The following guidelines were observed when writing the SIL source code. Note, however, that they were treated as guidelines, not rules, with the cardinal rule being "readability first".
Rationale: All modern compilers support C99 and C++11, with occasional minor exceptions that generally do not impact SIL. Modern compilers also support anonymous unions; this feature is nonstandard in C99, but it has been added to the C11 standard, so such unions will be forward-compatible. On the flip side, the C11 standard makes C99's complex number types (the <complex.h> header) and variable-length arrays optional, so those features should be avoided.
Rationale (int): In all modern PC-class environments, the int type is at least 32 bits wide. (There are some embedded processor environments which use 16-bit ints to better match the hardware's capabilities, but such environments are generally not suited to running SIL-based programs.) Modern programs frequently need to work with data larger than a 16-bit integer can hold; requiring all code to either check for 16-bit overflow or explicitly use a 32-bit type would significantly increase the risk of bugs.
Rationale (pointers): In all modern PC-class environments, pointers are simple scalar values of at least the native word size, ensuring that an int-sized value can be safely stored in and later retrieved from a pointer variable. This technique is useful in certain cases, such as when passing an integer value through an interface that takes an opaque pointer argument. While these conversions should be avoided when possible, for the purposes of SIL they may be considered safe. Note that the converse does not hold: converting a pointer to int and back can change its value!
Rationale (int): We assume that the int type is at least 32 bits wide (see type size assumptions above), so thus there is no need to use long or int32_t merely to ensure a 32-bit data type. Using int as widely as possible reduces the chance of accidental truncation due to conversion between types of different sizes.
Rationale (sized types): Since we assume int is at least 32 bits wide, there should rarely be a need to specify sized types for local variables. However, sized types can be useful in certain cases, such as:
Rationale (char): While char and int8_t are normally the same internal type, char should be limited to data which is actually textual in nature. For 8-bit numeric data, use int8_t (or uint8_t, but see signed vs. unsigned integers below) to indicate to the reader that the data is numeric. Note in particular that whether char is signed or unsigned depends on the compiler, so using char for a signed 8-bit integer can result in nasty surprises.
Rationale (long and short): Sized integer types make long and short generally unnecessary, but when calling library functions with long- or short-type parameters or return values, it can be more convenient to use those types directly instead of casting back and forth. However, try not to propagate such types outside the immediate locality of the library call.
Rationale: Conversions between signed and unsigned integer types are a perennial source of bugs, so much so that most modern compilers warn about mixing them. The easiest way to avoid these bugs is not to use unsigned types at all. In particular, "this value will never be negative" is not a reason to use an unsigned type; someday the value will go negative, and your code will break.
There are still a few cases in which unsigned types are beneficial:
Rationale (data types): Experience has shown that boolean flags are used sufficiently often to warrant the use of a smaller data type than int when such values are stored in memory. On the other hand, values typically stored in registers, such as function parameters and return values or local variables, do not benefit from using a smaller-sized type. Note that C99 provides the _Bool type, along with the <stdbool.h> header which defines it as bool, but this type is not guaranteed to be link-compatible with the C++ bool type, so it is unsuitable for use in a library such as SIL which may be linked with C++ code.
Rationale (assignment of values): While C treats any nonzero value as true, assigning an arbitrary value directly to a boolean variable can have unexpected results: for example, on a system where long is larger than int, the value LONG_MIN itself is nonzero and therefore true, but assigning it directly to a boolean variable (of type int or uint8_t) will result in a false value due to truncation. It is permissible to copy the value of one boolean variable to another if the first variable's value is known to be safe (either 0 or 1), but do not assume that boolean arguments passed to an interface function have safe values.
Rationale: Computations using double values are generally more expensive than computations with float values even on systems with hardware support for double-precision floating point. Some systems implement double-precision operations in software, in which case the performance difference can reach an order of magnitude or more and significantly impact overall program performance. For the vast majority of floating-point computations typically performed by SIL-based programs, float provides sufficient precision to ensure correct behavior. Additionally, the consistent use of a single type helps avoid unexpected behavior that can result from mixing precisions, such as loss of precision when a double value is passed through a function which takes a float parameter.
An example of appropriate usage of double is in the SIL interface time_now(), which returns a double-precision timestamp in units of seconds. If a float value was returned, it would quickly lose precision to the point of preventing accurate sub-second timing. For example, using an IEEE 754-compliant single precision type, the timestamp resolution would drop to around 1 millisecond after 8192 seconds (about 2 hours); at a frame rate of 60 frames per second, this is more than 5% of a frame, and attempting to use such a low-resolution timestamp for accurate timing could result in noticeable jitter.
Rationale (f suffix): It can be easy to forget that floating-point literals are double precision by default, but including a double-precision literal in an expression causes all other operands in the expression to be promoted from single to double precision, even if the expression's value is then assigned to a single-precision variable (in which case the value must then be converted again, from double to single precision). Always include the f suffix on floating-point literals to mark them as single precision, except when the literal is intended to be a double-precision value or is used in a double-precision expression.
Rationale (integer literals): Unlike double-precision literals, integer literals do not cause promotion of floating-point operands, so they are generally safe to use in floating-point expressions, and expressions may be easier to read without extraneous ".0"s on such values. However, bear in mind that if both operands to an operator are integers, the operation will be performed as an integer and may consequently overflow; values which have the potential to cause such overflow should be written as floating-point literals, including the f suffix when appropriate.
Exception: It is not necessary to include the f suffix on floating-point literals used as initializers, since such values will be converted to single precision at compile time and thus will have no impact on runtime performance.
Examples:
Rationale: While 0 represents a zero value in any type, use of the most appropriate literal (NULL for pointers, '\0' for characters) helps remind the reader of the data type. However, it is not necessary to write 0.0 or 0.0f for floating-point values (see also floating-point literals above).
Rationale: Conditional expressions in if, for, and while statements (as well as subexpressions of the logical operators && and ||) treat any nonzero value as false, so there is no need to explicitly write "expression != 0", and it can often be more readable to omit the comparison against zero. For example, when a zero value or null pointer indicates the absence of an object, it is generally more meaningful to say "if (object)" (which can be read "if object exists") than to explicitly compare against zero, which can suggest that zero has some special meaning. Do use an explicit zero if the value zero has a specific meaning, such as the first entry in a zero-indexed menu.
If you need to store a flag for whether a value is zero, you can use the logical negation operator "!". However, do not use a double negation to test for a nonzero value; explicitly compare against zero instead.
Examples:
Rationale: It can be convenient to use an assignment in the conditional expression of a control statement, to assign a value and test that value at the same time. However, this can also make it harder to follow the flow of the code, so assignments should only be used when that assignment is the primary purpose of the entire expression.
Also, a lone assignment expression in a conditional statement can look like a mistyped equality comparison (and indeed, many compilers will emit a warning along those lines), so always enclose the assignment in parentheses and explicitly compare for inequality to zero.
Examples:
Rationale: In the past, a common C idiom was to replace certain arithmetic operations with bitwise operations, such as replacing multiplication by a power of two with the equivalent left-shift operation, on the (generally correct) theory that bitwise operations execute more quickly than arithmetic ones. Modern compilers are perfectly capable of performing this optimization themselves, so there is almost never any need to resort to this hack. (This can also lead to subtle bugs due to precedence errors.)
Even in cases where the use of bitwise operators can make a difference, don't use them unless you've profiled the code and you're certain that the arithmetic operator is a significant bottleneck. Remember that premature optimization is the root of all evil.
Examples:
Rationale: const can help prevent errors resulting from accidentally assigning to the wrong variable; as a bonus, it generally helps the compiler optimize better. Use it whenever you initialize a variable that won't be changed, such as when saving the result of a function call.
const can be applied at all pointer levels of a pointer variable, but often one const is good enough. (Also note that some library functions expect const at some levels but not at others, and the constness of each level has to match.)
Examples:
Rationale: In C, file-scope constants are not folded, so defining such a constant in a header using static const would emit a copy of the constant in every object file including the header, needlessly wasting space. Scalar constants should therefore be defined using either #define (but be careful when using macros) or enum; the advantage of enum is that the symbol is included in debug information and can be referenced in a debugger, while the disadvantages are that the syntax is slightly more obtuse and that floating-point values can't be used. Constants local to a function, on the other hand, can usually be compiled directly into the instruction stream as a register load, so there is no problem with just using const. (In this case, static is unnecessary and could potentially waste space in the object file.)
Examples:
Rationale: Although "considered harmful" by some—and indeed, injudicious use of goto can greatly impair code maintainability—the goto statement is useful in consolidating error-handling logic in a single location, and it should be used in preference to repeating the same cleanup code over and over.
Examples:
Rationale (assert early and often): Programmers are only human, and errors will creep into any nontrivial code. Assertions provide a way to check for errors at runtime before those errors cause crashes, data corruption, or other serious problems. SIL provides (in base.h) two macros, ASSERT() and PRECOND(), which can be used for this purpose. At present, the two macros are essentially identical, but PRECOND() is intended for checking function preconditions, and further use of the macro may be made for that purpose in the future, so prefer PRECOND() over ASSERT() when checking a function argument against a precondition. Use ASSERT() for all other cases.
Rationale (only impossible conditions): By writing an assertion, you are declaring (asserting) that the asserted expression must be true under every possible condition. If there is any possible condition under which the expression might be false, no matter how unlikely, do not use an assertion. For example, never assert that a memory allocation has succeeded, because there's always the possibility that the program will run out of memory (or address space) and the allocation will fail. In such cases, always implement and test proper error handling. Exceptions: You do not need to consider hardware errors such as memory or register corruption when deciding whether a condition is possible; for example, if an interface function checks the value of an argument, its helper functions do not also need to make the same check. You may also assume that system calls and other external functions behave according to their documentation.
Rationale (include fallback actions): The ASSERT() and PRECOND() macros accept an optional failure action, which is a statement (or multiple statements) which will be executed if the assertion fails and the program is not running in debug mode. While there is a school of thought which argues that the program should always abort on assertion failure because the internal state has left the designed bounds and further behavior cannot be predicted, it is often feasible to perform some sort of recovery short of terminating the entire program. For example, if a function which expects a valid file handle receives a null pointer, the function can simply return an error state as it would for an actual error with a valid file handle. This may still result in the program terminating itself with an error message, but even that is more user-friendly than simply crashing.
Rationale (avoid complex fallback actions): By their very nature, fallback actions for assertions cannot be tested like other code, since (assuming the program does not have any relevant bugs) the asserted condition will never fail, and if it did, the test (which runs in debug mode) would terminate anyway. For this reason, fallback actions should be extremely simple; often, a single return statement is sufficient. In cases where there is no simple way to recover from an assertion failure, prefer to omit the fallback action entirely, especially if there are no serious consequences from the failure.
Examples:
Macros are permitted when they serve a purpose which is difficult or impossible to serve otherwise. However, be especially careful of unintended side effects when writing a macro (see the examples below).
Note that many function-like uses of macros—specifically, those which do not include control statements like return that escape the scope of the macro and whose parameters take specific types—can be replaced with static inline functions. Doing so both avoids the potential problems of macros and allows the compiler to perform its usual type-checking.
Rationale: Preprocessor macros are a powerful metaprogramming tool, but that power can easily hurt readability. Since macros are expanded before the source code is parsed, it's easy to write a macro that has unintended consequences, and it can be difficult to figure out exactly what those consequences were.
Examples:
When including SIL headers in a source file, order the headers componentwise alphabetically by full pathname, excluding the .h filename extension. As a corollary, each header should declare all external types it references, except for those defined in src/base.h (which will always be included first).
If you need to include any system headers, list them after all SIL headers. It may be useful to further subdivide these into standard system headers and headers for specific system libraries.
Rationale: Using the full pathname of a header tells the reader immediately where the header is located; a relative pathname would force the user to check the location of the source file and manually resolve the relative path. Additionally, if a system header happens to have the same name as a header you create, the compiler may include the system header instead of yours if you give only the filename in the #include directive.
Exception: Use relative pathnames for nested includes in public headers, to avoid requiring particular compiler flags for client code.
Example:
Rationale: Including a header file inside another header file just to get the declaration of a structured type forces all users of the header to pay the cost of loading the nested header. Instead, when possible, use forward declarations of struct and union types. (This generally means you'll need to use "struct type" or "union type" instead of just the type name in function declarations.) Since C++ doesn't allow forward declarations of enums, headers which reference enum types and which may be included from C++ code (for SIL, this generally means all headers outside of the sysdep directory) will have to use nested includes for such types.
Typically, the typedef should precede the definition of the structured type itself, so the type name can be used within the definition (such as when defining a "next" pointer for a list). However, C++ does not allow referencing an enum before it has been defined, so in that case, the typedef must follow the enum.
Rationale: In C++, all tags for structured types (including class, struct, union, and enum) are automatically defined as type names, but in C, an explicit typedef is required for each type. Since C++ will not complain about such typedef statements, they should be included for all structured types visible to C code.
Rationale: UTF-8 is currently the de-facto standard for text encoding, and full Unicode (including U'...' character values) is supported by at least GCC and recent versions of Clang. However, support is by no means universal, so try to avoid non-ASCII characters, and test extensively if you do use them.
Rationale: 80 columns has proven to be a good balance between avoiding unnecessary wrapping and keeping the text narrow enough to scan easily (that is, without forcing the eyes to move back and forth on each line). 80 columns is also a fairly standard width for terminal programs and editors. However, some such programs have troubles with lines that are exactly 80 columns long (for example, Emacs will wrap the 80th character to the next line when using an 80-column display); for this reason, lines should be kept to 79 characters when possible.
Exceptions:
The SIL source files include a trailer comment which causes the Emacs and Vim editors to use the proper indentation settings.
Rationale: Four columns is enough to clearly indicate the nesting depth at a glance, without being so wide that it pushes reasonably nested code off the edge of the screen. (Corollary: if code is indented so much that the line length limit becomes a problem, the nesting level is too deep.) Four columns is also divisible by two to provide an intermediate indentation for labels.
Examples:
Rationale: There is little consensus between editor programs on the width of a tab stop; thus, to properly read code indented with tabs, the reader of the code must make a special effort to configure their software properly. It's far preferable for the (single or few) writers of source code to make the effort to use spaces rather than force the (many) readers to change their editor settings for each program's source code they view.
Rationale: It can be easy to overlook extra statements on the same line, especially when they are infrequent.
Exception: If all cases in a switch block will fit on one line each, the statements may be moved to the same lines as their respective case labels. In this case, do not outdent the case labels.
Examples:
Rationale: Whitespace improves readability when used in moderation. Omitting whitespace around member reference operators and unary operators emphasizes their tighter binding.
Exceptions:
Examples:
Exception: It's okay to omit spaces after commas in nested function calls, as long as doing so doesn't hurt readabililty.
Examples:
Rationale: Some combinations of operators are particularly susceptible to precedence errors:
The compiler will generally emit a warning if parentheses are missing in any of the cases listed above.
Examples:
Rationale: While C does not require parentheses around the
arguments to defined or (when the argument is a variable)
sizeof, those keywords act like functions in that they
return values*, so uses of those keywords should be styled
like function calls. return, on the other hand, is not a
function and does not generate a value (you couldn't put it on the
right side of an assignment operator), so it shouldn't be used like
one.
*Technically, defined
doesn't "return a value" since it's not recognized by the compiler at
all, but the preprocessor translates it into a boolean value, so it's
the same sort of beast.
Examples:
Rationale (mandatory braces): Failing to use braces with control statements can easily lead to bugs, such as when attempting to add a second statement to an if without a block.
Exception (opening brace): The opening brace can be moved to the next line if it doesn't fit on the same line, or to avoid confusion between a continued line of the control statement and the first line of the nested block when the two lines have similar indentation (see the second for example below).
Note that if a block is long, it can be useful to annotate the closing brace with the control statement that began the block (see the while example below).
Examples:
Rationale (optional braces): Unlike other control statements, the use of braces in switch statements has no effect on control flow. In general, use braces when you need to define variables local to that case.
Rationale (documenting fall-through): It can be hard to tell at a glance whether a missing break statement is intentional or not. Documentation helps reassure the reader of the intended behavior, and it also avoids the risk that someone (maybe even you) will accidentally insert a break during a code cleanup session.
Examples:
Rationale (explicit void): In C (as opposed to C++), an empty parameter list means that the function's parameters are unspecified. This prevents the compiler from checking the number and types of parameters at call sites, so functions which take no parameters should have an explicit void to indicate that fact to the compiler.
Rationale (opening brace on following line): Putting the brace on its own line gives an additional visual indication that the brace starts a new function.
Exception: If the function is both very short (1-2 lines) and defined with static linkage, it is acceptable to put the definition's opening brace on the same line as the function declaration. If the function body fits on the same line as the declaration, the entire function may be written on one line.
Examples:
Rationale: Using an explicit dereference operation makes it clear to the reader that the thing being called is a function pointer and not an actual function.
Exception: In C, function pointers accessed through a structure do not need to be dereferenced if they are used like C++ instance methods.
Functions have a specific header comment format; see the code for details.
In general, an identifier's name should immediately tell the reader the purpose of the identifier, but it should be concise enough that its length does not obscure the structure of the code. For example, LIMIT would be a poor name for a global constant; the name tells us nothing about what sort of limit it is. But the same LIMIT might make perfect sense in a short function whose sole purpose was to bound its parameter to be less than a certain value, and indeed a longer name would serve no purpose except to clutter up the code.
Similarly, a file's name should make the file's purpose clear to someone looking at a directory listing, but should not be so long as to clutter log messages (which include the name of the source file which generated the message). In the case of filenames, it's acceptable to include the directory path when determining whether a filename is "clear"; for example, resource/core.c clearly refers to core functionality for resource management, and does not need to be expanded to resource/resource-core.c.
Avoid overusing abbreviations, since they can reduce readability by forcing the reader to stop and mentally expand the abbreviation each time the identifier is used. For example, in a function that uses a variable to hold a count of objects, nobj would be a poor name for the variable since its meaning is not immediately obvious to a reader unfamiliar with the code. num_obj would be better, but unless the variable is heavily used throughout the function, num_objects is more friendly to the reader. (However, number_of_objects would be unnecessarily verbose, since a num_ prefix is generally understood to mean "number of".)
Single-letter and similarly short variable names should be avoided except in cases where their meaning is obvious and generally accepted. For example, i is widely accepted as an iterator variable and may be used in that context, but it should not be used for a temporary variable, even in a limited scope. Similarly, short names for types or functions are acceptable when they are clearly derived from similar names in the standard libraries.
Examples:
Use the filename extensions listed below for each source file type:
Rationale (filename extensions): While not strictly required on modern operating systems, the filename extension is an accepted way to inform the user of the type of content in the file. Some programs (including compilers and IDEs) also use the file extension to guess the file's content type, and using a nonstandard extension would confuse such programs to the detriment of the user.
Rationale (unique names): All source files are compiled to object files with the same filename extension (typically .o). If two source files in the same directory have the same name but a different extension, their object files would collide, breaking the build. If it is necessary to have two source files in different languages with the same purpose (for example, when implementing a C++ interface to C functions), use the base filename for the source file with the most nontrivial code, and rename other files to avoid object file collision. For example: utilities.c, utilities-cxx.cc, utilities-objc.m
Rationale (non-alphanumeric characters): Non-alphanumeric characters may have special meanings to some systems, preventing files whose names contain those characters from being used properly (or at all!) on such systems. For example, quote characters are used on many systems to enclose filenames containing spaces; conversely, spaces are used on most systems to separate command arguments, and including a space in a filename can cause builds to break in unexpected ways. The only symbols accepted as safe across all systems are the hyphen and underscore. Non-ASCII characters should also be avoided because some users' systems may not be able to display them properly.
Examples:
C++ reserves a number of keywords which can also be reasonably used as identifier names; for example, try could be a counter for an operation which may need to be retried several times, and this can be used as an object pointer when implementing instance-method-like functions in C. As long as the names are appropriate for the uses to which they are put, they may be freely used in C code.
However, care is needed when such identifiers appear in header files, such as when used as structure field names. In this case, renaming the identifier is usually best, but if the identifier does not need to be referenced by C++ code (for example, if it is a parameter name in a function declaration), it is also permissible to bracket the header with a #define/#undef pair:
While you should try not to introduce unnecessary computational complexity (for example, using a cubic-time algorithm when a quadratic-time algorithm is available), neither should you take "shortcuts" or "clever hacks" to cut down on execution time unless you have hard data demonstrating that such optimizations are of significant benefit to the program (or library) as a whole.
Rationale: This rule could also be phrased as, "Premature optimization is the root of all evil." The history of software development is littered with cases of programmers expending effort on optimizing routines which make no significant contribution to execution time in the first place—and introducing new, hard-to-find bugs as a result of their supposedly "clever" optimizations. Don't repeat their mistakes.
Examples: