Discussing how compilers, links and loaders work and the benefits of shared libraries.
Symbols and Symbol Resolution
Every relocatable object file has a symbol table and associated symbols. In the context of a linker, the following kinds of symbols are present:
-
Global symbols defined by the module and referenced by other modules. All non-static functions and global variables fall in this category.
-
Global symbols referenced by the input module but defined elsewhere. All functions and variables with extern declaration fall in this category.
-
Local symbols defined and referenced exclusively by the input module. All static functions and static variables fall here.
The linker resolves symbol references by associating each reference with exactly one symbol definition from the symbol tables of its input relocatable object files. Resolution of local symbols to a module is straightforward, as a module cannot have multiple definitions of local symbols. Resolving references to global symbols is trickier, however. At compile time, the compiler exports each global symbol as either strong or weak. Functions and initialized global variables get strong weight, while global uninitialized variables are weak. Now, the linker resolves the symbols using the following rules:
-
Multiple strong symbols are not allowed.
-
Given a single strong symbol and multiple weak symbols, choose the strong symbol.
-
Given multiple weak symbols, choose any of the weak symbols.
For example, linking the following two programs produces linktime errors:
/* foo.c */ /* bar.c */
int foo () { int foo () {
return 0; return 1;
} }
int main () {
foo ();
}
The linker will generate an error message because foo (strong symbol as its global function) is defined twice.
gcc foo.c bar.c
/tmp/ccM1DKre.o: In function 'foo':
/tmp/ccM1DKre.o(.text+0x0): multiple definition of 'foo'
/tmp/ccIhvEMn.o(.text+0x0): first defined here
collect2: ld returned 1 exit status
Collect2 is a wrapper over linker ld that is called by GCC.
Linking with Static Libraries
A static library is a collection of concatenated object files of similar type. These libraries are stored on disk in an archive. An archive also contains some directory information that makes it faster to search for something. Each ELF archive starts with the magic eight character string !<arch>\n, where \n is a newline.
Static libraries are passed as arguments to compiler tools (linker), which copy only the object modules referenced by the program. On UNIX systems, libc.a contains all the C library functions, including printf and fopen, that are used by most of the programs.
gcc foo.o bar.o /usr/lib/libc.a /usr/lib/libm.a
libm.a is the standard math library on UNIX systems that contains the object modules for math functions such as like sqrt, sin, cos and so on.
During the process of symbol resolution using static libraries, linker scans the relocatable object files and archives from left to right as input on the command line. During this scan, linker maintains a set of O, relocatable object files that go into the executable; a set U, unresolved symbols; and a set of D, symbols defined in previous input modules. Initially, all three sets are empty.
-
For each input argument on the command line, linker determines if input is an object file or an archive. If input is a relocatable object file, linker adds it to set O, updates U and D and proceeds to next input file.
-
If input is an archive, it scans through the list of member modules that constitute the archive to match any unresolved symbols present in U. If some archive member defines any unresolved symbol that archive member is added to the list O, and U and D are updated per symbols found in the archive member. This process is iterated for all member object files.
-
After all the input arguments are processed through the above two steps, if U is found to be not empty, linker prints an error report and terminates. Otherwise, it merges and relocates the object files in O to build the output executable file.
This also explains why static libraries are placed at the end of the linker command. Special care must be taken in cases of cyclic dependencies between libraries. Input libraries must be ordered so each symbol is referenced by a member of an archive and at least one definition of a symbol is followed by a reference to it on the command line. Also, if an unresolved symbol is defined in more than one static library modules, the definition is picked from the first library found in the command line.