CS 134

Dynamic Linking in Practice

  • Cat speaking

    So now we understand how dynamic linking works in theory, but how do we actually use it?

  • PinkRobot speaking

    Well, most of the time, it's just there in the background, making sure everything works. But there are some interesting things you can do with it!

  • BlueRobot speaking

    Let's look at some practical aspects of dynamic linking, from creating shared libraries to some clever tricks you can do with them.

Creating a Shared Library

Let's make a simple library that provides a function to print a greeting:

/* hello_lib.c */
#include <stdio.h>

void
say_hello(const char *name)
{
        printf("Hello, %s!\n", name);
}

To convert this code into a shared library, we need to compile it in a special way:

$ gcc -fPIC -c hello_lib.c
$ gcc -shared -o libhello.so hello_lib.o
  • Duck speaking

    What's PIC?

  • PinkRobot speaking

    “Position Independent Code”—remember how we said the shared library could be put anywhere in memory? -fPIC tells the compiler to generate code that totally doesn't mind where in memory it's been placed.

Now we can write a program that uses our library:

/* greet.c */
void say_hello(const char *name);

int
main()
{
        say_hello("Dynamic Linking");
        return 0;
}

And compile it:

$ gcc -o greet greet.c -L. -lhello
$ ./greet
./greet: error while loading shared libraries: libhello.so: cannot open shared object file: No such file or directory
  • Horse speaking

    Hay! What went wrong?

  • PinkRobot speaking

    The dynamic linker can't find our library! By default, it only looks in certain places…

Finding Libraries

On Linux, the dynamic linker looks for libraries in

  1. Directories listed in LD_LIBRARY_PATH.
  2. Directories listed in /etc/ld.so.conf.
  3. Standard directories like /lib and /usr/lib.

(On a Mac it's similar, but the environment variable is DYLD_LIBRARY_PATH. Windows has a different system.)

We can see what libraries a program needs:

$ ldd greet
        linux-vdso.so.1 (0x00007fff5cd7c000)
        libhello.so => not found
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4b12c00000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f4b12e00000)

To run our program, we can either

  1. Install the library in a standard location; or
  2. Set LD_LIBRARY_PATH:
$ LD_LIBRARY_PATH=. ./greet
Hello, Dynamic Linking!

Another option would be to compile our program with the -rpath option so that it automatically knows where to find the library:

$ gcc -o greet greet.c -L. -lhello -Wl,-rpath,`pwd`
$ ./greet
Hello, Dynamic Linking!

Dynamically Loading a Plugin

The dynamic-linking facility doesn't just have to be when a program starts. You can also load libraries after it's started running! On a POSIX system, we have the functions

  • dlopen — open a shared library/plug-in, loading it into memory and linking it against the program, returns a handle to it
  • dlsym — get the address of a function in the library (pass library handle and the function name)
  • dlclose — close the library, unloading it from memory

Note that while the library is linked against our program, our program isn't relinked against the newly loaded code, so the only way to see symbols in that code is via dlsym.

Here's an example program that generates C code, compiles it, then loads it into itself at runtime:

#include <dlfcn.h>
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <setjmp.h>
#include <signal.h>

typedef void (*demo_func_t)(void);

#define TEMPLATE_PATH "/tmp/dynlink_XXXXXX"
#define MAX_LINE     1024
#define COMPILE_CMD  "gcc -shared -fPIC -o %s %s"

static void
generate_source(const char *filename, const char *user_code)
{
        FILE *fp;

        if ((fp = fopen(filename, "w")) == NULL)
                err(1, "fopen");

        fprintf(fp, "#include <stdio.h>\n#include <unistd.h>\n"
                   "#include <fcntl.h>\n#include <stddef.h>\n"
                   "#include <stdint.h>\n#include <stdlib.h>\n\n"
                   "void demo_function(void)\n"
                   "{\n"
                   "        %s\n"
                   "}\n", user_code);
        fclose(fp);
}

static char *
create_temp_file(const char *suffix)
{
        char *filename;
        int fd;

        /* Allocate space for template + suffix + null terminator */
        if (!(filename = malloc(strlen(TEMPLATE_PATH) + strlen(suffix) + 1)))
                err(1, "malloc");

        strcpy(filename, TEMPLATE_PATH);

        if ((fd = mkstemp(filename)) == -1)
                err(1, "mkstemp");
        close(fd);

        /* Add suffix to the generated filename */
        strcat(filename, suffix);

        return filename;
}

static int
compile_shared_object(const char *src_file, const char *so_file)
{
        char cmd[MAX_LINE];

        snprintf(cmd, sizeof(cmd), COMPILE_CMD, so_file, src_file);
        printf("[DEBUG] Compiling with command: %s\n", cmd);

        return system(cmd);
}

static char *src_file, *so_file;

static void
zap_files(void)
{
        if (src_file) {
                unlink(src_file);
                free(src_file);
                src_file = NULL;
        }
        if (so_file) {
                unlink(so_file);
                free(so_file);
                so_file = NULL;
        }
}

/* Signal handlers for Alarm, Segmentation fault, Bus error */

jmp_buf give_up;

static void
sig_alarm(int signo)
{
        fprintf(stderr, "Timeout: Execution took too long\n");
        longjmp(give_up, 1);
}

static void
sig_mem(int signo)
{
        fprintf(stderr, "Memory access error\n");
        longjmp(give_up, 1);
}


int
main(void)
{
        char *line;
        void *handle;
        demo_func_t func;
        const char *error;
        size_t linelen = 0;

        /* Set up cleanup handler */
        atexit(zap_files);

        while (1) {
                printf("Enter one line of C code (for function body):\n"
                    "(e.g., --> printf(\"Hello, world!\\n\"); <-- )\n"
                    "Type 'exit(0);' to quit\n");
                if ((linelen = getline(&line, &linelen, stdin)) == -1)
                        err(1, "getline");


                /* Create temporary source and shared object files */
                src_file = create_temp_file(".c");
                so_file = strdup("./temp.so");
                printf("[DEBUG] Temporary files: Source = %s, "
                    "Shared object: %s\n", src_file, so_file);

                /* Generate and compile the source */
                generate_source(src_file, line);
                printf("[DEBUG] Generated source file with user code\n");

                if (compile_shared_object(src_file, so_file) != 0) {
                        fprintf(stderr, "Compilation failed\n");
                        goto cleanup;
                }
                printf("[DEBUG] Compilation successful\n");

                /* Load the shared object */
                handle = dlopen(so_file, RTLD_NOW);
                if (handle == NULL) {
                        fprintf(stderr, "dlopen failed: %s\n", dlerror());
                        goto cleanup;
                }
                printf("[DEBUG] Loaded shared object at %p\n", handle);

                /* Get the function symbol */
                func = (demo_func_t)dlsym(handle, "demo_function");
                if ((error = dlerror()) != NULL) {
                        fprintf(stderr, "dlsym failed: %s\n", error);
                        dlclose(handle);
                        goto cleanup;
                }
                printf("[DEBUG] Retrieved function symbol at %p\n",
                       (void *)func);

                /* Execute the function */
                printf("\nExecuting user function:\n");
                printf("--------------------\n");
                /* Protect against infinite loops and some bad memory access */
                if (setjmp(give_up) == 0) {
                        signal(SIGALRM, sig_alarm);
                        signal(SIGSEGV, sig_mem);
                        signal(SIGBUS, sig_mem);
                        alarm(3);
                        func();
                        alarm(0);
                        signal(SIGALRM, SIG_DFL);
                        signal(SIGSEGV, SIG_DFL);
                        signal(SIGBUS, SIG_DFL);
                }
                printf("--------------------\n");

                /* Cleanup */
                dlclose(handle);
                printf("[DEBUG] Unloaded shared object\n");

        cleanup:
                zap_files();
                printf("[DEBUG] Removed temporary files\n");
        }

        return 0;
}
  • Hedgehog speaking

    Do I need to follow this code in detail?

  • PinkRobot speaking

    The main thing is the big picture—that we can load libraries at runtime and use them. The details are just to show how it's done and give a fun demo.

  • Dog speaking

    I think it's pretty cool. It's like a REPL for C code!

  • Cat speaking

    This whole thing of loading code at runtime is cool, but also kinda dangerous. You're giving the user the power to run arbitrary code in your program! But at least it was our choice to do that.

  • PinkRobot speaking

    Well, actually, we can load code into programs that didn't expect it, too…

Fun with LD_PRELOAD

The dynamic linker has a special feature: if we set a special environment variable, it will load libraries before all others. On Linux, the variable is LD_PRELOAD (on a Mac it's DYLD_INSERT_LIBRARIES). This feature means you can override functions in other libraries!

Let's make a library that intercepts calls to time() (and equivalent functions) and always returns the time for New Year's Day 2025:

/* faketime.c */
#include <time.h>
#include <sys/time.h>

/* Always return New Year's 2025! */
#define NEW_YEARS_2025 1735718400 /* in seconds since epoch */

/* redefine time() to our fake time */
time_t
time(time_t *tloc)
{
        time_t fake_time = NEW_YEARS_2025;
        if (tloc) *tloc = fake_time;
        return fake_time;
}

/* and, gettimeofday() is used by some programs */
int
gettimeofday(struct timeval *restrict tv, void *restrict tz)
{
        tv->tv_sec = NEW_YEARS_2025;
        tv->tv_usec = 0;
        return 0;
}

/* and, clock_gettime() is used by other programs */
int
clock_gettime(clockid_t clk_id, struct timespec *tp)
{
        tp->tv_sec = NEW_YEARS_2025;
        tp->tv_nsec = 0;
        return 0;
}

Compile it:

$ gcc -fPIC -shared -o libfaketime.so faketime.c

Now we can make ANY program think it's New Year's Day:

$ date
Mon Nov  4 11:54:23 AM PST 2024
$ LD_PRELOAD=./libfaketime.so date
Wed Jan  1 12:00:00 AM PST 2025
  • Hedgehog speaking

    That's both cool and a bit scary. You can change how programs behave without modifying them!

  • PinkRobot speaking

    True, but It does have legitimate uses. For example, you can use it to debug programs by intercepting functions and printing when they're called.

Often we want to not only have our replacement function run, but also call the original function. We call this usage interposition, because our code lies between the program and the original library function (interposing itself).

Interposition can be used for many things, including

  • Debugging (intercepting functions to print when they're called).
  • Testing (simulating different conditions).
  • Compatibility layers (making old programs work with new libraries).
  • Duck speaking

    But how do we call the original function?

  • PinkRobot speaking

    It actually varies. On Linux, you look it up with dlsym using the RTLD_NEXT handle. On a Mac, you use a different approach called swizzling, where it does a switcharoo with the original function, so you try to call yourself but it's actually the original function.

  • Hedgehog speaking

    It still all seems a bit sneaky to me…

What might be some security concerns with LD_PRELOAD?

Modern Unix-like systems have some safeguards, including

  1. LD_PRELOAD is ignored for setuid programs.
  2. Only trusted users should be able to set LD_LIBRARY_PATH and LD_PRELOAD.
  3. System libraries are usually installed in protected directories.
  4. Macs may refuse to honor DYLD_INSERT_LIBRARIES for signed binaries.

Library Versioning

  • Goat speaking

    Meh. This is all well and good, but what happens when you update a library and the new version isn't compatible with old programs? You'd have been better off statically linking!

  • PinkRobot speaking

    Actually, that's where library versioning comes in…

Libraries can have multiple versions installed at once:

$ ( cd /lib/x86_64-linux-gnu/ ; ls -l libpcre*so* )
lrwxrwxrwx 1 root root     21 Apr  8  2024 libpcre2-16.so -> libpcre2-16.so.0.11.2
lrwxrwxrwx 1 root root     21 Apr  8  2024 libpcre2-16.so.0 -> libpcre2-16.so.0.11.2
-rw-r--r-- 1 root root 572064 Apr  8  2024 libpcre2-16.so.0.11.2
lrwxrwxrwx 1 root root     21 Apr  8  2024 libpcre2-32.so -> libpcre2-32.so.0.11.2
lrwxrwxrwx 1 root root     21 Apr  8  2024 libpcre2-32.so.0 -> libpcre2-32.so.0.11.2
-rw-r--r-- 1 root root 543392 Apr  8  2024 libpcre2-32.so.0.11.2
lrwxrwxrwx 1 root root     20 Apr  8  2024 libpcre2-8.so -> libpcre2-8.so.0.11.2
lrwxrwxrwx 1 root root     20 Apr  8  2024 libpcre2-8.so.0 -> libpcre2-8.so.0.11.2
-rw-r--r-- 1 root root 625344 Apr  8  2024 libpcre2-8.so.0.11.2
lrwxrwxrwx 1 root root     23 Apr  8  2024 libpcre2-posix.so -> libpcre2-posix.so.3.0.4
lrwxrwxrwx 1 root root     23 Apr  8  2024 libpcre2-posix.so.3 -> libpcre2-posix.so.3.0.4
-rw-r--r-- 1 root root  14568 Apr  8  2024 libpcre2-posix.so.3.0.4

And programs record which version they were built against:

$ objdump -p /bin/grep | grep NEEDED
  NEEDED               libpcre2-8.so.0
  NEEDED               libc.so.6

This way,

  • Old programs keep working with the version they expect.
  • New programs can use new features.
  • Security fixes can be applied to all versions.
  • Cat speaking

    So that's how Linux manages to update libraries without breaking everything!

  • PinkRobot speaking

    Exactly! The dynamic linker handles all the complexity of finding the right version of each library for each program.

Symbol Versioning

  • Duck speaking

    But what if you only change one function in the library? Do you need a whole new version?

  • PinkRobot speaking

    Actually, libraries can version individual functions! This is called symbol versioning.

Let's see symbol versioning in action. Here's a simple example:

/* coolmath.c - Our growing math library */
#include <math.h>

/* Original version of our function */
__asm__(".symver add_positive_v1,add_positive@COOLMATH_1.0");
int
add_positive_v1(int a, int b)
{
        return a + b;  /* Oops, we forgot to check for positive! */
}

/* New version with proper checking (@@ means also set the default version) */
__asm__(".symver add_positive_v2,add_positive@@COOLMATH_2.0");
int
add_positive_v2(int a, int b)
{
        if (a < 0 || b < 0) {
                return 0;
        }
        return a + b;
}

We also need a version script (coolmath.map),

COOLMATH_1.0 {
    global:
        add_positive;
    local:
        *;
};

COOLMATH_2.0 {
    global:
        add_positive;
} COOLMATH_1.0;
  • Horse speaking

    Hay! What does local: *; do?

  • PinkRobot speaking

    It says that any functions not explicitly listed as global are private to the library. Using it is good practice to avoid accidentally exposing internal functions, even if we forgot to declare them as static.

Compile it:

$ gcc -fPIC -shared -Wl,--version-script=coolmath.map -o libcoolmath.so coolmath.c

For the header file, we'll declare the function so that it chooses the right version. By default, programs will just want add_positive, but if we specify a version, it'll use that one:

/* coolmath.h */
#ifndef COOLMATH1_H
#define COOLMATH1_H

#define STR_HELPER(x) #x
#define STR(x) STR_HELPER(x)    /* Stringify the version */

#ifdef COOLMATH_VERSION
__asm__(".symver add_positive,add_positive@COOLMATH_" STR(COOLMATH_VERSION));
#endif
int add_positive(int a, int b);
#endif
  • Horse speaking

    So now we have two versions of add_positive in one library?

  • PinkRobot speaking

    Exactly! Let's see how different programs use them:

Here's our test program, mathtest.c:

#include <stdio.h>
#include "coolmath.h"

int
main() {
        printf("3 + (-2) = %d\n", add_positive(3, -2));
        return 0;
}

We'll compile two versions of the executable: one uses the current version of the library, and the other uses the old version:

$ gcc -o newmathtest mathtest.c -L. -lcoolmath
$ gcc -DCOOLMATH_VERSION=1.0 -o oldmathtest mathtest.c -L. -lcoolmath
$ LD_LIBRARY_PATH=. ./oldmathtest
3 + (-2) = 1
$ LD_LIBRARY_PATH=. ./newmathtest
3 + (-2) = 0
  • Cat speaking

    So the same function name in the same library gives different results depending on when the program was compiled?

  • PinkRobot speaking

    Right! The dynamic linker knows which version each program expects and calls the right one.

  • Hedgehog speaking

    That seems complex. Why go to all this trouble?

  • PinkRobot speaking

    It lets library authors fix bugs and add features while still keeping compatibility with old programs.

We can see the version information with objdump:

$ objdump -T libcoolmath.so | grep add_positive
0000000000001111 g    DF .text  000000000000002b (COOLMATH_2.0) add_positive
00000000000010f9 g    DF .text  0000000000000018 (COOLMATH_1.0) add_positive
$ objdump -T oldmathtest | grep add_positive
0000000000000000      DF *UND*  0000000000000000 (COOLMATH_1.0) add_positive
$ objdump -T newmathtest | grep add_positive
0000000000000000      DF *UND*  0000000000000000 (COOLMATH_2.0) add_positive

Notice that even though we never said which version to use when we compiled newmathtest, it has recorded that it needs version 2.0 of the library. If we came out with a version 3.0 of the library, newmathtest would still use version 2.0 until we recompiled it.

In industrial-strength libraries like the C library, why is symbol versioning so important?

  • Rabbit speaking

    In fact, glibc uses this mechanism extensively. Functions like memcpy have multiple versions optimized for different CPU features, and the dynamic linker can even switch versions at runtime!

  • PinkRobot speaking

    That's a bit beyond what we need to cover here, but yes, symbol versioning is a powerful tool!

  • Rabbit speaking

    If you want to know more, here's a good article on the subject of shared libraries and all the things you can do with them.

  • Goat speaking

    Meh. I still think static linking is simpler. Worse is better! Are we done with dynamic linking now?

  • PinkRobot speaking

    I think we're at a good place to wrap up!

(When logged in, completion status appears here.)