Another boring day. I received feedback from a user who couldn’t execute the curl
command in his container.
it was a symbol lookup error , undefined curl_mime_free
i create a container with the same image on my machone and run the same command and it worked fine.
This doesn’t make sense intuitively. If the same image is used, at least the file system should be identical.
To investigate further, checked the dynamic libraries linked by the curl
executable in both environments using the ldd
command.
then found the issue, the curl
executable in the user’s container was linked to the libcurl.so
which path is different from the one in my container.
rerun the ldd
command with the LD_DEBUG
environment variable set to libs
to see how the dynamic linker resolves the dynamic libraries.
it seems that the entry of libcurl.so
in ld.so.cache was different in the two environments.
but why? i use the same image and the same run command arguments, if something triggers the ld.so.cache update , it should be the same in both environments.
I suddenly realized that the user’s container was created with the NVIDIA container runtime, while mine was created with the default container runtime.
The --runtime
parameter was overlooked by me because the user’s dockerd was configured to use NVIDIA’s runtime by default.
In order to find when the ld.so.cache was updated, I checked out the source code of nvidia container toolkit
the ldconfig was executed in the function nvc_ldcache_update
with default libs_dir
/usr/lib/x86_64-linux-gnu
int
nvc_ldcache_update(struct nvc_context *ctx, const struct nvc_container *cnt)
{
// ...
argv = (char * []){cnt->cfg.ldconfig, "-f", "/etc/ld.so.conf", "-C", "/etc/ld.so.cache", cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL};
// ...
}
that’s why the libcurl.so
in the user’s container was linked to the wrong path.
the nvc_ldcache_update
function will called when nvidia-container-cli configure
subcommand executed.
the nvidia-container-cli configure
subcommand was invoked by command nvidia-container-runtime-hook
’s prestart
hook.
the nvidia-container-runtime-hook
was executed by the nvidia-container-runtime
which was configured in the daemon.json file.
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
enough, my curiosity was satisfied. the solution is simple, just run ldconfig again inside the container and do not specify the libs_dir
parameter.