In this recipe we are going to debug a container that is failing during startup. The steps in this recipe can also be used to debug a container that failed during runtime or simply to debug any other issues that might come up from time to time.

You can find the application for this recipe on my github repository

When you do not know my recipes take a look at the introduction, or check out all recipes there are.

As usual in the recipe section we will look at the following:

The problem it solves

Of course you would want to determine why something went wrong in your application which is running in a docker container.
And you would be right that your application logs are the first thing to look at.

But what if something went wrong with the container as a result of some configuration issue? Say for example it could not connect to another container over a SDN (Like in this example).

Then you need to check if your docker container/Dockerfile simply is not configured correctly or if your application did fail for any other reason whatsoever. And this exactly what this recipe will help you to do.

Debugging containers also helps to get comfortable around them, playing with them and working with them.

What

We are going to explore the debugging by working with an asp net core application that is running inside a docker container and should be connected to a postgres database which is also running in a docker container. The problem will be that the connection string is not correct for connecting via loopback interface (localhost).

This is because if you run an application inside a container the localhost now refers to the container and not to the docker hosts localhost anymore.
You can read up on this in my posts about integration testing.

How

I assume that you cloned the repository and run all the commands from its root.

Run containers manually and in background

The first issue we will be looking at is when we run a container with the –d flag as detached (in background).
This then does not print the logs immediately ot stdtout.

For this we start with the following docker commands:

  1. Start the postgres container
    docker container run -d -p 5434:5432 --rm --name \ 
    pg_db -e POSTGRES_USER=db_user -e POSTGRES_DB=pg_db postgres:10
    
  2. Build the web api image
    docker image build . -t debug_api:docker_debug
  3. Start the web api container manually
    docker container run -d -p 5000:80 --name api -e ASPNETCORE_ENVIRONMENT=Docker \
    --add-host=inDockerhost:172.17.0.1 debug_api:docker_debug
  4. Check with curl if the web api is running smoothly:
    curl -f http://localhost:5000/api/v1/counter/

We can clearly see that curl could not connect, so something must be going wrong here.

1.) Simple logging of a container

A.) Identify that a container is not run correctly
With the following command we can see that the container is correctly run but does appear to have stopped.

docker container list -a

So the postgres container is up and running but we still want to look at it if the database can be connected to.

First we check if the database is up and running by either connecting to it:

psql -d pg_db -U db_user -p 5434 -h localhost

or use pg_isready command.

If you do not have a local postgres installation then use the following command to execute the psql command directly in the container.

docker exec -it pg_db /bin/sh \ 
-c "psql -d pg_db -U db_user -p 5432 -h localhost"

You need to remember to change the port to the one exposed by the container not the mapped docker host. Which will hear be 5432.

So with knowing that the database container is up and running and is not the container at fault we will now look at ways to debug the api container

B.) Find out more about the containers state
When a docker container is in the stopped mode you can always find out more about it with the inspect command

docker container inspect <container_name or ID>

The inspect command prints a lot of information to the stdout.

It includes things like Networksettings, the file system in use, where volumes and data are stored, the state of the container and much more.

It also shows you were the log files are located. You could cd into the docker daemon and to that path and open the file.
But it is way better to utilize a docker command.

C.) How to show logging files from the application inside the container
But instead of going there you can utilize

docker container logs <container_name or ID>

This will show you the stdout of the container on your terminal.
With inspecting this you might be able to find a hint why your application stopped working.

We can see here that it is a SocketException, I already know was the problem is because I purposely created an incorrect connection string.

But from the StackTrace we can find that NPSQL Exception points to some connection issue with the postgres database.

So this is our important clue. Something went wrong on connecting to the database. But we know already that the database is running correctly.

This means the appsettings are at fault somehow in the docker container (because this is were we store the Connection string, right 😉 ).

We could have one of the two following problems here:

  1. Wrong environment so the wrong appsettings is utilized
  2. Wrong Connectionstring in correct appsettings (port, server, database name, user etc.)The first one can be excluded by looking at the result of the docker container inspect api – command, were it says:
"Env": [
"ASPNETCORE_ENVIRONMENT=Docker",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"ASPNETCORE_URLS=http://+:80",
"DOTNET_RUNNING_IN_CONTAINER=true",
"ASPNETCORE_VERSION=2.1.6"
],

So we can tell that this is the desired environment by looking at the ASPNETCORE_ENVIRONMENT=Docker

Now lets check the appsettings inside the container.
We can do this by copying the appsettings file to our local working directory with

docker cp ./appsettings.Docker.copy.json api:/app/appsettings.Docker.json

and inspect it with our favorite file explorer.
The file yielded:

{
"DbContextString" : "Server=WrongHost;Port=5444;User ID=db_user;Database=pg_db",
"Logging": {
"LogLevel": {
"Default": "Warning"
}
},
"AllowedHosts": "*"
}

We have two problems here:

  • We ran the postgres container with port 5434.
  • Also the Server points to WrongHost, but we configured for the api the inDockerHost.So we change the file to
{
"DbContextString" : "Server=inDockerHost;Port=5434;User ID=db_user;Database=pg_db",
"Logging": {
"LogLevel": {
"Default": "Warning"
}
},
"AllowedHosts": "*"
}

After that we push it back to the container with

docker cp ./appsettings.Docker.copy.json api:/app/appsettings.Docker.json

After that we restart the container with

docker container start api

Et Voila it works like a charm as we can see with curl:

Second way to inspect stopped containers

But there is also another way to inspect a file in a container that has stopped. Even though I like the other one more we will explore it here nonetheless because it can come in quite handy:

First you do again a docker list -a and find the ID of your container.
Then you do a

docker container commit <container_id> api:changes

This creates a new image from this container.
With this image you can now run the following command with a changed entry point to enter the container

docker run -it --entrypoint=/bin/bash api:changes

Then we install nano and look into the appsettings by downloading nano and using it to examine the file:

The file is obviously the exact same we had before with the copy command. So we change it to the correct file mentioned above.

After that we leave the container with ctrl+D create another commit and create an image from that.

We can then start another container from this image and have the updated container that will also connect this time.

But as you can tell it is way more to do this way.

For a container run in background with the –rm flag this techniques will not work. So we will explore now what we can do then.

A container run in background stopped working

It might be obvious but if you run a docker container with -d and –rm flags you will not be able to see what went wrong with the above commands, because the container is already removed by the docker daemon.

To solve this you can either look at the image itself with

docker image inspect <image_name:tag or Id>

Which puts out the information about the used image, like environment variables, configuration, Entrypoints and the run commands and much more.

Another more obvious way is to start the container manually in a new terminal window without -d (and without –rm) to obtain the logs printed to stdout.


which prints the logs to stdout.
To then fix this you can refrain to the steps mentioned before.

Also some general tips if you are stuck for a given image or container:

  • Look into docker hub for the specific image and the documentation
  • Activate system logging in dockerfile (to stdout)
  • Check your Dockerfile, maybe it also does do something wrongly, even though you can better find out about this by using the docker image command, because the image and the file might be out of sync already

How to avoid mistakes in general:

  • Be as explicit as possible in dockerfiles and docker-compose files.
  • create healthchecks
  • check your connection strings, network names, host names and ports
  • create some logging of your own or wrap your containers in a logging script.

Advanced features

  • Healthchecks for docker containerized applications
  • In general your application should rather recover from failure instead of simply handling or avoiding failures. You cannot avoid all failures anyhow. Some examples are
    • Circuit breakers
    • Bulkheads
    • Timeouts
  • Connection to database should always be in some sort of retry or fail safe.
  • On -d –rm docker run scenario create a shared volume you write to, and connect with another container.

0 Comments

Leave a Reply

Avatar placeholder