Imagine yourself – a software developer – going after a bug some customer reported. Do you remember saying “God, I hate these descriptive logs! I wish we logged less stuff!” You don’t remember saying it? Me neither. Because I ❤ logs.

Logs

(This article is part of What I Have Learned As a Software Developer series.)

What do I mean by logging? Usually something like adding a line of code such as this one:

logger.info("User logged in");

From my experience, people love to read logs once there is a problem they need to solve. Yet, people often forget to write them. Unless someone reminds them, they can write code for days without a single log. Logs are windows to the past. Logs are evidence you need to catch your killer. You need them.

Most Common Mistake: Missing Context

Software developers tend to focus on writing good code that satisfies the requirements. But they are not used to explaining stuff. And often, you can see it in logs. If you are lucky enough and there are some logs, the context is missing.

Hence the most common mistake: not enough information in logs to fully understand what the hell is going on. How to tell if there is enough information?

Pouring whisky

Say you drink a few glasses of Laphroaig, your favorite single malt Scotch Whisky. You go to sleep. And I wake you up by hitting a gong while screaming at you the log message: if you understand it, it’s a good message. Every time you write a log message, imagine yourself half-drunk in the middle of the night. Would you understand?

I know you’ll never read the log messages half-drunk in the middle of the night. But you are going to read the messages once there is trouble in production. The services are down, the company is losing money, you are under stress and desperately trying to find some answers in the log messages. You aren’t even at work, you are on a train on your phone reading the logs trying to help. Yes, it happened! Hence the rule of thumb:

Every log message should contain the content of some variable.

This simple rule helps you to write better messages. Is “User logged in” a good message? No, it’s not. The context is missing, there is no content of any variable. What about “User ${userId} logged in”? Suddenly, you know more. If you remember to print a single variable there, you often print more:

“User ${userId} successfully logged in the system ${systemId} as ${userRole} with generated session id ${sessionId} which expires at ${sessionExpirationTime}”

I’d say it’s pretty clear what happened. Even if I wake you up in the middle of the night.

If you write an error message, mention expected state, actual state, and sometimes even tips to fix the error

“User ${userId} is not authorized to execute ${actionName} because it’s missing the permission ${requiredPermission}. Try to switch a role or log in as a different user.”

If you read it while debugging, you have a pretty good idea of what is wrong and what should you try next.

A structure

Write structured logs. Recently, we have taken advantage of structured logs. Instead of writing a full sentences we write log messages as a JSON with custom data. Something like this:

logger.info('User successfully logged in the system', {
	"userId": userId,
	"systemId": systemId,
	"userRole": userRole,
	"sessionId": sessionId,
	"sessionExpirationTime": sessionExpirationTime
});

Such logs are easier to filter in tools like Kibana.

Choose proper log level. The common misconception is to log every “error” as an “error” level. E. g. whenever someone requests some path that does not exist on the server, the server returns 404 NOT FOUND HTTP status and you log an error such as

logger.error('Requested path ${url.path} does not exist');

But is it really an error? The server behaved as expected. Someone requested a non-existent route and the server responded with 404. This is supposed to happen. Do not log messages as errors if everything happens as expected. This is not supposed to be logged as an error level. Maybe warning, probably info level.

This is supposed to be logged as an error on the client-side, probably. If you write a client that requests the server and if you request by accident /usesr instead of /users, log the 404 response as an error. It’s the client’s error. Not server’s error. Distinguish between it. Otherwise, your logs will be cluttered by non-important error messages.

Every error in your logs should lead to a task that someone has to do. We think about the log level. The result is that whenever there is an error in the logs, someone has to do something.

  • A service is unable to connect to the database: someone has to check the connection, maybe restart the service, restart the database, fix DNS issue, or whatever.
  • A type error such as “NullPointerException” occurs: someone has to fix it in the code.
  • A service is unable to connect to remote REST API service: someone has to figure out what is going on, maybe changing configuration because the remote service moved to a different URL.

This should be your goal too. Having thousands of errors every day in your log stream is a big no-no. It’s simply not an error if you have been ignoring it for the last twelve months. Act according to it. Next time, you don’t need to stress about problems in production, you just take a sip of tea, look at the logs a find the problem immediately!

A teapot full of green tea

(Pictures are from unsplash.com)

Further reading