Logging Best Practices

Logging is not a new concept and in no way special to Python. Logfiles have existed for decades and there’s little reason to reinvent the wheel in our little world.

Therefore let’s rely on proven tools as much as possible and do only the absolutely necessary inside of Python*.

A simple but powerful approach is to log to unbuffered standard out and let other tools take care of the rest. That can be your terminal window while developing, it can be systemd redirecting your log entries to syslogd, or your cluster manager. It doesn’t matter where or how your application is running, it just works.

This is why the popular twelve-factor app methodology suggests just that.

*

This is obviously a privileged UNIX-centric view but even Windows has tools and means for log management although we won’t be able to discuss them here.

Canonical Log Lines

Generally speaking, having as few log entries per request as possible is a good thing. The less noise, the more insights.

structlog’s ability to bind data to loggers incrementally – plus thread-local context storage – can help you to minimize the output to a single log entry.

At Stripe, this concept is called Canonical Log Lines.

Centralized Logging

Nowadays you usually don’t want your logfiles in compressed archives distributed over dozens – if not thousands – of servers or cluster nodes. You want them in a single location. Parsed, indexed, and easy to search.

ELK

The ELK stack (Elasticsearch, Logstash, Kibana) from Elastic is a great way to store, parse, and search your logs.

The way it works is that you have local log shippers like Filebeat that parse your log files and forward the log entries to your Logstash server. Logstash parses the log entries and stores them in Elasticsearch. Finally, you can view and search them in Kibana.

If your log entries consist of a JSON dictionary, this is fairly easy and efficient. All you have to do is to tell Logstash either that your log entries are prepended with a timestamp from TimeStamper or the name of your timestamp field.

Graylog

Graylog goes one step further. It not only supports everything those above do (and then some); you can also directly log JSON entries towards it – optionally even through an AMQP server (like RabbitMQ) for better reliability. Additionally, Graylog’s Extended Log Format (GELF) allows for structured data which makes it an obvious choice to use together with structlog.