Logging Best Practices

Servers

Logging is not a new concept and in no way special to Python. Logfiles have existed for decades and there’s little reason to reinvent the wheel in our little world.

Therefore let’s rely on proven tools as much as possible and do only the absolutely necessary inside of Python[*].

A simple but powerful approach is to log to unbuffered standard out and let other tools take care of the rest. That can be your terminal window while developing, it can be systemd redirecting your log entries to syslogd, or your cluster manager. It doesn’t matter where or how your application is running, it just works.

This is why the popular twelve-factor app methodology suggests just that.

[*]This is obviously a privileged UNIX-centric view but even Windows has tools and means for log management although we won’t be able to discuss them here.

Centralized Logging

Nowadays you usually don’t want your logfiles in compressed archives distributed over dozens – if not thousands – of servers or cluster nodes. You want them in a single location; parsed, indexed, and easy to search.

ELK

The ELK stack (Elasticsearch, Logstash, Kibana) from Elastic is a great way to store, parse, and search your logs.

The way it works is that you have local log shippers like Filebeat that parse your log files and forward the log entries to your Logstash server. Logstash parses the log entries and stores them in in Elasticsearch. Finally, you can view and search them in Kibana.

If your log entries consist of a JSON dictionary, this is fairly easy and efficient. All you have to do is to tell Logstash the name of your timestamp field.

Graylog

Graylog goes one step further. It not only supports everything those above do (and then some); you can also directly log JSON entries towards it – optionally even through an AMQP server (like RabbitMQ) for better reliability. Additionally, Graylog’s Extended Log Format (GELF) allows for structured data which makes it an obvious choice to use together with structlog.