Logging Best Practices¶
The best practice for you depends very much on your context. To give you some pointers nevertheless, here are a few scenarios that may be applicable to you.
Pull requests for further interesting approaches as well as refinements and more complete examples are very welcome.
Logging is not a new concept and in no way special to Python. Logfiles have existed for decades and there’s little reason to reinvent the wheel in our little world.
There are several concepts that are very well-solved in general and especially in heterogeneous environments, using special tooling for Python applications does more harm than good and makes the operations staff build dart board with your pictures.
Therefore let’s rely on proven tools as much as possible and do only the absolutely necessary inside of Python[*]. A very nice approach is to simply log to standard out and let other tools take care of the rest.
One runner that makes this very easy is the venerable runit project which made it a part of its design: server processes don’t detach but log to standard out instead. There it gets processed by other software – potentially by one of its own tools: svlogd. We use it extensively and it has proven itself extremely robust and capable; check out this tutorial if you’d like to try it.
If you’re not quite convinced and want an overview on running daemons, have a look at cue’s daemon showdown that discusses the most common ones.
There are basically two common ways to log to local logfiles: writing yourself into files and syslog.
The simplest approach to logging is to forward your entries to the syslogd. Twisted, uwsgi, and runit support it directly. It will happily add a timestamp and write wherever you tell it in its configuration. You can also log from multiple processes into a single file and use your system’s logrotate for log rotation.
The only downside is that syslog has some quirks that show itself under high load like rate limits (they can be switched off) and lost log entries.
Nowadays you usually don’t want your logfiles in compressed archives distributed over dozens – if not thousands – servers. You want them at a single location; parsed and easy to query.
Since syslog is such a widespread solution, there are also ways to use it with basically any centralized product.
Logstash with logstash-forwarder¶
Logstash is a great way to parse, save, and search your logs.
The general modus operandi is that you have log shippers that parse your log files and forward the log entries to your Logstash server and store is in elasticsearch. If your log entries consist of a JSON dictionary (and perhaps a tai64n timestamp), this is pretty easy and efficient.
If you can’t decide on a log shipper, logstash-forwarder (formerly known as Lumberjack) works really well.
lumberjack input is configured to use
codec => "json", having
structlog output JSON is all you need.
See the documentation on the Python Standard Library for an example configuration.
Graylog goes one step further. It not only supports everything those above do (and then some); you can also log directly JSON entries towards it – optionally even through an AMQP server (like RabbitMQ) for better reliability. Additionally, Graylog’s Extended Log Format (GELF) allows for structured data which makes it an obvious choice to use together with structlog.
|[*]||This is obviously a privileged UNIX-centric view but even Windows has tools and means for log management although we won’t be able to discuss them here.|