thinking out loud

  • Homepage
  • Blog
  • Archives

My wishlist for software I have to run in production

Published: Fri 16 September 2016
By Adam DeConinck

In misc.

tags: 'devops' 'opslife' 'sysadmin' 'computing'

  • When possible, do things the "Unixy" way

  • There should be a way to tell the service not to accept more work

    ... and a way to tell it to come back online.

  • Where possible, there should be a "dry run" mode

  • There should be a way to evict a node from the cluster, both cleanly and forcefully

  • There should be a manpage

    In a pinch, I'll accept a '--help' option that is actually helpful. :) But web-only documentation is problematic, especially as it's often only maintained for the most recent version. It's also profoundly unhelpful when I'm running the service on a network that doesn't connect to the Internet.

  • Make your software easy to package

  • Make your software easy to containerize

  • On Linux, be as distro-agnostic as you can.

    This isn't me saying you can't rely on new and shiny features. I like them too. :) But try to rely on those features in the most specific way you can -- i.e., must have Linux kernel feature X or Python version Y or have Z daemon running. "Must be running on Ubuntu 16.04" is just as bad for a backend service as "must be viewed in Chrome" is on the Web.

  • Log to a file, or the system journal, or to a remote service built for logging. Please don't log to a specialized database.

    When it's 3 AM and I'm trying to remember how to spell ls, I don't also want to have to remember how to write an SQL query against your custom logging schema. I also don't want to have to figure out your custom tool for querying service specific logs.

  • Log messages should specify their severity level, and there should be a way to set the level that is actually logged.

    I personally like the regular syslog severities from RFC 3164 (e.g. err, warn, info, debug), but really they can be named after Pokemon as long as their meaning is clear. :)

    I've seen the argument that the current crop of log management tools make it unneccessary to set and filter log levels. And it's true that Splunk/Kibana/etc are pretty good these days, so it's easy to filter logs when viewing them instead of at log time. However, resource congestion is still a thing, and when debug logs fill a 500 GB disk in a day because they're just that noisy, that's obviously going to ruin your day. However, being able to get those noisy logs when debugging is also a good thing. So please -- make this a tunable.

  • Your service should be able to handle log rotation.

    That problem with logs filling up a disk? Log rotation exists to help with that. :)

    Again, in the Unixy spirit, I strongly prefer software that includes a logrotate.d file and can take a signal (i.e., SIGHUP) to start logging to the new file. I've spent way too much time beating my head against various vendors' buggy implementation of log rotation built into their service instead of just using logrotate. But occasionally I've seen service-specific rotation make sense, so just do what makes sense.

Proudly powered by Pelican, which takes great advantage of Python.

The theme is by Smashing Magazine, thanks!