Graph everything and anything

November 14th, 2012 at 04:11

Its commonplace to use Google Analytics to monitor usage and Nagios to get alerts about outages. However some folks are going a whole lot further in the world of data & analytics…

Etsy aim to

“make it ridiculously simple for any engineer to get anything they can count or time into a graph with almost no effort”
Measure Anything, Measure Everything

They highlight that application metrics are generally harder than network or machine ones as they are very specific to your business, and they change as your applications change. For this reason its important to make it trivial to add/change metrics.

They use StatsD and Graphite. StatsD listens for messages on a UDP port, parses the messages, extract metrics data, and periodically flushes the data to graphite. Plus StatsD automatically tracks the count, mean, maximum, minimum, and 90th percentile times (which is a good measure of “normal” maximum values, ignoring outliers).

Heres one of their graphs:

Plus another view of the same data that highlights exceptions (using graphite’s data-processing tools to make a graph that highlights deviations from the norm):

Because its so simple to log and then visualise new metrics its easy to use these tools to identify problems and track down bottlenecks. Heres a couple of examples:

Plus you can even send alerts via Nagios with attached graphs from Graphite or use a command line interface to draw graphs.

