Skip to content
Snippets Groups Projects
Commit de135d15 authored by nimrod's avatar nimrod
Browse files

Post on metrics collection.

parent 2ed83fd5
No related branches found
No related tags found
No related merge requests found
Pipeline #1405 passed
Collecting metrics
==================
:date: 2021-05-08
:summary: Integrating metric collection in your application
A few startups I worked at had a similar story. When they got started they
didn't have any metric collection (maybe some system metric from their cloud
provider, but nothing more). After a few times where they had to debug an issue
where metrics were needed they decided to start collecting metrics from the
application. Since they were a small team with little experience in setting up
the needed infrastructure or the man power to handle such a task they decided to
use a SaaS product (NewRelic and Datadog are both good choices here). Then as
the company grew and the number of users, processes, components and instances
grew so did the bill from the that SaaS. This is usually the time where they
decide that they need an DevOps person on the team (not just because of the
metrics issue, but as the company matures, the customer base grows, uptime
requirements increase, scaling is an issue, etc.). What follows is my advice on
such undertakings.
First of all, use StatsD. Not specifically the StatsD daemon but the protocol.
It's mature, flexible enough for most tasks, supported in all languages and in
all metrics collection software. You can even use netcat to push metrics from
shell scripts.
You don't have to setup storage for the metrics you collect, CloudWatch, Librato
or similar services can be used and are cheaper than the more integrated SaaS
offering. If you have Elasticsearch already for log collection, you can use that
for storing metrics as well (great for a few dozen instances). InfluxDB and
Gnocchi support the StatsD protocol. As you can see, it's a flexible solution.
I think that most people associate the StatsD protocol with the Graphite
protocol and software, but there is a key difference: StatsD uses UDP. It's
faster and there's less overhead. You can default to sending the metrics to
localhost and if nothing collects them that's still fine (great for local
development and CI jobs where you don't want to run metrics collection software
or where it doesn't make sense). I know that some would say that TCP is more
reliable than UDP and you can lose metrics using it. To that I would say that
the lower overhead of UDP, the lack of connection is actually an advantage. No
:code:`connection closed` errors, that single packet is more likely to reach its
destination in case the system is under heavy load than opening a new TCP
connection. You can read this `GitHub blog post
<https://github.blog/2015-06-15-brubeck/>`_ to see how they dealt with their
metric loss.
I know that Prometheus is the new hotness and very popular when running
Kubernetes and it's not a bad choice. But there are a few things that you need
to keep in mind when considering it. It's best when you have service discovery
available for it (if you're already using Kubernetes or running in a cloud
provider that's not an issue but that's not everybody). Also, for short lived
tasks (like Cron or processes that pull from a job queue) or for processes that
can't open a listening socket (or if you have many of the same processes running
on the same host), you need to setup a push gateway. The process pushes metrics
to the gateway and Prometheus collects them afterwards (sounds an awful like the
StatsD daemon, doesn't it). Prometheus has a StatsD exporter so you can still
use StatsD along with Prometheus.
If you want some ad-hoc or more lightweight metrics collection, `statsd-vis
<https://github.com/rapidloop/statsd-vis>`_ is a good solution. It holds the
data in memory for a configurable time and has a builtin web UI to watch the
graphs. I have a a very small `Docker image
<https://hub.docker.com/r/adarnimrod/statsd-vis>`_ for that.
Lastly, for Python applications I recommend the `Markus
<https://markus.readthedocs.io/>`_ library. It's easy to use, reliable and you
add more metrics as needed. 2 thumbs up.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment