Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
B
blog
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
nimrod
blog
Commits
de135d15
Commit
de135d15
authored
4 years ago
by
nimrod
Browse files
Options
Downloads
Patches
Plain Diff
Post on metrics collection.
parent
2ed83fd5
No related branches found
No related tags found
No related merge requests found
Pipeline
#1405
passed
4 years ago
Stage: test
Stage: deploy
Changes
1
Pipelines
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
content/collecting-metrics.rst
+66
-0
66 additions, 0 deletions
content/collecting-metrics.rst
with
66 additions
and
0 deletions
content/collecting-metrics.rst
0 → 100644
+
66
−
0
View file @
de135d15
Collecting metrics
==================
:date: 2021-05-08
:summary: Integrating metric collection in your application
A few startups I worked at had a similar story. When they got started they
didn't have any metric collection (maybe some system metric from their cloud
provider, but nothing more). After a few times where they had to debug an issue
where metrics were needed they decided to start collecting metrics from the
application. Since they were a small team with little experience in setting up
the needed infrastructure or the man power to handle such a task they decided to
use a SaaS product (NewRelic and Datadog are both good choices here). Then as
the company grew and the number of users, processes, components and instances
grew so did the bill from the that SaaS. This is usually the time where they
decide that they need an DevOps person on the team (not just because of the
metrics issue, but as the company matures, the customer base grows, uptime
requirements increase, scaling is an issue, etc.). What follows is my advice on
such undertakings.
First of all, use StatsD. Not specifically the StatsD daemon but the protocol.
It's mature, flexible enough for most tasks, supported in all languages and in
all metrics collection software. You can even use netcat to push metrics from
shell scripts.
You don't have to setup storage for the metrics you collect, CloudWatch, Librato
or similar services can be used and are cheaper than the more integrated SaaS
offering. If you have Elasticsearch already for log collection, you can use that
for storing metrics as well (great for a few dozen instances). InfluxDB and
Gnocchi support the StatsD protocol. As you can see, it's a flexible solution.
I think that most people associate the StatsD protocol with the Graphite
protocol and software, but there is a key difference: StatsD uses UDP. It's
faster and there's less overhead. You can default to sending the metrics to
localhost and if nothing collects them that's still fine (great for local
development and CI jobs where you don't want to run metrics collection software
or where it doesn't make sense). I know that some would say that TCP is more
reliable than UDP and you can lose metrics using it. To that I would say that
the lower overhead of UDP, the lack of connection is actually an advantage. No
:code:`connection closed` errors, that single packet is more likely to reach its
destination in case the system is under heavy load than opening a new TCP
connection. You can read this `GitHub blog post
<https://github.blog/2015-06-15-brubeck/>`_ to see how they dealt with their
metric loss.
I know that Prometheus is the new hotness and very popular when running
Kubernetes and it's not a bad choice. But there are a few things that you need
to keep in mind when considering it. It's best when you have service discovery
available for it (if you're already using Kubernetes or running in a cloud
provider that's not an issue but that's not everybody). Also, for short lived
tasks (like Cron or processes that pull from a job queue) or for processes that
can't open a listening socket (or if you have many of the same processes running
on the same host), you need to setup a push gateway. The process pushes metrics
to the gateway and Prometheus collects them afterwards (sounds an awful like the
StatsD daemon, doesn't it). Prometheus has a StatsD exporter so you can still
use StatsD along with Prometheus.
If you want some ad-hoc or more lightweight metrics collection, `statsd-vis
<https://github.com/rapidloop/statsd-vis>`_ is a good solution. It holds the
data in memory for a configurable time and has a builtin web UI to watch the
graphs. I have a a very small `Docker image
<https://hub.docker.com/r/adarnimrod/statsd-vis>`_ for that.
Lastly, for Python applications I recommend the `Markus
<https://markus.readthedocs.io/>`_ library. It's easy to use, reliable and you
add more metrics as needed. 2 thumbs up.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment