Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
B
blog
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
nimrod
blog
Commits
fd0d924c
Commit
fd0d924c
authored
4 years ago
by
nimrod
Browse files
Options
Downloads
Patches
Plain Diff
Post on amilive.
parent
de135d15
No related branches found
No related tags found
No related merge requests found
Pipeline
#1406
passed
4 years ago
Stage: test
Stage: deploy
Changes
1
Pipelines
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
content/monitoring-shore.rst
+60
-0
60 additions, 0 deletions
content/monitoring-shore.rst
with
60 additions
and
0 deletions
content/monitoring-shore.rst
0 → 100644
+
60
−
0
View file @
fd0d924c
Monitoring shore.co.il
======================
:date: 2021-05-08
:summary: Monitoring shore.co.il
Recently, I had some time to work on a project I had on my to-do list for a long
time, monitoring services in `shore.co.il <https://www.shore.co.il/>`_. The
project is now done and is available in my `GitLab instance
<https://git.shore.co.il/shore/amilive>`_.
Requirements
------------
When I write monitoring, I mean periodic checks on services and alerts if they
fail. I had a specific requirement set in mind with this project. I wanted the
monitoring to be reliable, meaning that if anything and everything in my
infrastructure failed, I would still get alerts. This was critical for me since
I run a lot of my infrastructure at home and a prolonged internet or power
outage would bring down many services. Cheap and easy would also be nice.
Architecture
------------
I decided on using Lambda functions along with SMS notifications from SNS on
AWS. Lambda functions can be reliably triggered using CloudWatch Events on a
schedule (every x minutes) and failures can be published to a SNS topic that has
a target that sends SMS messages to my cellphone. So far, very reliable, no
dependency on anything in my infrastructure. For added reliability, I added
CloudWatch alerts in case a function failed to be invoked recently or if the
invocation failed. Said alerts would also send me an SMS message. SMS messages
cost a little (hopefully there would little of those), I don't have enough
Lambda function invocation or runtime to go over the free tier and the price for
the code in S3 isn't great either. For me, it was easier, cheaper and more
reliable than setting up Nagios, Sensu or similar.
Solution
--------
I wrote a few Python functions to test the different services I run (DNS, SMTP,
IMAP, SSH, different web services). To deploy them I wrote a Terraform module
that does everything from creating the SNS topic, upload the Python code and
hook up the Lambda functions. Everything is ran inside a GitLab CI pipeline and
uses the `GitLab remote Terraform state
<https://docs.gitlab.com/ee/user/infrastructure/terraform_state.html>`_ (I
recently had reason to try it out and I was impressed).
Conclusions
-----------
I don't think I would set up this specific solution for a company. A company
would most likely have an on-call schedule. Maybe using a SaaS product would be
easier and better in some aspects (like running checks from multiple locations).
But for my small infrastructure and considerations it was a success. The project
can be adapted to use a service like PagerDuty to have an on-call schedule and
it can be deployed to multiple regions to run checks from multiple regions.
Lastly, Nagios and Sensu have a library of ready checks in Ruby or Perl so you
don't have to write them yourself. This project has been live for more than a
week now and has been reliable. The AWS Cost Explorer predicts that the cost for
this month would be a few dollars. I call it a success.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment