[pkg-go] Bug#886893: Bug#886893: document how to hook into prometheus

Tue May 19 21:12:38 BST 2020

On 2020-05-16 21:21:33, Martina Ferrari wrote:
> Hi Antoine!
>
> On 11/01/2018 01:37, Antoine Beaupre wrote:
>> So I have figured out that mtail can send out metrics as Prometheus
>> wants them, under /prometheus. But I did this mostly by having my hand
>> held by the maintainer - there is very little documentation in the
>> Debian package itself on how to configure it.
>
> I am interesting in knowing more about this. How did you configure that
> behaviour?

I don't exactly remember, to be honest. It looks so straightforward now
that I don't even remember what I wanted to see documented here now.

Which is the entire problem with this bug report: for people like you
and me, it's pretty clear how to hook mtail into prometheus. But for
someone new to both, it's entirely clear.

Yet i can't quite put my finger on it now.

Anyways, here's what I have.

I use a mtail snippet similar to this (jinja template):

https://git.autistici.org/ai3/float/-/blob/master/roles/nginx/templates/nginx.mtail.j2

That file gets dropped in /etc/mtail/nginx.mtail and then mtail exposes
those metrics as a http endpoint.

We tell mtail to parse the nginx logfiles by adding those line to
/etc/default/mtail:

    LOGS=/var/log/nginx/foo.log
    ENABLED=1

And restart mtail.

We export the scrape job from the node to the prometheus server with the
following Puppet code:

  @@prometheus::scrape_job { "${::fqdn}:3903":
    job_name => 'mtail',
    targets  => ["${::fqdn}:3903"],
    labels   => {
      'alias'   => $::hostname,
      'classes' => join(lookup('classes', Data, 'first', []), ',')
    },
  }

This basically tells the Prometheus server to scrape the mtail host on
port 3903. The above turns into
/etc/prometheus/file_sd_config.d/mtail_cache01.torproject.org:3903.yaml:

    # this file is managed by puppet; changes will be overwritten
    ---
    - targets:
      - cache01.torproject.org:3903
      labels:
        alias: cache01
        classes: roles::cache

The final part of the chain is to plot those things in a graph. That is
highly site-specific, unfortunately. In our case, we wanted to plot the
cache ratio of the nginx cache, so we made a custom grafana dashboard
for this:

https://gitlab.com/anarcat/grafana-dashboards/blob/master/cache-health.json

This will look like gobbledyduck for anyone (it is JSON). But the point
is to graph a few Prometheus queries, which are basically:

 "sum(rate(nginx_http_request_details_total{upstream_cache_status=~\"$cache_status\",alias=~\"$alias.*\"}[5m]))",
 "sum(nginx_http_request_details_total{alias=~\"$alias\"}) by (upstream_cache_status)",
 "histogram_quantile(0.5, sum(rate(nginx_http_request_time_seconds_bucket{upstream_cache_status=~\"$cache_status\",alias=~\"^$alias.*\"}[5m])) by (vhost,le))",
 "histogram_quantile(0.9, sum(rate(nginx_http_request_time_seconds_bucket{upstream_cache_status=~\"$cache_status\",alias=~\"^$alias.*\"}[5m])) by (vhost,le))",
 "histogram_quantile(0.99, sum(rate(nginx_http_request_time_seconds_bucket{upstream_cache_status=~\"$cache_status\",alias=~\"^$alias.*\"}[5m])) by (vhost,le))",

There are other queries there, but they are from the node exporter, also
installed on the nginx host.

>> It would be nice to have a simple README.Debian file that would say
>> how to deploy config files (and where) and how to hook it into
>> Prometheus.
> Definitely. I have not done it in the past, because I could not find a
> way that was generic and simple enough. My usual configuration for mtail
> requires a bunch of rewrite rules, and these are dependent on the mtail
> programs loaded. My configuration for scraping mtail with apache and
> postfix log scrapers looks like this:
>
> scrape_configs:
>   - job_name: 'mtail'
>     static_configs:
>       - targets: ['foo.example.org:3903']
>     metric_relabel_configs:
>       - source_labels: [prog, server_port]
>         regex: 'apache_metrics.mtail;(.*)'
>         target_label: instance
>         replacement: ${1}
>       - source_labels: [prog, server_port]
>         regex: 'apache_metrics.mtail;.*'
>         target_label: job
>         replacement: apache
>       - source_labels: [prog]
>         regex: 'apache_metrics.mtail'
>         target_label: server_port
>         replacement: ''
>       - source_labels: [prog]
>         regex: 'apache_metrics.mtail'
>         target_label: prog
>         replacement: ''
>
>       - source_labels: [prog]
>         regex: 'postfix.mtail'
>         target_label: job
>         replacement: 'postfix'
>       - source_labels: [prog, instance]
>         regex: 'postfix.mtail;(.*):3903'
>         target_label: instance
>         replacement: '$1:25'
>       - source_labels: [prog]
>         regex: 'postfix.mtail'
>         target_label: prog
>         replacement: ''
>       - regex: 'exported_instance'
>         action: labeldrop
>
>
> These use the "prog" label to identify the source of data, and rewrites
> the "instance" label so it looks like it is an exporter running in the
> same port as an apache vhost or postfix.

I never got into relabeling. I don't see the point, to be honest: this
makes metrics site-specific and therefore makes it harder to share
(e.g.) Grafana dashboards. It's also yet another configuration to add
into Prometheus, to manage in Puppet and so on. It just makes things
generally harder, and at no obvious benefit from my
perspective... especially for new users.

One thing I'd like to use relabeling for would be to trim down the
number of metrics we're storing, but I haven't quite figured out if
that's worth my time just yet. So far I've thrown hardware at the
problem instead and it was much cheaper.

Thanks!

a.
-- 
A developed country is not a place where the poor have cars. It's
where the rich use public transportation.
                        - Gustavo Petro