Two simple probes to check on your Raspberry Pi's wellbeing!

After 3 years of service, my Raspberry Pi’s filesystem finally got corrupted. I expected it to crash earlier, but it lasted for quite a while!

Even if I had backups, I did have to reinstall it from scratch. I was using Munin to monitor my Raspberry Pi, and I think it’s a good solution for this kind of device because it’s lightweight and performs very little I/O.

Anyway I decided to upgrade my monitoring stack, as on the rest of my infrastructure, with the Telegraf - InfluxDB - Grafana (TIG) stack. I used an USB key (🤷🏻‍♂️) as the storage for InfluxDB. We’ll see how it runs in the long term!

I used the munin-rpi-temp plugin to monitor my RPi’s CPU temperature and frequency. I wanted to have that on the new stack too, so let’s see how I proceeded.

The probes

The 2 scripts are very simple.

pi@raspberrypi ~> cat /usr/local/bin/rpi-temp

awk '{print $1/1000}' /sys/class/thermal/thermal_zone0/temp
pi@raspberrypi ~> cat /usr/local/bin/rpi-freq

echo "$(( $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq)*1000 ))"

Make sure the 2 scripts are executable.

In rpi-temp we divide the output by 1000 to get a result in °C. We use awk in order to have a float.

In rpi-freq with multiply the output by 1000 to get Hertz. It’s an integer so no need for awk.

The Telegraf inputs

Next, we need Telegraf to execute the scripts and get the output.

  commands = ["/usr/local/bin/rpi-temp"]
  name_override = "rpi_temp"
  data_format = "value"
  data_type = "float"

  commands = ["/usr/local/bin/rpi-freq"]
  name_override = "rpi_freq"
  data_format = "value"
  data_type = "integer"

Since the output is a simple number, we can use a value as data_format. See the exec plugin documentation for more information.

Grafana visualisation

I made 2 simple panes:

Here is the JSON for the frequency panel and for the temperature panel.