Intro
Unlike your home PC, critical security servers basically never just restart on their own. But today I got a Zabbix notification that indeed one of ours had done just that. I set out to investigate to learn the root cause and prevent it from recurring. things took a weird turn at some point.
The details
I asked my colleague to investigate the server, He dutifully looked at various logs. There didn’t seem to be anything amiss according to the logs. Then he showed me a screenshot of the CLI prompt. It had been up for 497 days and some hours. So there was no restart.
We speculated that maybe just the ASM module had restarted, but without any real evidence.
System uptime is an SNMP MIB which we monitor in Zabbix.
The graph of this uptime looks like this:

To be continued…