Recently I have been working with the ELK stack to develop some training. I had provisioned a small VM to run all the software required and Logstash was happily ingesting a small amount of logs.
I started to fiddle with the Logstash aggregate filter when all of a sudden Logstash became completely unresponsive. No logs, no warnings or errors, yet systemd told me it was happily running.
Two hours later I had completely reinstalled Logstash only for the problem to persist. In despair I sat back for a few minutes and suddenly logs started flowing.
Turns out Logstash was taking ~5 minutes to start. After a bit of research I found this Github issue which directed me to the JRuby wiki detailing the issue:
When JRuby boots up, the JDK libraries responsible for random number generation go to /dev/random for (at least) initial entropy. After this point, more recent versions of JRuby will use a PRNG for subsequent random numbers, but older versions will continue to return to /dev/random. Unfortunately /dev/random can “run out” of “good” random numbers, providing a guarantee that reads from it will not return until the entropy pool is restored. On some systems – especially virtualized – the entropy pool can be small enough that this slows down JRuby’s startup time or execution time significantly.
Note the words “especially virtualized”. To see your available entropy, run:
Anything less than 1000 is sub-optimal. In my case it came back with 54!
One solution is to install the haveged package. Once installed Logstash should start in a reasonable time. Two hours wasted, hopefully this post will help a few people.