That pesky “soft lockup detected” message…

If you’ve read any of my more geekish entries, you’ll know that I bought a surplus Dell Poweredge 2900 server last year with the intent of setting up a MineCraft server or two for my son.  I decided to use VirtualBox VMs because I liked the notion of playing with the software, and it seemed to me to be a bit more secure. There’s nothing all that sensitive on the machine but still, security is security and in this case access to one server did not mean access to the rest.

Problems in the form of purported kernel panics reported as soft lockup messages began as soon as I started setting up VMs.

soft lockup detected on CPU#x

I tried several Ubuntu versions, MineOS, and straight Debian, but still the issues persisted.  A couple of months ago, I whined to my Facebook friends and Ian Mclaird suggested that it might be an issue with the symmetric multiprocessing (SMP) support.  Thus armed with new words, I began another search and finally discovered that the issue is very specific to SMP in VMs.  It’s also a bit rare, I guess, as there are only two posts that I’ve been able to find addressing the issue.

See

The short version is that you need to tweak a kernel setting and make sure it’s applied at boot.  Using nano or your favorite editor, edit /etc/sysctl.conf as root, simply add a line at the end as shown. Note: for the later kernels at least, you can use a / or a . to delinate the sub-directories.

  • with kernels from somewhere prior to 2013, try: kernel/watchdog_thresh=180
  • For Ubuntu 14 and 15, as well as Debian Wheezie and Jessie, try” /proc/sys/kernel/watchdog_thresh
  • For Ubuntu 16: try  /proc/sys/kernel/hung_task_timeout_secs

Two notes:

First: when I tried to set this Ubuntu 16 I wound up with an error at boot to the effect of “kernel parameters not loaded.” (Sorry, I’m not going to try and recreate the exact error and I can’t find it on line…) What happened is that the value at /proc/sys/kernel/hung_task_timeout_secs is protected. I think, but have not confirmed that there is a maximum value and the sysctl checks the value.  As I didn’t want to mess with it any more, I reviewed the document at the third link (https://www.kernel.org/doc/Documentation/sysctl/kernel.txt)  and set /kernel/watchdog=0.  This was not my first choice as it disables both the hard and soft lockup watchdogs. I may get back to the this but for now, the system(s) are stable, so I’m moving on!

Edit (6/17): one of my Debian Jessie servers hung so I turned the watchdog back on. The max for /proc/sys/kernel/watchdog_thresh appears to be 60. It tosses an error with anything larger.

Second: you can try settings in a non-persistant manner by running (as root, use sudo as necessary)

sysctl -w /proc/sys/kernel/hung_task_timeout_secs=xx

You can also reload the /etc/sysctl.conf file with:

sysctl -p /etc/sysctl.conf

Echo4Golf Clear

Leave a Reply

Your email address will not be published. Required fields are marked *