How I used journalctl to Determine the Source of an Electrical Problem

0

Image by: freeimageslive.co.uk CC-by-SA 4.0

The journalctl command can extract interesting data from the systemd journals. I never thought it could help with hardware electrical problems – until I tried it.

I have a home computer lab with an almost constant inventory of 12 computers that includes laptops, towers, and a MasterFrame 700 that keeps all its components in view. With all those computers failures of various types are a fact of life. Of course that’s what we SysAdmins live for – right?

One of my tower systems needed a new 200mm top fan as the old one was making end-of-life rattles. I powered it off and pulled it out of its location on the shelf. I usually make the easy repairs like fan replacement with the computer in-place and connected because it’s less work than disconnecting everything in order to move it to a workbench.

I had replaced the defective fan and restarted the system to do some initial testing but it powered off a few seconds before the startup process was finished.

Problem Determination

Because a couple of the external cables were a bit tight, I removed all except the power cord and then connected the host to one of the testing harnesses connected to my KVM switch. I tried powering on again and – quite surprisingly – it completed startup and just sat there running as if nothing were wrong. So the problem was in one of the external connections.

I needed more information. Knowing how much is available from the systemd journalling system, I decided to check the journal. The failure was during startup so the old dmesg command would have giving me information about the most recent boot but not previous ones. That information is buried in the logs and the systemd journal. Just working my way through the logs or journal could take a significant chunk of time out of my day.

However, the journalctl command gives us an elegant method for locating this information. The command in Figure 1 uses some options that cut down the need to search through thousands of lines of log and journal entries. The -b -1 option tells journalctl to display data from the previous boot and not the current one. The –dmesg option informs the command to display only the messages usually displayed by the dmesg command. Finally, -o short-monotonic, tells journalctl to show the timestamps as seconds from initiation of the startup process.

I did still need to search a bit but the error was in red and easy to find.

root@host:~# journalctl -b -1 --dmesg -o short-monotonic

[  119.859917] host.both.org kernel: usb usb2-port2: over-current condition
[  120.067916] host.both.org kernel: usb usb2-port6: over-current condition
[  120.466905] host.both.org kernel: usb usb1-port6: over-current condition
[  120.593902] host.both.org kernel: usb usb1-port14: over-current condition
[  120.723897] host.both.org kernel: usb usb1-port2: over-current condition

Figure 1: Results from the journalctl command pointed me in the right direction.

This data shows that at about 120 seconds into the Linux startup sequence an over-current condition was detected on one of the USB ports. This is a typical electrical short circuit. The list above shows the source via the USB hubs. I could have followed this using a command like lsusb but it was easiest to start with a check of the external USB cables.

I found that the host had an external USB cable connecting to a small 4-port hub where the keyboard and mouse were plugged in. The connector from the host at the external hub had been pulled from its normal straight configuration into a broken 90-degree configuration and the bare wires were showing. I disconnected that cable and powered up successfully. I was also removing the devices from the hub and connecting it directly into my 16-port KVM switch so that hub was no longer needed in any event. Had I not been doing that a new cable would have solved the problem.

I cut the ends off the defective cable and tossed it into my recycling box.

Conclusion

The amount and type of information that can be provided by systemd via the journal still surprises me. It could have taken me a significant amount of time to locate the source of this problem if the data from the journal had not pointed me in the right direction. This incident has reminded me of the vast amounts of information collected by the systemd journal and that I need to remember to utilize it.

Another thing to note from this is that just because there is a power problem, the power supply is not always at fault. That’s where I would have started looking in the past.

I plan to use longer cables instead of ones at the limits of their length. I most certainly caused the cable damage when I pulled the computer out of its normal location instead of disconnecting it first and connecting it to the KVM cable.