Crash and illegal instruction in Electron, Chromium and friends
This post is a - hopefully - searchable note about a very ugly effect, where Chrome, Chromium, and everything based on its Electron backend just crash in various ways, and one possible cause, with the suggestion what to do (spoiler: panic!).
I was updating my system, which is Debian/unstable, so there was a good chance that some library is broken, or the display driver, or the Xserver, or maybe libc6 herself, because various parts were just upgraded.
Funny that I rarely have the time to play games, but today I needed to stay up so decided to check Skyrim again (which runs flawlessly under Linux+Steam, mind you; how technology have advanced…), and had to realise that the Steam UI just crash. Beta too. Then Chromium crashed. Then Signal desktop client, and I started dot the t's, cross the i's and connect them with lines… I mean, I started to be suspicious about Chrome/Electron, and indeed, most of the other stuff worked. So, I started looking.
The effect
First, Steam crashed[1]:
kernel: traps: steamwebhelper[16276] trap invalid opcode ip:7f454b9db794 sp:7ffe71c9afa0 error:0 in libcef.so[7f45490ef000+7770000] kernel: traps: Compositor[31480] trap invalid opcode ip:7fc275ddb794 sp:7fc269df70c0 error:0 in libcef.so[7fc2734ef000+7770000] kernel: traps: steamwebhelper[31009] trap invalid opcode ip:7f81b37db794 sp:7ffcc9e43520 error:0 in libcef.so[7f81b0eef000+7770000] kernel: traps: Compositor[3035] trap invalid opcode ip:7f92c03db794 sp:7f92b43f73f0 error:0 in libcef.so[7f92bdaef000+7770000] kernel: traps: Compositor[3522] trap invalid opcode ip:7f92c03db794 sp:7f92b43f7290 error:0 in libcef.so[7f92bdaef000+7770000] kernel: traps: Compositor[3680] trap invalid opcode ip:7f92c03db794 sp:7f92b43f7290 error:0 in libcef.so[7f92bdaef000+7770000] kernel: traps: Compositor[3856] trap invalid opcode ip:7f92c03db794 sp:7f92b43f73f0 error:0 in libcef.so[7f92bdaef000+7770000] kernel: traps: Compositor[4029] trap invalid opcode ip:7f92c03db794 sp:7f92b43f73f0 error:0 in libcef.so[7f92bdaef000+7770000]Then Chromium followed soon:
kernel: traps: ThreadPoolForeg[11069] trap invalid opcode ip:55579b3e2cdd sp:7f9f021fb6b0 error:0 kernel: traps: ThreadPoolForeg[11067] trap invalid opcode ip:55579b3e2cdd sp:7f9f029fc6b0 error:0 error:0 kernel: in chromium[555796f53000+a30b000] kernel: in chromium[555796f53000+a30b000] kernel: in chromium[555796f53000+a30b000] kernel: kernel: kernel:Nice!
Fiddle-fiddle
The first was trying to disable the Chrome sandbox (CEF) since the net wisdom mentioned that Chrome sandbox is very picky about library updates, and may react with a crash to an unknown library call. Turned out that Steam already did that by itself since this was an already "worked around" bug (in clone3() call if anyone interested). Anyway, disabling CEF didn't help (since it was already partially disabled).
The next was upgrading libraries, since it may have been caused by any incompatible library, but after updating everything Electron depends it still crashed. So I have updated the display driver, since the previous update may have partially done that and the closed-source nvidia driver sometimes make funky things when the kernel driver mismatches the xwindow libs; restarted X, to no avail though. Still crashing. After that I was worried that any library may cause this since I have only upgraded some code instead of the whole system and some libs may become incompatible with some other libs. It took a while to upgrade the system, but the effect was the same.
I was worried that the machine needed a reboot, mainly because these modern mainboards have the ugly habit of not booting MBR in various circumstances and grub mentioned in a deadpan voice that "there was no BIOS partition in sda so the boot wasn't updated", which means that if BIOS is happy with sdb then the system boots, if not then the system needs magic. So I really don't want to reboot right now.
Then went to sleep.
You did what?
So I was in the general unhappy mood, since I really didn't want to reboot, and a lot of stuff uses Electron, and since the libs were upgraded I was thinking that it's either a lib I need to downgrade, but it's a bit hard to point to one among the 5000 present, or I have to reboot the machine which may or may not work. I was also doing my work, and tried to start something (some compiler, actually, but that's not important), which failed in a way suggested that a disk is full and the temp files were not possible to save. Sometimes happens (the stuff always fills any storage, no matter the size), and I started looking. No, the disk is not full. None of them, in fact. But… what is this?!
# df Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 6662240 6662240 0 100% /dev/shm
You may not know /dev/shm. It is a cute temporary storage in shared memory, kind of /tmp/, which is fast, comparable in size to the total memory and usually quite empty.
Well not today! It is in memory, so it gets forgotten throughout reboots, but betwen them it is fairly stable: if someone puts a file there then it shall remove when it's not needed anymore. If specific someones create bazillions of gigantic clutter and ignore to remove then the storage (along with some of the available memory) is gone. Full. To the brim.
And of course it was… [drum roll] Steam+Electron!
Steam likes to crash, not badly, but since it juggles with various black magic stuff (including lot of shell scripts, an electron based UI, a patched-to-death WINE called Proton, various Vulkan shader pre-, post- and nobody-know-why-compiling, apart from weird updates) it sometimes throws in the towel and faints. However, as it turned out, it leaves its stuff behind. And not small stuff, mind you. Hundreds of megabytes per crash. And it takes only 5-6 of those to fill the poor storage. And it did.
And from then on nothing which uses the storage (and smart enough to notice that it's full, I'm looking at you, Electron!) was able to operate, and have been crashed in various absolutely meaningless and non-deterministic way. Call that impossibe to debug.
I see the tough life of the maintainers of Electron based stuff: they get these invalid opcode error reports and it may have been caused by absolutely anything.
Cleared /dev/shm (everything older than 2 days, minus /dev/shm/.tmpfs). Everything works. Damn.
Summary
Chrome/Electron crashes, illegal instructions in libcef.so or in ThreadPoolForeg(round) caused by the temporary storage /dev/shm being full; filled by crashed Electron-based code (Steam).
1. the visuals of the crash log superimposed on the keyword list on the side is part of the dramatic effect! :-)