Action Item #472
Boot time speedups
|Status:||In Progress||Start date:||25 Nov 2015|
|Category:||09 - Testing|
|Target version:||1.1.0 - Upgrades|
|Severity:||04 - Low|
- Switch to LZO for kernel compression
- Delay Loop Calibration: ''lpj=''; can save > 100ms on ARMv5 based systems (boot and look for Calilbrating Delay Loop and check what lpj is set to, then use that on command line)
- Parameters for boot time analysis: ''initcall_debug'', ''printk_time=1''
- Switch rootfs to UBIfs - better for flash devices
- Use initramfs (see slides)
- Integrate as many init scripts as possible into single program (my own init program), then use init=myprog
I probably won't get to <1s boot time without bootloader access but I can certainly get to <10s, possibly to <5s.References
RM #472: And lpj, calculated by firstboot, to kernel boot args to speed boot process.
RM #472: Speed up module identification and loading, shaving about 7 seconds off the boot time. Fix RPi spi module name.
RM #472: Switch to just using mdev (and not nldev) because it's working better now to load needed drivers.
RM #472: Speed up the method used to load modules from modules.conf, and remove extraneous modules that are already handled (re: loaded) by mdev.
RM #472: Switch to LZ4 kernel compression, which is not enabled by default by the defconfig for RPi.
RM #472: Add pbsetentropy utility, to allow bumping available entropy after seeding /dev/urandom.
RM #472: Add RNG setup that not just seeds urandom but bumps the available entropy at boot time based on that seed.
RM #472: Only run depmod in firstboot. Place smb/nmb into the background - nothing depends on
them so it lets other stuff startup in parallel.
#3 Updated by Hammel about 2 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 10
#4 Updated by Hammel about 2 years ago
Looking through dmesg to see boot times I found the following things:
A bunch of eth drivers are compiled in. Only smsc95xx needs to be enabled and even that could be a loadable module.
bcm2708_spi takes 3.5 seconds to load
sda1 (ext4) takes 9 seconds to load
wlan0 takes 8 seconds to load
X (and the fbtft) takes 17 seconds to load
#6 Updated by Hammel about 2 years ago
Getting from power on to mounting the SD card takes about 6 seconds on a Pi 2.
[ 5.844178] EXT4-fs (mmcblk0p2): recovery complete [ 5.853241] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null) [ 5.853328] VFS: Mounted root (ext3 filesystem) readonly on device 179:2. [ 5.863535] devtmpfs: mounted [ 5.870637] Freeing unused kernel memory: 1024K [ 5.870975] Run /sbin/init as init process
Nearly all of this is getting USB and SD access setup. Then getting the RNG takes another 5 seconds.
[ 6.156952] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null) [ 9.468533] FAT-fs (sda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. [ 9.486838] FAT-fs (sdb1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. [ 9.737072] EXT4-fs (mmcblk0p3): mounting ext3 file system using the ext4 subsystem [ 9.758496] EXT4-fs (mmcblk0p3): recovery complete [ 9.758531] EXT4-fs (mmcblk0p3): mounted filesystem with ordered data mode. Opts: (null) [ 9.979002] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. [ 10.917545] random: crng init done
This can be fixed with preseeding. The rest of the time goes to loading modules. By the time the Bluetooth init script is run,
which is the last one that shows up in dmesg, we're at close to 30 seconds.
[ 28.846777] Bluetooth: BNEP socket layer initialized
Things I can do¶
- Call log() in all my custom scripts so I can tell which ones are taking a long time. Or at least call log() from rcS to say which script is being called. That will get more info into dmesg.
- Put network startup in the background. Nothing needs it up front to start, I don't think.
- Add RNG seeding early in the boot sequence.
- I see this with at least some wifi drivers: unknown parameter '11n_disable' ignored. I should take those out from the drivers I've seen that message from (rt2800usb, carl9170).
- There is something happening here that's taking 8 seconds - find out what it is:
[ 11.725594] usbcore: registered new interface driver rt2800usb [ 19.922011] snd_bcm2835: module is from the staging directory, the quality is unknown, you have been warned.
I think this is module loading from /etc/modules.conf. I might be able to slurp just the module names from the file and feed them as one request to modprobe.
I feel confident I can get this down to 15 seconds, maybe less, to get to the UI. And with the changes to bootsplash it will seem much quicker.
#7 Updated by Hammel about 2 years ago
Module loading was a bottleneck. I fixed the init script to scarf all the module names into a single variable and pass them all at once to modprobe. I also switched from depmod to depmod -A which helped a bunch too. This shaves about 7 seconds off the boot time.
The next biggest bottleneck is mdev. It takes 9 seconds for that to complete. One possible solution is to write a C program that handles all the configuration of usbhandler and friends (from nldev). Still, mdev has to spawn that C program for each device it finds so it's hard to say how much time this would save. I can start by writing usbhandler in C and seeing if that helps.
After a little googling I found that mdev has its own init script that I've been overwriting. If I use it, which has the following:
echo /sbin/mdev >/proc/sys/kernel/hotplug
/sbin/mdev -s # coldplug modules
find /sys/ -name modalias -print0 | xargs -0 sort -u | tr '\n' '\0' | xargs -0 modprobe -abq
then it seems to boot about 6 seconds faster overall, but it also loads a lot more modules (due to that last line). In fact, it loaded the module for the Pi Zero W wifi even though I'd commented it out of the modules.conf and the node came up using that wifi instead of the usb dongle I had connected (which came up as wlan1). So it seems I should drop my mdev script, and possibly drop loading my modules as this takes care of it, and much faster.
Network bring up is about 5.5 seconds. I can put that into the background and hang dependent services off it, like smb.
smb startup, on its own, takes almost 9 seconds. Combined with the network that's almost 15 seconds that could be done in the background so the UI can be up faster.
#8 Updated by Hammel about 2 years ago
I commented out all the modules in /etc/modules.conf and tried again. I found the following modules were not loaded automatically by the mdev changes:
However, there are alot of new drivers loaded. One of them is hid_multitouch. I'm not certain yet but I may need to blacklist that in order for tslib to work. That said, I ran the following and it looks okay:
No device specified, trying to scan all of /dev/input/event*
/dev/input/event0: ILITEK ILITEK-TP Mouse
/dev/input/event1: HID 6901:2701
/dev/input/event2: HID 6901:2701 Mouse
/dev/input/event3: HID 6901:2701 Consumer Control
/dev/input/event4: HID 6901:2701 System Control
/dev/input/event5: ILITEK ILITEK-TP
Despite having hid_multitouch loaded, event5 is mapped correctly to the touchscreen and shows up correctly so maybe I don't need to blacklist that driver. More testing is required.
#9 Updated by Hammel about 2 years ago
- % Done changed from 20 to 30
Modules loading is cleaned up for both S10mdev (which now handles many of the module loading duties) and S11dev which does manual modules.conf module loading. It's a bit faster now.
Changes tested, committed and pushed.
log() can't be called by rcS directly (by adding it to the script) because we still need to source the functions script, which in turn sources package specific functions like bumpSplash. Getting rid of psplash (see RM #689) will help here.
So now we're down to this:
[ 37.781344] Running /etc/init.d/S97stopsplash
That's the last init process running before the UI gets going. I still think there are things I can do to speed this up, especially if I can offload network startup stuff to run in parallel with UI startup.
#10 Updated by Hammel about 2 years ago
The defconfig for RPi uses gzip for compression and I wasn't even grabbing that image - I was grabbing the uncompressed image.
I've switched to LZ4 compression and am now copying the compressed image to the package tree.
Changes tested, committed and pushed.
The following could be used with a stripped down initramfs to get a bootsplash faster. This would then switch to the animated splash in the init processing.
/etc/init.d/start.sh: #!/bin/sh mount -t proc proc /proc mount -t sysfs sysfs /sys mount -t devtmpfs devtmpfs /dev # Mount RFS / do some critical stuff mount /dev/mmcblk0p1 /media fbsplash -s /media/splash.ppm -d /dev/fb0 mount -o move /proc /media/proc mount -o move /sys /media/sys mount -o move /dev /media/dev # Switch to production system exec switch_root /media /linuxrc
Improving RNG setup¶
Getting RNG setup early is partly done, but there is the usual problem of not just seeding, but increasing the count of available entropy. The former is already done. The latter is not, because that requires an ioctl call for RNDADDTOENTCNT.
I can add a C program to piboxlib/ptools that handles this. It's a simply utility for bumping the available entropy - nothing more - using the ioctl. I can then add that to S20urandom (which I'll have to dup in my buildroot skeleton from the buildroot target). And I can generate a seed file with 4096 bits of entropy and add it to the initial build.
But S20urandom will need to change to read the whole 4096 bytes - or however large the entropy pool actually is. Right now it just reads 512 bytes from /etc/random-seed (which is where the new initialized data should go during the build).See:
- man urandom
The man page for urandom explains how to do all of this.
#12 Updated by Hammel about 2 years ago
- % Done changed from 30 to 60
RNG updates completed and tested, committed and pushed. This includes the C program, pbsetentropy, added to libpibox.
The last speedup I want to try is networking: how can I startup the network interfaces and then all the subsystems that depend on it in the background while the UI comes up?
#15 Updated by Hammel about 2 years ago
- File bootchart.png added
I ran bootchard tonight. Here is the image.
What surprised me is that it's not really networking interface startup that's the bottleneck. There are two bottlenecks. The first is the depmod command, which may just be superfluous (it could be run on firstboot if necessary) and can be outright dropped. The second is Samba startup. That could be handled by placing the daemons (smb/nmb) in the background. I can start them first in a subshell and then background them to fully daemonize their startup.
Together, that may shave 20 seconds on a 40 second startup!
#16 Updated by Hammel about 2 years ago
- File bootchart-2.png added
I was right. Those two things shave about 15 seconds off the boot time.
Next would be parallelizing processes depending on an IP (nptd, sshd, httpd, etc.). That could save a few more seconds, plus there are up to 9 seconds worth of artificial sleeps in the init scripts. So I could probably drop another 10 seconds! Woohoo!
#17 Updated by Hammel about 2 years ago
- File bootchart-rpi3.png added
- Priority changed from Immediate to High
- % Done changed from 60 to 70
Those previous tests were on a single-core Raspberry Pi Zero W. I tried the same build on a quad-core Raspberry Pi 3 and got even better results.
I've got it down to about just under 12 seconds! And at least 1 second of that is getting bootchard started. And another second on psplash getting started, which I could put in the background (or will replace later). So I could get down to under 10 seconds boot on the RPi 3.
- Wifi setup
- modprobe processing
The former I can do some tricks with (there is an artificial sleep in there that probably could be sync'd better). The latter I'm not sure what else I can do. I have tried generating the modules list on first boot so it doesn't have to get rebuilt on each boot, but that doesn't help modprobe process each module.
I think this is good enough for now. I want to move on to other things and will return to this later to try and squeeze even more out of the boot times.