Project

General

Profile

Actions

Action Item #472

open

Boot time speedups

Added by Hammel about 9 years ago. Updated over 1 year ago.

Status:
In Progress
Priority:
Immediate
Assignee:
Category:
09 - Testing
Target version:
Start date:
25 Nov 2015
Due date:
% Done:

70%

Estimated time:
Severity:
05 - Very Low

Description

Some things I can review
  1. Switch to LZO for kernel compression
  2. Delay Loop Calibration: ''lpj=''; can save > 100ms on ARMv5 based systems (boot and look for Calilbrating Delay Loop and check what lpj is set to, then use that on command line)
  3. Parameters for boot time analysis: ''initcall_debug'', ''printk_time=1''
  4. Switch rootfs to UBIfs - better for flash devices
  5. Use initramfs (see slides)
  6. Integrate as many init scripts as possible into single program (my own init program), then use init=myprog

I probably won't get to <1s boot time without bootloader access but I can certainly get to <10s, possibly to <5s.

References

Files

bootchart.png (154 KB) bootchart.png Bootchart of boot processon Pi Zero W. Hammel, 27 Mar 2019 21:45
bootchart-2.png (125 KB) bootchart-2.png Boot time afte rminor adjustments. Hammel, 27 Mar 2019 22:10
bootchart-rpi3.png (131 KB) bootchart-rpi3.png Boot times for RPi 3 with onboard Wifi enabled. Hammel, 30 Mar 2019 11:51

Related issues

Related to PiBox - Action Item #231: kernel config cleanupClosedHammel14 Oct 2013

Actions
Actions #1

Updated by Hammel almost 6 years ago

  • Priority changed from Normal to Immediate
Actions #2

Updated by Hammel almost 6 years ago

  • Target version changed from 1.0 - Atreides to 1.1.0 - Upgrades
Actions #3

Updated by Hammel almost 6 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 10

Other options:

  1. Slides with general boot optimizations

This is mostly the same as the link in the issue description. Some of these would be handled with RM #447 by using squashfs and overlays that are managed by an initramfs init script for booting.

Actions #4

Updated by Hammel almost 6 years ago

From RM#231:

Looking through dmesg to see boot times I found the following things:

A bunch of eth drivers are compiled in. Only smsc95xx needs to be enabled and even that could be a loadable module.
bcm2708_spi takes 3.5 seconds to load
sda1 (ext4) takes 9 seconds to load
wlan0 takes 8 seconds to load
X (and the fbtft) takes 17 seconds to load
Actions #5

Updated by Hammel almost 6 years ago

  • % Done changed from 10 to 20

Added lpj to kernel boot args via a firstboot test and updates cmdline.txt.
Changes pushed.

Actions #6

Updated by Hammel almost 6 years ago

Getting from power on to mounting the SD card takes about 6 seconds on a Pi 2.

[    5.844178] EXT4-fs (mmcblk0p2): recovery complete
[    5.853241] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
[    5.853328] VFS: Mounted root (ext3 filesystem) readonly on device 179:2.
[    5.863535] devtmpfs: mounted
[    5.870637] Freeing unused kernel memory: 1024K
[    5.870975] Run /sbin/init as init process

Nearly all of this is getting USB and SD access setup. Then getting the RNG takes another 5 seconds.

[    6.156952] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[    9.468533] FAT-fs (sda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[    9.486838] FAT-fs (sdb1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[    9.737072] EXT4-fs (mmcblk0p3): mounting ext3 file system using the ext4 subsystem
[    9.758496] EXT4-fs (mmcblk0p3): recovery complete
[    9.758531] EXT4-fs (mmcblk0p3): mounted filesystem with ordered data mode. Opts: (null)
[    9.979002] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   10.917545] random: crng init done

This can be fixed with preseeding. The rest of the time goes to loading modules. By the time the Bluetooth init script is run,
which is the last one that shows up in dmesg, we're at close to 30 seconds.

[   28.846777] Bluetooth: BNEP socket layer initialized

Things I can do

  • Call log() in all my custom scripts so I can tell which ones are taking a long time. Or at least call log() from rcS to say which script is being called. That will get more info into dmesg.
  • Put network startup in the background. Nothing needs it up front to start, I don't think.
  • Add RNG seeding early in the boot sequence.
  • I see this with at least some wifi drivers: unknown parameter '11n_disable' ignored. I should take those out from the drivers I've seen that message from (rt2800usb, carl9170).
  • There is something happening here that's taking 8 seconds - find out what it is:
[   11.725594] usbcore: registered new interface driver rt2800usb
[   19.922011] snd_bcm2835: module is from the staging directory, the quality is unknown, you have been warned.

I think this is module loading from /etc/modules.conf. I might be able to slurp just the module names from the file and feed them as one request to modprobe.

I feel confident I can get this down to 15 seconds, maybe less, to get to the UI. And with the changes to bootsplash it will seem much quicker.

Actions #7

Updated by Hammel almost 6 years ago

Module loading was a bottleneck. I fixed the init script to scarf all the module names into a single variable and pass them all at once to modprobe. I also switched from depmod to depmod -A which helped a bunch too. This shaves about 7 seconds off the boot time.

The next biggest bottleneck is mdev. It takes 9 seconds for that to complete. One possible solution is to write a C program that handles all the configuration of usbhandler and friends (from nldev). Still, mdev has to spawn that C program for each device it finds so it's hard to say how much time this would save. I can start by writing usbhandler in C and seeing if that helps.

After a little googling I found that mdev has its own init script that I've been overwriting. If I use it, which has the following:

echo /sbin/mdev >/proc/sys/kernel/hotplug
/sbin/mdev -s # coldplug modules
find /sys/ -name modalias -print0 | xargs -0 sort -u | tr '\n' '\0' | xargs -0 modprobe -abq

then it seems to boot about 6 seconds faster overall, but it also loads a lot more modules (due to that last line). In fact, it loaded the module for the Pi Zero W wifi even though I'd commented it out of the modules.conf and the node came up using that wifi instead of the usb dongle I had connected (which came up as wlan1). So it seems I should drop my mdev script, and possibly drop loading my modules as this takes care of it, and much faster.

Network bring up is about 5.5 seconds. I can put that into the background and hang dependent services off it, like smb.

smb startup, on its own, takes almost 9 seconds. Combined with the network that's almost 15 seconds that could be done in the background so the UI can be up faster.

Actions #8

Updated by Hammel almost 6 years ago

I commented out all the modules in /etc/modules.conf and tried again. I found the following modules were not loaded automatically by the mdev changes:

snd-bcm2835
spi_bcm2835
uvcvideo

However, there are alot of new drivers loaded. One of them is hid_multitouch. I'm not certain yet but I may need to blacklist that in order for tslib to work. That said, I ran the following and it looks okay:

  1. evtest
    No device specified, trying to scan all of /dev/input/event*
    Available devices:
    /dev/input/event0: ILITEK ILITEK-TP Mouse
    /dev/input/event1: HID 6901:2701
    /dev/input/event2: HID 6901:2701 Mouse
    /dev/input/event3: HID 6901:2701 Consumer Control
    /dev/input/event4: HID 6901:2701 System Control
    /dev/input/event5: ILITEK ILITEK-TP

Despite having hid_multitouch loaded, event5 is mapped correctly to the touchscreen and shows up correctly so maybe I don't need to blacklist that driver. More testing is required.

Actions #9

Updated by Hammel over 5 years ago

  • % Done changed from 20 to 30

Modules loading is cleaned up for both S10mdev (which now handles many of the module loading duties) and S11dev which does manual modules.conf module loading. It's a bit faster now.
Changes tested, committed and pushed.

log() can't be called by rcS directly (by adding it to the script) because we still need to source the functions script, which in turn sources package specific functions like bumpSplash. Getting rid of psplash (see RM #689) will help here.

So now we're down to this:

[   37.781344] Running /etc/init.d/S97stopsplash

That's the last init process running before the UI gets going. I still think there are things I can do to speed this up, especially if I can offload network startup stuff to run in parallel with UI startup.

Actions #10

Updated by Hammel over 5 years ago

Compressing kernel.img

The defconfig for RPi uses gzip for compression and I wasn't even grabbing that image - I was grabbing the uncompressed image.
I've switched to LZ4 compression and am now copying the compressed image to the package tree.

Changes tested, committed and pushed.

Using initramfs

The following could be used with a stripped down initramfs to get a bootsplash faster. This would then switch to the animated splash in the init processing.

/etc/init.d/start.sh:
#!/bin/sh
mount -t proc proc /proc
mount -t sysfs sysfs /sys
mount -t devtmpfs devtmpfs /dev
# Mount RFS / do some critical stuff
mount /dev/mmcblk0p1 /media
fbsplash -s /media/splash.ppm -d /dev/fb0
mount -o move /proc /media/proc
mount -o move /sys /media/sys
mount -o move /dev /media/dev
# Switch to production system
exec switch_root /media /linuxrc

Improving RNG setup

Getting RNG setup early is partly done, but there is the usual problem of not just seeding, but increasing the count of available entropy. The former is already done. The latter is not, because that requires an ioctl call for RNDADDTOENTCNT.

I can add a C program to piboxlib/ptools that handles this. It's a simply utility for bumping the available entropy - nothing more - using the ioctl. I can then add that to S20urandom (which I'll have to dup in my buildroot skeleton from the buildroot target). And I can generate a seed file with 4096 bits of entropy and add it to the initial build.

But S20urandom will need to change to read the whole 4096 bytes - or however large the entropy pool actually is. Right now it just reads 512 bytes from /etc/random-seed (which is where the new initialized data should go during the build).

See:
  • /proc/sys/kernel/random/entropy_avail
  • /proc/sys/kernel/random/pool_size
  • man urandom

The man page for urandom explains how to do all of this.

Actions #11

Updated by Hammel over 5 years ago

urandom updates are ready for testing. I've added a custom S20urandom and generate the seed file in Buildroot's postbuild.sh script. Both were verified with shellcheck.

I still need to write the C program to do the ioctl and place it in libpibox.

Actions #12

Updated by Hammel over 5 years ago

  • % Done changed from 30 to 60

RNG updates completed and tested, committed and pushed. This includes the C program, pbsetentropy, added to libpibox.

The last speedup I want to try is networking: how can I startup the network interfaces and then all the subsystems that depend on it in the background while the UI comes up?

Actions #13

Updated by Hammel over 5 years ago

  • Description updated (diff)
  • % Done changed from 60 to 30
Actions #14

Updated by Hammel over 5 years ago

  • % Done changed from 30 to 60

Along with networking I also want to run bootchard to get an idea of where exactly the slow processing is at.

Actions #15

Updated by Hammel over 5 years ago

I ran bootchard tonight. Here is the image.
Bootchart of boot processon Pi Zero W.
What surprised me is that it's not really networking interface startup that's the bottleneck. There are two bottlenecks. The first is the depmod command, which may just be superfluous (it could be run on firstboot if necessary) and can be outright dropped. The second is Samba startup. That could be handled by placing the daemons (smb/nmb) in the background. I can start them first in a subshell and then background them to fully daemonize their startup.

Together, that may shave 20 seconds on a 40 second startup!

Actions #16

Updated by Hammel over 5 years ago

I was right. Those two things shave about 15 seconds off the boot time.
Boot time afte rminor adjustments.
Next would be parallelizing processes depending on an IP (nptd, sshd, httpd, etc.). That could save a few more seconds, plus there are up to 9 seconds worth of artificial sleeps in the init scripts. So I could probably drop another 10 seconds! Woohoo!

Actions #17

Updated by Hammel over 5 years ago

Those previous tests were on a single-core Raspberry Pi Zero W. I tried the same build on a quad-core Raspberry Pi 3 and got even better results.
----
Boot times for RPi 3 with onboard Wifi enabled.
----
I've got it down to about just under 12 seconds! And at least 1 second of that is getting bootchard started. And another second on psplash getting started, which I could put in the background (or will replace later). So I could get down to under 10 seconds boot on the RPi 3.

Again, the only things left to improve that will have a big impact are
  1. Wifi setup
  2. modprobe processing

The former I can do some tricks with (there is an artificial sleep in there that probably could be sync'd better). The latter I'm not sure what else I can do. I have tried generating the modules list on first boot so it doesn't have to get rebuilt on each boot, but that doesn't help modprobe process each module.

I think this is good enough for now. I want to move on to other things and will return to this later to try and squeeze even more out of the boot times.

Actions #18

Updated by Hammel over 4 years ago

  • Priority changed from High to Immediate
  • Severity changed from 03 - Medium to 04 - Low
Actions #19

Updated by Hammel over 2 years ago

  • Severity changed from 04 - Low to 05 - Very Low
Actions #20

Updated by Hammel over 1 year ago

  • Target version changed from 1.1.0 - Upgrades to 3.0 - Corrino

This might benefit from writing some C apps that handle initial setup instead of trying to create lots of nodes manually via sh, eg.

Actions

Also available in: Atom PDF