Ubuntu: excessive system clock drift? (2+ minutes per hour)



Question:

I am running a reasonably new install of 17.10 on a new system (fully patched, and not virtualized), and noticed that the boot time listed in /proc/stat's btime entry kept changing. This broke some scripts that used this information to compute the wall-clock time at which certain processes had been started.

With some debugging, I found that btime is calculated as now() - uptime, and the btime drift was due to the fact that the system clock was incrementing at a different rate than the uptime clock was!

I assumed that this was due to some sort of clock slew applied to the system clock by systemd-timesyncd.service (i.e. the ntpd replacement), so I disabled timesyncd and rebooted, as a test. Sure enough, now the uptime counter and the system clock step at the same rate. (I also installed adjtimex to check the kernel parameters to verify that no clock slew remains: there is no frequency bias applied and the tick value is 10000, as it should be.)

Without timesyncd on, however, it is clear that the system clock is very much out of whack. The clock lost about 5 minutes over the course of 135 minutes (~ -37000 ppm), which is similar to what I got using adjtimex -l -w over the course of about 20 minutes to manually estimate the system clock drift (it gives ~ -40000 ppm). (And, indeed, just to check, using a stopwatch, I found that /proc/uptime is also incrementing at the wrong rate; ~ -41000 ppm. So that's consistent.)

The CMOS clock is a bit off too (it gained 30 seconds over the 135 mins), but my understanding is that this should not affect the system clock except at boot time. There is no /etc/adjtime file that I can find by which the system clock rate would be changed at boot -- and anyway, as above adjtimex reports that there has been no clock-tick fudging. So I can't imagine how the CMOS clock could be causing the issue I see with the system clock.

Nevertheless, I will change the CMOS battery, as some reports have suggested that this can miraculously fix system clock problems. (Despite there being no obvious mechanism by which this could happen.)

But is there any other explanation for why the system clock could be so very wrong? And are there any solutions for the fact that the system timers are off by such a huge amount? Clearly just running timesyncd doesn't fix the problem, because the excessive clock slew that it produces is problematic (as above).

I could use adjtimex to change the kernel parameters directly (which should keep the uptime and system clock counters in sync at least), but that is really meant to address clock errors in the range of +- 500 ppm. What I'm seeing is 3 orders of magnitude larger, and I wonder if it indicates some more significant issue.

For the record, a 17.10 installation that I have on a very similar machine does not have this problem.

Update: changing the CMOS battery did nothing (as suspected). See below for the final resolution of the problem.


Solution:1

It turns out the problem was with the TSC clock source. In the short term, changing the clock source to 'hpet' (temporarily via echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource, or more permanently by adding clocksource=hpet to the kernel boot parameters in /etc/default/grub) works around the issue.

More broadly, this problem is due to a bug in the linux kernel's TSC handling with respect to Skylake X desktop CPUs. This should be fixed in a forthcoming kernel release.

Update: rebuilding the current kernel with the one-line fix from the above patch does in fact restore correct TSC behavior.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »