[ntp:questions] drift value very large and very unstable

Andy Helten andy.helten at dot21rts.com
Mon Mar 3 21:56:50 UTC 2008


I realize this is long, but I tried to include the whole story.  I did
work earnestly to solve this on my own, but unfortunately I've been
spinning my wheels the last few days.  Thanks for any help.

I am having a problem with drift values approaching and, on occasion,
reaching +/-500ppm.  My time source setup:

GPS --> XL-GPS::IRIGB --> SBC0::IRIGB --> SBC0::NTP

The XL-GPS is synchronized with GPS time and outputs an IRIG-B signal. 
The processor board, SBC0, is a single board computer housing a
Symmetricom BC635PMC IRIG-B receiver.  Three different SBC0s and three
different BC635 PMCs were tested and all produced the same results.  The
BC635 IRIG-B receiver is the only time source for NTP (see "BC635" conf
file below) using NTP's Bancomm reference clock support.  This is the
"target system".

The drift using this configuration is typically near +/- 500ppm.  I say
"+/-" because from one run of NTP to the next it may completely swing,
for example, from +486ppm to -490ppm on the same processor board.  Most
of the time this wild swing only happens following a reboot, but I've
observed it on at least two occasions when ntpd was simply stopped and
then restarted (with no conf file changes and no reboots between).

To make matters more interesting, the drift consistently settles at
~100ppm when using only a local NTP server that is synchronized with
other public stratum 2 NTP servers (such as ntp.idealab.com, zagbot.com,
etc).  In other words, when syncing with public NTP Internet severs, the
drift does not swing from positive to negative and it always settles at
a reasonable value (<100 ppm).

I've done several tests, including the use of a 1Hz timestamp print out
feature of the XL-GPS.  The timestamp is synchronized with system's 1PPS
and so it comes out nearly exactly once every second.  I wrote a script
that waits for the 1Hz timestamp, when the timestamp print occurs, the
script runs a C program that grabs IRIG-B time from the BC635 PMC and
grabs system time using clock_gettime() and then prints these two
timestamps.  I then combine these three timestamps into a log file (one
line for each 1Hz sample).   This test seems to prove the stability of
the XL-GPS, BC635, and SBC0's system clock (which is not being
disciplined by NTP during the test).  In particular, the test showed
SBC0's drift is in line with the 100ppm value seen when syncing with a
network time source.  This test results were also consistent with the
claimed accuracy of the SBC0 oscillator, 30ppm.  In other words, the
500ppm value seems to be a completely bogus fabrication of NTP.

Another piece of evidence is that the IRIG-B PMC was used on two
different single board computers (one was a Concurrent PP110, the other
a Concurrent VP315) where the drift was stable and settled at reasonable
values on both of these boards.  In this case, the BC635 IRIG-B PMC did
not have a time reference, instead the time was set manually on the
BC635 and the BC635 operated in flywheel mode (i.e. the IRIG-B time
drifted with the clock on the BC635).  This was the "development
system".  Several weeks of testing on this system always produced stable
results.  Drift values always stabilized at the same reasonable value,
for example, ~20ppm for one of these "other" SBCs.  It was only after
several weeks of running on these boards that we then moved to the
"target system", SBC0, and then began experiencing the problem with drift.


The "target system" summary:

- SBC0 (2 Intel CPUs)
    - GPS --> XL-GPS::IRIGB --> SBC0::IRIGB --> SBC0::NTP
- Concurrent RedHawk 4.2 (Hanoi)
    - Linux sbc9 2.6.18.8-RedHawk-4.2-trace #1 SMP PREEMPT Tue May 29
12:44:24 EDT 2007 i686 i686 i386 GNU/Linux


The "development system" summary:

- SBC1: Concurrent PP110, Pentium III-M (1 CPU)
    - PP110::IRIGB --> PP110::NTP
- SBC2: Concurrent VP315, Pentium M (1 CPU)
    - VP315::IRIGB --> VP315::NTP
- Enterprise Linux, Version 4 (original release), kernel version:
  - Linux ntp1 2.6.9-5.EL #1 Wed Jan 5 19:22:18 EST 2005 i686 i686 i386
GNU/Linux


Common items between "target system" and "development system":

- ntpd - NTP daemon program - Ver. 4.2.4p0
- BC635PMC hardware (i.e. exact same pieces of hardware)
- BC635PMC v6.5.0 driver from Symmetricom


Some other notes and thoughts on this problem:

- I have searched the web and NTP mailing list and have found various
instances of problems with large drift values, but none fit my situation
exactly or the instances were resolved by some means not applicable here. 
- There is "no" activity on SBC0 when this problem occurs.  By "no" I
mean no additional applications except whatever may be running as a cron
job (which isn't much).  By "no" I also mean that there is no additional
hardware causing a heavy interrupt load on the system.
- The drift has _always_ gone near or equal to +/-500ppm -- i.e. it has
never stabilized at a reasonable value when running with the BC635 IRIGB
time source.
- I've tested the "target system" with and without the XL-GPS time
source, in which case the BC635 IRIG-B PMC runs in "flywheel" mode.  In
"flywheel" mode, the drift problem is the same.
- The linux kernel has only a CompactFlash for a local disk and, as
such, the kernel is configured without swap space.
- The "target system" has various requirements that necessitate running
in the fashion in which we are running.  For example, there is no
connection to the Internet, nor can there be reliance on "other" network
time sources.  The system must be completely self sufficient with one
local IRIG-B synced board serving as a local stratum 1 NTP server for
several other local SBC0 boards.
- I am not sure what other run-time NTP information is useful so I
didn't include any.  Just let me know what you would like to see.  It is
not easily possible to run tests with the BC635 and the VP315 or PP110
since those pieces of hardware are no longer co-located.
- I have tested ntp-4.2.4.p4 and ntp-dev-4.2.5p113 distributions and the
drift problem is the same (although, I only ran these versions once,
long enough to see the drift go above 400ppm).
- The drift file was deleted prior to almost every run of NTP.  I say
"almost" because some for some tests I wanted to see what NTP would do
when starting with a large drift value.
- There have been in the neighborhood of 50 different test runs
(probably more, but I'm not counting).
- One other test we plan to run is installing RedHat Enterprise Linux
Version 4, Update 4 on SBC0.  This is a software environment more
similar to ones on which the BC635 driver was developed.

Andy


/*******************************************************/
NTP conf file for BC635 IRIG-B PMC
/*******************************************************/

# Base conf file for all normal operation and initial sync for both server
# and client

# Debug stuff
statistics clockstats peerstats loopstats
statsdir /var/lib/ntp/log/
filegen clockstats file stats.clock type pid link enable
filegen peerstats file stats.peer type pid link enable
filegen loopstats file stats.loop type pid link enable

restrict default nomodify notrap noquery
restrict 127.0.0.1

tinker panic 0   # don't let daemon exit for any time difference

driftfile /var/lib/ntp/drift

# Base conf file for normal operation for both server and client
tinker step 0    # disable stepping, so that we only slew time

# Conf file for normal operation of a server

server  127.127.16.0 prefer mode 2 minpoll 4 iburst burst # Symmetricom
BC635
tos orphan 6




/*******************************************************/
NTP conf file for network NTP server
/*******************************************************/
# Base conf file for all normal operation and initial sync for both server
# and client

# Debug stuff
statistics clockstats peerstats loopstats
statsdir /var/lib/ntp/log/
filegen clockstats file stats.clock type pid link enable
filegen peerstats file stats.peer type pid link enable
filegen loopstats file stats.loop type pid link enable

restrict default nomodify notrap noquery
restrict 127.0.0.1

tinker panic 0   # don't let daemon exit for any time difference

driftfile /var/lib/ntp/drift

# Base conf file for normal operation for both server and client
tinker step 0    # disable stepping, so that we only slew time

# Conf file for initial sync of a client

server 192.168.2.90 prefer iburst burst minpoll 5 maxpoll 9





More information about the questions mailing list