[ntp:questions] no servers, synchronized, time reset

Trevor Woerner twoerner at gmail.com
Mon Feb 25 20:31:25 UTC 2013


Hi everyone,

I have been getting the following pattern show up in my log files and
I'm hoping to better understand why and perhaps figure out how to
improve the situation.

I have two machines. One of which, the server, is syncing to an
upstream NTP server over one ethernet interface and acts as the
server to another, client, machine over a separate ethernet interface.
The two machines are connected via their own private switch and
have virtually no traffic between them, other than the NTP messages.

The client:
- is running NTP version 4.2.4p4
- has an IP address of 148.198.225.200
- runs ntp with only the '-g' option
- it's /etc/ntp.conf file looks like

    driftfile /etc/ntp.drift
    server 148.198.225.221 minpoll 4 maxpoll 5

When NTPd is started up on the client, it will initially set its time with:

    # ntpd -n -q -g

and then run ntpd as:

    # ntpd -g

The server's IP address is 148.198.225.221 and is running ntpd 4.2.0a.

I ran tcpdump on the server to capture the network traffic between them:

    # tcpdump -i eth1 host 148.198.225.200 and udp -w outfile

I have been capturing logs for the last 4-5 days and periodically I'll find
the following pattern show up:

    2013-02-21T23:27:05.229+05:00 ntpd[4396]: no servers reachable
    2013-02-21T23:44:30.227+05:00 ntpd[4396]: synchronized to
148.198.225.221, stratum 4
    2013-02-21T23:44:43.049+05:00 ntpd[4396]: time reset -2.179424 s

First off, it would be nice to understand why ntpd thinks there are no
servers reachable. According to the tcpdump there weren't any network
issues at that time:

    2013-02-21 23:27:05.228770 IP 148.198.225.200.123 >
148.198.225.221.123: NTPv4, Client, length 48
            0x0000:  000d 60dd f96a 0030 640a d97b 0800 4500
           0x0010:  004c 0000 4000 4011 4d6e 94c6 e1c8 94c6
           0x0020:  e1dd 007b 007b 0038 ca33 2305 04ec 0000
           0x0030:  2815 0000 3ab7 94c6 e1dd d4d1 71b0 3aae
           0x0040:  baaf d4d1 7209 3b72 81fd d4d1 7209 3b57
           0x0050:  a187 d4d1 7219 3a8e 6060
    2013-02-21 23:27:05.228891 IP 148.198.225.221.123 >
148.198.225.200.123: NTPv4, Server, length 48
           0x0000:  0030 640a d97b 000d 60dd f96a 0800 4510
           0x0010:  004c d459 4000 4011 7904 94c6 e1dd 94c6
           0x0020:  e1c8 007b 007b 0038 ed7c 2404 04ee 0000
           0x0030:  53f1 0000 9805 94c6 28a1 d4d1 71fd 625f
           0x0040:  0f5a d4d1 7219 3a8e 6060 d4d1 7219 3a95
           0x0050:  b78c d4d1 7219 3a97 6bc1

But ignoring that for the moment, why does ntpd wait until it is out by ~2
seconds before making a change? Is it possible to have it make the
change sooner (say, when it's out by only 1/4 second)?

If I look at all the time resets:

    2013-02-21T23:44:43.049+05:00 ntpd[4396]: time reset -2.179424 s
    2013-02-22T01:19:43.289+05:00 ntpd[4396]: time reset +0.237872 s
    2013-02-22T10:30:49.047+05:00 ntpd[4396]: time reset -0.242646 s
    2013-02-22T13:58:47.261+05:00 ntpd[4396]: time reset -1.785796 s
    2013-02-22T14:48:28.015+05:00 ntpd[4396]: time reset -0.245943 s
    2013-02-22T19:17:30.870+05:00 ntpd[4396]: time reset -2.147083 s
    2013-02-23T11:32:55.635+05:00 ntpd[4396]: time reset -0.236013 s
    2013-02-23T13:22:50.772+05:00 ntpd[4396]: time reset +2.139825 s
    2013-02-23T16:01:20.626+05:00 ntpd[4396]: time reset -2.147514 s
    2013-02-24T12:15:48.772+05:00 ntpd[4396]: time reset +2.148165 s
    2013-02-24T12:49:12.921+05:00 ntpd[4396]: time reset +2.149475 s
    2013-02-24T22:41:09.775+05:00 ntpd[4396]: time reset -2.145805 s
    2013-02-25T10:27:47.921+05:00 ntpd[4396]: time reset +2.145463 s

Eight out of thirteen have a reset that is amazingly close to 2.14 seconds.
It's almost as if it's waiting until it is out by 2.14 seconds before
performing
the reset. But in addition to that, the sign of the change keeps flipping.
First
it's minus 2 seconds, then plus 2 seconds, then minus for a couple, the plus
for a couple. It just doesn't seem as if the client is tracking the
server's time
very well.

Running ntpq on the client (today) gives:

     remote           refid      st t when poll reach   delay   offset
 jitter
==============================================================================
*148.198.225.221 148.198.40.161   4 u    7   32  377    0.175   -0.540
1.213

148.198.40.161 is the upstream server of the "server" machine
(148.198.225.221)
in this configuration.

Any suggestions? Can I provide any more information?

Best regards,
    Trevor


More information about the questions mailing list