[Pool] time reset

Dave Hart davehart at gmail.com
Thu Mar 24 23:01:00 UTC 2011


On Thu, Mar 24, 2011 at 8:44 PM, Hal Murray <hmurray at megapathdsl.net> wrote:
> 2) Your local clock is broken.  There are several ways this can happen.
>  a) Power saving can changes the CPU speed
>  b) The kernel timekeeping software could be broken.
>  c) Ages ago, you could get this when a lot of disk activity caused lost
> interrupts.  I haven't seen that on PCs in a while, but it might come back on
> embedded systems.
d) You're using a VM with broken timekeeping.  This is increasingly
common.  Notice just after there was discussion on this list of the
flash-mob effect every hour on pool servers from cron'd ntpdate or
similar, a former pool server operator responded to this thread saying
they had an incurably bad clock on their pool server, pulled it from
the pool, and cron'd ntpdate every _minute_.  I can only hope that's
every minute plus or minus a healthy random amount like 15s.

The solution is often beyond the reach of the putative operator,
because they have control over only the VM client, and not the the
monitor/dom0/host.  It also depends on the particular OS in the VM,
and the level of integration the underlying VMM offers for that type
of client.  The best answer is to have no clock in the VM, instead
passing through gettimeofday() and similar to the monitor/dom0/host
clock.  That doesn't allow a ntpd per VM, but assuming the
monitor/dom0/host clock is well-disciplined, delivers the best
possible client VM timing.

While I've seen a few reports of well-behaved ntpd in VMs, I wonder
how many are on lightly-oversubscribed hardware and subject to serious
degradation when the cumulative load of all the VMs increases.

> 3) Your network connection is variable and unsymmetrical.  You can easily get
> problems if you do a big download over a DSL connection.  ntpd assumes your
> network delays are symmetric.  If your system gets calibrated when the
> traffic is low (and symmetric) and you start a big download (or upload)
> without much traffic in the other direction, queuing delays can cause a big
> enough time shift to confuse ntpd.  On my DSL line, I see up to 3 second
> delays.  If this is the problem, tinker huffpuff might help.
>
> Lots of interesting info at http://www.bufferbloat.net/  (but not directly
> ntp related)

While most likely unhelpful to the OP in solving their problem, I
recommend everyone take a look at their internet connectivity's
buffering using

http://netalyzr.icsi.berkeley.edu/

A few higher-end consumer routers (which are also lower-end small
business routers, typically) offer traffic shaping that can prevent
upstream buffering in the "modem" by ensuring data is paced to stay
under the minimum anticipated upstream throughput of the service.
Minimizing downstream buffering at the other ISP end of the last mile
is trickier but is usually part of the same router QoS feature.  And
if you like building your own routers, please do join the
bloat-devel at lists.bufferbloat.net list and you too can experience the
bliss of both low latency and high throughput at the same time.

Briefly, big, unmanaged buffers are increasingly common as devices
typically use all available memory for packet buffering in an
ill-advised view that packet loss is always counterproductive.  In
fact, lacking ECN (which is essentially unsupported by most routers
your packets will traverse), packet loss is the only indication to TCP
that the path is congested and senders need to slow down to avoid even
more packet loss, or worst case, congestion collapse.  Buffers sized
to let you push several hundred megabits through your gigabit ethernet
are extremely oversized when that gig link is part of a path capable
of a few megabits.  Even in home routers, engineers are pushed to
ensure they can achieve maximum throughput, too often without any push
to keep latencies reasonable over WAN bottlenecks.

Cheers,
Dave Hart

Cheers,
Dave Hart


More information about the pool mailing list