[ntp:questions] Accuracy of audio tones via VOIP

Robert Scott no-one at notreal.invalid
Tue Jul 9 23:10:56 UTC 2013

On 09 Jul 2013 19:46:35 GMT, Rob <nomail at example.com> wrote:

>Robert Scott <no-one at notreal.invalid> wrote:
>> On Tue, 09 Jul 2013 18:59:05 +0100, David Woolley
>> <david at ex.djwhome.demon.invalid> wrote:
>>>On 09/07/13 15:35, Robert Scott wrote:
>>>> as much as the local quartz oscillator frequencies?  Does anyone have
>>>> any insight on how Skype and other VOIP systems manage record/playback
>>>> rate synchronization?
>>>The workings of Skype are a trade secret.  Other ones will have a jitter 
>>>buffer, and will typically dump or repeat frames if that gets too full 
>>>or risks underflowing.  RTP allows the source to mark good places to do 
>> That does not sound too encouraging for the pitch accuracy of tones
>> transmitted over Skype.  Of course you would never notice it in normal
>> speech, which is, I guess, all that Skype is targetting.  But for
>> calibrating to the standard frequency tones from NIST (500 Hz and 600
>> Hz), it probably is not trustworthy, right?
>Would you be setting your clock to the hour pips as heard on the internet
>stream of a radio station?

No, I am not interested in time.  I am talking about frequency.  If
you call 1-303-499-7111 you will hear the audio that is transmitted on
WWV, which includes 500 Hz and 600 Hz tones.  As transmitted these
tones are as accurate as NIST can make them.  But as received they
might appear at a different frequency.

Landline phones do go through a packet switching network, but that
network is tightly synchronized.  It is not the internet.  Similarly
cellphones go through a tightly synchronized network.  I am fairly
confident that tones reconstructed by these networks maintain their
frequency because the encoding clocks and the decoding clocks are
locked.  But I was asking about Skype and similar VOIP technologies.
I can't see how they can have the same level of clock synchronization
between encoding and decoding since the decoding happens on my PC.
Sound on a PC is rendered by sound adapter hardware (the sound card)
and is clocked by a crystal oscillator in the sound card that is not
synchronized with anything.  Therefore it can be quite a bit off in
frequency.  The audio frequencies rendered will then be off by the
same amount, unless Skype does some magic in processing the signal
before sending it to the sound card.  If Sykpe did that then they
would have to know exactly how far off my PC's sound card oscillator
is.  Perhaps they can find that out over time, and maybe that is part
of the installation of the software.

If these VOIP technologies do distort audio frequencies, has anybody
ever seen an example of it?  That is, if you call the NIST number
listed above and measure the audio frequency of the nominal 500 Hz and
600 Hz tones, has anyone every measured it to be noticeably off?

More information about the questions mailing list