Categories
Admin ntp Security

Correct way to run an ntp server

Intro
Concerns about DOS and DDOS have been heightened recently, for instance http://securityaffairs.co/wordpress/20934/cyber-crime/symantec-network-time-protocol-ntp-reflection-ddos-attacks.html. A more bare-bones, antiseptic description is here in CVE-2013-5211. Unfortunately those inventive hackers have found new ways to create headaches for us good guys. Last month saw an increase in DDOS attacks using poorly configured public ntp servers to create packet amplification. I’ve looked into it and determined how to run an ntp server without exposing your server to being an unwitting source of this type of traffic.

The details

This mostly applies to SUSE Linux (SLES), but I don’t think the other Linux distros would be all that different. In SLES you have the NTP configuration in /etc/ntp.conf. You have of course the regular lines, plus the server lines, which may look something like this:

...
server otc1.psu.edu prefer
server ntp2.usno.navy.mil
server tock.usno.navy.mil prefer
server navobs1.wustl.edu
...

You may not be able to use these exact same servers – sometimes you need to ask permission first.

Now if that’s all you had, plus the driftfile and the other blah, blah, you’re probably in trouble. Test this from another Linux server beforehand. Something like:

> ntpdc -c monlist

If you start seeing lines like the following you’re in trouble:

remote address          port local address      count m ver code avgint  lstint
===============================================================================
ldrj1200.drjon.drjo.ne 58372 10.192.186.15          2 7 2      0     30       0
ns.drjohnstec.com      48944 10.192.186.15          1 7 2    5d0      0      11
neus.drj.drjohnstechta   123 10.192.186.15          8 3 2    5d0      2      13
...

That’s no good because with one udp packet a whole lot of packets can be returned, or worse, sent to a different target since in general the source IP address of the UDP query packet could be spoofed.

The solution
Of course what I’m writing here is not news. It’s just somewhat hard to understand the ntp documentation on this topic on the ntp.org web site.

In my experimentation I’ve found you should add into the ntp.conf file these lines:

restrict default kod nomodify notrap nopeer noquery
# but allow some hosts access
restrict 127.0.0.1
# our monitoring server
restrict 10.192.186.89

Then, a

> sudo service ntp restart

and your remote listing should produce something like this:

> ntpdc -c monlist ntp1.johnstechtalk.com

ntp1.johnstechtalk.com: timed out, nothing received
***Request timed out

and equally important, you can still locally query your ntp server to see that it is still syncing time:

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l    5   64  377    0.000    0.000   0.001
*gps1.tns.its.ps .GPS.            1 u  886 1024  377   28.554    3.396   6.836
+ntp2.usno.navy. .PTP.            1 u  154 1024  377   13.124   -1.422   4.658
+tock.usno.navy. .PTP.            1 u  965 1024  377   13.906   -0.058   0.910
+navobs1.wustl.e .GPS.            1 u  194 1024  377   30.817    0.274   1.927

And, equally important, your local servers using your ntp server should also continue to be able to sync time against the ntp server you have set up.

Conclusion
We have shown how to prevent your ntp server from being using in a DDOS attack. Most ntp servers are probably protected by a firewall of some sort, but it still might be a good idea to lock it down in this way as a best security practice.

The official advice talks about upgrading to ntp version 4.7, but I find this impractical for a couple reasons. It is not generally available from the distro package vendors, and it is considered a development release. Hence the effort to massage the configuration of an older NTP server as I’ve documented here to make it invulnerable to this problem.

References
The IT Detective Agency: ntp server shows the wrong time after patching

Categories
Admin Linux ntp SLES

The IT Detective Agency: ntp server shows the wrong time after patching

Intro
One of my ntp servers hadn’t been patched in awhile so it was time. Other systems rely on it for time synchronization. The next morning after the patching I noticed that the ntp service wasn’t even running. I started it and went about my business. Checking back some minutes later, it had died again. What happened, and how to get it fixed? Read on to see how we diagnosed and solved this puzzler.

The details
I like to use ntpq -p to query my ntp server – it’s easy to type! So when I started it up the results looked like this:

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        .LOCL.          10 l   59   64   17    0.000    0.000   0.001
 drjegw.drjn.com 192.5.41.209     2 u   56   64   17    0.335  3605497  26.136
 drjegw2.drjn.co 192.5.41.41      2 u   56   64   17   19.241  3605534  39.621
 drjeuro.drjn.eu 128.252.19.1     2 u   60   64   17  105.970  3605532  38.946

That’s some offset, eh? 3.605 x 10^6 msec, or, when you think about it, just over an hour. And yet the local clock had no offset. Strange.

Date
I like to do a crude check of system time by running the date command – quickly – on two different systems. Lacking some sleep, I noticed eventually but not right away, that my ntpd server had a date that was retarded by almost exactly an hour. I didn’t notice it at first because I had trained myself to only look at the seconds, which were “only” off by five seconds.

I checked to see if the timezone or localization settings had been changed by the patching – they hadn’t. So I went ahead and advanced the system clock by an hour. Actually the yast GUI of SLES gave me the option to sync against a time server, so I chose my closest one and did that after I had stopped ntpd.

Next problem, please
That got the time in the ballpark. But ntpd still wasn’t behaving. It exhibited a strange behaviour I’ve never seen before – its offset kept increasing. I observed this behaviour over the course of several minutes:

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l    3   64  377    0.000    0.000   0.001
*drjegw.drjn.com 192.5.41.209     2 u  129  128  377    0.350  146.846  81.771
+drjegw2.drjn.co 192.5.41.41      2 u    1  128  377   20.211  183.047  97.286
+drjeuro.drjn.eu 128.252.19.1     2 u   72  128  377  104.931  161.696  79.561

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l    5   64  377    0.000    0.000   0.001
*drjegw.drjn.com 192.5.41.209     2 u    2  128  377    1.803  182.380  97.636
+drjegw2.drjn.co 192.5.41.41      2 u    3  128  377   20.211  183.047  97.286
+drjeuro.drjn.eu 128.252.19.1     2 u   74  128  377  104.931  161.696  79.561

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l   28   64  377    0.000    0.000   0.001
*drjegw.drjn.com 192.5.41.209     2 u   89  128  377    1.803  182.380  97.636
+drjegw2.drjn.co 192.5.41.41      2 u   90  128  377   20.211  183.047  97.286
+drjeuro.drjn.eu 128.252.19.1     2 u   32  128  377  104.667  197.864  96.296

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l    5   64  377    0.000    0.000   0.001
*drjegw.drjn.com 192.5.41.209     2 u    4  128  377    1.813  218.325 113.345
+drjegw2.drjn.co 128.118.25.5     2 u    5  128  377   19.667  219.077 113.157
+drjeuro.drjn.eu 128.252.19.1     2 u   75  128  377  104.667  197.864  96.296

Look at that offset column. See? It keeps going up, at about a rate of 40 msec every two minutes. It ain’t supposed to do that!

So a Unix pal of mine said he had encountered an issue in ntp and had commented out that local clock. I honestly had absolutely no idea what that LOCAL line did, but it had never hurt before.

The local clock comes from these lines in ntp.conf:

server 127.127.1.0              # local clock (LCL)
fudge  127.127.1.0 stratum 10   # LCL is unsynchronized

So I took those out, stopped ntpd with a sudo service ntp stop, synced the time with a sudo sntp -P no -r drjegw.drjn.com, and restarted ntpd. It didn’t work immediately, but it became apparent eventually that it was working.

Meantime I discovered the ntpdc command, which is kind of informative in this situation:

> ntpdc
ntpdc> loopinfo

offset:               0.097373 s
frequency:            -132.558 ppm
poll adjust:          12
watchdog timer:       841 s

This tells me the offset if 97 msec (already too large in my experience) and that for some reason the system clock hadn’t been adjusted in 841 s, and that the clock drift rate was -132 ppm – much, much higher than any other system

Then in a few minutes it clicked and got the offset in order:

ntpdc> loopinfo

offset:               0.000000 s
frequency:            -132.558 ppm
poll adjust:          4
watchdog timer:       11 s

So removing the local system clock seemed to be working. But what was the real cause of all this? I discussed it with an admin. Bear in mind that this is physical server. He said the system clock gets its time from a hardware clock which should be visible in the ILO. We checked it. Sure enough, there it was, reporting in the ILO – still, after we had fixed the problem at the OS level – as one hour retarded. There was no way to manually adjust it. The only option was to set up sntp servers, which we did, which forced the ILO to restart.

We logged back in to the ILO and voila, the time was right!

I now realize that in the OS the LOCAL Time was using that hardware clock, which must have drifted by an hour since the system was installed.

Before the patch the system was incrementally keeping up with the drift, making the necessary incremental changes periodically. But the discrepancy was too large for it after rebooting after the patching. In the /var/log/ntp I even see a line:

14 Sep 07:20:53 ntpd[10259]: time correction of 3605 seconds exceeds sanity limit (1000); set clock manually to the correct UTC time.

Conclusion
Now the system is better and we have:

ntpdc> loopinfo

offset:               0.057029 s
frequency:            -5.465 ppm
poll adjust:          4
watchdog timer:       24 s

That’s better, but the offset of 57 msec is still far larger than normal. But it’s useable for now.