Categories
Admin Network Technologies

Ping sweep for network security engineers

Intro

I swear my bash programming skills are getting worse and worse. What I really need is a bash scripting tips blog entry to remind myself of my favorite bash scripting tips. I have this for python and I refer toit and add to it all the time. I don’t care if anyone else never uses it, it’s worth having all my used tips in one place as I find I constantly forget the basics due to infrequent usage.

Oh. So to the point. What this blog post is nominally about is to provide a useable medium-quality ping swep that a network security engineer would find useful.

Conditions
  • access to host on the subnet in question
  • this accessible host has a bash shell CLI, e.g., a Checkpoint firewall
  • ping and arp programs available
What it does

This script is designed to sweep through a /24 subnet, politely pausing one second per attempt. It send s a single PING to each IP. This is the things that makes it appealing to network security engineers. it does not require a reply, which is a common situation for network security appliances. It immediately checks the arp table afterwards to see if there is an arp entry (before that has a chance to age out). If so, it reports the IP as up.

The code

I call the program sweep.sh.

#!/bin/bash

is_alive_ping()
{
  ping -c 1 -W 1 $1 > /dev/null
# arp -an output looks like: ? (10.29.129.208) at 01:c0:ed:78:b3:dc [ether] on eth0
# or if not present, like ? (10.29.129.209) at <incomplete> on eth0
  arp -an|grep -iv incomplete|grep -qi $1\)
  [ $? -eq 0 ] && echo Node with IP: $i is up.
}

if [[ ! -n $1 ]];
then
  echo "No subnet passed. Pass three octects like 10.29.129"
  exit
fi
subnet=$1
for i in ${subnet}.{1..254}
do
is_alive_ping $i
sleep 1
done

Apologies for the lousy programming. But it gets the job done.

./sweep.sh 10.29.129
Node with IP: 10.29.129.1 is up.
Node with IP: 10.29.129.2 is up.
Node with IP: 10.29.129.3 is up.
Node with IP: 10.29.129.5 is up.
Node with IP: 10.29.129.6 is up.
Node with IP: 10.29.129.10 is up.
Node with IP: 10.29.129.50 is up.
Conclusion

As a network security engineer you may be asked if it’s safe to use a paricular IP on one of your subnets where you have your equipment plus equipment frmo other groups. I provide a ping sweep script which reports which IPs are taken, not relying on an ICMP REPLY, but just on the ARP table entry which gets created if a device is on the network.

References and related

None so far!

Categories
Network Technologies Raspberry Pi

Trying to improve my home WiFi with a range extender

Intro

My Teams meetings in the mornings had poor audio quality and sometimes I could not share my screen. My suspicions focused on my home WiFi Router, which is many years old. I decided to make an experiment and get a range extender. The results are, well, mixed at best.

Windows command

netsh wlan show interface

There is 1 interface on the system:
Name : Wi-Fi 
Description : Intel(R) Dual Band Wireless-AC 3168 
GUID : f1c094c0-fcb7-4e47-86ba-51df737e58c8 
Physical address : 28:c6:3f:8f:3a:27 
State : connected 
SSID : DrJohn 
BSSID : ec:c3:02:eb:2d:7c 
Network type : Infrastructure 
Radio type : 802.11ac 
Authentication : WPA2-Personal 
Cipher : CCMP 
Connection mode : Auto Connect 
Channel : 153 
Receive rate (Mbps) : 292.5 
Transmit rate (Mbps) : 292.5 
Signal : 99% 
Profile : DrJohn

802.11ac is WiFi 5. 802.11n is WiFi 2, to be clear about it.

What’s going on

My work laptop starts out using WiFi 5 (803.11ac). The signal is around 60% or so. So I guess not super great. Then after an hour or so it switches to WiFi 2 (802.11n)! Audio in my meetings gets disturbed during this time.

My WiFi Extender did not really change this behavior to my surprise! But maybe the quality is better.

One morning I started out on WiFi 4, the signal quality varied between 94% down to 61%, all while nothing was being moved, and within a matter of minutes! The lower Signal values are associated with slower transmit and receive rates, naturally. But at least with the extender WiFi 4 seems OK. It’s useable for my interactive meetings. In my experience, once you are on WiFi 4 you are very unlikely to automagically get switched back to WiFi 5. But the reverse is not true. So there’s a lot of variability in the signal over the course of minutes. But I stayed on WiFi 4 for over three hours without its changing. I connected to a differ SSID, then connected back to my _EXT SSID and, bam, WiFi 5, but only at 52% signal strength.

The way I know this behavior in detail is that I happen to have a ThousandEyes endpoint agent installed and I have access to this history of the connection quality, signal strength, thoughput, etc. ThousandEyes is pretty cool.

Further experimentation

The last couple days I’ve been getting WiFi 5 and it’s been sticking. What’s the difference? This sounds incredibly banal, but I stood the darn extender upright! That’s right, during those days when I was mostly getting WiFi 4 the Extender had all its antennae sticking out, but it was flat on a table. I am in a room across the hallway. Then I managed to stand it upright – a little tricky since it is pluued into an extension cord. I’m still across the hallway. But things have been behaving better ever since.

Does a WiFi extender create a new SSID?

Yes! It creates an SSID named after your SSID with an _EXT appended to that name. However, it is very important to note that it is a bridged network so it means your _EXT-connected devices see all your devices not on _EXT, and that makes it very convenient. The subnet used is your primary router’s subnet, in other words.

This TP-Link (see references) seems to have lots of nice features. MIMO, AP mode, mesh mode, etc. You may or may not need them right away. For instance, the device has several status LEDs which get kind of bright for a bedroom at nighttime. Originally we covered it with a dark T-Shirt. Then I looked at it and saw it has an LED switch! That’s right. Just press that LED switch and those way-too-bright LEDs stop illuminating, while the device keeps on working. A very small but thoughtful feature which you would never even think to look for but turns out to be important. It might have overheated had we kept it covered with that T-Shirt.

Raspberry Pi

A good command is:

sudo iwconfig wlan0

wlan0 IEEE 802.11 ESSID:"Music_EXT"
Mode:Managed Frequency:5.765 GHz Access Point: 9C:53:22:02:6B:59
Bit Rate=433.3 Mb/s Tx-Power=31 dBm
Retry short limit:7 RTS thr:off Fragment thr:off
Encryption key:off
Power Management:on
Link Quality=62/70 Signal level=-48 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0

To be continued…

References and related

TPLink AC1900 WiFi Range Extender at Amazon (Costs about $69. I do not get promotional credits!)

Categories
Firewall Linux Network Technologies

The IT Detective Agency: the case of the mysterious ICMP host administratively prohibited packets

Intro

I haven’t published a new case in a while, not for lack of cases, but more that they they all fall into something I’ve already written about. But today there is definitely something new.

Some details

Thousandeyes agent-to-agent communication was generally working for all our enterprise agents after fixing firewall rules, etc, except for this one agent hosted in Azure US East. Was it something funny about the firewalls on either side of the vpn tunnel to this cloud? Ping tests were working. But a connection to tcp port 49153, which is used for agent-to-agent communication gave a response in the form of an ICMP type 3 code 10 packet which said something like host administratively prohibited. What?

The Cisco TAM suggested to look at iptables. I did a listing with iptables -L. The output is pretty long and I’m not experienced looking at it. Nothing much jumped out at me, but I did note the presence of this line:

REJECT     all  —  anywhere             anywhere             reject-with icmp-host-prohibited

in a couple of the chains, which seemed suspicous.

An Internet search pointed towards firewalld since the agent is a Redhat 7.9 system. Indeed firewalld was running:

systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2023-10-12 15:26:25 UTC; 5h 45min ago

The suggestion is to test with firewalld disabled. Indeed this produced correct results – no more ICMP packets back.

But it’s probably a good security measure to run firewalld, so how to modify it? This note from Redhat was particularly helpful in learning how to add a rule to the firewall. I pretty much just needed to do this to permanently add my rule:

firewall-cmd –add-port 49153/tcp –permanent

Afterwards the agent-to-agent tests began to be run successfully.

Which runs first, tcpdump or firewalld?

tcpdump

This is a good question to ask because if the order had been different, and who knows, you might have your packets dropped before you ever see them on tcpdump. But tcpdump seems to get a pretty clean mirror of what the network interface gets before application or kernel processing.

The new equivalent to netstat -an

If I want to see the listening processes in Redhat I might do a

ss -ln

In the old days I memorized using netstat -an, but that is now frowned upon.

Conclusion

We solved a case where tcp packets were getting returned with an ICMP packet which basically said: prohibited. This was due to the host, a Redhat 7 system, having restricted ports due to firewalld running. Once firewalld was modified this traffic was permitted and Thousandeyes Tests ran successfully. We also proved that tcpdump runs before firewalld.

References and related

How to add rule to firewalld on Redhat-like systems.

Categories
Network Technologies Python

Python network diagram generator

Intro

Since they took away our Visio license to save licensing fees, some of us have wondered where to turn to. I once used the venerable old MS Paint after learning one of my colleagues used it. Some have turned to Powerpoint. Since I had some time and some previous familiarity with the components – for instance when I create CAD designs for 3D printing I am basically also doing CAD as code using openSCAD – I wondered if I could generate my network diagram using code? It turns out I can, at least the basic stuff I was looking to do.

Pillow

I’m sure there are much better libraries out there but I picked something that was very common although also very limited for my purposes. That is the python Pillow package. I created a few auxiliary functions to ease my life by factoring out common calls. I call the auxiliary modules aux_modules.py. Here they are.

from PIL import Image, ImageDraw, ImageFont
serverWidth = 100
serverHeight = 40
small = 5
fnt = ImageFont.truetype('/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf', 12)
fntBold = ImageFont.truetype('/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf', 11)

def drawServer(img_draw,xCorner,yCorner,text,color='white'):
# known good colors for visibility of text: lightgreen, lightblue, tomato, pink and of course white
# draw the server
    img_draw.rectangle((xCorner,yCorner,xCorner+serverWidth,yCorner+serverHeight), outline='black', fill=color)
    img_draw.text((xCorner+small,yCorner+small),text,font=fntBold,fill='black')

def drawServerPipe(img_draw,xCorner,yCorner,len,source,color='black'):
# draw the connecting line for this server. We permit len to be negative!
# known good colors if added text is in same color as pipe: orange, purple, gold, green and of course black
    lenAbs = abs(len)
    xhalf = xCorner + int(serverWidth/2)
    if source == 'top':
        coords = [(xhalf,yCorner),(xhalf,yCorner-lenAbs)]
    if source == 'bottom':
        coords = [(xhalf,yCorner+serverHeight),(xhalf,yCorner+serverHeight+lenAbs)]
    img_draw.line(coords,color,2)

def drawArrow(img_draw,xStart,yStart,len,direction,color='black'):
# draw using several lines
    if direction == 'down':
        x2,y2 = xStart,yStart+len
        x3,y3 = xStart-small,y2-small
        x4,y4 = x2,y2
        x5,y5 = xStart+small,y3
        x6,y6 = x2,y2
        coords = [(xStart,yStart),(x2,y2),(x3,y3),(x4,y4),(x5,y5),(x6,y6)]
    if direction == 'right':
        x2,y2 = xStart+len,yStart
        x3,y3 = x2-small,y2-small
        x4,y4 = x2,y2
        x5,y5 = x3,yStart+small
        x6,y6 = x2,y2
        coords = [(xStart,yStart),(x2,y2),(x3,y3),(x4,y4),(x5,y5),(x6,y6)]
    img_draw.line(coords,color,2)
    img_draw.line(coords,color,2)

def drawText(img_draw,x,y,text,fnt,placement,color):
# draw appropriately spaced text
    xy = (x,y)
    bb = img_draw.textbbox(xy, text, font=fnt, anchor=None, spacing=4, align='left', direction=None, features=None, language=None, stroke_width=0, embedded_color=False)
# honestly, the y results from the bounding box are terrible, or maybe I don't understand how to use it
    if placement == 'lowerRight':
        x1,y1 = (bb[0]+small,bb[1])
    if placement == 'upperRight':
        x1,y1 = (bb[0]+small,bb[1]-(bb[3]-bb[1])-2*small)
    if placement == 'upperLeft':
        x1,y1 = (bb[0]-(bb[2]-bb[0])-small,bb[1]-(bb[3]-bb[1])-2*small)
    if placement == 'lowerLeft':
        x1,y1 = (bb[0]-(bb[2]-bb[0])-small,bb[1])
    xy = (x1,y1)
    img_draw.text(xy,text,font=fntBold,fill=color)

How to use

I can’t exactly show my eample due to proprietary elements. So I can just mention I write a main program making lots of calls tto these auxiliary functions.

Tip

Don’t forget that in this environment, the x axis behaves like you learned in geometry class with positive x values to the right of the y axis, but the y axis is inverted! So positive y values are below the x axis. That’s just how it is in a lot of these programs. get used to it.

What I am lacking is a good idea to do element groupings, or an obvious way to do transformations or rotations. So I just have to keep track of where I am, basically. But even still I enjoy creating a network diagram this way because there is so much control. And boy was it easy to replicate a diagram for another one which had a similar layout.

It only required the Pillow package. I am able to develop my diagrams on my local PC in my WSL environment. It’s nice and fast as well.

Example Output

This is an example output from this diagram as code approach which I produced over the last couple days, sufficiently blurred for sharing.

Network diagram (blurred) resulting from use of this code-first approach
Conclusion

I provide my auxiliary functions which permit creating “network diagrams as code.” The results are not pretty, but networking people will understand them.

References and related

I developed a way to blur images using the Python Pillow package.

CAD as code: openSCAD is what I had in mind in taking this code first approach to building up geometries.

My disorganized cheat sheet of python language features I most commonly use.

Categories
Admin Linux Network Technologies Web Site Technologies

The IT Detective Agency: This site can’t be reached

Intro

It’s been awhile since I’ve had the opportunity to relatean IT mystery. After awhile they are repates of what’s already happened in the past, or it’s too complex to relate, or I was only peripherally involved. But today I came across a good one. It falls into the never been seen before category.

The details

A web server behind my web application firewall became unreachable. In the browser they get a message This site can’t be reached. The app owners came to me looking for input. I checked the WAF and it was fine. The virtual server was looking healthy. So I took a packet trace, something to this effect:

$ tcpdump -nni 0.0 host 192.168.2.124

14:00:45.180349 IP 192.68.1.13.42045 > 192.68.2.124.443: Flags [S], seq 1106553901, win 23360, options [mss 1460,sackOK,TS val 3715803515 ecr 0], length 0 out slot1/tmm3 lis=/Common/was90extqa.drjohn.com.app/was90extqa.drjohn.com_vs port=0.53 trunk=
14:00:45.181081 IP 192.68.2.124 > 192.68.1.13: ICMP host 192.68.2.124 unreachable - admin prohibited filter, length 64 in slot1/tmm2 lis= port=0.47 trunk=
14:00:45.181239 IP 192.68.1.13.42045 > 192.68.2.124.443: Flags [R.], seq 1106553902, ack 0, win 0, length 0 out slot1/tmm3 lis=/Common/was90extqa.drjohn.com.app/was9
0extqa.drjohn.com port=0.53 trunk=

I’ve never seen that before, ICMP host 192.68.2.124 unreachable – admin prohibited filter. But I know ICMP can be used to relay out-of-band routing information on occasion, though I do not see it often. I suspect it is a BAD THING and forces the connection to be shut down. Question is, where was it coming from?

The communication is via a firewall so I check the firewall. I see a little more traffic so I narrow the filter down:

$ tcpdump -nni 0.0 host 192.168.2.124 host 443

And then I only see the initial SYN packet followed by the RST – from the same source IP! So since I didn’t see the bad ICMP packet on the firewall, but I do see it on the WAF, I preliminarily conclude the problem exists on the WAF.

Rookie mistake! Did you fall for it? So very, very often, in the heat of debugging, we invent some unit test which we’ve never done before, and we have to be satisified with the uncertainty in the testing method and hope to find a control test somehow, somewhere to validate our new unit test.

Although I very commonly do compound filters, in this case it makes no sense, as I realized a few minutes later. My port 443 filter would of course exclude logging the bad ICMP packets because ICMP does not use tcp port 443! So I took that out and re-run it. Yup. bad ICMP packet still present on the firewall, even on the interface of the firewall directly connected to the server.

So at this point I have proven to my satisfaction that this packet, which is ruining the communication, really comes frmo the server.

What the server guys say

Server support is outsourced. The vendor replies

As far as the patching activities go , there is nothing changed to the server except distro upgrading from 15.2 to 15.3. no other configs were changed. This is a regular procedure executed on almost all 15.2 servers in your environment. No other complains received so far…

So, the usual It’s not us, look somewhere else. So the app owner asks me for further guidance. I find it’s helpful to create a test that will convince the other party of the error with their service. And what is one test I would have liked to have seen but didn’t cnoduct? A packet trace on the server itself. So I write

I would suggest they (or you) do a packet trace on the server itself to prove to themselves that this server is not behaving ini an acceptable way, network-wise, if they see that same ICMP packet which I see.

The resolution

This kind of thing can often come to a stand-off, or many days can be wasted as an issue gets escalated to sufficiently competent technicians. In this case it wasn’t so bad. A few hours later the app owners write and mention that the home-grown local firewall seemed suspect to them. They dsabled it and this traffic began to work.

They are reaching out to the vendor to understand what may have happened.

Case: closed!

Conclusion

An IT mystery was resolved today – something we’ve never seen but were able to diagnose and overcome. We learned it’s sometimes a good thing to throw a wider net when seeing unexpected reset packets because maybe just maybe there is an ICMP host unreachable packet somewhere in the mix.

Most firewalls would just drop packets and you wait for a timeout. But this was a homegrown firewall running on SLES 15. So it abides by its own ways of working, I guess. So because of the RST, your connection closes quickly, not timing out as with a normal network firewall.

As always, one has to maintain an open mind as to the true source of an issue. What was working yesterday does not today. No one admits to changing anything. Finding clever ad hoc unit tests is the way forward, and don’t forget to validate the ad hoc test. We use curl a lot for these kinds of tests. A browser is a complex beast and too much of a black box.

Categories
Network Technologies

How to force snmpwalk to convert strings to numeric OIDs

Intro

It’s a little hard to find this information on the Internet, so I’m amplifying the correct answer here by using my blog.

The details

I’m not super-competent with MIBs and such, but I manage for my purposes with my basic understanding. I have access to an F5 bigip with various IPSEC tunnels on it. I want to use Zabbix to check the status of those tunnels. So I do an SMPwalk like this:

snmpwalk -v3 … -c public 127.0.0.1 F5-BIGIP-SYSTEM-MIB::sysIpsecSpdStatTunnelState

which produces output like this line:

F5-BIGIP-SYSTEM-MIB::sysIpsecSpdStatTunnelState.”/Common/tunnel-01″.58401 = STRING: up

But I cannot take that as it is and use it in an snmpget like this:

snmpget -v3 … -c public 127.0.0.1 F5-BIGIP-SYSTEM-MIB::sysIpsecSpdStatTunnelState.”/Common/tunnel-01″.58401

That produces an error like this:

Unknown Object Identifier (Index out of range: /Common/tunnel-01 (sysIpsecSpdStatTrafficSelectorName))

So we need to convert the string into a numeric OID. But how?

The answer

Use the -On switch as an additional argument in your snmpwalk.

You will get a scary long OID, but it will at least be numeric.

Gonig further

You can then deconstruct the response and reconstitute the section at the beginning with a nice name. For my F5 example

.1.3.6.1.4.1.3375.2.1.2.17.1.3.1.14

becomes

F5-BIGIP-SYSTEM-MIB::sysIpsecSpdStatBytes

I think. Then preserve the following digits as is.

Conclusion

We have shown how to output a numeric OID from an snmpwalk. This, specifically, is sueful in converting a string embedded in the output into a numeric OID, which may then be used by other SNMP applications such as Zabbix which may or may not have the MIB file loaded. The secret is simply to use the -On switch in snmpwalk.

References and related

My Zabbix FAQ – questions you wish they had answered, can be very helpful

Categories
Admin Network Technologies TCP/IP

Verizon Airspeed Hotspot uses ipv6 and interferes with VPN client Global Protect

Intro

The headline says it all. I got my shiny brand new Verizon hotspot from Walmart. I managed to activate it and add it to my Verizon account (not super easy, but after a few stumbles it did work.) I tried it out my home PC – works fine. I tried it out on my work PC. No good. My Global Protect connection was unstable. It connects for about a minute, then disconnects, then connects, etc. Basically unusable.

The details

I have heard of possible problem with the GP client (version 5.2.11) and IPv6. So I looked to see if this hotspot could be handing out IPv6 info. Yes. It is. But is that really making a difference? I concocted a simple test. I disabled IPv6 on my Wi-Fi adapter, then re-tested the GP client. The connection was smooth as glass! No disconnects!

Disable ipv6 on your Wi-Fi adapter

Bring up a powershell as administrator. Then:

get-netadapterbinding -componentid ms_tcpip6

will show you the current state of ipv6 on your adapters.

disable-netadapterbinding -Name “Wi-Fi” -ComponentID ms_tcpip6

will disable ipv6 on your Wi-Fi. And

enable-netadapterbinding -Name “Wi-Fi” -ComponentID ms_tcpip6

will re-enable it.

ipconfig /all output

For the record, here are some interesting bits from running ipconfig /all:

Wireless LAN adapter Wi-Fi:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel(R) Dual Band Wireless-AC 8265
Physical Address. . . . . . . . . : 0C-BD-94-98-11-5B
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
Temporary IPv6 Address. . . . . . : 2600:1001:b004:2b78:8ab:145c:d014:2edd(Deprecated)
IPv6 Address. . . . . . . . . . . : 2600:1001:b004:2b78:2cc0:71b0:7f1e:a973(Deprecated)
Link-local IPv6 Address . . . . . : fe80::2cc0:71b0:7f1e:a973%30(Preferred)
IPv4 Address. . . . . . . . . . . : 192.168.1.103(Preferred)

Subnet Mask . . . . . . . . . . . : 255.255.255.0
Lease Obtained. . . . . . . . . . : Thursday, April 21, 2022 4:54:04 PM
Lease Expires . . . . . . . . . . : Friday, April 22, 2022 4:54:04 AM
Default Gateway . . . . . . . . . : 192.168.1.1
DHCP Server . . . . . . . . . . . : 192.168.1.1
DHCPv6 IAID . . . . . . . . . . . : 302832932
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-28-89-F6-8E-B0-5C-DA-E6-09-0A
DNS Servers . . . . . . . . . . . : fe80::50ae:caff:fea8:1dbc%30
192.168.1.1
NetBIOS over Tcpip. . . . . . . . : Enabled

But, having done all that, I can only occasionally connect to GP. It seems to work slightly better at night. ipv6 does not seem to be the sole hiccup. No idea what the recipe for reliable success is. If I ever learn it I will publish it. Meanwhile, my phone’s hotspot, also VErizon, also handing out ipv6 info, usually permits me to connect to GP. It’s hard to see the difference.

Conclusion

The Verizon Airspeed Hotspot sends out a mix of IPv6 and IPv4 info to dhcp clients. Palo Alto Networks’ Global Protect client does not play well with that setup and wil not have a stable connection.

I do not think there is a way to disable IPv6 on the hotspot. However, for those with admin access it can be disabled on a Windows PC. And then GP will work just fine. Or not.

Oh, and by the way, otherwise the Airspeed works well and is an adequate solution where you need a good reliable hotspot. Well, in fact, don’t expect reliability like you have from a wired connection. After a couple hours, all users just got dropped for no apparent reason whatsoever.

Categories
Admin DNS Firewall Network Technologies TCP/IP

The IT detective agency: named times out tcp queries

Intro

I’ve been reliable running ISC’s BIND server for eons. Recently I had a problem getting my slave servers updated after a change to the primary master. What was going on there?

The details

This was truly a team effort. I saw that the zone file had differing serial numbers on the master versus the slave servers. My attempts to update via an rndc refresh zone was having no effect.

So I tried a zone transfer by hand: dig axfr drjohnstechtalk.com @50.17.188.196

That timed out!

Yet, regular dns qeuries went through fine: dig ns drjohnstechtakl.com @50.17.188.196

I thought about it and remembered zone transfers use TCP whereas standard queries use UDP. So I tried a TCP-based simple query: dig +tcp ns drjohnstechtalk.com @50.17.188.196. It timed out!

So of course one suspects the firewall, which is reasonable enough. And when I looked at the firewal I found some funny drops, though i cuoldn’t line them up exactly with my failed tests. But I’m not a firewall expert; I just muddle through.

The next day someone from the DNS group asked how local queries behaved? Hmm. never tried that. So I tried it: dig +tcp ns drjohnstechtalk.com @localhost. That timed out as well! That was a brilliant suggestion as we now could eliminate the firewall and all that complexity from the equation. Because I had tried to do packet traces on two different machines at the same time and line up the results. It wasn’t easy.

The whole issue was very concerning to us because we feared our secondaries would be unable to pudate their slave zones and ultimately time them out. The result would be devastating.

We have support, fortunately. A company that hearkens frmo the good old days, with real subject matter experts. But they’re extremely busy. We did not get a suggestion for a couple weeks. But eventually we did. They had seen this once before.

named time to respond to TCP-based queries

The above graph is from a Zabbix monitor showing how long it takes that dns server to respond to that simple query. 6 s is a time-out. I actually set dig to timeout at 2 s, but in wall-clock time it actually takes 6 s.

The fix

We removed this line from the options block of named.conf:

keep-response-order {any; };

The info fmo the experts is that most likely that was configured as a workaround to CVE-2019-6477 but that issue was fixed since 9.15.6.

Conclusion

We encountered the named daemon in a situation where it was unable to respond to TCP-based DNS queries and hence unable to do zone transfers. So although most queries use UDP, this was a serious issue for us and prevented zones from being updated on all authoritative nameservers.

As is the case with so many modern IT problems, the effect was not black or white. Failures were intermittent, and then permanent. A restart fixed ths issue (forgot to mention so far!). But we involved an expert to find the root cause and it was the presence of a single configuration line in our named.conf. After removing that all was good.

Categories
Admin Network Technologies Raspberry Pi

A nice alternative to speedtest.net for the DIY linux crowd

Intro

I was building some infrastucture around automated speedtest.net tests using speedtest-cli. I noticed the assigned servers keep changing, some servers are categorized as malicious sources, some time out if tested on the hour and half-hour, and results are inconsistent depending on which server you get.

So, I saw that the speedtest-cli (linux command-line python script) has a switch for a “mini” server. When I investigated that seemed the answer to the problem – you can set up your own mini server and use that for yuor tests. I.e., control both ends of the test. great.

The speedtest.net mini server was discontinued in 2017! There’s some commercial replacement. So I thought. Forget that. I was disillusioned and then happened upon a breath of fresh air – an open source alternative to speedtest.net. Enter, librespeed.

Some details

librespeed has a command-line program whih is an obvious rip-off of speedtest-cli. In fact it is called librespeed-cli and has many similar switches.

There is also a server setup. Really, just a few files you can put on any apache + php web server. There is a web GUI as well, but in fact I am not even that interested in that. And you don’t need to set it up at all.

What I like is that with the appropriate switches supplied to librespeed-cli, I can have it run against my own librespeed server. In some testing configurations I was getting 500 Mbps downloads. Under less favorable circumstances, much less.

Testing, testing, testing

I tested between Europe and the US. I tested through a proxy. I tested from the Azure cloud to an Amazon AWS server. I tested with a single cpu linux server (good old drjohnstechtakl.com) either as server, or as the client. This was all possible because I had full control over both ends.

Some tips
  1. Play with the speedtest-cli switches. See what works for you. librespeed-cli -h will shows you all the options.
  2. Increasing the stream count can compensate for slower PING times (assuming both ends have a fast connection)
  3. It does support proxy, but
  4. Downloads don’t really work through proxy if the server is only running http
  5. Counterintuitively, the cpu burden is on the client, not the server! My servers didn’t show the slightest bit of resource usage.
  6. Corollary to 5. My 4-cpu client to 1-cpu server test was much faster than the other way around where server and client roles were reversed.
  7. Most things aren’t sensitive to upload speeds anyway so seriously consider suppressing that test with the appropriate switch. Your tests will also run a lot faster (18 seconds versus 40 seconds).
  8. Worried about consuming too much bandwidth and transferring too much data? I also developed a solution for that (will be my next blog post)
  9. So I am running a librespeed server on my little VM on Amazon AWS but I can’t make it public for fear of getting overrun.
  10. ISPs that have excellent interconnects such as the various cloud providers are probably going to give the best results
  11. It is not true your web server needs write access to its directory in my experience. As long as you don’t care about sharing telemetry data and all that.
  12. To emphasize, they supply the speedtest-cli binary, pre-built, for a whole slew of OSes. You do not and should not compile it yourself. For a standard linux VM you will want the binary called librespeed-cli_1.0.9_linux_386.tar.gz
Example files

The point of these files is to test librespeed-cli, from the directory where you copied it to, against your own librespeed server.

json-ns6

[{
"name":"ns6, Germany (active-servers)",
"server":"https://ns6.drjohnstechtalk.com/",
"id":864,
"dlURL":"backend/garbage.php",
"ulURL":"backend/empty.php",
"pingURL":"backend/empty.php",
"getIpURL":"backend/getIP.php",
"sponsorName":"/dev/null/v",
"sponsorURL":"https://dev.nul.lv/"
}]

wrapper.sh

#!/bin/sh
# see ./librespeed-cli -help for all the options
./librespeed-cli --local-json json-ns6 --server 864 --simple --no-upload --no-icmp --ipv4 --concurrent 4 --skip-cert-verify

Purpose: are we getting good speeds?

My purpose in what I am constructing is to verify we are getting good download speeds. I am not trying to hit it out of the park. That consumes (read, wastes) a lot of resources. I am targeting to prove we can achieve about 150 Mbps downloads. I don’t know anyone who can point to 150 Mbps and honestly say that’s insufficient for them. For some setups that may take four simultaneous streams, for others six. But it is definitely achievable. By not going crazy we are saving a lot of data transfers. AWS charges me for my network usage. So a six stream download test at 150 Mbps (Megabits per second) consumes about 325 MBytes download data. If you’re not being careful with your switches, you can easily nudge that up to 1 GB downloads for a single test.

My librespeed client to server tests ran overnight alongside my old approach using speedtest. The speedtest results are all over the place, with a bunch of zeroes for whatever reason, as is typical, while librespeed – and mind you this is from a client in the US, going through a proxy, to a server in Europe – produced much more consistent results. In one case where the normal value was 130 mbps, it dipped down to 110 mbps.

Testing it out at home
Test from a home PC against my own librespeed server

I made my test URL on my AWS server private, but a public one is available at https://librespeed.de/

At home of course I want to test with a Raspberry Pi since I work with them so much. There is indeed a pre-built binary for Raspberry Pi. It is https://github.com/librespeed/speedtest-cli/releases/download/v1.0.9/librespeed-cli_1.0.9_linux_armv7.tar.gz

The problem with speedtest in more detail

There were two final issues with speedtest that were the straws that broke the camel’s back, and they are closely related.

When you resolve www.speedtest.net it hits a Content Distribution Network (CDN), and the returned results vary. For instance right now we get:

;; QUESTION SECTION:
;www.speedtest.net. IN A
;; ANSWER SECTION:
www.speedtest.net. 4301 IN CNAME zd.map.fastly.net.
zd.map.fastly.net. 9 IN A 151.101.66.219
zd.map.fastly.net. 9 IN A 151.101.194.219
zd.map.fastly.net. 9 IN A 151.101.2.219
zd.map.fastly.net. 9 IN A 151.101.130.219

Note that you can also run speedtest-cli with the –list switch to get a list of speedtest servers. So in my case I found some servers which procuced good results. There was one where I even know the guy who runs the ISP and know he does an excellent job. His speedtest server is 15 miles away. But, in its infinite wisdom, speedtest sometimes thinks my server is in Lousiana, and other times thinks it’s in New Jersey! So the returned server list is completely different for the two cases. And, even though each server gets assigned a unique number, and you can specify that number with the –server switch, it won’t run the test if that particular server wasn’t proposed to you in its initial listing. (It always makes a server listing call whether you specified –list or not, for its own purposes as to which servers to use.)

I tried to use some of tricks to override this behaviour, but short of re-writing the whole thing, it was not going to work. I imagined I could force speedtest-cli to always use a particular IP address, overwriting the return from the fastly results, but getting that to work through proxy was not feasible. On the other hand if you suck it up and accept their randomly assigned server, you have to put up with a lot of garbage results.

So set up your own server, right? The –mini switch seems built to accommodate that. But the mini server was discontinued in 2017. The commercial replacement seemed to have some limits. So it’s dead end upon dead end with speedtest.net.

Conclusion

An open source alternative to speedtest.net’s speedtest-cli has been identified and tested, both server and client. It is librespeed. It gives you a lot more control than speedtest, if that is your thing and you know a smidgeon of linux.

References and related

(2024 update) Cloudflare has a really nice speed test without all the bloat: https://speed.cloudflare.com/

Just to do your own test with your browser the way you do with speedtest.net: https://librespeed.de/

librespeed-cli: https://github.com/librespeed/speedtest-cli

librespeed-cli binaries download page: https://github.com/librespeed/speedtest-cli/releases

RPi version of librespeed-cli: https://github.com/librespeed/speedtest-cli/releases/download/v1.0.9/librespeed-cli_1.0.9_linux_armv7.tar.gz

The RPi I use for automatically power cycling my cable modem is hard-wired to my router and makes for an excellent platform from which to conduct these speedtests.

librespeed server: https://github.com/librespeed/speedtest . You basically can git clone it (just a bunch of js and php files) from: https://github.com/librespeed/speedtest.git

If, in spite of every positive thing I’ve had to say about librespeed, you still want to try the more commercial speedtest-cli, here is that link: https://www.speedtest.net/apps/cli

In this context a lot of people feel iperf is also worth exploring. I think it is a built-in linux command.

To kick it up a notch for professional-class bandwidth and availability measurements, ThousandEyes is the way to go. This discussion is very enlightening: https://www.thousandeyes.com/blog/caveats-of-traditional-network-tools-iperf

Categories
Admin Network Technologies

The IT Dective Agency: someone stole my switch port

Intro

A complex environment produces some too-strange-to-be-true type of issues. Yesterday was one of those days. Let me try to set this up like a script from a play.

The setting

A non-descript server room somewhere in the greater New York City area.

The equipment

A generic security appliance we’ll call ThousandEyes PX, just to make up a name.

Cisco Nexus 7K plus a FEX

The players

Dr John – the protagonist

PCT – a generic network vendor

Florence Ranjard – an admin of ThousandEyes PX in France

Shake Abel – a server room resource in PA

Cloud Johnson – someone in Request Management

Bill Otto – a network guy at heart, forced to deal with his now vendor-managed network via ITIL

The processes

ITIL – look it up

Scene 1

An email from Dr John….

Hi Bill,

Thanks.

Well that’s messed up, as they say. I wouldn’t believe it if I hadn’t seen it for myself. Someone, “stole” our port and assigned it to a different device on a different vlan – despite the fact that it was in active use!

I guess I will try to “steal” it back, assuming I can find the IT Catalog article, or maybe with the help of Cloud.

Fortunately I have console access to the Fireeye. I artificially introduced traffic, which I see reflected in the port statistics. So I know the ThousandEyes is still connected to this port, despite the wrong vlan and description.

Regards,

Dr John

Scene 2

One Week earlier

Siting at home due to Covid, Florence realizes she can no longer access the management port of her group’s ThousandEyes security appliance located in another continent. She beings to investigate and even contacts the vendor…

Scene 3

This exciting script is to be contiued, hopefully