Categories
Consumer Tech

Consumer Tech: Dyson Animal vacuum cleaner stops and starts

Intro

I am kind of annoyed that my simple problem with a Dyson Animal vacuum cleaner took way more Internet research than should have been the case to resolve. I hope to spare someone else that grief.

The details

Actually this is a friend’s vacuum cleaner. I don’t think it would have played out this way had I been the exclusive user of it. She noted that it stops and starts. It goes for a few seconds and stops. Then you can start it up for a few more seconds. And so on.

More clues

This model shows the charge remaining on the battery. Still two bars out of three, so that’s good. It’s been properly charged. After examining it I get the gut feeling that the brush motor shouldn’t draw all that much power.

We also note that when the long tube is removed and its just acting as a short hand-held device, it doesn’t stop.

After doing some Internet research (not finding an exact hit for this model), I am inspired to take things apart and check all the filters. On other models the filters can do you in. This one has been very lightly used to date and so the filters are remarkably clean. No filter issue.

The solution

So I check the brush and that area. It is simply clogged with gray dirt. I begin to clear it out with my fingers – seems no easy way to do it – until all the gray dirt is gone from the brush area where it goes up into the tube.

The results

The results are in. Works like a champ now!

Comment

I am more used to a more obscure brand of vacuum cleaner, Miele. It has a simple orange status thing that visibly shows you when it can’t suck in dirt or what not due to a full bag, a clog, whatever. In addition you can pretty much hear the pitch of the motor’w whine increase when there’s a clog. With the Dyson I didn’t notice that pitch change, nor was there any indicator of a cloged system. So Dyson’s design is faulty.

Conclusion

A Dyson Animal vacuum cleaner was only running for a few seconds at a time. After a few more seconds it could run again. The power level showed two of three led bars. The filters were all clean, but the brush head was clogged with dirt. Cleaning that out fixed everything nicely.

Dyson’s design has to be faulted for not making this clogged situation more evident. Strange, coming from a company which obviously prides itself on its innovative design.

Categories
Security Web Site Technologies

Setting up lftp to do ftp over ssl

Intro

I have seen too much advice on the Internet for resolving problems when one encounters erroers of this sort when using the lftp client on linux:

mput: myfile.log: Fatal error: Certificate verification: unable to get local issuer certificate (3E:...)

or this one:

mput: myfile.log: Fatal error: Certificate verification: unable to get issuer certificate (4A:...)

You research that and get a lot of htis that recommend to

“set ssl:verify-certificate false inside the lftp command”

But, you know, security-wise, that isn’t such a hot approach. And you can do better with just a bit more effort.

The details

Examine what certificate the ftp server is using with this openssl command:

$ openssl s_client -showcerts -connect example.com:21 -starttls ftp

The privatre pki scenario

I’m imagining a scenario where yuo are in a world where a private pki reigns. In that case you want to just make sure lftp knows where to find the private root CA and possibly the intermediate CA.

To be continued…

Categories
Linux Raspberry Pi

Raspberry Pi + LED Matrix Display project

Intro

A friend bought a bunch of parts and thought we could work on it together. He wanted to reproduce this project: Raspberry Pi Audio Spectrum Display – Hackster.io

So of interest here is that it involved a 64×64 LED matrix display, and a “hat” sold by Adafruit to supposedly make things easier to connect.

Now I am not a hardware guy and never pretended to be. When we realized that it required soldering, I bought those supplies but I didn’t want to be the one to mess things up so he volunteered for that. And yes, it was a mess.

Equipment

See later on in the post for the equipment we used.

The story continues

Who knew that gold on the PCB could be ruined so easily after you’ve changed your mind about a soldered joint and decided to undo what you’ve done? The experience pretty much validated my whole approach to staying away from soldering. So we ruined that hat and had to order another one. While we waited for it I developed a certain strtategy to deal with the shortcomings of the partially destroyed hat. See the section at the bottom entitled Recipe for a broken hat.

Comments on the project

I don’t think the guy did a good job. Details left out where needed, extra stuff added. For instance you don’t need to order that Firebeetle – it’s never used! Also, I’ve been told that his python isn’t very good either. But in general we followed his skimpy instructions. So we ordered the LED matrix from DFRobot, not Adafruit, and I think it’s different. In our case we did indeed follow the project suggestion to wire the 2×8 pin as shown in their (DFRobot’s) picture, leaving out the white wire. Once we soldered the white wire to the hat’s gpio pin 24, we were really in business. What we did not need to do is to solder pin E to either pin 8 or 16. (This is something you apparently need to do for the Adafruit LED.) In our testing it didn’t seem to matter whether or not those connections were made so we left it out on our second hat.

You think he might have mentioned just how much soldering is involed. Let’s see you have 2×20 connector, a 2×8 connector plus a single gpio connection = 57 pins to solder. Yuck.

What we got to work

Deploying images on these LED displays is cool. You just kind of have to see it. It’s hard to describe why. The picture below does not do it justice. Think stadium scoreboard.

In rpi-rgb-led-matrix/utils directory we followed the steps in the README.md file to compile the LED viewer:

sudo apt-get update
sudo apt-get install libgraphicsmagick++-dev libwebp-dev -y
make led-image-viewer
                    
#!/bin/sh
# invert images because the sound stuff is otherwise upside-down
sudo led-image-viewer –led-pixel-mapper=”U-mapper;Rotate:180″ –led-gpio-mapping=adafruit-hat –led-cols=64 –led-rows=64 /home/pi/walk-in-the-woods.jpg
Walk in the woods

Do we have flicker? Just a tiny bit. You wouldn’t notice it unless yuo were staring at it for a few seconds, and even then it’s just isolated to a small section of the display. Probably shoddy soldering – we are total amateurs.

Tip for your images

Consider that you only have 64 x 64 pixles to work with. So crop your pictures beforehand to focus on the most interesting aspect – people if there are people in the picture (like we’ve done in the above image), specifically faces if there are faces. Otherwise everything will just look like blurs and blobs. You yourself do not have to resize your pictures down to 64 x 64 – the led software will do the resizing. So just focus on cropping down to a square-sized part of the picture you want to draw attention to.

Real-time audio

So my friend got a USB microphone. I developed the following script to make the python example work with real-time sounds – music playing, conversatiom, whatever. It’s really cool – just the slightest lag. But, yes, the LEDs bounce up in response to louder sounds.

So in the directory rpi-rgb-led-matrix/bindings/python/samples I created the script drjexample.sh.

                    

#!/bin/sh
# DrJ 6/21
# make the LED react to live sounds by use of a USB microphone
# I am too lazy to look up how to make the python program read from STDIN so I will just
# make the equivalent thing by creating test.wav as a nmed pipe. It's an old linux trick.

rm test.wav; mkfifo test.wav

# background the python program. It will patiently wait for input
sudo python spectrum_matrix.py &

# Now run ffmpeg
# see my own post, https://drjohnstechtalk.com/blog/2019/04/live-stream-to-youtube-from-a-raspberry-pi-webcam/
ffmpeg \
-thread_queue_size 4096 \
-f alsa -i plughw:1,0 \
-ac 2 \
-y \
test.wav

So note that by having inverted the image (180 degree rotation) we have the sounds bars and images both in the same direction so we can switch between the two modes.

I believe to get the python bindings to work we needed to install some additional python libraries, but that part is kind of a blur now. I think what should work is to follow the directions in the README.md file in the directory rpi-rgb-led-matrix/bindings/python, namely

sudo apt-get update && sudo apt-get install python2.7-dev python-pillow -y
make build-python
sudo make install-python

Hopefully that takes care of it. For sure you need numpy.

Future project ideas

How about a board that normally plays a slideshow, but when the ambient sound reaches a certain level – presumably because music is playing – it switches to real-time sound bar mode?? We think it’s doable.

Recipe for a broken hat

For the LED matrix display we got the DFRobot one since that’s what was linked to in the project guide. But the thing is, the reviewer’s write-up is incomplete so what you need to do involves a little guesswork.

At the end of the day all we could salvage while we wait for a new Adafruit hat to come in is the top fourth of the display! The band below it is either blank, or if we push on the cables a certain way, an unreliable duplicate of the top fourth.

The next band suffers from a different problem. Its blue is non-functional. So it’s no good…

And the last band rarely comes on at all.

OK? So we’re down to a 16 x 64 pixel useable area.

But despite all those problems, it’s still kind of cool, I have to admit! I know at work we have these digital sign boards and this reminds me of that. So first thing I did was to create a custom banner – scrolling text.

I call this display program drjexample2.sh. I put it in the directory rpi-rgb-led-matrix/examples-api-use

                    
#!/bin/sh
# DrJ – 6/21
# nice example
sudo ./demo -D1 –led-rows=64 –led-cols=64 –led-multiplexing=1 –led-brightness=50 -m25 /home/pi/pil_text.ppm

But that requires the existence of a ppm file containing the text I wanted to scroll, since I was working by example. So to create that custom PPM file I created this python script.

                    
#!/usr/bin/python3
from PIL import Image, ImageDraw, ImageFont

# our fonts
###fnt = ImageFont.truetype(“/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf”, 14)
fnt = ImageFont.truetype(“/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf”, 16)
fnt36 = ImageFont.truetype(“/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf”, 36)
fnt2 = ImageFont.truetype(“/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf”, 18)
fntBold = ImageFont.truetype(“/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf”, 40)

###img = Image.new(‘RGB’, (260, 30), color = (73, 109, 137))
img = Image.new(‘RGB’, (260, 30), color = ‘black’)

d = ImageDraw.Draw(img)
###d.text((0,0), “Baby, welcome to our world!!!”, font=fnt, fill=(255,255,0))
d.text((0,0), “Baby, welcome to our world!!!”, font=fnt, fill=’yellow’)

#img.save(‘pil_text.png’)
img.save(‘pil_text.ppm’, format=’PPM’)

Once I discovered the problem with the bands – by way of running all the demos and experimenting with the arguments a bit – I noticed this directory: rpi-rgb-led-matrix/utils. I perked right up because it held out the promise of displaying jpeg images. Anyone who has seen any of my posts know that I am constantly putting out raspberry pi based photo displays in one form or another. For instance see https://drjohnstechtalk.com/blog/2021/01/raspberry-pi-photo-frame-using-the-pictures-on-your-google-drive-ii/

or

https://drjohnstechtalk.com/blog/2020/12/raspberry-pi-photo-frame-using-your-pictures-on-your-google-drive/

Matrix Display

But see how cool it is? No? It’s a sleeping, recumbent baby. It’s like the further away from it you get, the clearer it becomes. Trust me in person it does look good. And it feels like creating one of those bright LED displays they use in ballparks.

But this same picture also shows the banding problem.

To get the picture displayed I first cropped it to make it wide and short. I wisely chose a picture which was amenable to that approach. I created this display script:

                    
#!/bin/sh
sudo led-image-viewer –led-no-hardware-pulse –led-gpio-mapping=adafruit-hat –led-cols=64 –led-rows=64 –led-multiplexing=1 /home/pi/baby-sleeping.jpg

But before that can all work, you need to compile the program. Just read through the README.md. It gives these instructions which I followed:

sudo apt-get update
sudo apt-get install libgraphicsmagick++-dev libwebp-dev -y
make led-image-viewer

It went kind of slowly on my RPi 3, but it worked without incident. When I initially ran the led-image-viewer nothing displayed. So the script above shows the results of my experimentation which seems appropriate for our particular matrix display.

How did we get here?

Just to mention it, we followed the general instructions in that project. So I guess no need to repeat the recipe here.

Slideshow

You know I’m not going to let an opportunity to create a slideshow go to waste. So i created a second appropriately horizontal image which I might effectively show in my narrow available band of 16 x 64 pixels. just to share the little script, it is here:

                    
#!/bin/sh
cd /home/pi
sudo led-image-viewer –led-no-hardware-pulse –led-gpio-mapping=adafruit-hat –led-cols=64 –led-rows=64 –led-multiplexing=1 -w6 -f baby-and-mom.jpg baby-sleeping.jpg

This infinitely loops over two pictures, displaying each for six seconds.

Start on boot

I have the slideshow start on boot using a simple technique I’ve developed. You edit the crontab file (crontab -e) and enter at the bottom:

                    
@reboot sleep 35; /home/pi/rpi-rgb-led-matrix/utils/jdrexample2.sh > drjexample.log 2>&1
Lessons learned

My friend ordered all the stuff listed on the DFRobot project page, including their 64×64 LED matrix. Probably a mistake. They basically don’t document it and refer you to Adafruit, where they deal with a 64×64 LED matrix – their own – which may or may not have the same characteristics, leaving you somewhat in limbo. Next time I would order from Adafruit.

As mentioned above that gold foil on printed circuit boards really does come off pretty easily, and then you’re hosed. Because of the lack of technical specs we were never really sure if we needed to solder the E contact to 8 or to 16 and destroyed all those terminals in the process of backing out.

I actually created custom ppm files of solid colors, red, blue, green, white, so that I could prove my suspicions about the third band. Red and green display fine, blue not at all. White displays as yellow.

Viewed close up, the LED matrix doesn’t look like much, and of course I was close up when I was working with it. But when I stepped back I realized how beautiful a brightly illuminated picture of a baby can be! The pixels merge and your mind fills in the spaces between I guess.

The original idea was to tackle sound but I got stuck on the ability to use it as a photo frame (you know me). But he wants to return to sound which I am dreading….

Testing audio

In our first tests. the audio example wasn’t working. But now it seems to be. The project guy’s python code is named spectrum_matrix.py if I recall correctly. It goes into rpi-rgb-led-matrix/bindings/python/samples. And as he says, you run it from that directory as

$ sudo python spectrum_matrix.py

But, his link to test.wav is dead – yet another deficiency in his write-up. At least in my testing not every possible WAV file may work. This one, moo sounds, does however. http://soundbible.com/grab.php?id=1778&type=wav So, it plays for a few seconds – I can hear it through earphones – and the LEDs kind of go up and down. We recorded a wav file and found that that does not work. The error reads like this:

Home directory not accessible: Permission denied
W: [pulseaudio] core-util.c: Failed to open configuration file '/root/.config/pulse//daemon.conf': Permission denied
W: [pulseaudio] daemon-conf.c: Failed to open configuration file: Permission denied
Traceback (most recent call last):
File "spectrum_matrix.py.orig", line 56, in
matrix = calculate_levels(data,chunk,sample_rate)
File "spectrum_matrix.py.orig", line 49, in calculate_levels
power = np.reshape(power,(64,chunk/64))
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 292, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
ValueError: cannot reshape array of size 2048 into shape (64,64)

Note that I had renamed the original spectrum_matrix.py to spectrum_matrix.py.orig because we started messing with it. Actually, I pretty much get the same error on the file that works; it’s just that I get it at the end of the LED show, not immediately.

Superficially, the two files differ somewhat in their recording format:

$ file ~/voice.wav moo.wav

/home/pi/voice.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
moo.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, stereo 32000 Hz

I played the voice.wav on a Windows PC – it played just fine. Just a little soft.

So what’s the essential difference between the two files? Well, something that jumps out is that the one is mono, the other stereo. Can we somehow test for that? Yes! I made the simplest possible conversion af a mono to a stereo file with the following ffmpeg command:

$ ffmpeg -i ~/voice.wav -ac 2 converted.wav

$ file converted.wav


converted.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, stereo 44100 Hz

I copy converted.wav to test.wav and re-run spectrum_matrix.py. This time it works!

Not sure how my friend produced his wave file. But I want to make one on my own. He had plugged a USB microphone into the RPi. I have done research somewhat related to this – publishing a livestream to Youtube, audio only, video grayed out. That’s in this post: https://drjohnstechtalk.com/blog/2019/04/live-stream-to-youtube-from-a-raspberry-pi-webcam/ So I am not afraid to se ffmpeg any longer. So I created this tiny script, record.sh, with my desired arguments:

                    
#!/bin/sh
# see my own post, https://drjohnstechtalk.com/blog/2019/04/live-stream-to-youtube-from-a-raspberry-pi-webcam/
ffmpeg \
-thread_queue_size 4096 \
-f alsa -i plughw:1,0 \
-ac 2 \
/tmp/ffmpeg.wav

And I ran it while speaking loudly into the mic. It ran OK. The output file comes out as

test.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, stereo 48000 Hz

And…it plays with the LEDs dancing. Great.

Audio: LED responds to live input

My friend wants the LED to respond to live input such as his stereo at home. Being a terrible python programmer but at least middling linux techie, I see a way to accomplish it without his having to touch the sample program by employing an old Unix trick: named pipes. So I create this script, drjexample.sh which combines all the knowledge we gained above into one simple script:

                    

#!/bin/sh
# DrJ 6/21
# make the LED react to live sounds by use of a USB microphone
# I am too lazy to look up how to make the python program read from STDIN so i will just
# make the equivalent thing by creating test.wav as a nmed pipe. It's an old linux trick.

rm test.wav; mkfifo test.wav

# background the python program. It will patiently wait for input
sudo python spectrum_matrix.py &

# Now run ffmpeg
# see my own post, https://drjohnstechtalk.com/blog/2019/04/live-stream-to-youtube-from-a-raspberry-pi-webcam/
ffmpeg \
-thread_queue_size 4096 \
-f alsa -i plughw:1,0 \
-ac 2 \
-y \
test.wav

So a named pipe is just that. Instead of the pipe character we know and love, you coordinate process output from the first process with process input of the second process by way of this special file. The operating system does all the hard work. But it works just as though you had used the | character.

Best of all, this script actually works, to wit, the LED is now responding to live input. I see it jump when I say test into the microphone. Unknown to me is if it will play for extended periods of time – it would be easy for one process to output faster than the other can input for instance, so a backlog builds up. The responsiveness is good, I would guess around no more than 200 ms lag.

Equipment

We used the equipment from this post, except the Firebeetle. You know, that’s just another reason I consider that post to be a sloppy effort. Who lists a piece of equipment that they don’t use?? And again, next time I would rather search for an LED display from Adafruit. We use an RPi 3 and installed the image on a micro SD card with the new-ish Raspberry Pi imager, which works just great: Introducing Raspberry Pi Imager, our new imaging utility – Raspberry Pi

Oh, plus the soldering iron and solder. And a multi-colored ribbon cable with female couplers at one end and a 2×8 connector on the other. Not sure if that came with the LED or not since I didn’t order it.

Our power supply is about 5 amps and plugs into the hat. We do not need power for the RPi.

References and related

This post makes it seem like a walk in the park. Our experience is not so much. Raspberry Pi Audio Spectrum Display – Hackster.io

People seem to like this Raspberry Pi photo frame post I did which draws photos from your Google drive. https://drjohnstechtalk.com/blog/2020/12/raspberry-pi-photo-frame-using-your-pictures-on-your-google-drive/

Introducing Raspberry Pi Imager, our new imaging utility – Raspberry Pi – for putting the operating system formerly known as Raspbian onto a micro SD card.

test.wav (Use this moo wav file and rename to test.wav): http://soundbible.com/grab.php?id=1778&type=wav

Useful ffmpeg commands: 20+ FFmpeg Commands For Beginners – OSTechNix

How I figured out hot to livestream audio to YouTube without video from a RPi using ffmpeg is documented here: https://drjohnstechtalk.com/blog/2019/04/live-stream-to-youtube-from-a-raspberry-pi-webcam/

Github project for this effort (not completely working as of yet):

Categories
Admin Firewall Proxy

Checkpoint SYN Defender: what you don’t know can hurt you

Intro

Our EDI group hails me last Friday and says they can’t reach their VANs, or at best intermittently. What to do, what to do… I go on the offensive and say they have to stop using FTP (and that’s literal FTP, not sftp, not FTPs, just plain old FTP), it’s been out of date for at least 15 years.

But that wasn’t really helping the situation, so I had to dig a lot deeper. And frankly, I was coincidentally having intermittent issues with my scripted speedtests. Could the two be related?

The details

To be continued, hopefully.

References and related

VAN: Value Added Network

Categories
Admin Network Technologies

The IT Dective Agency: someone stole my switch port

Intro

A complex environment produces some too-strange-to-be-true type of issues. Yesterday was one of those days. Let me try to set this up like a script from a play.

The setting

A non-descript server room somewhere in the greater New York City area.

The equipment

A generic security appliance we’ll call ThousandEyes PX, just to make up a name.

Cisco Nexus 7K plus a FEX

The players

Dr John – the protagonist

PCT – a generic network vendor

Florence Ranjard – an admin of ThousandEyes PX in France

Shake Abel – a server room resource in PA

Cloud Johnson – someone in Request Management

Bill Otto – a network guy at heart, forced to deal with his now vendor-managed network via ITIL

The processes

ITIL – look it up

Scene 1

An email from Dr John….

Hi Bill,

Thanks.

Well that’s messed up, as they say. I wouldn’t believe it if I hadn’t seen it for myself. Someone, “stole” our port and assigned it to a different device on a different vlan – despite the fact that it was in active use!

I guess I will try to “steal” it back, assuming I can find the IT Catalog article, or maybe with the help of Cloud.

Fortunately I have console access to the Fireeye. I artificially introduced traffic, which I see reflected in the port statistics. So I know the ThousandEyes is still connected to this port, despite the wrong vlan and description.

Regards,

Dr John

Scene 2

One Week earlier

Siting at home due to Covid, Florence realizes she can no longer access the management port of her group’s ThousandEyes security appliance located in another continent. She beings to investigate and even contacts the vendor…

Scene 3

This exciting script is to be contiued, hopefully

Categories
Admin Linux Network Technologies

Speedtest automation: what they don’t tell you

Intro

I began to implement the autmoation of speedtest checks. I was running the jobs every 10 minutes, but we noticed something flaky in the results. On the hour and on the half hour the tests seemed to be garbage. What’s going on?

Our findings

Well, if you use a scheduler to run a speedtest every 10 minutes, it will start exactly on the hour and exactly on the half-hour, amongst other start times. We were running it eight times to test eight different paths. Only the last two were returning reliable results. The early ones were throwing errors. So I introduced an offset to run the jobs at 2,12,22,32,42,52 minutes. And with this offset, the results became much more reliable.

The inevitable conclusion is that too many other people are running tests exactly on the hour and half hour. A single run takes roughly 30 seconds to complete. And it must be that the servers which speedtest rely on are simply overwhelmed and refuse to do more tests.

References and related

There is a linux script written in python that implements the full speedtest. https://pypi.org/project/speedtest-cli/ It really works, which is cool.

But as well there is a RPM package you can get from the speedtest site itself.

Categories
Network Technologies

When is a switch not a switch: when it’s a Cisco Nexus

Intro

I don’t think we ask a lot of our switches. A packet comes in on this port and goes out on this other port. That sort of thing. But, apparently, when the switch is a Cisco Nexus, that is asking too much. Maybe it will go to that other port. Let’s see, it depends who sent it. So then again, maybe not. So maybe device A can ping device B. Device B cannot ping device A. But it can ping device C and device C can ping A. Or maybe Jack can ping device 1 but not device 2 on the same subnet. Yet Jill can ping device 2 but not device 1!

Yes. Those are not theoretical examples, but, sadly, actual examples lifted from recent massive debugging sessions I’ve participated in, which, eventually, focused on the Nexus switch not treating all traffic as it should. Another example: larger packets were not getting through on the same vlan.

Sometimes the Cisco support engineer is too clever by half. We had one find a few errors on supervisor board 5 so we switched to board 6. That did nothing. We finally noticed that all problems were related to FEXes connected to slot 8. As a test we plugged a FEX into a different slot and, voila, things began to work better. But the Cisco guy did not see any errors on slot 8.

Well, you only get the same Cisco engineer for a limited time. Then their shift turns over and you get to explain the whole problem all over again to the next one. Yeah, supposedly they’re briefed, but ni reality not so much. So finally the second engineer was a little more pragmatic than the first too-clever guy. This is a quote from her: “I do not see errors on that module, but logically, it has to be the problem.” This of course is after we presented all the evidence to her. So we ordered an RMA, replaced that board and yes everything began to work.

I’ve been involved in two such massive debugging session over the last four months. No one from the Cisco side is saying it, but I will. These Nexus switches define flows behind the scenes. They can probably do some very clever things with these flows. But it is no longer a simple switch. And, sadly, when the flows don’t all work, no one at Cisco is prepared to look for that issue or even consider it as a possibility. They look for obscure errors. And, heaven forbid, they are, I guess, incapable of proving to themselves their switch as at fault – eating packets – by doing what any first semester network class would have, namely, mirroring some of the ports and examining the traffic for themselves. It is instead up to the customer to prove the switch is at fault.

So in my first debugging session, which lasted about 16 hours, Cisco did not really find the errors that would lead them to believe slot 8 was a problem. Yet its replacement fixed everything. In fairness, in the second debugging session, which only lasted five hours or so, they did see errors which justified a replacement. But at no time did they ever use the word flow or offer in detail how the errors they saw could have produced the weird results we were seeing (that was the Jack and Jill example from above).

Conclusion

We guys who run servers would like to think a switch is a switch is a switch. But Cisco is requiring us to up our game, even if they don’t up theirs. Their Nexus switches absolutely can eat some of your network traffic while passing other over the same ports. I’d like to call that flows, even if they refuse to use that term.

Categories
Admin Network Technologies

What is the one DHCP problem managed network providers never recognize?

Answer: the one where their switch eats the DHCPDISCOVER packets. And the zmzaing thing is they never learn. And the second amazing thing is that they actually don’t apply the most basic networing debugging techniques when such a problem occurs. I’m talking your basic, DHCPDISCOVER packet goes to yuor switch, same DHCPDISCOVER packet never arrives to the DHCP server on same switch. We know it to be the case, but, to help convince yourself that your switch is eating the pakcets, do networky things like create a span port of the DHCP server’s port to prove to yourself that no DHCP requests are coming in. And yet, they are never prepared to do that, to propose that. So instead indirect proxies are used to draw the conclusion.

I’ve been involved in three of four such debugging sessions. They take hours. I took notes when it happened again this weekend. I guess that setup is pretty typical of how it plays out. A data center was moved, including a DHCP server. The new data center has a MAN network to the old one. All IPs were preserved. When they turned on the moved DHCP server DHCP lease were no longer getting handed out. In fact it was worse than that. With the moved DHCP sever turned off, most DHCP leases were working. But with it on, that’s when things really began to go south!

Here’s the switch port they noted for the iDRAC:

sh run int gi1/0/24
Building configuration…
Current configuration : 233 bytes
!
interface GigabitEthernet1/0/24
description --- To-cnshis01-iDRAC - iDRAC
switchport access vlan 202
switchport mode access
logging event link-status
speed 100
duplex full
spanning-tree portfast
ip dhcp snooping trust
end

The first line of course if the IOS command. OK, so they had that on the iDRAC, right. But on the actual server port they had this:

sh run int gi1/0/23
Building configuration…
Current configuration : 258 bytes
!
interface GigabitEthernet1/0/23
description --- To-cnshis01-Gb1 - Gb1
switchport access vlan 202
switchport mode access
logging event link-status
spanning-tree portfast
service-policy input PMAP_COS_REMARK_IN
service-policy output PMAP_COS_OUT
end

I basically told them cheekily up front that this is usually a network switch problem and that they have to play with the DHCP snooping enable setting.

And I have to say that the usual hours of debugging were short-circuited this time as they seemed to believe me, and simply experimented by adding

ip dhcp snooping trust

to the DHCP server’s main port. We immediately began seeing DHCPDISCOVER pakcets come in to the DHCP server, and the team testified that people were getting leases.

Final mystery explained

Now why were things behaving really badly – no leases – when the DHCP server was up but no DHCPDISCOVER requests were getting to it? I have the explanation for that as well. You see theer is a standby DHCP server which is designed for failure of the primary DHCP server. But not for this type of failure! That’s right. There is an out-of-band (by that I mean not carried over DHCP ports like UDP port 67) communication between standby and primary which tells the standby Hey, although you got this DHCPDISCOVER request, ignore it becasue the primary is active and will serve it! And meanwhile, as we have said, the primary wasn’t getting the requests at all. Upshot: no one gets leases.

Just to mention it

My second-to-last debugging session of this sort was a little different. There they mentioned that there was a “global setting” which governed this DHCP snooping on the switch. So they had to do something with that (enable or disable or something). So there was no issue with the individual switch ports. For me that’s just a variation on the same theme.

What’s the idea behind this feature?

Having done a total of zero minutes of research on the topic, I will anyway weigh in with my opinion! Suppose someone comes along and plugs in a consumer grade home router into your network. It’s probably going to act as a rogue DHCP server. Imagine the fun trying to debug that situation? We’ve all been there… These rogue devices are probably fairly common. So if your corporate switch doesn’t suppress certain DHCP packets from ports where they are not expected, then this rogue device will begin to take down your subnet and totally bewilder everyone. I imagine this setting that is the topic of this blog post stems from trying to suppress all unknown DHCP packets in advance. Its just that sometimes the setting is taken too far and, e.g., a firewall which relays DHCP requests is also getting its DHCP packets suppressed.

Conclusion

I normally would have presented this as part of my IT Detective series. But I feel this is more like a lament about the sad state of affairs with our network providers. And though I’ve seen this issue about four times in the past 12 months, they always act like they have no idea what we’re talking about. They’ve never encountered this problem. They have no idea how to fix it. And they have no idea how to further debug it.. What steps does the customer wish?

References and related

Juat because I mentioned it, here’s on of those IT Detective Agency blog posts: The IT Detecive Agency: web site not accessible

Categories
Admin Web Site Technologies

TCL iRule program with comments for F5 BigIP

Intro

A publicity-adverse colleague of mine wrote this amazing program. I wanted to publish it not so much for what it specifically does, but as well for the programming techniques it uses. I personally find i relatively hard to look up concepts when using TCL for an F5 iRule.

Program Introduction

Test

                    
# RULE_INIT is executed once every time the iRule is saved or on reboot. So it is ideal for persistent data that is shared accross all sessions.
# In our case it is used to define a template with some variables that are later substituted

when RULE_INIT {
# "static" variables in iRules are global and read only. Unlike regular TCL global variables they are CMP-friendly, that means they don't break the F5 clustered multi-processing mechanism. They exist in memory once per CMP instance. Unlike regular variables that exist once per session / iRule execution. Read more about it here: https://devcentral.f5.com/s/articles/getting-started-with-irules-variables-20403
#
# One thing to be careful about is not to define the same static variable twice in multiple iRules. As they are global, the last iRule saved overwrites any previous values.
# Originally the idea was to load an iFile here. That's also the main reason to even use RULE_INIT and static variables. The reasoning was (and I don't even know if this is true), that loading the iFile into memory once would have to be more efficient than to do it every time the iRule is executed. However, it is entirely possible that F5 already optimized iFiles in a way that loads them into memory automatically at opportune times, so this might be completely unnecessary.
# Either way, as you can tell, in the end I didn't even use iFiles. The reason for that is simply visibility. iFiles can't be easily viewed from the web UI, so it would be quite inconvenient to work with.
# The template idea and the RULE_INIT event stayed, even though it doesn't really serve a purpose, except maybe visually separating the templates from the rest of the code.
#
# As for the actual content of the variable: First thing to note is the use of  {} to escape the entire string. Works perfectly, even though the string itself contains braces. TCL magic.
# The rest is just the actual PAC file, with strategically placed TCL variables in the form of $name (this becomes important later)

            set static::pacfiletemplate {function FindProxyForURL(url, host)
{
            var globalbypass = "$globalbypass";
            var localbypass = "$localbypass";
            var ceglobalbypass = "$ceglobalbypass";
            var zpaglobalbypass = "$zpaglobalbypass";
            var zscalerbypassexception = "$zscalerbypassexception";

            var bypass = globalbypass.split(";").concat(localbypass.split(";"));
            var cebypass = ceglobalbypass.split(";");
            var zscalerbypass = zpaglobalbypass.split(";");
            var zpaexception = zscalerbypassexception.split(";");

            if(isPlainHostName(host)) {
                        return "DIRECT";
            }

            for (var i = 0; i < zpaexception.length; ++i){
                        if (shExpMatch(host, zpaexception[i])) {
                                   return "PROXY $clientproxy";
                        }
            }

            for (var i = 0; i < zscalerbypass.length; ++i){
                        if (shExpMatch(host, zscalerbypass[i])) {
                                   return "DIRECT";
                        }
            }

            for (var i = 0; i < bypass.length; ++i){
                        if (shExpMatch(host, bypass[i])) {
                                   return "DIRECT";
                        }
            }

            for (var i = 0; i < cebypass.length; ++i) {
                        if (shExpMatch(host, cebypass[i])) {
                                   return "PROXY $ceproxy";
                        }
            }

            return "PROXY $clientproxy";
}
}

            set static::forwardingpactemplate {function FindProxyForURL(url, host)
{
            var forwardinglist = "$forwardinglist";
            var forwarding = forwardinglist.split(";");

            for (var i = 0; i < forwarding.length; ++i){
                        if (shExpMatch(host, forwarding[i])) {
                                   return "PROXY $clientproxy";
                        }
            }

            return "DIRECT";
}
}
}

# Now for the actual code (executed every time a user accesses the vserver)
when HTTP_REQUEST {
    # The request URI can of course be used to differentiate between multiple PAC files or to restrict access.
    # So can basically any other request attribute. Client IP, host, etc.
            if {[HTTP::uri] eq "/proxy.pac"} {

                        # Here we set variables with the exact same name as used in the template above.
                        # In our case the values come from a data group, but of course they could also be defined
                        # directly in this iRule. Using data groups makes the code a bit more compact and it
                        # limits the amount of times anyone needs to edit the iRule (potentially making a mistake)
                        # for simple changes like adding a host to the bypass list
                        # These variables are all set unconditionally. Of course it is possible to set them based
                        # on for example client IP (to give different bypass lists or proxy entries to different groups of users)
                        set globalbypass [ class lookup globalbypass ProxyBypassLists ]
                        set localbypass [ class lookup localbypassEU ProxyBypassLists ]
                        set ceglobalbypass [ class lookup ceglobalbypass ProxyBypassLists ]
                        set zpaglobalbypass [ class lookup zpaglobalbypass ProxyBypassLists ]
                        set zscalerbypassexception [ class lookup zscalerbypassexception ProxyBypassLists ]
                        set ceproxy [ class lookup ceproxyEU ProxyHosts ]

                        # Here's a bit of conditionals, setting the proxy variable based on which virtual server the
                        # iRule is currently executed from (makes sense only if the same iRule is attached to multiple
                        # vservers of course)
                        if {[virtual name] eq "/Common/proxy_pac_http_90_vserver"} {
                            set clientproxy [ class lookup formauthproxyEU ProxyHosts ]
                        } elseif {[virtual name] eq "/Common/testproxy_pac_http_81_vserver"} {
                            set clientproxy [ class lookup testproxyEU ProxyHosts]
                        } elseif {[virtual name] eq "/Common/proxy_pac_http_O365_vserver"} {
                            set clientproxy [ class lookup ceproxyEU ProxyHosts]
                        } else {
                            set clientproxy [ class lookup clientproxyEU ProxyHosts ]
                }

                        # Now this is the actual magic. As noted above we have now set TCL variables named for example
                        # $globalbypass and our template includes the string "$globalbypass"

                        # What we want to do next is substitute the variable name in the template with the variable values
                        # from the code.
                        # "subst" does exactly that. It performs one level of TCL execution. Think of "eval" in basically
                        # any language. It takes a string and executes it as code.
                        # Except for "subst" there are two in this context very useful parameters: -nocommands and -nobackslashes.
                        # Those prevent it from executing commands (like if there was a ping or rm or ssh or find or anything
                        # in the string being subst'd it wouldn't actually try to execute those commands) and from normalizing
                        # backslashes (we don't have any in our PAC file, but if we did, it would still work).
                        # So what is left that it DOES do? Substituting variables! Exactly what we want and nothing else.
                        # Now since the static variable is read only, we can't do this substitution on the template itself.
                        # And if we could it wouldn't be a good idea, because it is shared accross all sessions. So assuming
                        # there are multiple versions of the PAC file with different proxies or bypass lists, we would
                        # constantly overwrite them with each other.
                        # The solution is simply to save the output of the subst in a new local variable that exists in
                        # session context only.
                        # So from a memory point of view the static/global template doesn't really gain us anything.
                        # In the end we have the template in memory once per CMP and then a substituted copy of the template
                        # once per session. So as noted earlier, could've probably just removed the entire RULE_INIT block,
                        # set the template in session context (HTTP_REQUEST event) and get the same result,
                        # maybe even slightly more efficient.
                        set pacfile [subst -nocommands -nobackslashes $static::pacfiletemplate]

                        # All that's left to do is actually respond to the client. Simple stuff.
                        HTTP::respond 200 content $pacfile "Content-Type" "application/x-ns-proxy-autoconfig" "Cache-Control" "private,no-cache,no-store,max-age=0"
            # In this example we have two different PAC files with different templates on different URLs
            # Other iRules we use have more differentiation based on client IP. In theory we could have one big iRule
            # with all the PAC files in the world and it would still scale very well (just a few more if/else or switch cases)
            } elseif { [HTTP::uri] eq "/forwarding.pac" } {
                set clientproxy [ class lookup clientproxyEU ProxyHosts]
                set forwardinglist [ class lookup forwardinglist ProxyBypassLists ]
            set forwardingpac [subst -nocommands -nobackslashes $static::forwardingpactemplate]
            HTTP::respond 200 content $forwardingpac "Content-Type" "application/x-ns-proxy-autoconfig" "Cache-Control" "private,no-cache,no-store,max-age=0"
            } else {
                # If someone tries to access a different path, give them a 404 and the right URL
                HTTP::respond 404 content "Please try http://webproxy.drjohns.com/proxy.pac" "Content-Type" "text/plain" "Cache-Control" "private,no-cache,no-store,max-age=0"
            }
}

To be continued...

Categories
Consumer Tech

Consumer Tech: Netflix video and audio out of sync using Firestick

Intro

While watching Mad Men last night on IMDB we saw a terribly annoying audio delay – probably about three seconds after the video. Then it cuts to the commercials and they were perfectly in sync. Then back to the show – still a 2 – 3 -second delay. Is it the brand of TV? We use a Firestick plus a Samsung LCD TV.

The solution

Well, my first inclination was to look for audio settings on either the TV or the Amazon Firestick which might be set to delay audio. I had such a setting on an old sound system, though I think it was only for a sub-second delay. But, there are no such settings on either Firestick or Samsung TV, so that’s not it.

An Internet search proved not too useful.

What I eventually realized – the in-sync commercials was a hint – is that I could rewind the show a tiny bit, and that might pop it back into sync. And…it did!

How it happens

I think it happens when I pause a show one night, to return to it a later time. It remembers where I was, which is great. But it sometimes gets out of sync this way. I have seen this with Netflix and IMDB. The common element seems to be the Firestick.