Categories
Admin Linux

Getting GNU screen to work on Windows 10 for a productive terminal multiplex environment

Intro
My jump server is getting old and they’re threatening to cut it off. A jump server is a server from which you launch CLI terminal sessions into your linux servers. Since my laptop has firewall access to all the same servers I wondered if I could build up a productive environment right within Windows 10 on my own laptop. For me this would be running GNU screen as a terminal multiplexer since I hop between terminal screens all day.

More details
Windows 10 is coming around to more fully integrating with Linux! it’s about time. WSL, windows subsystem for Linux, is all about that. And things like bash shell, ubuntu and OpenUSE Linux are available from the windows store. But that was not an option for me. My organizaiton has shut all that down.

So I thought back to my days as a Cygwin user those many years ago… Could I get GNU screen running within Cygwin environment on Windows 10? Well, yes, I can with just a few tweaks.

I think the initial Cygwin install required admin privileges, but once installed to run it does not.

Within Cygwin screen is an optional package and you can run their setup program to search and install it.

Here is my .screenrc file

defscrollback 4000
#change init sequence to not switch width
termcapinfo  xterm Z0=\E[?3h:Z1=\E[?3l:is=\E[r\E[m\E[2J\E[H\E[?7h\E[?1;4;6l
 
# Make the output buffer large for (fast) xterms.
termcapinfo xterm* OL=10000
 
# tell screen that xterm can switch to dark background and has function
# keys.
termcapinfo xterm 'VR=\E[?5h:VN=\E[?5l'
termcapinfo xterm 'k1=\E[11~:k2=\E[12~:k3=\E[13~:k4=\E[14~'
termcapinfo xterm 'kh=\E[1~:kI=\E[2~:kD=\E[3~:kH=\E[4~:kP=\E[H:kN=\E[6~'
 
# special xterm hardstatus: use the window title.
termcapinfo xterm 'hs:ts=\E]2;:fs=\007:ds=\E]2;screen\007'
 
#terminfo xterm 'vb=\E[?5h$<200/>\E[?5l'
termcapinfo xterm 'vi=\E[?25l:ve=\E[34h\E[?25h:vs=\E[34l'
 
# emulate part of the 'K' charset
termcapinfo   xterm 'XC=K%,%\E(B,[\304,\\\\\326,]\334,{\344,|\366,}\374,~\337'
 
# xterm-52 tweaks:
# - uses background color for delete operations
termcapinfo xterm ut
 
#from https://stackoverflow.com/questions/359109/using-the-scrollwheel-in-gnu-screen
termcapinfo xterm* ti@:te@
 
escape ^\\ # changes espace sequence
password

Note that in my .screenrc I use <Ctrl-\> as my escape sequence, so, e.g., to pop to the previous screen it is <Ctrl-\> <Ctrl-\>. I’m not sure that’s standard but my fingers will remember that to my dying day. They probably still remember some of those EDT/TPU VAX editor commands to this day!

Compare and contrast
Here are my day 0 observations.

ssh, curl, nslookup and tracert are coming from the underlying Windows system (do a which curl to see that) so that means you get the dumb version your system has.

So there is no dig, and no nc or netcat.

touch, cat, mkdir and vi behave pretty normally. man pages are installed, which can be a help.

if you use proxy, a funny thing can happen and your environment variables can get mixed. You may have inherited an HTTP_PROXY environment variable form the system, but the alias you copied from a linux jump server probably defines an http_proxy environment variable (lower case). And both can co-exist! As to which one curl would then use, who knows? Better just stick to working with the upper-case one and NOT define another in lower case.

For awhile it looked like scrolling was not working at all when screen was running. Then i found that tip I reference at the bottom of my .screenrc file which makes scrolling work via the mouse’s scroll wheel, which isn’t too bad.

Old friends like ls, grep, echo and while (built-in bash command) are available however. dig can be installed from the bind-utils package.

A lot of other packages are optionally available, including a whole X-Windows environment, which I used to run in the past but hope to avoid this time around.

No crontabs however (to have cron daemon requires installing admin privileges) which kind of hurts.

Simple output redirection seems to work, as does job control, e.g.,

ping -t 8.8.8.8 > /dev/null 2>&1 & 

Not sure why you’d want to run the above command, but this nice example shows that the /dev/null device exists, and the ping command is inherited from your Windows environment hence the -t option to run it indefinitely, and that it will create a background process which you can view and control with jobs / kill.

Now I typically move my laptop off the work environment each night, so all my ssh logins will be lost, unlike the jump server situation. But our jump server isn’t that stable anyway so no big loss I’d say…

I am sooo used to highlighting text in Teraterm, which is my current environment, and that being sufficient to put that text into the clipboard, that I keep doing that in this environment. But it doesn’t work. I have to use the CMD window convention of highlighting the text and then hitting ENTER to get it into the clipboard. oops. That was because I had been launching Cygwin from a CMD window. Now I am launching from a proper Cygwin shortcut and simple text highlighting works, BUT, right-clicking to paste it in brings up a menu rather than just doing it! So there’s that different now…

A word on package management
I don’t know why I was afraid of installing packages when I first tried Cygwin over a decade ago. Now for me that’s the key – to understand and practice installing packages because it’s actually really easy when you’re used to it.

The key is to simply keep your initial install setup hanging around, setup-x86_64.exe. In my case it’s in my downloads directory. Example usage: I wondered if I could install a decent version of ping rather than continually suffer with the dumb DOS version. So, fire up the above-mentioned executable. Go through a few screens (where it remembers the answers from the initial install), then search for the package (Yes, it’s there!), and select to install the most recent version from the drop-down. A few more clicks and it’s done and available in your path. it’s that easy… Not sure about uninstalling because you almost never need to do that. It seems maybe a thousand packages are available? so no, there’s no yum or zypper or rpm or apt-get, but who really needs those anyway?

As a concrete example, I am learning about SNMP. So I got something running on a Bluecoat proxy, and I wanted to see what I could see. The guide recommended using snmpwalk, which of course I did not have. So I learned which package it is in with a DDG search, then ran the Cygwin setup, found that package, installed it, and voila, there was snmpwalk in my path. And it worked, by the way. Easy peasy.

Creating your own scripts

If you have the funny situation, like me, where you had enough privileges to install Cygwin, perhaps by temporarily assigning your account the Admin role, but when you use it day-to-day, you do not have admin privileges, you will find yourself unable to create files in some of the system directories like /usr/local/bin – permission denied! But in your home directory you will be able to edit files.

So what I did is to create a bin directory under my home directory, where I plan to add my home-grown scripts such as mimeencode, and make sure my PATH includes this directory with a statement like

 export PATH=$PATH:${HOME}/bin

which I put in my .alias file, which in turn I source from .bashrc.

Conclusion
GNU screen for Windows is indeed possible, but you gotta run it on top of Cygwin. It’s of interest that after all these years Cygwin is still viable o Windows 10. Cygwin can be run in a pretty lightweight fashion if you avoid the X-Windows stuff. There are some quirks but it is surprisingly linux-like at the end of the day. I believe it is really suitable as a replacement for a linux jump server. screen, for the uninitiated, is a temrinal multiplexer, which means it makes it very fast for you to switch between multiple terminal windows.

Some things are a bit different.

I think I will use this both at work and at home…

If you have access, a look at WSL and native bash might be worthwhile.

References and related

Here’s the GNU Cygwin home page: https://www.cygwin.com/

Install Cygwin by running https://www.cygwin.com/setup-x86_64.exe

Interesting discussion: https://stackoverflow.com/questions/359109/using-the-scrollwheel-in-gnu-screen

If you have a linux jump server that runs screen, or just want to ssh to a linux server, teraterm can be a good choice (as opposed to putty or built-in ssh). These days it can be found here: https://osdn.net/projects/ttssh2/releases/

Categories
Admin

Azure Cloud: can you swap public IP addresses on two VMs?

Intro
I am just beginning to use Microsoft’s Azure cloud environment. Although I am inclined to be a fan of AWS, I haven’t looked at AWS networking for awhile, and the last time I did something I felt totally lost in trying to understand their terminology.

But in spite of my natural inclination to support everything Amazon, I gotta admit that Azure was a good, usable environment for what I needed to do; swap the public IPs on two VMs.

The details
I was not getting any help whatsoever from with my organization. But I did at least get sufficient access to the Resource Group where my Redhat 7.4 VM was running. That was a godsend.

In Azure network interfaces are resources. They have IPs like 10.0.1.4, 10.0.1.7, etc.

Public IP addresses are resources. They have IPs apprporiate for your region. They are typically associated with a network interface.

A network interface in turn is associated to a VM, typically.

For some reason which no one could explain to me, I could no longer patch my RHEL 7.4 server. That began about September 2019. Meantime, I was using an application which relied on a built-in package. Now Redhat always ships with old versions of everything, so running this old version plus lack of patches really put pressure on me to upgrade to a new OS. Can you do an in-place upgrade? As far as I can tell, no. I went with SLES 15 SP1 on a new VM within the same resource group and data center since I have some familiarity with Suse Linux. That OS had a newer version of that open source package, plus it could be patched.

But the IP of the old server was embedded in several places and switching it was not an option. What to do? What to do? Can you even swap IPs on two VMs within the same Resource Group? Who knows?

Well, it turns out you can. The documentation on the topic is pretty good and cleared up some things for me. Particularly the first two links in the references at the bottom.

I took this approach.

Changed the IP from dynamic to Static (go to configure section when looking at this resource). This should have been done from the get-go, but wasn’t. Who knew?

Dissociate the IP from the network interface.

Changed the second IP from dynamic to static.

Dissociate this IP from its network interface.

Associate IP to network interface of the SLES 15 server.

Associate second IP to network interface of the RHEL server.

And that’s it…. It worked like a charm.

Then I cleaned up some old public IP addresses which weren’t being used. You have to remember there is a shortage of IPs. So a lot of the quirk you encounter are due to their utilizing Ips as sparingly as possible. makes sense to me. For instance you can have a “public IP” resource which has no value! it may not get a value until its absolutely needed by virtue of being associated with a network interface on an active server. Stuff like that…

Conclusion
Yes, you can indeed swap public IPs on two servers if they belong to the same Resource Group and I guess the same data center. I know because I did it. As a bonus I found that the Azure documentation is pretty clear and sufficiently detailed.

References and related
https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-ip-addresses-overview-arm
https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-network-interface-addresses

Categories
Admin Network Technologies

The IT Detective Agency: the case of the Is it the firewall? or routing? or switch? or layer 2?

Intro
This is yet another tale of things in the IT world often do not turn out the way it seems at first blush. Or possibly a tale of just when you think you’ve seen it all after decades in the industry, something new (to you) occurs.

What’s going on
The firewall team was all busy so when this strange problem occurred Friday they called in the second string: me. I consider some of the team to be less-than-customer focused so I try to compensate for them and for my lack of knowledge about the firewall by applying a more customer-first attitude. In other words, a sympathetic listening ear. These days it can be hard just to find someone to complain to about your It problem, and I am keenly aware of that.

There was some strange communication which wasn’t working, mediated by a firewall I had never accessed and was not sure i even had access to. So of course I was asked to join a big conference call where an ongoing debugging session was taking place.

I refused.

I hate being blindsided, and i hate not having answers, making me sound even less competent than I already am.

But what I did do is being my research to see what the system is, if perhaps I had access, etc.

Yes. I found that through a management system I have access to I had access to view the policies on that particular firewall and view the logs as well.

So once I had that up, I agreed to join the call.

They had one server communicating to three different systems. Only one of the three systems was being reached. Yes the other two were on the same subnet. Two of our firewalls were between the system and the three servers.

And, yes, i could see some drops. The interesting TCP error stated: TCP packet out of state, first packet isn’t SYN.

No problem. routing must be screwed up such that we have asymmetric routing. It happens all the time. Right? well these systems are really appliances with only some basic networking information configurable, not real debugging facility, and really no ability to add a host route.

I could not establish a shell session onto the firewall – not sure what the password naming scheme was that they used.

Then a real firewall guy comes on the call. But his connectivity is messed up, so I keep with the debug session, if nothing else than to support him since four eyes is more effective than just two. He shares the routing tables of our two in-line firewalls. It’s hard to understand as these are all new subnets for me, some are ones that don’t look right. But just focusing on possible host routes for any of these three servers, I don’t see anything amiss.

Firewall policy
And, in firewall policy I see the entire subnet has this traffic permitted. There is no rule specific to one or the other of these systems.

So what do we have up until now?
A purist firewall administrator attitude would be as follows:
The firewall treats all these systems the same, therefore this cannot be a firewall problem. Talk to your networking or system people. Have a nice day.

Well, in fact there was some serious question about the network switch as well. So we had a network guy on the call. So they dug up the MAC addresses of these systems, from which they found the switch ports. Then they checked the port configuration. Ah, some complex 802.1x authentication was configured. As I understand this means the device would not even be allowed onto the subnet until it passed some kind of Radius authentication. So they removed this 802.1x stuff and just made sure that port was assigned to the right vlan.

Still, the problem persisted.

I think the other firewall guy was also new to this equipment. Eventually, though, he tries to do a packet trace of the one that’s working versus the one that isn’t.

You know, I never saw the results of those traces, but I’m pretty sure, reading between the lines, that they surprised him, meaning, they did not fit the hypothesis of the asymmetric routing.

In these situations there is the main communication in the mian session, then side communications going on, like between me and the firewall guy. But it is all chaotic. Acoustics are mediocre, accents are hard to understand. So the net transfer of information is pretty low. Statements, even important ones, often have to be repeated multiple times (rebroadcasts) to assure everyone “gets it.”

Typical questions were asked. When did this last work? what had changed? There were a couple changes. Some kind of networking thing (I forget what), and then the firewalls changed management systems after that. The firewall change seemed closer in time to the last known success.

You acquire more and more information as you dig into problems. It’s hard to judge which is relevant at the time and which lines if inquiry are a complete waste of time. A good incident manager or project manager can sense which are the more productive lines of investigation and nurture those discussions while suppressing the noise.

Actually it was the networking guy who found the Checkpoint link below. I looked at it. the firewall guy was muttering something about badly behaved, older applications that might exhibit this behaviour.

So we agreed to take the suggested steps, which would basically allow these out of state packets. Drat. The firewall returned an error.

But I continue to refresh the firewall logs. The communication was occurring about every minute. Lo and behold, I see the older drops, and then accepts for the last few minutes! I think it worked. I tell them to check.

They check their end. Sure enough. Communication beginning to work…

The customer tries to make assertion that this was a firewall problem all along. Not so fast. Firewall guy says, well, the firewall is doing exactly what it’s supposed to be doing. who’s right?

We’re all good for now, but we state this is a kludge for today and a follow-up meeting needs to occur.

So what happened?
I think the single most important thing is that the firewall guy switched his problem hypothesis from Must be asymmetric routing, to Maybe it’s a badly behaved application. Meaning what? What if you have an application that establishes a TCP connection, and then to beat idle timeouts, sends a KEEP ALIVE packet every minute? Well, now, suppose your firewall is rebooted in the middle of that because it has changed management stations and needs to reload policy? What might the situation look like to it?

It you were unlucky, it just might see these KEEP ALIVE TCP packets without having the connection in its connection table, in other words, exactly the situation we are observing!

What should have happened?
It would have been great if the communication were forced to be re-established form time-to-time, even once a day. This problem had been going on for days.

But, given this very stupid behaviour on the part of this application, if the app people had been aware they should have forced their application to re-establish the TCP connection after the firewall reboot. Probably, for the one that did work, it had been forced to re-establish.

A firewall person has to be sufficiently aware to realize this could be happening, and advise the app owner on what to do to prevent it.

Conclusion
So whose problem is it?

To the app people it looks like a firewall issue, cut-and-dried. To a firewall guy it looks like an application issue, cut-and-dried. I see both sides. It is some of both. An app owner has to understand enough about firewalls to see that this type of thing can occur. Assigning blame to one side or the other, as most people are wont to do, is not productive. Only a team effort could have revealed this issue. And recall that the “fix” is actually a kludge that lowers security.

Case: almost closed.

References and related
Checkpoint’s note on TCP packet out of state first packet isn’t SYN: https://community.checkpoint.com/t5/General-Topics/TCP-packet-out-of-state-First-packet-isn-t-SYN-tcp-flags-SYN-ACK/td-p/37166

The IT Detective agency cases are still coming fast and furious. Here’s another recent case. Failed to convert character

Categories
Admin Network Technologies

The IT Detective Agency: WebEx and the case of the mysterious reset

Intro
A company known to me was a contented user of WebEx until they noticed a strange behaviour: their calls were losing quality or even dropped exactly after one hour. No one, most especially the vendor of WebEx, had the slightest idea of the root cause. Read on to see how this fascinating case is playing out.

Scene: company offices, Sao Paolo, Brazil
Triago, a very competent IT professional located in Sao Paolo was the first to report the problem. More-or-less it went like this:

– when he uses the call-my-computer feature in WebEx the call quality is fine, until he has been on the meeting for one hour. At the one hour mark the voice quality of others (from his perspective) dropped dramatically. Sometimes the call was completely lost. Then, about five minutes later, the quality was OK again.
– the problem only occurred when in the office or using VPN, i.e., when using the company network
– others in South America are having the same issue
– he can use the same company laptop, on the Internet, and will not have the problem

After the usual finger-pointing amongst various vendors a debugging plan was created.

It’s going really well. There have been about 12 test calls, stretched out over the last five months. You have to admire the chutzpah of US software vendors who sell to major customers and then still manage to treat them like crap come time for support.

The pattern, more-or-less, goes like this. test call with several vendors plus Triago and I in the US. Wait around for an hour, produce the problem. Wait for software vendor to “analyze”. Wait for two weeks. Some small insight may be gleaned by them. Conclusion: another test is needed, we didn’t have all the traces we need. rinse and repeat.

Scene: A soulless office park somewhere in northern New Jersey
To be continued, literally…

Scene: an enterprise-class server room somewhere in Research triangle Park, North Carolina
If one uses the company’s guest WiFi, one uses the company’s firewall, but not the company’s proxy server. This test succeeds. But, unfortunately, one also uses UDP rather than TCP for the communication because that is the default. See the references for communication requirements.

So one thought is to knock out the ability to use UDP by blocking UDP port 9000, thereby forcing use of TCP. Testing that today….

References and related
Networking requirements for WebEx: https://help.webex.com/en-us/WBX264/How-Do-I-Allow-Webex-Meetings-Traffic-on-My-Network

Categories
Admin

What I’m Working on Now: building my own 3D printer

Intro
Because someone else on the team had a good experience with it, and so i can exchange notes and get some tips form them, I went ahead and ordered the Anet A8 3D printer + an extra roll of PLA. That is, despite the vocal negative reviews which exist.

Although normally I don’t consider myself very mechanical, I guess I’m pretty good at following instructions. So I’m watching their detailed Youtube video and mimicking their actions, step-by-step. And I’m having a blast! It’s like having an erector set all over again.

And I’m in constant awe that so much could be had for so little. There must be, what, a couple hundred parts, including five stepper motors, several limit switches, a circuit board, lcd, power supply, steel rods, belts, arylic(?) parts, and 0.5 Kg of PLA – all for $140?? I feel like its raw value is several times that. Hey, I just bought two m5 screws and nuts from Home Depot and paid almost $3.

Not exactly like the video

Things were going great, until the kit actually deviated form the parts shown in the video! Especially where the kit provided a two 3D-printed parts shaped significantly differently from what they show. Assembly slowed down after that.

The rods did not fit through the 3D parts as they should have – I had to bore them out a bit with one of the yet-unused threaded rods that came with the kit! Which did work by the way + a lot of muscle and hand strength.

Partially assembled – extruder not yet inserted.

Many gray hairs later I finished the assembly. If you’re familiar with furniture assembly of a semi-complex piece like a desk with top shelves, this is about three times more complicated. You keep hoping that OK, now it’ll get easier and I’ll cruise through this. But it never does! Every stage features unique steps presenting their own unique challenges.

In my case my cooling fan on the extruder, which looks suspiciously like a 3D part, hangs below the extruder nozzle, so I don’t think I can use it.

Then once it was assembled my first test print, a simple box, went awry. Around the third layer the whole thing started moving around on the print bed! Once again I was hoping that the hard part was the assembly and then I could cruise to printing to my heart’s content. Wrong. Now comes a whole new set of challenges unique to 3D printing, and you have to master that as well. This is nothing at all like going to Staples to get an ink jet printer where the most challenging thing you’ll have to do is change an ink cartridge. Nothing at all like that. This is more like constructing a model railroad yourself.

Things get smoother

So I took everyone’s advice and installed Cura 3D as my slicer. Then I just decided to go for it and print out my first upgraded part: a center nozzle fan. It worked really, really well!

My very first print – a center nozzle fan

You can almost see form the picture that the quality was really good. I had to sandpaper the chute a little for it to fit – apparently ABS sandpapers easily – and voila, it snapped into place.

Next I printed a filament guide. Also no problems there.

Then my filament broke.

Some terms
slicer – software which takes an STL file an translates it into a series of layer-by-layer movements. Cura 3D is the slicer I use.
gcode – the file format that the anet a8 printer understands. You take an STL file, put it into your slicer, and have it produce a gcode file for you that you print.
PLA – the type of plastic most often used for 3D printing at home

References and related
Amazon link to what I purchased – now only $135! https://smile.amazon.com/gp/product/B01N5D2ZIB/ref=ppx_yo_dt_b_asin_title_o01_s00?ie=UTF8&psc=1

Assembly video, part 1
Part 2.
Anet test guide: part 3: https://www.youtube.com/watch?v=EB5Q3_sJ-Tk

Someone’s additional first steps guide – has some useful tips: https://www.instructables.com/id/Beginners-Guide-to-3D-Printing-Anet-A8-DIY-3D-Prin/

Note that the included microSD card has PDFs for assembly instructions, usage instructions and troubleshooting guide.

Upgrade part: center nozzle fan download: http://www.thingiverse.com/thing:1620630

I am contemplating using openscad to build my 3D models. www.openscad.org. This is really good tutorial: http://www.tridimake.com/2014/09/how-to-use-openscad-tricks-and-tips-to.html

Categories
Admin IT Operational Excellence Network Technologies

No Internet, secure WiFi status message in Windows 10

Intro
Finding out how Windows decides if there is an Internet connection or not can be a challenge often posed by trying to do an Internet search comprised or words that are common and therefore used in many other contexts. I have to give credit to someone else who found most of these pertinent links that help explain how Windows decides whether or not your PC has an Internet connection.

What they don’t tell you
I think there are a lot more tests microsoft does than what they’ve documented. In my opinion, based on observation, in addition to the sites they recommend to whitelist, also whitelist

www.msftconnecttest.com

Some PCs get stuck in a loop requesting www.msftconnecttest.com/connecttest.txt indefinitely, which isn’t good for anyone.

Here’s one they don’t mention, of the same ilk:

ipv6.msftconnecttest.com/connecttest.txt

I’m thinking to just leave that one alone, unless you really are fully running on ipv6.

Now if you have a PAC file, what you’re going to see are accesses for
<PAC-file-address>/connecttest.txt

I don’t think that one’s documented either. I’m not yet sure how best to have the PAC file web server respond, where best means the reply which would make the PC most likely to decide Yes I really do have an Internet connection.

References and related
This Pulse Secure article is pretty good. You start with an Internet connection, then launch Pulse Secure vpn, then find you are told there is no longer an Internet connection. This explains why it might be, but in my opinion it is incomplete as it does not even consider the case where an authenticating proxy is the sole gateway to the Internet:
https://kb.pulsesecure.net/articles/Pulse_Secure_Article/KB43805

These are two more articles about VPN tunneling
https://community.pulsesecure.net/t5/Pulse-Desktop-Clients/Pulse-Secure-blocks-Windows-10-apps-from-internet-access/td-p/11944
https://docs.pulsesecure.net/WebHelp/PCS/8.3R1/Home.htm#PCS/PCS_AdminGuide_8.3/About_VPN_Tunneling.htm

network Location Awareness (NLA) and Network Connection Status Indicator (NCSI) are explained in these articles
https://support.microsoft.com/en-us/help/4494446/an-internet-explorer-or-edge-window-opens-when-your-computer-connects
https://support.microsoft.com/en-us/help/2778122/using-authenticated-proxy-servers-together-with-windows-8

Categories
Admin Linux Raspberry Pi

Raspberry Pi Recovery Mode or interrupting the boot process

Intro
If you installed Raspbian from the NOOBS distribution as I do, then you may occasionally “blow up” your installation as I just have! You have an out, sort of, short of re-imaging the disk, though about with the same impact.

To interrupt the boot process and enter recovery mode, attach a USB keyboard and repeatedly hit the Shift key. You should come to the NOOBS OS install selection screen. Just re-install Rasbian again…

Symptoms
When I powered up, I got the initial multi-color screen. Then a two-line text message popped up – too quickly to be read, then a grayish screen, then it split into a lower and upper part, then both halves faded away and there it stayed… At that point it was not responsive to any keyboard inputs or mouse clicks.

Conclusion
While doing my advanced slide show and rotating display project i somehow managed to blow up my OS. finding the way to interrupt the boot-up was not so easy so I am amplifying the answer that worked for me on the Internet: repeatedly hit the Shift key during the boot, until you see the NOOBS image selector screen.

Categories
Admin Linux Network Technologies Raspberry Pi Security Web Site Technologies

How to test if a web site requires a client certificate

Intro
I can not find a link on the Internet for this, yet I think some admins would appreciate a relatively simple test to know is this a web site which requires a client certificate to work? The errors generated in a browser may be very generic in these situations. I see many ways to offer help, from a recipe to a tool to some pointers. I’m not yet sure how I want to proceed!

why would a site require a client CERT? Most likely as a form of client authentication.

Pointers for the DIY crowd
Badssl.com plus access to a linux command line – such as using a Raspberry Pi I so often write about – will do it for you guys.

The Client Certificate section of badssl.com has most of what you need. The page is getting big, look for this:

So as a big timesaver badssl.com has created a client certificate for you which you can use to test with. Download it as follows.

Go to your linux prompt and do something like this:
$ wget https://badssl.com/certs/badssl.com‐client.pem

badssl.com has a web page you can test with which only shows success if you access it using a client certificate, https://client.badssl.com/

to see how this works, try to access it the usual way, without supplying a client CERT:

$ curl ‐i ‐k https://client.badssl.com/

HTTP/1.1 400 Bad Request
Server: nginx/1.10.3 (Ubuntu)
Date: Thu, 20 Jun 2019 17:53:38 GMT
Content-Type: text/html
Content-Length: 262
Connection: close
 
<html>
<head><title>400 No required SSL certificate was sent</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<center>No required SSL certificate was sent</center>
<hr><center>nginx/1.10.3 (Ubuntu)</center>
</body>
</html>

Now try the same thing, this time using the client CERT you just downloaded:

$ curl ‐v ‐i ‐k ‐E ./badssl.com‐client.pem:badssl.com https://client.badssl.com/

* About to connect() to client.badssl.com port 443 (#0)
*   Trying 104.154.89.105... connected
* Connected to client.badssl.com (104.154.89.105) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* warning: ignoring value of ssl.verifyhost
* skipping SSL peer certificate verification
* NSS: client certificate from file
*       subject: CN=BadSSL Client Certificate,O=BadSSL,L=San Francisco,ST=California,C=US
*       start date: Nov 16 05:36:33 2017 GMT
*       expire date: Nov 16 05:36:33 2019 GMT
*       common name: BadSSL Client Certificate
*       issuer: CN=BadSSL Client Root Certificate Authority,O=BadSSL,L=San Francisco,ST=California,C=US
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*       subject: CN=*.badssl.com,O=Lucas Garron,L=Walnut Creek,ST=California,C=US
*       start date: Mar 18 00:00:00 2017 GMT
*       expire date: Mar 25 12:00:00 2020 GMT
*       common name: *.badssl.com
*       issuer: CN=DigiCert SHA2 Secure Server CA,O=DigiCert Inc,C=US
> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.27.1 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: client.badssl.com
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Server: nginx/1.10.3 (Ubuntu)
Server: nginx/1.10.3 (Ubuntu)
< Date: Thu, 20 Jun 2019 17:59:08 GMT
Date: Thu, 20 Jun 2019 17:59:08 GMT
< Content-Type: text/html
Content-Type: text/html
< Content-Length: 662
Content-Length: 662
< Last-Modified: Wed, 12 Jun 2019 15:43:39 GMT
Last-Modified: Wed, 12 Jun 2019 15:43:39 GMT
< Connection: keep-alive
Connection: keep-alive
< ETag: "5d011dab-296"
ETag: "5d011dab-296"
< Cache-Control: no-store
Cache-Control: no-store
< Accept-Ranges: bytes
Accept-Ranges: bytes
 
<
<!DOCTYPE html>
<html>
<head>
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="shortcut icon" href="/icons/favicon-green.ico"/>
  <link rel="apple-touch-icon" href="/icons/icon-green.png"/>
  <title>client.badssl.com</title>
  <link rel="stylesheet" href="/style.css">
  <style>body { background: green; }</style>
</head>
<body>
<div id="content">
  <h1 style="font-size: 12vw;">
    client.<br>badssl.com
  </h1>
</div>
 
<div id="footer">
  This site requires a <a href="https://en.wikipedia.org/wiki/Transport_Layer_Security#Client-authenticated_TLS_handshake">client-authenticated</a> TLS handshake.
</div>
 
</body>
</html>
* Connection #0 to host client.badssl.com left intact
* Closing connection #0

No more 400 error status – that looks like success to me. Note that we had to provide the password for our client CERT, which they kindly provided as badssl.com

Here’s an example of a real site which requires client CERTs:

$ curl ‐v ‐i ‐k ‐E ./badssl.com‐client.pem:badssl.com https://jp.nissan.biz/

* About to connect() to jp.nissan.biz port 443 (#0)
*   Trying 150.63.252.1... connected
* Connected to jp.nissan.biz (150.63.252.1) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* warning: ignoring value of ssl.verifyhost
* skipping SSL peer certificate verification
* NSS: client certificate from file
*       subject: CN=BadSSL Client Certificate,O=BadSSL,L=San Francisco,ST=California,C=US
*       start date: Nov 16 05:36:33 2017 GMT
*       expire date: Nov 16 05:36:33 2019 GMT
*       common name: BadSSL Client Certificate
*       issuer: CN=BadSSL Client Root Certificate Authority,O=BadSSL,L=San Francisco,ST=California,C=US
* NSS error -12227
* Closing connection #0
* SSL connect error
curl: (35) SSL connect error

OK, so you get an error, but that’s to be expected because our certificate is not one it will accept.

The point is that if you don’t send it a certificate at all, you get a different error:

$ curl ‐v ‐i ‐k https://jp.nissan.biz/

* About to connect() to jp.nissan.biz port 443 (#0)
*   Trying 150.63.252.1... connected
* Connected to jp.nissan.biz (150.63.252.1) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* warning: ignoring value of ssl.verifyhost
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* NSS error -12227
* Closing connection #0
curl: (35) NSS: client certificate not found (nickname not specified)

See that client certificate not found? That is the error we eliminated by supplying a client certificate, albeit one which it will not accept.

what if we have a client certificate but we use the wrong password? Here’s an example of that:

$ curl ‐v ‐i ‐k ‐E ./badssl.com‐client.pem:badpassword https://client.badssl.com/

* About to connect() to client.badssl.com port 443 (#0)
*   Trying 104.154.89.105... connected
* Connected to client.badssl.com (104.154.89.105) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* warning: ignoring value of ssl.verifyhost
* Unable to load client key -8025.
* NSS error -8025
* Closing connection #0
curl: (58) Unable to load client key -8025.

Chrome gives a fairly intelligible error

Possibly to be continued…

Conclusion
We have given a recipe for testing form a linux command line if a web site requires a client certificate or not. thus it could be turned into a program

References and related
My article about ciphers has been popular.

I’ve also used badssl.com for other related tests.

Can you use openssl directly? You’d hope so, but I haven’t had time to explore it… Here are my all-time favorite openssl commands.

https://badssl.com/ – lots of cool tests here. The creators have been really thorough.

Categories
Admin

F5 clustering tips

Intro
I was having trouble with one of my new F5 clusters. I thought I’d be clever and add two new F5’s to the existing cluster, and then remove the old members when everything looked good.

But the device groups somehow remembered the old servers, even though they weren’t in the device groups. Eventually I could not even sync the two new servers together.

Here’s a concise summary of the steps which worked, taken from this article:

Hi Manuel, do you try to setup a sync-failover device-group containing three units? To establish device trust I would recommend to force two units into offline state. Remove the machines from the existing sync-failover device-group (repeat on each machine if required) and delete the sync-failover device-group. Now reset the device trust on all machines. Next step will be to use the active machine to add both offline machines as peers. Now all three units should show up in each machine´s device list. On the active unit create a new device-group of type sync-failover with network failover enabled. Add all machines to the new device-group. This will be synced to all machines and you can release them from forced offline. Time for the initial sync now. Thanks, Stephan

I get a little confused about sync-only vs sync-failover. That difference is clearly discussed there as well:

“Sync-only” will not allow to synchronize LTM configurations.
To synchronize an LTM config all units need to belong to the same “sync-failover” device-group.

I had afraid and ignorant of the need to reset device trust. For good measure I deleted all device groups. Then this procedure seems to leave me with an auto-created datasync-global-dg (sync-only). Since there was no sync-failover group I manually created one I called sync-LTM-configs. Another auto-created, sync-only group is the device_trust_group. So I see three device groups in all when I am on the sync page, but only two when I am on the device group page.

Conclusion
I had afraid and ignorant of the need to reset device trust in cleaning up an F5 cluster with old and new members. But once I did that and followed the recipe above, I was good to go.

References and related
The discussion on the F5 Devcentral site: https://devcentral.f5.com/questions/f5-sync-options

Categories
Admin Network Technologies

Postfix Operational tips

Intro
I’m trying out the system-supplied postfix on a SLES system. i had been using sendmail but there doesn’t seem to be any development on that software.

Some commands I needed right away
Well, right away I had thousands of queued messages so I needed a way to make sense of what was happening.

For these commands to make sense you need to know that I am running a second postfix configuraiton out of /etc/postfixEXT.

Display the queue

postqueue -c /etc/postfixEXT -p

Force delivery from the queue

postqueue -c /etc/postfixEXT -f

List one email in detail

postcat -vq -c /etc/postfixEXT QUEUEID

Delete one email

postsuper -c /etc/postfixEXT -d QUEUEID

Put mail on hold

postsuper -c /etc/postfixEXT -h ALL|QUEUEID

Release mail form hold

postsuper -c /etc/postfixEXT -H ALL|QUEUEID

How to force delivery of a single message
This command is not documented anywhere – because it doesn’t exist so you have to get creative. If you have the luxury of halting all email for a few seconds simply do this:

Display the queue to find the queue ID of the email you want to force delivery of

postqueue -c /etc/postfixEXT -p

Put all mail on hold

postsuper -c /etc/postfixEXT -h ALL

Now release the hold on that one email

postsuper -c /etc/postfixEXT -H QUEUEID

QUEUEID is, of course, the queue id .e.g., F2A1A27891E, of the email in question.

Look for what happened
Check your mail log’s last lines in /var/log/mail

Revert back to normal running

postsuper -c /etc/postfixEXT -H ALL

Since mail is store-and-forward and not real time, you can do these steps, quickly, even on a production system and no one will be the wiser if you are pretty quick. Probably takes two minutes even for a slow typer.

How to run multiple listeners
I didn’t want to disturb the system-installed postfix too much. I would let it “have” the loopback address, 127.0.0.1, leaving me the other IPs for my relay config to listen on. I added these lines to /etc/postfix/main.cf

multi_instance_enable = yes
multi_instance_directories = /etc/postfixEXT

service postfix start starts up the local postfix plus my relay. Grep the process table for either master or postfix to see. However, to be honest, service postfix stop does not kill all processes. So I always end up killing one of the master processes by hand. Update: postmulti -p stop does the trick to kill all. There is also a status or start option instead of stop.

Sendmail to Postfix migration tips
This could be a separate post but I am too lazy to do that.

What happens to the access file? I kept the name of the file access but just list all the IPs, one per line, without any further arguments, to permit just those IPs relay access. In my main.cf I have a line like this to tie it together:

mynetworks = /etc/postfixEXT/access

Note that there is no hashed or .db version of this file any longer, unlike in the sendmail case.

References and related
Since I mentioned sendmail I have to give a shout out to one of my old sendmail posts.

More info on postfix multiple instances. A pretty complete giude.