Categories
3D Printing

Trying to develop a 3d object with the help of chatgpt

Intro

I develop simple 3d objects using an approach which could be called 3d-object-as-code. The language is Openscad, and my few objects are all documented here: 3d printing some parts for the house. But that was started long before generative AI took off. So it was incumbent on me to explore what assistance I could get from use of chatgpt.

The details

I wanted an object which would push against our utensil holder to pin it in place within the kitchen drawer, and at the same time make that space available for additional kitchen gadget junk. The final picture is further down below. I started having chatgpt generate the base based on my desired dimensions. So far so good. Then I had it re-do it using less material by making the base a lattice. Things already begin to fall apart. It dropped two of the ends though it did create a lattice pattern.

Now mind you no serious developer would proceed as I have done. I get the code from chatgpt, and paste it into the openscad app to render it to see what it created. Very indirect and inefficient! But this is just for me, so why not…

So anyway, I add the missing side by hand.

A time saver?

Yes! At this point chatgpt has gotten me started. on my own, I get psyched out by decisions as to whether or not to created centered versions of cubes, etc, and my issue when doing it by hand is getting lost in translation, literally. I often find myself three translations deep, and it gets to be overwhelming to think it through. chatgpt unilaterally decided the initial cube would not be translated, so I went with that simpler approach, and that helped.

I think I did manage to get chatgpt to add the legs as well, so that was a help.

chatgpt was good at creating modular and therefore reuseable code which even had nice comments! It’s like looking at your colleague’s code who writes better code than you and picking up a few pointers.

But not after awhile

I realized I needed to stabilize those legs. But what words to use in the prompt? I’m not an engineer. I said something to this effect

Starting from this openscad code, add a small cross arm having the same thickness as the leg in order to anchor the leg more firmly (openscad code...)

The result was laughable as it produced a horizontal piece attached to the bottom of the leg on the one end and attached to nothing at all on the other!

Second attempt:

Starting from this openscad code, add a small brace having the same thickness as the leg in order to anchor the leg more firmly. It should begin at (0,0,20) and end at (20,0,15) (openscad code...)

Still, it ignored these very direct start and end directives. I tried once more with no better results.

I also tried an approach requesting to add a bracing triangle of material to help stabilize the leg, but it laughably added an extruded triangle along the whole length of the leg!

At this point clearly the ai was not acting like an assistant, but a text language generator. It had clearly zero idea what it was doing.

So at that point, it was time negative exercise, useful only for this blog post and to make me humbly admit I do not know how to get the most out of chatgpt.

Finally

I had to add those bracing bits by hand-coding that part. That involved a rotation, a translation and a difference. It could have been worse.

The final product
The code

// DrJ 1/2025. Parameters in mm
// Dimensions of the box
width = 110;
length = 150;
thickness = 3;
epsilon = 1;
brace_angle = 30;
brace_z = 20;
spacing = 53; // Spacing between the lines in the criss-cross pattern
line_thickness = 4; // Thickness of the lines in the pattern
leg_height = 53; // Height of the legs
leg_width = 6; // Width of the legs
leg_length = 10; // Length of the legs
//width = width – line_thickness; // correction
side_height = thickness; // Height of the side leg
side_width = width; // Width of the side leg
side_length = 10; // Length of the side leg
module leg() {
// Create a leg
cube([leg_width, leg_length, leg_height]);
}
module shave_cube(){
translate([0,-epsilon,thickness]){cube([side_width,side_length+2*epsilon,leg_height]);};
}
module leg_brace() {
// Create a leg brace
difference(){
translate([0,0,-brace_z]){rotate(a=[0,brace_angle,0]){cube([leg_width, leg_length, leg_height]);}};
shave_cube();
}
}
module side_leg() {
// Create a leg
cube([side_width, side_length, side_height]);
}
module criss_cross_pattern() {
for (i = [0 : spacing : length]) {
// Horizontal lines
translate([0, i, 0]) {
cube([width, line_thickness, thickness]);
}
}
for (i = [0 : spacing : width]) {
// Vertical lines
translate([i, 0, 0]) {
cube([line_thickness, length, thickness]);
}
}
}

// Create the criss-cross box
criss_cross_pattern();
// Add legs at two corners
translate([0, 0, -(leg_height-thickness)]) {
leg(); // Leg at the bottom-left corner
}
translate([0, length – leg_length, -(leg_height-thickness)]) {
leg(); // Leg at the bottom-left corner
}
// add stronger sides
translate([0, 0, 0]) {
side_leg(); // Leg at the bottom-left corner
}
translate([0, length – side_length, 0]) {
side_leg(); // Leg at the bottom-left corner
}
leg_brace();
translate([0,length – side_length,0]){
leg_brace();}
//shave_cube();

Conclusion

I got not-so-great results in my attempt to use the chatgpt o4 generative ai offered by Duckduckgo. The basic stuff, yes, it got me started and taught me how to make good modular openscad code. Anything remotely complex, forget about it. You want to treat ai like an assistant, right, but this assistant has near zero understanding of what I want and did not learn even after multiple attempts within the same chat session. It should be put out to pasture…

However, I am always willing to take the fall. I was just going by the seat of my pants with regards to prompt engineering. Maybe if I had chosen better prompts, or let ai have freer reign to do the whole design I would have experienced better results. But shouldn’t my “assistant” be better at understanding me?

References and related

All about Openscad: https://openscad.org/

The original post: 3d printing some parts for the house.

You can use chatgpt directly: chat.com or the way I used it, from the menu of https://duckduckgo.com/

The Etsy shop of the person who printed this for me for only $5: https://www.etsy.com/shop/gizmoswidgets

Categories
Admin

An IT Nightmare

Intro

This is an actual dream, or more like a nightamre, that I had recently. I guess it’s illustrative of an IT person’s worst fears.

The dream

Well I don’t remember dreams very well and I’m not the kind ot embellish stories to make them sound more interesting so this is going to be brief.

So in this dream I am at the office. My work situation seems to be that I have slightly more access than I need to various systems. The workpllace is a lagre corporate office where processes are followed but still individual contributors want to make a difference so there is some self-imposed pressure to do something of value.

So anyway I find myself in this dream needing to make a configuration change to a monitoring system. Something like a Zabbix implementation. Now I know that I am not chiefly responsible for it, and in fact I should not be modifying it, but I have some idea that what I plan to do will make it better in some way. It’s in fairly widespread usage – about 150 users.

While doing this improvement it asks me for a new administrator password. But it wasn’t exactly that. It changed the administrator password, displayed the new one, and suggested I copy it and save it, which I did. This dialog only displayed for about 10 seconds and that screen went away.

Actually I hadn’t quite had time to save that new admin password, I just had it in my clipboard. I went to notepad++ to paste it and preserve it. There was nothing in my clipboard! That deep, sinking feeling set in. Even if it continues to work, we won’t be able to do patches so it is as good as killed, it will just take a little longer.

I guess we’ve all been there, right?

Inspired by

IRL I was purchasing tickets for Shen Yun. I selected seats and was at the part where you enter credit card info. My Edge browser proposed to enter a generated credit card number, which I generally approve of as an anti-fraud measure, so I let it fill in the info. It needed biometric authentication – face. The next thing I knew the whole browser screen vanished. Edge running on a completely patched Windows 11 PC simply crashed without a word. I think it bears mentioning because unlike the bad old days where crashes were customary, these days it’s not such a quotidien occurence. And when I restarted things, there was no memory of my seat selection but they were blocked from being purchaseable. Kind of a worst case scenario there. I didn’t wish to wait for the 20 minute purchase timer to timeout as few seats were available. Fortunately there were comparable seats in another row. Second time through it did not propose to fill in with a random card and things went through.

Categories
Cloud

Azure DevOps pipeline run limits

Intro

We wished to run a pipeline every five minutes, but when you do the math, this will result in its running more than 1000 times per week, which according to the documentation is forbidden. On the other hand, we are using private agents – our own – so why should Micrsoft put limits on how often we run jobs on them??

The details

Given that there are 10080 minutes in a week, to arrive at fewer than 1000 pipelkine runs per week you’d need to pace out your jobs at no more than run once every 12 minutes. And that’s what I had been doing. So then I would create a second pipelikne running the same code, but running it an the inbetween times to end up with a logical job which runs every six minutes. But is this approach really required for our private agents?

We decided to put this to a real test. I created a Hello World yaml file and ran it every minute. The results are not at all what we expected!

The results

Essentially, the job runs 10 times out of every 15 minutes. This is another published limit. And you see this effect right away. So this is like some kind of burst rate limiting you might say, and it applies.

And during those times when it’s not being run, you don’t see it paused or anything. It simply isn’t run. But you can run it by hand (I think) and it will run.

So then you think, OK, limits apply, even to private agent pools. Then we left it running, and something funny happened.

After about 640 runs in the course of 24 hours, it simply stopped. Then about three days later it started up again, ran about 637 times, then stopped again.

So there seems to be an additional unpublished limit of something like 640 runs in a 72 hour interval.

But, we were able to exceed 1000 runs in a week, for what it’s worth.

Then I let the job run awhile. It seemed to want to run 635 times on Mondays, then stop the whole rest of the week. IDK…

Alternatives

I guess we were not using pipelines for what it was intended. It’s not really to be considered cron on steroids. We’ll be looking at Azure Functions to see if it’s a better fit for our requirements.

Conclusion

Treating pipelines like “cron on steroids” is not what it was designed for. Even when you use your own agents in your pool, your Azure Pipeline job will be rate limited to about 10 runs per 15 minutes, about 640 runs per three day interval (unpublished limit), though you can exceed 1000 runs per week. These limits prevent you from executing a run every five minutes! If you need to execute a job so often, consider finding a different approach!

References and related

How a pipeline can modify its own repository.

Categories
Consumer Tech

The IT Detective Agency: The case of the iPhone mystery alarm

Intro

My wife asked my assistance to find the source of the daily alarm which was nagging her at 6:20 AM every morning. I don’t use an iPhone so I was pretty clueless myself.

The details

Of course she had done the obvious things like look at the clock for set alarms. And at installed apps for alarms. Nothing.

Yet every day – unless the iPhone was turned completely off – this alarm would go off at 6:20 AM. And her Apple iWatch, or whatever it’s called, also had some message about this alarm.

We searched all installed apps for “alarm” and “clock” but there was nothing left to look at. Maybe one of her health apps? Nope. doesn’t seem to be. Maybe the Army Knife app with all its little useful gadgets? Nope, no alarm clock there.

The breakthrough

Then I got an idea. Since the wake-up screen mentioned domething about sleep, I decdied to search the phone for sleep. And voila, there is a sleep app, or at least sleep settings. And it was set to end her sleep at 6:20 AM.

So you see the misdirection at work? We kept thinking in terms of clock and alarm. But Apple just thinks of it as sleep and calls it as such.

Case: closed

Conclusion

Two people were frustrated for days trying to find the source of an iPhone alarm, which eventually was found. Beware that there is a sleep app. We followed the leads on the Internet about turning off certain notifications, which led nowhere.

Categories
Network Technologies

The IT detective agency: the case of the failed dhcp request

Intro

In full disclosure this case was not one I contributed to in any way, unlike all the others I’ve reported on. Nevertheless, source who did work on this case told me sufficient details and it is an interesting case.

The setup

For this case to make any sense, you need to understand the background. If I got it right, some people were trying to restore a backup version of Windows 11 Professional. When they did this restore, they found the problem that they were not pciking up an IP address via dhcp if they were on a company network. If they did the restore while on a home office network it went OK.

So imagine the comlpexity in a modern IT environment this presents. You have the PV vendor, HP, the OS vendor, Microsoft, the dhcp service operator, in-house, the LAN service provider and the network gear vendor, Cisco. The fault could lie anywhere. They all initially claim their stuff is working fine (which is always the default statement) and look elsewhere.

So what I like to say is that any hypothesis is unlikely, yet one of them will prove to be correct, eventually.

More details

Packet traces showed the DHCP Discover request being sent by the PC, but not arriving to the DHCP server. Ah, you say, simple: the switch is guilty here of dropping the DHCP Discover packet, fix it. After all, “eating” dhcp packets is something misconfigured switches do all the time if dhcp snooping is misconfigured.

Yet the LAN service provider says the switch isn’t misconfigured. So they have to open a case with the switch vendor to understand the drop. I’m not sure where that support case went, meanwhile…

The in-house expert troubleshooters were able to take a second trace from a PC which did pick up an IP address after a restore. This restore feature of course used to work when it was initially released.

Categories
Linux Python

How to prevent an image from being rotated

Intro

I still use my home-grown slideshow software based on Raspberry Pi, which is quite a testament to its robustness as it has been running with only minor modifications for many years now. one recent improvement has been my addition of being able to handle photos from recent iPhones which save photos in the new-to-me HEIC format. My original implementation only handles JPEGs and PNG file types, so it was skipping all our recent iPhone photos.

I figured there just had to be a converter our there which would even work on the RPi, which of course there was, heif-convert. But it has an oddity when it comes to rotation. It converts the HEIC to a jpeg, fine, but it rotates them, but it also leaves all the EXIF meta data, including the orientation meta data, as is. This in turn means display software such as fbi may try to rotate the picture a second time. Or at least that’s what happened to my software where one of my steps is an explicit rotate. That step was creating a double rotation.

So I needed a tiny program which left all the EXIF meta data alone except the rotation, which it sets to 0, i.e., do not rotate. Seeing nothing out there, I developed my own.

The details

Here is that script, which I call 0orientation.py:

#!/usr/bin/python3
# DrJ 10/24
# you need the exif package: pip install exif
# https://github.com/kennethleungty/Image-Metadata-Exif?tab=readme-ov-file
#https://exif.readthedocs.io/en/latest/
import sys
from exif import Image
file = sys.argv[1]
with open(file, 'rb') as image_file:
    my_image = Image(image_file)
exif_flag = my_image.has_exif
print(exif_flag)
exifs = my_image.list_all()
print(exifs)
my_image.orientation = 0
with open('modified_image.jpg', 'wb') as new_image_file:
    new_image_file.write(my_image.get_file())

Conclusion

I have shared how to overwrite just the orientation tag frmo a JPEG photo.

Reference and related

My photo frame software based on photos stored in a Google Drive

Categories
Home Computing Uncategorized Web Site Technologies

How to use your phone as an ersatz workstation with equipment lying around your house

Intro

While my laptop was being shipped to me I wanted to be as productive as possible using my Samsung Galaxy A35. I was vaguely aware of the availability of Microsoft 365 apps such as Outlook. How far could I take this…?

The recipe

To cut to the chase, I was maybe 60 – 70 % effective. I used equipment found in the typical IT person’s home plus one inexpensive purchase from Walmart.

Here is what I used:

  • HDMI monitor
  • old Amazon firestick
  • cheap bluetooth keyboard purchased from Walmart
  • phone stand

And here’s what I really wished I had but did not:

  • bluetooth mouse

Which apps worked well:

  • Outlook
  • Teams, especiallt chat, less so the meetings function
  • One Note
  • Edge
  • VPN client

I must say the bluetooth keyboard worked really well for doing some serious typing up of emails.

How the external monitor worked

So I “came up” (in quotes because I’m sure many others figured out this same thin) with the idea of casting my phone screen onto an external monitor by way of the screen mirroring capacibility available on even the oldest amazon Firestick. On the phone you simply go to Smart View Mirror Screen.

So that prevented me from having to hold the phone at least while I was drafting emails.

But, and it’s a big one, is that the external monitor was not a TV and the sound from meetings was killed by this setup. And I did not see a way to keep audio local to the phone while only casting the screen.

A smaller problem is that the refresh lag is quite noticeable under conditions of rapid screen refresh. So it may take a second or two to show what the phone’s screen shows.

Still, it’s pretty cool.

I would have bought a bluetooth mouse but it simply wasn’t available at my local Walmart. I was pretty inconvenienced without it having to constantly touch the phone screen for various things.

And the external keyboard

Pretty well. Even some shortcuts worked. Alt-TAB, which I use a lot to switch between apps has some kind of vaguely similar effect on the phone, but not to the point where I could rely on it usefully. The unlock shortcut button sort of woke the up the phone screen at least.

TAB helped me to pop from one field in the form to the next the way I would use it on a PC.

Overall responseiveness was satisfactory.

The small form factor was not a detriment, and maybe even an advantage since it’s so light and portable.

What if you have an HP G5 docking station lying around?

Well I do. It has a USB-C cord which you normally plug into your HP laptop. But I didn’t have the power supply for it so I couldn’t use it when I would have needed it. Well, it basically works with a Samsung phone – at least the keyboard and mouse worked. In my 10 second testing the attached HDMI display did not automatically show anything. Maybe there are some phone settings which would need to be changed. I didn’t mess with it at all.

But it’s cool seeing a mouse working. It suddenly paints a mouse pointer on your phone screen which you can move around and click to launch an app.

Apps are often baby implmentations

At first I struggeled with the Outlook app, trying to use it as though it were my full-blown Outlook client on my PC. It only had one week’s worth of messages, which was pretty limiting since I was out for more than a week. Then I had a lightbulb moment and remembered that the Web version of Outlook worked on my phone. So I switched to using Outlook through the Edge browser – much better for me. That’s https://outlook.office.com/ . I could get full history and therefore do more reliable searching through messages.

Responsive Design work-around

Sometimes the mobile app version of a web site just doesn’t have the featuires, but looks nice. Edge has a feature you can choose called View Desktop Site which gives you the “real” web site. Now it may look tiny, forcing you to expand and shrink with two fingers. But at least it will generally work.

Where is Notepad or Notepad++

I didn’t look for an app. I suppose there is one. Somtimes you just want to inspect your clipboard. I settled on pasting into a new draft Outlook email to do my visual inspection of my clipboard.

References and related

I prepared the above solution with one day’s notice. If you had a couple days you might check out the Samsung Dex. I guess it would work for modern Samsung Galaxy phones though I haven’t tried it myself.

The web version of business Outlook, which is a pretty good implementtion of the full-blown client is https://outlook.office.com/

Categories
Admin Linux Network Technologies

The IT detective agency: The mystery of the non-validating DKIM record

Intro

A colleague of mine in another timezone created the necessary DKIM records in Cloudflare for a new mail domain. There was panic as the mail team realized too late these records were not validating. I was called in to help. Unfortunately at the beginning I only my smartphone to work with. Did you ever try to do this kind of detail work with a smartphone? Don’t.

The details

The smartphone thing is worthy of a separate post. I was getting somewhere, but it is like working with both hands tied behind yuor back.

So the mail team is telling me the dkim record doesn’t validate and showing me a screenshot of something from mxtoolbox to prove it.

I of course want to know the details so I can verify my mistakes before anyone else gets to – that’s how I roll!

Well, mxtoolbox, has a free validator for these dkim records which is pretty useful. Go to Supertool, then click the dropdown and select DKIM. A DKIM record involves a domain and a selector. Here’s a real live example for Hurricane Electric which uses he.net as their sending mail domain. So in their DNS the DKIM txt record for them looks like this when viewed from dig:

"v=DKIM1; k=rsa; p=MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAonNI5HmoWfZntOsU5G3t eKi70HHBhDMe7himvGBNfq119soydCj7KoR9DsFYAMqCcPghLY29ishIbzMKsCFy 68XN4MWOSrFr+ERDHIuLXcFvaYYQ0oI5HVcViKSX85/YLXe+5JUcf5VsKoBLifNy U1NFA3UPa6MHBIOcD+JVF6F67G9m7t+COhsrhcvl9x" "kNq2NAY0OxbBM+CM+V4p0J 6pgt0PqYGnwd9s3/P7TUD2jY9elJLB5CfIec4DDCROj3MgUyTl2JfBcNy0WGzkEl OpFipd5MMesZvgyIVBsgLY58hTPldYhekkKWlOhpMpYbAi8gnvk+aJv2jZcaYHpJ kLNrri+q2gMeEX30JSoXfYNKx+B6m1Udn7Ig2ngHNVTXgNZlCw6SvbfmwXBE97q5 iG1SOnrgLKQvtgZv08Y7k5sp9+2SfoOS5MSYt" "OTfCbtknUi/VbaU4kVE76jFB0xx 6CAoR1SC9lDJBGvyFMuGvyhOXTiYV44tk1fyrV9Ba4yaKi8dhgHwe9vVbCSK8Ebt CeMXrkS/I3Dc33B6+tM1poC06GVhxElpd8rHiWvNImBuqCWwtGDsXm4ulubTcjvS gglJrB7kl4l3+AcTZn15zCrePl6xHWtL29b9vEy1w7whgExoDHaXZl+Svne9pfZ7 esXNu+mfERmGb56OreCEQQMCAwEAAQ=="

This is the value for this record: henet-20240223-153551._domainkey.he.net

To validate this DKIM record in mxtoolbox we pull out the token in front of _domainkey and refer to it as the selector, and drop the _domainkey and enter it like this:

The problem with the DKIM entry I was assigned to rescue was that the DIM syntax check was not passing. Yet it looked just like the way the mail team requested. What is going on? How can this problem be broken down into smaller steps???

To be continued…

Appendix A
How did I know the exact selector for Hurricane Electric?

I looked at the SMTP headers of an email I received from them. I found this section:

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=he.net;
	s=henet-20240223-153551

d must stand for domain and s for selector. This is all considered public information, albeit somewhat obscure. So the domain is he.net and the selector is henet-20240223-153551.

Categories
DNS Linux TCP/IP

The IT Detective Agency: the case of the slow dns server responses to tcp

Intro

This case was solved today. Now I just need to find the time to write it up!

I belong to a team which runs many dozens of dns servers. We have basic but thorough monitoring of these servers using both Zabbix and Thousandeyes. One day I noticed a lot of timeout alerts so I began to look into it. One mystery just led to another without coming any closer to a true root cause. There were many dead ends in the hunt. Finally our vendor came through and discovered something…

The details

The upshot are these settings we arrived at for an ISC BIND server:

   tcp-listen-queue 200;
   tcp-clients 600;
   tcp-idle-timeout 10;

This is in the options section of the named.conf file. That’s it! This is on a four-core server with 16 GB RAM. The default values are:

tcp-listen-queue: 10

tcp-clients: 10

tcp-idle-timeout: 60 seconds

Those defaults will kill you on any reasonably busy server, meaning, one which gets a couple thousand requests per second.

To be continued…

Conclusion

We encountered a tough situation on our ISC BIND DNS servers. TCP queries, and only TCP queries, were responded to slowsly at best or not at all. after many flase starts we found the solution was setting three tcp parameters in the options section of the configuration file, tcp-listen-queue, tcp-clients and tcp-idle-timeout. We’ve never had to mess with those parameters after literally decades of running ISC BIND. Yet we have incontrovertible proof that that is what was needed.

Case: closed!

References and related

A great and very detailed discussion of this type of TCP backlog issues on Redhat systems is found here: https://access.redhat.com/solutions/30453

Categories
Firewall Network Technologies

The IT Detective Agency: The case of the unreliable WiFi call

Intro

It’s been awhile since I have added a case to the canon of It detective stories which I have personally solved. It’s not that things don’t need resolving. They do! But either they look like what has come before, so there’s nothing new, or they are so new I’m still in the middle of them and you never know if they will ever be solved… Such was the situation with today’s subject: WiFi calling.

WiFi calling, which most people are blissfully ignorant of, can be very necessary if you are in a large building which shields you from cell phone tower signals and does not have any in-building signal boosters. In this situation, as long as you’ve enabled WiFi calling on your phone, it will be smart enough upon seeing no cell signal, to switch to using WiFi, assuming an access point and WiFi is reachable.

Well, such is the case at some office building my company has. And wiFi calling was found to be OK for phones using T-Mobile. But not for Verizon. With Verizon (VZ) phones WiFi calling was at best unpredicatble: sometimes the call would go through and sometimes not.

Unfortunately there were a lot of parties involved in the communication path. WLCs (wireless LAN controllers) have access points (APs) connect to them. they in turn tunnel the communication to another site where the anchor controller resides. Then it gets handed off to a perimiter firewall for NATing and egress via Internet routers. The Internet routers have some sort of load-balancing in place. We don’t run them any more the way we used to. A vendor does that now. And firewalls are handled by a different group. And a different group is in charge of mobile devices. The phone also has a Global protect client and hence an always-on VPN connection. That part is run by yet another group! So you see how this gets impossibly messy. I realized I was in a pretty good place – probably th best place compared to anyone else – to do this troubleshooting however because I touched many of the groups or had “good friends” there.

What does failure look like?

On my phone, a failed attempt looks like this. I place a call, and it doesn’t go through. It also doesn’t not go through. I just never hear anything. I wait for up to a minute, because, who is going to wait more than a minute to hear something after they’ve dialed the number?

More details

At the site they convinced themselves that whereas one SSID works, a second SSID which actually uses the same path, does not. For my part I wasn’t so sure. Eventually under my fairly extensive testing I could produce the problem every time by rebooting my phone and then placing a WiFi call very quickly afterwards.

Fun aside: how to force
WiFi calling even when you have signal

On an Android device go to airplace mode. Your WiFi is then disabled. But you can re-enable your WiFi and airplace mode will stay on! Now when you bring up the built-in voice calling app, you will see the green phone icon with a WiFi icon superimposed over it. That’s how you know you are placing a WiFi call.

But then if I did nothing for about 30 minutes, often my next attempted WiFi call would go through! Go figure. And the call after that would work as well, etc. But maybe a couple hours later the whole thing would break again. I don’t think they were that systematic in their testing.

Verizon to the rescue

After spinning our wheels helplessly we finally got a call with a tech engineer from Verizon who was helpful. Because at some point you think to yourself, the app developer of the phone should be able to instrument the voice app with verbose logging to say what it thinks the problem is. Let’s switch to the firewall where I have good access to the logs as well as a good colleague willing to grind it out with me. Well this is a Checkpoint firewall and the logs are filled with drops. Checkpoint logging says First packet isn’t SYN. So what the VZ guy said which helped us focus is that you want to look for the tunnels to 14.20.0.0/16 or something like that. maybe it’s more like 14.20.128.0/17, or something that rhymes with that! In any case, we didn’t believe the First packet isn’t SYN drops were hurting us too much as we get those a lot, yet things just work.

Then there were dns requests to 8.8.8.8. Why? That’s not the dns server we configured in dhcp (another one of my sub-specialties). And even if the right dns server was being used, it was always possible it was hitting a dns firewall rule. So that had to be ruled out. And it did seem dns did not play into this. Then there was the worrisome matter of the vpn tunnel created by GPC. What if, somehow, these packets were going over that tunnel? They shouldn’t, but what if they do? Well, then we should see that traffic in the GPC logs (another of my sub-specialties). We didn’t. So I became somewhat comfortable ruling out GPC.

So back to VZ. The guy said on our test call that he saw the tunnel initially established, then there was no more communication over it. And so the tester did not receive the test call for him. So when we looked for destination 141.207…, yeah we could see IKE and IPSEC communication. We could see a tunnel being estabvlished over udp port 500, thn further communication to that same destination over udp port 4500. These are pretty much the standard ports for IKE. the VZ guy said he did not have access to be able to do a trace on the IKE peer. We could do a packet trace on our firewall however.

More testing

So we never did see an official drop in the checkpoint logs. Still, I began to suspect that firewall and my colleague agreed with me, or at least agreed to try some things. But first, another red herring. the VZ guy suggested we could trace the packets on the phone with pcapdroid or something like that. So I got that running on my phone. But to work it creates its own IKE tunnel, uses completely different IP addressing, and just generally makes it impossible to account for these IKE packets going to VZ.

On Checkpoint you have a general setting for how it will handle “NAT traversal” for IKE connections. It looks like this:

By the way, tracing on the firewall isn’t all that easy since there are two interfaces. We actually were running tcpdump on the inward-facing interface while running fw monitor on the outbound interface! That’s not so easy to coordinate. Neither D nor I had ever done it before. We never did reach that Aha moment where you say, look, the packet destined for the tunnel enters here, and doesn’t go out here. There was just too much competing traffic. But anyway, D wanted to play with the NAT traversal settings, which seemed easier.

First adjustment: aggressive aging

The first thing D did was to turn off aggressive aging. Well, that helped a lot. With that, I was able to place my WiFi calls successfully every time after a reboot!

But this thing is tricky. We were chatting. Some time had passed. I placed another test call. Nope. that one didn’t go through! Drat. We had more homework to do. I had been recording the exact times of the calls pretty carefully. About 16 minutes had elapsed between the two calls.

To be continued…

Conclusion

In one of our most difficult cases, we got WiFi calling working reliably on Verizon phones. There were a lot of parties involved and a lot of false leads: look for asymmetric routing, etc.. The real problem was the IKE NAT traversal settings on a Checkpoint firewall. everyone involved is much happier now.

Case: closed!

References and related

A cogent discussion of the many others having troubkle with this is found at this VZ community page: https://community.verizon.com/t5/Other-Network-Discussions/What-are-the-wifi-calling-firewall-ports-and-destination-IP/m-p/1080659