Categories
Linux Perl Raspberry Pi Web Site Technologies

Convert GPS Coordinates into town name or address

Intro

This is a small piece of a larger project – displaying your photos on Google Drive using a Raspberry Pi. That project will require completion of many small investigations, this being just one of them.

I thought, wouldn’t it be cool to ask your photo frame when and where a certain picture was taken? I thought that information was typically embedded into the picture by modern smartphones. Turns out this is disappointingly not the case – at least not on our smartphones, except in a small minority of pictures. But since I got somewhere with my investigation, I wanted to share the results, regardless.

Also, I naively assumed that there surely is a web service that permits one to easily convert GPS coordinates into the name – in text – of the closest town. After all, you can enter GPS coordinates into Google Maps and get back a map showing the exact location. Why shouldn’t it be just as easy to extract the nearest town name as text? Again, this assumption turns out to be faulty. But, I found a way to do it that is not toooo difficult.

Example for Cape May, New Jersey

$ curl -s http://api.geonames.org/address?lat=38.9302957777778&lng=-74.9183310833333&username=drjohns

<geonames>
<address>
<street>Beach Dr</street>
<houseNumber>690</houseNumber>
<locality>Cape May</locality>
<postalcode>08204</postalcode>
<lng>-74.91835</lng>
<lat>38.93054</lat>
<adminCode1>NJ</adminCode1>
<adminName1>New Jersey</adminName1>
<adminCode2>009</adminCode2>
<adminName2>Cape May</adminName2>
<adminCode3/>
<adminCode4/>
<countryCode>US</countryCode>
<distance>0.03</distance>
</address>
</geonames>

The above example used the address service. The results in this case are unusually complete. Sometime the lookups simply fail for no obvious reason, or provide incomplete information, such as a missing locality. In those cases the town name is usually still reported in the adminName2 element. I haven’t checked the address accuracy much, but it seems pretty accurate, like, representing an actual address within 100 yards, usually better, of where the picture was taken.

They have another service, findNearbyPlaceName, which sometimes works even when address fails. However its results are also unpredictable. I was in Merrillville, Indiana and it gave the toponym as Chapel Manor, which is the name of the subdivision! In Virginia it gave the name The Hamlet – still not sure where that came from, but I trust it is some hyper-local name for a section of the town (James City). Just as often it does spit back the town or city name, for instance, Atlantic City. So, it’s better than nothing.

The example for Nantucket

From a browser – here I use curl in the linux command line – you enter:

$ curl -s http://api.geonames.org/findNearbyPlaceName?lat=41.282778&lng=-70.099444&username=drjohns

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<geonames>
<geoname>
<toponymName>Nantucket</toponymName>
<name>Nantucket</name>
<lat>41.28346</lat>
<lng>-70.09946</lng>
<geonameId>4944903</geonameId>
<countryCode>US</countryCode>
<countryName>United States</countryName>
<fcl>P</fcl>
<fcode>PPLA2</fcode>
<distance>0.07534</distance>
</geoname>
</geonames>

So what did we do? For this example I looked up Nantucket in Wikipedia to find its GPS coordinates. Then I used the geonames api to convert those coordinates into the town name, Nantucket.

Note that drjohns is an actual registered username with geonames. I am counting on the unpopularity of my posts to prevent an onslaught of usage as the usage credits are limited for free accounts. If I understood the terms, a few lookups per hour would not be an issue.

I’m finding the PlaceName lookup pretty useless, the address lookup fails about 30% of the time, so I’m thinking as a backstop to use this sort of lookup:

$ curl ‘http://api.geonames.org/extendedFindNearby?lat=41.00050&lng=-74.65329&username=drjohn’

<?xml version=”1.0″ encoding=”UTF-8″ standalone=”no”?>
<geonames>
<address>
<street>Stanhope Rd</street>
<mtfcc>S1400</mtfcc>
<streetNumber>439</streetNumber>
<lat>41.00072</lat>
<lng>-74.6554</lng>
<distance>0.18</distance>
<postalcode>07871</postalcode>
<placename>Lake Mohawk</placename>
<adminCode2>037</adminCode2>
<adminName2>Sussex</adminName2>
<adminCode1>NJ</adminCode1>
<adminName1>New Jersey</adminName1>
<countryCode>US</countryCode>
</address>
</geonames>

Note that gets a reasonably close address, and more importantly, a zipcode. The placename is too local and I will probably discard it. But another lookup can turn a zipcode into a town or city name which is what I am after.

$ curl ‘http://api.geonames.org/postalCodeSearch?country=US&postalcode=07871&username=drjohns’

<?xml version=”1.0″ encoding=”UTF-8″ standalone=”no”?>
<geonames>
<totalResultsCount>1</totalResultsCount>
<code>
<postalcode>07871</postalcode>
<name>Sparta</name>
<countryCode>US</countryCode>
<lat>41.0277</lat>
<lng>-74.6407</lng>
<adminCode1 ISO3166-2=”NJ”>NJ</adminCode1>
<adminName1>New Jersey</adminName1>
<adminCode2>037</adminCode2>
<adminName2>Sussex</adminName2>
<adminCode3/>
<adminName3/>
</code>
</geonames>

See? It was a lot of work, but we finally got the township name, Sparta, returned to us.

Ocean GPS?

I was whale-watching and took some pictures with GPS info. Trying to apply the methods above worked, but just barely. Basically all I could get out of the extended find nearby search was a name field with value North Atlantic Ocean! Well, that makes it sounds like I was on some Titanic-style ocean crossing. In fact I was in the Gulf of Maine a few miles from Provincetown. So they really could have done a better job there… Of course it’s understandable to not have a postalcode and street address and such. But still, bodies of waters have names and geographical boundaries as well. Casinos seem to be the main sponsors of geonames.org, and I guess they don’t care. Yesterday my script came up with a location Earth! But now I see geonames proposed several locations and I only look at the first one. I am creating a refinement which will perform better in such cases. Stay tuned… And…yes…the refinement is done. I had to do a wee bit of xml parsing, which I now do.

To get your own account at geonames.org

The process of getting your own account isn’t too difficult, just a bit squirrelly. For the record, here is what you do.

Go to http://www.geonames.org/login to create your account. It sends an email confirmation. Oh. Be sure to use a unique browser-generated password for this one. The security level is off-the-charts awful – just assume that any and all hackers who want that password are going to get it. It sends you a confirmation email. so far so good. But when you then try to use it in an api call it will tell you that that username isn’t known. This is the tricky part.

So go to https://www.geonames.org/manageaccount . It will say:

Free Web Services
the account is not yet enabled to use the free web services. Click here to enable. 

And that link, in turn is https://www.geonames.org/enablefreewebservice . And having enabled your account for the api web service, the URL, where you’ve put your username in place of drjohns, ought to work!

For a complete overview of all the different things you can find out from the GPS coordinates from geonames, look at this link: https://www.geonames.org/export/ws-overview.html

Working with pictures

Please look at this post for the python code to extract the metadata from an image, including, if available GPS info. I called the python program getinfo.py.

Here’s an actual example of running it to learn the GPS info:

$ ../getinfo.py 20170520_102248.jpg|grep -ai gps

GPSInfo = {0: b'\x02\x02\x00\x00', 1: 'N', 2: (42.0, 2.0, 18.6838), 3: 'W', 4: (70.0, 4.0, 27.5448), 5: b'\x00', 6: 0.0, 7: (14.0, 22.0, 25.0), 29: '2017:05:20'}

I don’t know if it’s good or bad, but the GPS coordinates seem to be encoded in the degrees, minutes, seconds format.

A nice little program to put things together

I call it analyzeGPS.pl and a, using it on a Raspberry Pi, but could easily be adapted to any linux system.

                    
#!/usr/bin/perl
# use in combination with this post https://drjohnstechtalk.com/blog/2020/12/convert-gps-coordinates-into-town-name/
use POSIX;
$DEBUG = 1;
$HOME = "/home/pi";
#$file = "Pictures/20180422_134220.jpg";
while(<>){
$GPS = $date = 0;
$gpsinfo = "";
$file = $_;
open(ANAL,"$HOME/getinfo.py \"$file\"|") || die "Cannot open file: $file!!\n";
#open(ANAL,"cat \"$file\"|") || die "Cannot open file: $file!!\n";
print STDERR "filename: $file\n" if $DEBUG;
while(<ANAL>){
  $postalcode = $town = $name = "";
  if (/GPS/i) {
    print STDERR "GPS: $_" if $DEBUG;
# GPSInfo = {1: 'N', 2: (39.0, 21.0, 22.5226), 3: 'W', 4: (74.0, 25.0, 40.0267), 5: 1.7, 6: 0.0, 7: (23.0, 4.0, 14.0), 29: '2016:07:22'}
   ($pole,$deg,$min,$sec,$hemi,$lngdeg,$lngmin,$lngsec) = /1: '([NS])', 2: \(([\d\.]+), ([\d\.]+), ([\d\.]+)...3: '([EW])', 4: \(([\d\.]+), ([\d\.]+), ([\d\.]+)\)/i;
   print STDERR "$pole,$deg,$min,$sec,$hemi,$lngdeg,$lngmin,$lngsec\n" if $DEBUG;
   $lat = $deg + $min/60.0 + $sec/3600.0;
   $lat = -$lat if $pole eq "S";
   $lng = $lngdeg + $lngmin/60.0 + $lngsec/3600.0;
   $lng = -$lng if $hemi = "W" || $hemi eq "w";
   print STDERR "lat,lng: $lat, $lng\n" if $DEBUG;
   #$placename = `curl -s "$url"|grep -i toponym`;
   next if $lat == 0 && $lng == 0;
# the address API is the most precise
   $url = "http://api.geonames.org/address?lat=$lat\&lng=$lng\&username=drjohns";
   print STDERR "Url: $url\n" if $DEBUG;
   $results = `curl -s "$url"|egrep -i 'street|house|locality|postal|adminName'`;
   print STDERR "results: $results\n" if $DEBUG;
   ($street) = $results =~ /street>(.+)</;
   ($houseNumber) = $results =~ /houseNumber>(.+)</;
   ($postalcode) = $results =~ /postalcode>(.+)</;
   ($state) = $results =~ /adminName1>(.+)</;
   ($town) = $results =~ /locality>(.+)</;
   print STDERR "street, houseNumber, postalcode, state, town: $street, $houseNumber, $postalcode, $state, $town\n" if $DEBUG;
# I think locality is pretty good name. If it exists, don't go  further
   $postalcode = "" if $town;
   if (!$postalcode && !$town){
# we are here if we didn't get interesting results from address reverse loookup, which often happens.
     $url = "http://api.geonames.org/extendedFindNearby?lat=$lat\&lng=$lng\&username=drjohns";
     print STDERR "Address didn't work out. Trying extendedFindNearby instead. Url: $url\n" if $DEBUG;
     $results = `curl -s "$url"`;
# parse results - there may be several objects returned
     $topelemnt = $results =~ /<geoname>/i ? "geoname" : "geonames";
     @elmnts = ("street","streetnumber","lat","lng","locality","postalcode","countrycode","countryname","name","adminName2","adminName1");
     $cnt = xml1levelparse($results,$topelemnt,@elmnts);

     @lati = @{ $xmlhash{lat}};
     @long = @{ $xmlhash{lng}};
# find the closest entry
     $distmax = 1E7;
     for($i=0;$i<$cnt;$i++){
       $dist = ($lat - $lati[$i])**2 + ($lng - $long[$i])**2;
       print STDERR "dist,lati,long: $dist, $lati[$i], $long[$i]\n" if $DEBUG;
       if ($dist < $distmax) {
         print STDERR "dist < distmax condition. i is: $i\n";
         $isave = $i;
       }
     }
     $street = @{ $xmlhash{street}}[$isave];
     $houseNumber = @{ $xmlhash{streetnumber}}[$isave];
     $admn2 = @{ $xmlhash{adminName2}}[$isave];
     $postalcode = @{ $xmlhash{postalcode}}[$isave];
     $name = @{ $xmlhash{name}}[$isave];
     $countrycode = @{ $xmlhash{countrycode}}[$isave];
     $countryname = @{ $xmlhash{countryname}}[$isave];
     $state = @{ $xmlhash{adminName1}}[$isave];
     print STDERR "street, houseNumber, postalcode, state, admn2, name: $street, $houseNumber, $postalcode, $state, $admn2, $name\n" if $DEBUG;
     if ($countrycode ne "US"){
       $state .= " $countryname";
     }
     $state .= " (approximate)";
   }
# turn zipcode into town name with this call
   if ($postalcode) {
     print STDERR "postalcode $postalcode exists, let's convert to a town name\n";
     print STDERR "url: $url\n";
     $url = "http://api.geonames.org/postalCodeSearch?country=US\&postalcode=$postalcode\&username=drjohns";
     $results = `curl -s "$url"|egrep -i 'name|locality|adminName'`;
     ($town) = $results =~ /<name>(.+)</i;
     print STDERR "results,town: $results,$town\n";
   }
   if (!$town) {
# no town name, use adminname2 which is who knows what in general
     print STDERR "Stil no town name. Use adminName2 as next best thing\n";
     $town = $admn2;
   }
   if (!$town) {
# we could be in the ocean! I saw that once, and name was North Atlantic Ocean
     print STDERR "Still no town. Try to use name: $name as last resort\n";
     $town = $name;
   }
   $gpsinfo = "$houseNumber $street $town, $state" if $locality || $town;
   } # end of GPS info exists condition
  } # end loop over ANAL file
  $gpsinfo = $gpsinfo || "No info found";
  print qq(Location: $gpsinfo
);
} # end loop over STDIN

#####################
# function to parse some xml and fill a hash of arrays
sub xml1levelparse{
# build an array of hashes
$string = shift;
# strip out newline chars
$string =~ s/\n//g;
$parentelement = shift;
@elements = @_;
$i=0;
while($string =~ /<$parentelement>/i){
 $i++;
 ($childelements) = $string =~ /<$parentelement>(.+?)<\/$parentelement>/i;
 print STDERR "childelements: $childelements" if $DEBUG;
 $string =~ s/<$parentelement>(.+?)<\/$parentelement>//i;
 print STDERR "string: $string\n" if $DEBUG;
 foreach $element (@elements){
  print STDERR "element: $element\n" if $DEBUG;
  ($value) = $childelements =~ /<$element>([^<]+)<\/$element>/i;
  print STDERR "value: $value\n" if $DEBUG;
  push @{ $xmlhash{$element} }, $value;
 }
} # end of loop over parent elements
return $i;
} # end sub xml1levelparse

Here’s a real example of calling it, one of the more difficult cases:

$ echo -n 20180127_212203.jpg|./analyzeGPS.pl

GPS: GPSInfo = {0: b'\x02\x02\x00\x00', 1: 'N', 2: (41.0, 0.0, 2.75), 3: 'W', 4: (74.0, 39.0, 12.0934), 5: b'\x00', 6: 0.0, 7: (2.0, 21.0, 58.0), 29: '2018:01:28'}
N,41.0,0.0,2.75,W,74.0,39.0,12.0934
lat,lng: 41.0007638888889, -74.6533592777778
Url: http://api.geonames.org/address?lat=41.0007638888889&lng=-74.6533592777778&username=drjohns
results:
street, houseNumber, postalcode, state, town: , , , ,
Address didn't work out. Trying extendedFindNearby instead. Url: http://api.geonames.org/extendedFindNearby?lat=41.0007638888889&lng=-74.6533592777778&username=drjohns
childelements: <address> <street>Stanhope Rd</street> <mtfcc>S1400</mtfcc> <streetNumber>433</streetNumber> <lat>41.00121</lat> <lng>-74.65528</lng> <distance>0.17</distance> <postalcode>07871</postalcode> <placename>Lake Mohawk</placename> <adminCode2>037</adminCode2> <adminName2>Sussex</adminName2> <adminCode1>NJ</adminCode1> <adminName1>New Jersey</adminName1> <countryCode>US</countryCode> </address>string: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
element: street
value: Stanhope Rd
element: streetnumber
value: 433
element: lat
value: 41.00121
element: lng
value: -74.65528
element: locality
value:
element: postalcode
value: 07871
element: countrycode
value: US
element: countryname
value:
element: name
value:
element: adminName2
value: Sussex
element: adminName1
value: New Jersey
dist,lati,long: 3.88818897839883e-06, 41.00121, -74.65528
dist < distmax condition. i is: 0
street, houseNumber, postalcode, state, admn2, name: Stanhope Rd, 433, 07871, New Jersey, Sussex,
postalcode 07871 exists, let's convert to a town name
url: http://api.geonames.org/extendedFindNearby?lat=41.0007638888889&lng=-74.6533592777778&username=drjohns
results,town: <geonames>
<name>Sparta</name>
<adminName1>New Jersey</adminName1>
<adminName2>Sussex</adminName2>
<adminName3/>
</geonames>
,Sparta
Location: 433 Stanhope Rd Sparta, New Jersey (approximate)

Or, if you just want the interesting stuff,

$ echo -n 20180127_212203.jpg|./analyzeGPS.pl 2>/dev/null

Location: 433 Stanhope Rd Sparta, New Jersey (approximate)

Conclusion

An api for reverse lookup of GPS coordinates which returns the nearest address, including town name, is available. I have provided examples of how to use it. It is unreliable, however, and Geonames.org does provide alternatives which have their own drawbacks. In my image gallery, only a minority of my pictures have encoded GPS data, but it is fun to work with them to pluck out the town where they were shot.

I have incorporated this functionality into a Raspberry Pi-based photo frame I am working on.

I have created an example Perl program that analyzes a JPEG image to extract the GPS information and turn it into an address that is remarkably accurate. It is amazing and uncanny to see it at work. It deals with the screwy and inconsistent results returned by the free service, Geonames.org.

References and related

There are lots of different things you can derive given the GPS coordinates using the Geonames api. Here is a list: https://www.geonames.org/export/ws-overview.html

In this photo frame version of mine, I extract all the EXIF metadata which includes the GPS info.

One day my advanced photo frame will hopefully include an option to learn where a photo was taken by interacting with a remote control. Here is the start of that write-up.

You can pay $5 and get a zip codes to cities database in any format. I’m sure they’ve just re-packaged data from elsewhere, but it might be worth it: https://www.uszipcodeslist.com/

For a more professional api, https://smartystreets.com/ looks quite nice. Free level is 250 queries per month, so not too many. But their documentation and usability looks good to me. For this post I was looking for free services and have tried to avoid commercial services.

Categories
Perl Python Raspberry Pi Web Site Technologies

Raspberry Pi photo frame using your pictures on your Google Drive

Editor’s Note

Please note I am putting all my currently active development and latest updates into this newer post: Raspberry Pi photo frame using your pictures on your Google Drive II

Intro

All my spouse’s digital photo frames are either broken or nearly broken – probably she got them from garage sales. Regardless, they spend 99% of the the time black. Now, since I had bought that Raspberry Pi PiDisplay awhile back, and it is underutilized, and I know a thing or two about linux, I felt I could create a custom photo frame with things I already have lying around – a Raspberry Pi 3, a PiDisplay, and my personal Google Drive. We make a point to copy all our cameras’ pictures onto the Google Drive, which we do the old-fashioned, by-hand way. After 17 years of digital photos we have about 40,000 of them, over 200 GB.

So I also felt obliged to create features you will never have in a commercial product, to make the effort worthwhile. I thought, what about randomly picking a few for display from amongst all the pictures, displaying that subset for a few days, and then moving on to a new randomly selected sample of images, etc? That should produce a nice review of all of them over time, eventually. You need an approach like that because you will never get to the end if you just try to display 40000 images in order!

Equipment

This work was done on a Raspberry Pi 3 running Raspbian Lite (more on that later). I used a display custom-built for the RPi, Amazon.com: Raspberry Pi 7″ Touch Screen Display: Electronics), though I believe any HDMI display would do.

The scripts
Here is the master file which I call master.sh.

                    
#!/bin/sh
# DrJ 8/2019
# call this from cron once a day to refesh random slideshow once a day
RANFILE=”random.list”
NUMFOLDERS=20
DISPLAYFOLDER=”/home/pi/Pictures”
DISPLAYFOLDERTMP=”/home/pi/Picturestmp”
SLEEPINTERVAL=3
DEBUG=1
STARTFOLDER=”MaryDocs/Pictures and videos”

echo “Starting master process at “`date`

rm -rf $DISPLAYFOLDERTMP
mkdir $DISPLAYFOLDERTMP

#listing of all Google drive files starting from the picture root
if [ $DEBUG -eq 1 ]; then echo Listing all files from Google drive; fi
rclone ls remote:”$STARTFOLDER” > files

# filter down to only jpegs, lose the docs folders
if [ $DEBUG -eq 1 ]; then echo Picking out the JPEGs; fi
egrep ‘\.[jJ][pP][eE]?[gG]$’ files |awk ‘$1 > 11000 {$1=””; print substr($0,2)}’|grep -i -v /docs/ > jpegs.list

# throw NUMFOLDERS or so random numbers for picture selection, select triplets of photos by putting
# names into a file
if [ $DEBUG -eq 1 ]; then echo Generate random filename triplets; fi
./random-files.pl -f $NUMFOLDERS -j jpegs.list -r $RANFILE

# copy over these 60 jpegs
if [ $DEBUG -eq 1 ]; then echo Copy over these random files; fi
cat $RANFILE|while read line; do
rclone copy remote:”${STARTFOLDER}/$line” $DISPLAYFOLDERTMP
sleep $SLEEPINTERVAL
done

# rotate pics as needed
if [ $DEBUG -eq 1 ]; then echo Rotate the pics which need it; fi
cd $DISPLAYFOLDERTMP; ~/rotate-as-needed.sh
cd ~

# kill any qiv slideshow
if [ $DEBUG -eq 1 ]; then echo Killing old qiv and fbi slideshow; fi
pkill -9 -f qiv
sudo pkill -9 -f fbi
pkill -9 -f m2.pl

# remove old pics
if [ $DEBUG -eq 1 ]; then echo Removing old pictures; fi
rm -rf $DISPLAYFOLDER

mv $DISPLAYFOLDERTMP $DISPLAYFOLDER

#run looping fbi slideshow on these pictures
if [ $DEBUG -eq 1 ]; then echo Start fbi slideshow in background; fi
cd $DISPLAYFOLDER ; nohup ~/m2.pl >> ~/m2.log 2>&1 &

if [ $DEBUG -eq 1 ]; then echo “And now it is “`date`; fi

I call the following script random-files.pl:

                    

#!/usr/bin/perl
use Getopt::Std;
my %opt=();
getopts("c:df:j:r:",\%opt);
$nofolders = $opt{f} ? $opt{f} : 20;
$DEBUG = $opt{d} ? 1 : 0;
$cutoff = $opt{c} ? $opt{c} : 5;
$cutoffS = 60*$cutoff;
$jpegs = $opt{j} ? $opt{j} : "jpegs.list";
$ranpicfile = $opt{r} ? $opt{r} : "jpegs-random.list";
print "d,f,j,r: $opt{d}, $opt{f}, $opt{j}, $opt{r}\n" if $DEBUG;
open(JPEGS,$jpegs) || die "Cannot open jpegs listing file $jpegs!!\n";
@jpegs = ;
# remove newline character
$nopics = chomp @jpegs;
open(RAN,"> $ranpicfile") || die "Cannot open random picture file $ranpicfile!!\n";
for($i=0;$i<$nofolders;$i++) {
  $t = int(rand($nopics-2));
  print "random number is: $t\n" if $DEBUG;
# a lot of our pics follow this naming convention
# 20160831_090658.jpg
  ($date,$time) = $jpegs[$t] =~ /(\d{8})_(\d{6})/;
  if ($date) {
    print "date, time: $date $time\n" if $DEBUG;
# ensure neighboring picture is at least five minutes different in time
    $iPO = $iP = $diff = 0;
    ($hr,$min,$sec) = $time =~ /(\d\d)(\d\d)(\d\d)/;
    $secs = 3600*$hr + 60*$min + $sec;
    print "Pre-pic logic\n";
    while ($diff < $cutoffS) {
      $iP++;
      $priorPic = $jpegs[$t-$iP];
      $Pdate = $Ptime = 0;
      ($Pdate,$Ptime) = $priorPic =~ /(\d{8})_(\d{6})/;
      ($Phr,$Pmin,$Psec) = $Ptime =~ /(\d\d)(\d\d)(\d\d)/;
      $Psecs = 3600*$Phr + 60*$Pmin + $Psec;
      print "hr,min,sec,Phr,Pmin,Psec: $hr,$min,$sec,$Phr,$Pmin,$Psec\n" if $DEBUG;
      $diff = abs($secs - $Psecs);
      print "diff: $diff\n" if $DEBUG;
# end our search if we happened upon different dates
      $diff = 99999 if $Pdate ne $date;
    }
# post-picture logic - same as pre-picture
    print "Post-pic logic\n";
    $diff = 0;
    while ($diff < $cutoffS) {
      $iPO++;
      $postPic = $jpegs[$t+$iPO];
      $Pdate = $Ptime = 0;
      ($Pdate,$Ptime) = $postPic =~ /(\d{8})_(\d{6})/;
      ($Phr,$Pmin,$Psec) = $Ptime =~ /(\d\d)(\d\d)(\d\d)/;
      $Psecs = 3600*$Phr + 60*$Pmin + $Psec;
      print "hr,min,sec,Phr,Pmin,Psec: $hr,$min,$sec,$Phr,$Pmin,$Psec\n" if $DEBUG;
      $diff = abs($Psecs - $secs);
      print "diff: $diff\n" if $DEBUG;
# end our search if we happened upon different dates
      $diff = 99999 if $Pdate ne $date;
    }
  } else {
    $iP = $iPO = 2;
  }
  $priorPic = $jpegs[$t-$iP];
  $Pic = $jpegs[$t];
  $postPic = $jpegs[$t+$iPO];
  print RAN qq($priorPic
$Pic
$postPic
);
}
close(RAN);

Bunch of simple python scripts

I call this one getinfo.py:

                    
#!/usr/bin/python3
import os,sys
from PIL import Image
from PIL.ExifTags import TAGS

for (tag,value) in Image.open(sys.argv[1])._getexif().items():
print (‘%s = %s’ % (TAGS.get(tag), value))

print (‘%s = %s’ % (TAGS.get(tag), value))

And here’s rotate.py:

                    
#!/usr/bin/python3
import PIL, os
import sys
from PIL import Image

picture= Image.open(sys.argv[1])

# if orientation is 6, rotate clockwise 90 degrees
picture.rotate(-90,expand=True).save(“rot_” + sys.argv[1])

While here is rotatecc.py:

                    
#!/usr/bin/python3
import PIL, os
import sys
from PIL import Image

picture= Image.open(sys.argv[1])

# if orientation is 8, rotate counterclockwise 90 degrees
picture.rotate(90,expand=True).save(“rot_” + sys.argv[1])

And rotate-as-needed.sh:

                    
#!/bin/sh
# DrJ 12/2020
# some of our downloaded files will be sideways, and fbi doesn’t auto-rotate them as far as I know
# assumption is that are current directory is the one where we want to alter files
ls -1|while read line; do
echo fileis “$line”
o=`~/getinfo.py “$line”|grep -ai orientation|awk ‘{print $NF}’`
echo orientation is $o
if [ “$o” -eq “6” ]; then
echo “90 clockwise is needed, o is $o”
# rotate and move it
~/rotate.py “$line”
mv rot_”$line” “$line”
elif [ “$o” -eq “8” ]; then
echo “90 counterclock is needed, o is $o”
# rotate and move it
~/rotatecc.py “$line”
mv rot_”$line” “$line”
fi
don

And finally, m2.pl:

                    

#!/usr/bin/perl
# show the pics ; rotate the screen as needed
# for now, assume the display is in a neutral
# orientation at the start
use Time::HiRes qw(usleep);
$DEBUG = 1;
$delay = 6; # seconds between pics
$mdelay = 200; # milliseconds
$mshow = "$ENV{HOME}/mediashow";
$pNames = "$ENV{HOME}/pNames";
# pics are here
$picsDir = "$ENV{HOME}/Pictures";

chdir($picsDir);
system("ls -1 > $pNames");
# forther massage names
open(TMP,"$pNames");
@lines = ;
foreach (@lines) {
  chomp;
  $filesNullSeparated .= $_ . "\0";
}
open(MS,">$mshow") || die "Cannot open mediashow file $mshow!!\n";
print MS $filesNullSeparated;
close(MS);
print "filesNullSeparated: $filesNullSeparated\n" if $DEBUG;
$cn = @lines;
print "$cn files\n" if $DEBUG;
# throw up a first picture - all black. Trick to make black bckgrd permanent
system("sudo fbi -a --noverbose -T 1 $ENV{HOME}/black.jpg");
system("sudo fbi -a --noverbose -T 1 $ENV{HOME}/black.jpg");
sleep(1);
system("sleep 2; sudo killall fbi");
# start infinitely looping fbi slideshow
for (;;) {
# then start slide show
# shell echo cannot work with null character so we need to use a file to store it
    #system("cat $picNames|xargs -0 qiv -DfRsmi -d $delay \&");
    system("sudo xargs -a $mshow -0 fbi -a --noverbose -1 -T 1  -t $delay ");
# fbi runs in background, then exits, so we need to monitor if it's still alive
# wait appropriate estimated amount of time, then look aggressively for fbi
    sleep($delay*($cn - 2));
    for(;;) {
      open(MON,"ps -ef|grep fbi|grep -v grep|") || die "Cannot launch ps -ef!!\n";
      $match = ;
      if ($match) {
        print "got fbi match\n" if $DEBUG > 1;
        } else {
        print "no fbi match\n" if $DEBUG;
# fbi not found
          last;
      }
      close(MON);
      print "usleeping, noexist is $noexit\n" if $DEBUG > 1;
      usleep($mdelay);
    } # end loop testing if fbi has exited
} # close of infinite loop

You’ll need to make these files executable. Something like this should work:

$ chmod +x *.py *.pl *.sh

My crontab file looks like this (you edit crontab using the crontab -e command):

@reboot sleep 25; cd ~ ; ./m2.pl >> ./m2.log 2>&1
24 16 * * * ./master.sh >> ./master.log 2>&1

This invokes master.sh once a day at 4:24 PM to refresh the 60 photos. My refresh took about 13 minutes the other day, but the old slideshow keeps playing until almost the last second, so it’s OK.

The nice thing about this approach is that fbi works with a lightweight OS – Raspbian Lite is fine, you’ll just need to install a few packages. My SD card is unstable or something, so I have to re-install the OS periodically. An install of Raspberry Pi Lite on my RPi 4 took 11 minutes. Anyway, fbi is installed via:

$ sudo apt-get install fbi

But if your RPi is freshly installed, you may first need to do a

$ sudo apt-get update && sudo apt-get upgrade

python image manipulation

The drawback of this approach, i.e., not using qiv, is that we gotta do some image manipulation, for which python is the best candidate. I’m going by memory. I believe I installed python3, perhaps as sudo apt-get install python3. Then I needed pip3: sudo apt-get install python3-pip. Then I needed to install Pillow using pip3: sudo pip3 install Pillow.

m2.pl refers to a black.jpg file. It’s not a disaster to not have that, but under some circumstances it may help. There it is!

Many of my photos do not have EXIF information, yet they can still be displayed. So for those photos running getinfo.py will produce an error (but the processing of the other photos will continue.)

I was originally rotating the display 90 degrees as needed to display the photos with the using the maximum amount of display real estate. But that all broke when I tried to revive it. And the cheap servo motor was noisy. But folks were pretty impressed when I demoed it, because I did it get it the point where it was indeed working correctly.

Picture selection methodology

There are 20 “folders” (random numbers) of three triplets each. The idea is to give you additional context to help jog your memory. The triplets, with some luck, will often be from the same time period.

I observed how many similar pictures are adjacent to each other amongst our total collection. To avoid identical pictures, I require the pictures to be five minutes apart in time. Well, I cheated. I don’t pull out the timestamp from the EXIF data as I should (at least not yet – future enhancement, perhaps). But I rely on a file-naming convention I notice is common – 20201227_134508.jpg, which basically is a timestamp-encoded name. The last six digits are HHMMSS in case it isn’t clear.

Rclone

You must install the rclone package, sudo apt-get install rclone.

Can you configure rclone on a headless Raspberry Pi?

Indeed you can. I know because I just did it. You enable your Pi for ssh access. do the rclone-config (or whatever it’s called) using putty from a Windows 10 system. You’ll get a long Google URL in the course of configuring that you can paste into your browser. You verify it’s you, log into your Google account. Then you get back a url like http://127.0.0.1:5462/another-long-url-string. Well, put that url into your clipboard and in another login window, enter curl clipboard_contents

That’s what I did, not certain it would work, but I saw it go through in my rclone-config window, and that was that!

Don’t want to deal with rclone?

So you want to use a traditional flash drive you plug in to a USB port, just like you have for the commerical photo frames, but you otherwise like my approach of randomizing the picture selection each day? I’m sure that is possible. A mid-level linux person could rip out the rclone stuff I have embedded and replace as needed with filesystem commands. I’m imagining a colossal flash drive with all your tens of thousands of pictures on it where my random selection still adds value. If this post becomes popular enough perhapsI will post exactly how to do it.

Getting started with this

After you’ve done all that, and want to try it out. you can run

$ ./master.sh

First you should see a file called files growing in size – that’s rclone doing its listing. That takes a few minutes. Then it generates random numbers for photo selection – that’s very fast, maybe a second. Then it slowly copies over the selected images to a temporary folder called Picturestmp. That’s the slowest part. If you do a directory listing you should see the number of images in that directory growing slowly, adding maybe three per minute until it reaches 60 of them. Finally the rotation are applied. But even if you didn’t set up your python environment correctly, it doesn’t crash. It effectively skips the rotations. A rotation takes a couple seconds per image. Finally all the images are copied over to the production area, the directory called Pictures; the old slideshow program is “killed,” and the new slideshow starts up. Whole process takes around 15 minutes.

I highly recommend running master.sh by hand as just described to make sure it all works. Probably some of it won’t. I don’t specialize in making recipes, more just guidance. But if you’re feeling really bold you can just power it up and wait a day (because initially you won’t have any pictures in your slideshow) and pray that it all works.

Still missing

I’d like to display a transition image when switching from the current set of photos to the new ones.

Suppressing boot up messages might be nice for some. Personally I think they’re kind of cool – makes it look like you’ve done a lot more techie work than you actually have!

You’re going to get some junk images. I’ve seen where an image is a thumbnail (I guess) and gets blown up full screen so that you see these giant blocks of pixels. I could perhaps magnify those kind of images less.

Movies are going to be tricky so let’s not even go there…

I was thinking about making it a navigation-enabled photo frame, such as integration with a Gameboy controller. You could do some really awesome stuff: Pause this picture; display the location (town or city) where this photo was taken; refresh the slideshow. It sounds fantastical, but I don’t think it’s beyond the capability of even modestly capable hobbyist programmers such as myself.

I may still spin the frame 90 degrees this way an that. I have the servo mounted and ready. Just got to revive the control commands for it.

References and related

This 7″ display is a little small, but it’s great to get you started. It’s $64 at Amazon: Amazon.com: Raspberry Pi 7″ Touch Screen Display: Electronics

I have an older approach using qiv which I lost the files for, and my blog post got corrupted. Hence this new approach.

In this slightly more sophisticated approach, I make a greater effort to separate the photos in time. But I also make a whole bunch of other improvements as well. But it’s a lot more files so it may only be appropriate for a more seasoned RPi command-line user.

My advanced slideshow treatment is beginning to take shape. I just add to it while I develop it, so check it periodically if that is of interest. Raspberry Pi advanced photo frame.

Categories
DNS Perl Raspberry Pi

Domain Services: does Backorder work?

Intro

This is a memoir of my personal experience with trying to obtain a DNS domain that was registered by another person and about to expire. Plus some technical discussion of how whois on linux probably works.

The details

I’ve been watching a particular domain for years now. It’s always been registered at auction sites, and has changed hands at least once, maybe even twice. So i was excited this year when it was about to expire at the end of September. I kept checking via linux whois – figuring, or really more like hoping, that a direct query to the authoritative whois server would not tip off the owner if it were done outside of a web page. The linux command is whois -h whois.epik.com drjohnss.com (ok, that is not the real domain, just using it for the sake of preserving anonymitiy).

So about 10 days after it “expired” – at which point I believe it is very easy for the owner to still renew it – I wanted to increase my chances so I decided to make a bid for it, figuring, the owner would face either my offer or the prospect of getting nothing for the domain or shelling out for the renewal. So I offered $150 which is what it’s worth to me.

To my surprise I got a return email:

Hello John,
Thanks for the inquiry.
This seller will not sell for less than $10K. What is your budget?

Christina

Wow, right? Then I thought for a few minutes? I’ve seen this before – at work. There was this no-name domain which matched something the marketing folks were planning, so we made an offer through a third-party service. The response was to the effect, The seller is not interested in selling, but for $47,000 you could buy it. WTF. You can’t make this stuff up. I don’t have a lot of respect for domainers because frankly, almost all my interactions have been negative. Consider the evidence. At work I constantly get unsolicited offers for company_name.nz. The emails always come from different email addresses to avoid spam filters. That is cyber-squatting. Deplorable. I once got an unsolicited offer for a domain similar to one we owned (without the “s”). I checked it and found it wasn’t even registered! So that con artist was trying to take advantage of our naivete. Scum. Then a month ago I was offered some $ for any GoDaddy account which had been registered years ago and so had access to its API auction service, which you apparently cannot get any longer. Sounds like an invitation to violate the terms of service to me – another dodgy tactic.

So I thought about that statement and decided, that’s just a negotiating tactic to make me cower and think unless I raised my offer to, say, $1000, I wouldn’t stand a chance. I decided not to cave. I am the world’s worst negotiator but here I felt I had somewhat a position of strength given my tepid feelings about the domain and the fact that it had officially expired. My – somewhat flip – response:

Hi Christina,
Thanks for the response. Well, I am content to see it expire so the seller gets $0. I know it’s been doing nothing for years now. I am a private person with no commercial interest in development of the domain. My budget is $200.

Christina’s response:

Thanks John.
I hear you.
I advise you to get the refundable exclusive backorder.
Just buy it and then don’t check it.
Regards,
Christina

So now this Christina lady sounds like she’s on my side seeing I wasn’t a big bucks buyer. At some point it’s a matter of trust. So I plunk down $200 for their backorder service and wait and don’t check.

Christina sends me this encouraging note:

John,
If you cancel the backorder, the fee is refunded.
And checking WHOIS is data we collect and which the registrant can see.
So, best to wait patiently.
Regards,
Christina

She encourages me to be super patient and asks what my plans are for it. My response:

Hi Christina,
Bragging rights at family gatherings, etc.
Then I’ll think about more ambitious things like a private social media site, but I doubt I’ll go there.
Thanks,John

So how did it end up?

Not so good. I eventually broke down and did a single whois check after a couple weeks and found the domain had been renewed. Foiled once again, and out the $200 backorder fee.*

*Technically not out since Christina also said it was refundable. I’m just going to sit on it until next year, and the year after that, …

What is that business model?

I had plenty of days to think about it, and I was trying to square two irreconcilable facts. 1) The seller was going to hold out for big money for a worthless domain, thereby losing money. 2) Yet, presumably, the seller is overall making money. Hmm. So I came up with this hypothesis.

Although to an outsider like myself the seller’s approach is irrational, I have a hypothesis for a business model which could justify it.
My hypothesis for a business model that supports such behavior is that some domainers own hundreds or even thousands of seemingly low-value domains – a domain farm – which they patiently cultivate. In the Internet there is commonly seen the long-tail phenomenon. Chris Anderson described it in a book. So instead of following a normal distribution around the nominal value of an unlikely-sounding domain, the actual value distribution has a long tail on the upside. So, if one owns enough domains, although any one may never get the big offer, it only takes a few big ones a year to hit, make up for all the losers and create positive cash flow. After all a domain is really worth what a buyer is willing to pay, not what the algorithms judge them to be worth. Some people will be willing to pay big.

An industry insider I contacted demurred when asked for confirmation or denial of my hypothesis, but insteadpointed me to this link: https://domaingraduate.com/ . If I understand it correctly, chapter 7, The domain Name Aftermarket, addresses this scenario. But it says it basically doesn’t work the way that I hypothesized. And that plus the other chapters in total present a much, much more complex story. There are business models, of course, but, well, just read it for yourself. I don’t care. I still like my domain farm plus long valuation tail concept.

About whois on linux

I need to investigate further what goes on when a simple whois lookup is done. Like everything, there’s a lot of history and it’s not so straightforward. This somewhat outdated article seems to cover it really well: https://securitytrails.com/blog/whois-lookup . I’m still digesting it myself. I’ve done a trace on port 43 for a whois lookup of drjohnstechtalk.com and see somewhat confounding results – it’s talking to two whois servers, a Verisign one (whois.verisign.com or similar), which provides some minimal information, and one which refuses to provide any information – whois.godaddy.com (GoDaddy is the registrar for this domain). My tenuous conclusion is that whois to Verisign does a static lookup and Verisign has a database which covers all of the .com domains with basic information. More detailed information can be provided by the actual registrar for that domain. But GoDaddy refuses to do that. However, it appears other registrars do accept these requests for details! In particular the registrars which are used by domainers to park their domains. Hence it is entirely possible, even from packet analysis, that a registrar gets tipped off by a linux command-line whois lookup (and therefore could provide metrics back to the registrant about these occurrences.)

Double however

I did still more research on whois, i.e., RTFM type stuff. It looks like there are switches which should turn off lookups on other server, like -r or -R, but when you try them they don’t actually work. But, I enabled verbose mode which shows you the whois servers being queried – no need to do a laborious packet trace – and I discovered that if you run the command this way:

$ whois –verbose -h whois.verisign-grs.com <domain_name>

then the query stays with Verisign’s whois server and there is no data leakage or data sharing with the actual registrar! So, mission accomplished. Note that the Verisign whois server probably only covers .com and .net gTLDs. For others like .io, .us, .info you have to figure out the principal whois server for yourself. Or ask for help in the comments section.

drjwhois makes it easier

I decided to write my own wrapper for whois to make this easier for anyone going down this path. Just bear in mind its limited applicability. It’s aimed at people interested in a domain, probably one on the after market, where they want to know if it’s about to expire or has actually expired, without tipping off the seller. As I said I call it drjwhois.

#!/usr/bin/perl
# DrJ's wrapper for whois - prevents data leakage
# Drj 11/20
$DEBUG = 0;
$domain = lc $ARGV[0];
# These are just the TLDs I consider the most important. Obviously there are thousands. Many do not have a resale market.
#to find the whois server just run whois --verbose
$BIZ = "whois.nic.biz";
$BR = "whois.registro.br";
$CA = "whois.cira.ca";
$CO = "whois.nic.io";
$DE = "whois.denic.de"; # de but whois server does not reveal anything! Must use their web site.
$ENOM = "whois.enom.com"; # biz
$IE = "whois.iedr.ie";
$IN = "whois.registry.in";
$INFO = "whois.afilias.net";
$IO = "whois.nic.io";
$ME = "whois.nic.me";
$ORG = "whois.pir.org";
$RU = "whois.tcinet.ru";
$US = "whois.nic.us";
$Verisign = "whois.verisign-grs.com"; # com, net, edu
%TLDs = ('biz',$BIZ,'br',$BR,'ca',$CA,'com',$Verisign,'me',$ME,'net',$Verisign,'edu',$Verisign,'ie',$IE,'io',$IO,'co',$CO,
'in',$IN,'info',$INFO,'org',$ORG,'ru',$RU,'tv',$ENOM,'us',$US);
if ($DEBUG) {
  foreach $key (keys %TLDs) {
    print $key . " " . $TLDs{"$key"} . "\n";
  }
}
$_ = $domain;
($tld) = /.([^.]+)$/;
print qq(Domain:\t\t$domain
TLD:\t\t$tld
WHOIS server:\t$TLDs{$tld}\n\n);
#$result = whois -h $TLDs{$tld} $domain;
#print $result;
unless ($TLDs{$tld}) {
  print "drjwhois has no information about this TLD. Instead use whois $domain\n";
  exit;
}
open(WHOIS,"whois -h $TLDs{$tld} $domain|") || die "Cannot launch whois -h $TLDs{$tld} $domain!!\n";
while(<WHOIS>) {
  if (/(whois|expir|paid|renewal)/i) {
    print ;
    $exists = 1;
  }
}
print "Domain $domain appears to be unregistered!\n" unless $exists;
print qq(\n\ndrjwhois is designed to only show information about the expiration
date of a domain, and if it has become unregistered, all without
leaking the query to aftermarket sellers such as Sedo, Epik, enom, etc.
If you want full information just use whois $domain
);

Example usage

$ drjwhois johnstechtalk.com

Domain: johnstechtalk.com
TLD:    com
WHOIS server: whois.verisign-grs.com

Registrar WHOIS Server: whois.godaddy.com
Registry Expiry Date: 2021-04-23T00:54:17Z
NOTICE: The expiration date displayed in this record is the date the
currently set to expire. This date does not necessarily reflect the expiration
view the registrar's reported date of expiration for this registration.

drjwhois is designed to only show information about the expiration
date of a domain, and if it has become unregistered, all without
leaking the query to aftermarket sellers such as Sedo, Epik, enom, etc.
If you want full information just use whois johnstechtalk.com

Anyway, I say the write-up is outdated because it’s a lot harder than it was a few years ago to get the registrant information. ICANN was chastened I believe by GDPR (data privacy) concerns and so most of the registrant’s personal details has been yanked, generally speaking. But there are left a few valuable nuggets of information.

How about all those nice web interfaces to whois?

I would personally avoid all the web interfaces registrars offer to whois – they seem to be run by the sales and marketing departments without exception. They almost guarantee data sharing with the registrant in addition to selling you services you don’t want.

Conclusion

My guess is that backorders rarely work out. Mine certainly didn’t. But if you like gambling it has a certain thrill to it since you never know…

If you want to play with the big boys and girls and make some money from buying and selling domains, my impression is that Epik is an honest broker, and that’s important to have when so many are not above coloring outside the lines in this business.

linux whois does indeed provide a way to avoid having your interest in a domain leak out to the owner. Use whois -h whois.verisign-grs.com <domain_name> and you are not giving yourself away.

References and related

An old blog post of mine which describes writing a program to GoDaddy’s api for buying a domain as soon as it becomes available.

Whois – what goes on behind the scenes during a whois lookup: https://securitytrails.com/blog/whois-lookup

Best resource I am aware of which covers the strange virtual world of buying and selling domains for a living.: https://domaingraduate.com/

If you’re dying to try out whois on linux but don’t have access to linux, you could either get a Raspberry Pi, though there is some set up and cost involved there, or install Cygwin on Windows 10, though there is some setup involved in getting the package setup, but at least there’s no cost.

On Centos linux, Raspbian (used by Raspberry Pi) and Cygwin, whois is its own package. On my Centos 8 server it is whois-5.5.1-2.

Categories
Perl

Dear Perl programmer, Here is a lifeline

Pythonizer

If you fit a certain profile: been in IT for > 20 years, managed to crate a few utility scripts in Perl, ut never wrapped your head around the newer and flashier Python, this blog post is for you.

Conversely, if you have grown up with Python and find yourself stuck maintaining some obscure legacy Perl code, this post is also for you.

A friend of mine has written a conceptually cool program that converts Perl programs into Python which he calls a Pythonizer.

I’m sure it won’t do well with special Perl packages and such. In fact it is an alpha release I think. But perhaps for those scripts which use the basic built-in Perl functions and operations, it will do the job.

When I get a chance to try it myself I will give some more feedback here. I have a perfect example in mind, i.e., a self-contained little Perl script which ought to work if anything will.

Conclusion

Old Perl programs have been given new life by Pythonizer, which can convert Perl programs into Python.

References and related

https://github.com/softpano/pythonizer

Perl is not a dead language after all. Work continues on Perl 7, which will be known as v5.32. Should be ready next year: https://www.perl.com/article/announcing-perl-7/?ref=alian.info

Categories
Admin Perl

Counting active leases on an old ISC DHCP server

Intro
Checkpoint Gaia offers a DHCP service, but it ias based on a crude and old dhcp daemon implementation frmo ISC. Doesn’t give you much. Mostly just the file /var/lib/dhcpd/dhcpd.leases, which it constantly updates. A typical dhcp client entry looks like this:

 
lease 10.24.69.22 {
  starts 5 2018/11/16 22:32:59;
  ends 6 2018/11/17 06:32:59;
  binding state active;
  next binding state free;
  hardware ethernet 30:d9:d9:20:ca:4f;
  uid "\0010\331\331 \312O";
  client-hostname "KeNoiPhone";
}


The details

So I modified a perl script to take all those lines and make sense of them.
I called it lease-examine.pl.
Here it is

#!/usr/bin/perl
# from https://askubuntu.com/questions/219609/how-do-i-show-active-dhcp-leases - DrJ 11/15/18
 
my $VERSION=0.03;
 
##my $leases_file = "/var/lib/dhcpd/dhcpd.leases";
my $leases_file = "/tmp/dhcpd.leases";
 
##use strict;
use Date::Parse;
 
my $now = time;
##print $now;
##exit;
# 12:22 PM 11/15/18 EST
#my $now = "1542302555";
my %seen;       # leases file has dupes (because logging failover stuff?). This hash will get rid of them.
 
open(L, $leases_file) or die "Cant open $leases_file : $!\n";
undef $/;
my @records = split /^lease\s+([\d\.]+)\s*\{/m, <L>;
shift @records; # remove stuff before first "lease" block
 
## process 2 array elements at a time: ip and data
foreach my $i (0 .. $#records) {
    next if $i % 2;
    ($ip, $_) = @records[$i, $i+1];
    ($ip, $_) = @records[$i, $i+1];
 
    s/^\n+//;     # && warn "leading spaces removed\n";
    s/[\s\}]+$//; # && warn "trailing junk removed\n";
 
    my ($s) = /^\s* starts \s+ \d+ \s+ (.*?);/xm;
    my ($e) = /^\s* ends   \s+ \d+ \s+ (.*?);/xm;
 
    ##my $start = str2time($s);
    ##my $end   = str2time($e);
    my $start = str2time($s,UTC);
    my $end   = str2time($e,UTC);
 
    my %h; # to hold values we want
 
    foreach my $rx ('binding', 'hardware', 'client-hostname') {
        my ($val) = /^\s*$rx.*?(\S+);/sm;
        $h{$rx} = $val;
    }
 
    my $formatted_output;
 
    if ($end && $end < $now) {
        $formatted_output =
            sprintf "%-15s : %-26s "              . "%19s "         . "%9s "     . "%24s    "              . "%24s\n",
                    $ip,     $h{'client-hostname'}, ""              , $h{binding}, "expired"               , scalar(localti
me $end);
    }
    else {
        $formatted_output =
            sprintf "%-15s : %-26s "              . "%19s "         . "%9s "     . "%24s -- "              . "%24s\n",
                    $ip,     $h{'client-hostname'}, "($h{hardware})", $h{binding}, scalar(localtime $start), scalar(localti
me $end);
    }
 
    next if $seen{$formatted_output};
    $seen{$formatted_output}++;
    print $formatted_output;
}

Even that script produces a thicket of confusing information. So then I further process it. I call this script dhcp-check.sh:

#!/bin/sh
# DrJ 11/15/18
# bring over current dhcp lease file from firewall FW-1
date
echo fetching lease file dhcpd.leases
scp admin@FW-1:/var/lib/dhcpd/dhcpd.leases /tmp
# analyze it. this should show us active leases
echo analyze dhcpd.leases
DIR=`dirname $0`
$DIR/lease-examine.pl|grep active|grep -v expired > /tmp/intermed-results
# intermed-results looks like:
#10.24.76.124   : "android-7fe22a415ce21c55" (50:92:b9:b8:92:a0)    active Thu Nov 15 11:32:13 2018 -- Thu Nov 15 15:32:13 2018
#10.24.76.197   : "android-283a4cb47edf3b8c" (98:39:8e:a6:4f:15)    active Thu Nov 15 11:37:23 2018 -- Thu Nov 15 15:32:14 2018
#10.24.70.236   : "other-Phone"            (38:25:6b:79:31:60)    active Thu Nov 15 11:32:24 2018 -- Thu Nov 15 15:32:24 2018
#10.24.74.133   : "iPhone-de-Lucia"          (34:08:bc:51:0b:ae)    active Thu Nov 15 07:32:26 2018 -- Thu Nov 15 15:32:26 2018
#exit
# further processing. remove the many duplicate lines
echo count active leases
awk '{print $1}' /tmp/intermed-results|sort -u|wc -l > /tmp/dhcp-active-count
echo count is `cat /tmp/dhcp-active-count`

And that script gives my what I believe is an accurate count of the active leases. I run it every 10 minutes from SiteScope and voila, we have a way to make sure we’re coming close to running out of IP addresses.

Categories
DNS Linux Perl Raspberry Pi Web Site Technologies

Roll your own domain drop catching service using GoDaddy

Intro
I’m after a particular domain and have been for years. But as a matter of pride I don’t want to overpay for it, so I don’t want to go through an auction. There are services that can help grab a DNS domain immediately after it expires, but they all want $$. That may make sense for high-demand domains. Mine is pretty obscure. I want to grab it quickly – perhaps within a few seconds after it becomes available, but I don’t expect any competition for it. That is a description of domain drop catching.

Since I am already using GoDaddy as my registrar I thought I’d see if they have a domain catching service. They don’t which is strange because they have other specialized domain services such as domain broker. They have a service which is designed for much the same purpose, however, called backorder. That creates an auction bid for the domain before it has expired. The cost isn’t too bad, but since I started down a different path I will roll my own. Perhaps they have an API which can be used to create my own domain catcher? It turns out they do!

It involves understanding how to read a JSON data file, which is new to me, but otherwise it’s not too bad.

The domain lifecycle
This graphic from ICANN illustrates it perfectly for your typical global top-level domain such as .com, .net, etc:
gtld-lifecycle

To put it into words, there is the
initial registration,
optional renewals,
expiration date,
auto-renew grace period of 0 – 45 days,
redemption grace period of 30 days,
pending delete of 5 days, and then
it’s released and available.

So in domain drop catching we are keenly interested in being fully prepared for the pending delete five day window. From an old discussion I’ve read that the precise time .com domains are released is usually between 2 -3 PM EST.

A word about the GoDaddy developer site
It’s developer.godaddy.com. It looks like one day it will be a great site, but for now it is wanting in some areas. Most of the menu items are duds and are placeholders. Really there are only three (mostly) working sections: get started, documentation and demo. Get started is only a few words and one slender snippet of Ajax code, and the demo itself also extremely limited, so the only real resource they provide is Documentation. Documentation is designed as an active documentation that you can try out functions with your data. You run it and it shows you all the needed request headers and data as well as the received response. The thing is that it’s very finicky. It’s supposed to show all the available functions but I couldn’t get it to work under Firefox. And with Internet Explorer/Edge it only worked about half the time. It seems to help to access it with a newly launched browser. The documentation, as good as it is, leaves some things unsaid. I have found:

https://api.ote-godaddy.com/ – use for TEST. Maybe ote stands for optional test environment?
https://api.godaddy.com/ – for production (what I am calling PROD)

The TEST environment does not require authentication for some things that PROD does. This shell script for checking available domains, which I call available-test.sh, works in TEST but not in PROD:

#!/bin/sh
# pass domain as argument
# apparently no AUTH is rquired for this one
curl -k 'https://api.ote-godaddy.com/v1/domains/available?domain='$1'&amp;checkType=FAST&amp;forTransfer=false'

In PROD I had to insert the authorization information – the key and secret they showed me on the screen. I call this script available.sh.

#!/bin/sh
# pass domain as argument
curl -s -k -H 'Authorization: sso-key *******8m_PwFAffjiNmiCUrKe******:**FF73L********' 'https://api.godaddy.com/v1/domains/available?domain='$1'&amp;checkType=FULL&amp;forTransfer=false'

I found that my expiring domain produced different results about five days after expiring if I used checkType of FAST versus checkType of FULL – and FAST was wrong. So I learned you have to use FULL to get an answer you can trust!

Example usage of an available domain

$ ./available.sh dr-johnstechtalk.com

{"available":true,"domain":"dr-johnstechtalk.com","definitive":false,"price":11990000,"currency":"USD","period":1}

2nd example – a non-available domain
$ ./available.sh drjohnstechtalk.com

{"available":false,"domain":"drjohnstechtalk.com","definitive":true,"price":11990000,"currency":"USD","period":1}

Example JSON file
I had to do a lot of search and replace to preserve my anonymity, but I feel this post wouldn’t be complete without showing the real contents of my JSON file I am using for both validate, and, hopefully, as the basis for my API-driven domain purchase:

{
  "domain": "dr-johnstechtalk.com",
  "renewAuto": true,
  "privacy": false,
  "nameServers": [
  ],
  "consent": {
    "agreementKeys": ["DNRA"],
    "agreedBy": "50.17.188.196",
    "agreedAt": "2016-09-29T16:00:00Z"
  },
  "period": 1,
  "contactAdmin": {
    "nameFirst": "Dr","nameLast": "John",
    "email": "dr-john@gmail.com",
    "addressMailing": {
      "address1": "555 Piney Drive",
      "city": "Smallville","state": "New Jersey","postalCode": "55555",
      "country": "US"
    },
    "phone": "+1.5555551212"
  },
  "contactBilling": {
    "nameFirst": "Dr","nameLast": "John",
    "email": "dr-john@gmail.com",
    "addressMailing": {
      "address1": "555 Piney Drive",
      "city": "Smallville","state": "New Jersey","postalCode": "55555",
      "country": "US"
    },
    "phone": "+1.5555551212"
  },
  "contactRegistrant": {
    "nameFirst": "Dr","nameLast": "John",
    "email": "dr-john@gmail.com",
    "phone": "+1.5555551212",
    "addressMailing": {
      "address1": "555 Piney Drive",
      "city": "Smallville","state": "New Jersey","postalCode": "55555",
      "country": "US"
    }
  },
  "contactTech": {
    "nameFirst": "Dr","nameLast": "John",
    "email": "dr-john@gmail.com",
    "phone": "+1.5555551212",
    "addressMailing": {
      "address1": "555 Piney Drive",
      "city": "Smallville","state": "New Jersey","postalCode": "55555",
      "country": "US"
    }
  }
}

Note the agreementkeys value: DNRA. GoDaddy doesn’t document this very well, but that is what you need to put there! Note also that the nameservers are left empty. I asked GoDaddy and that is what they advised to do. The other values are pretty much what you’d expect. I used my own server’s IP address for agreedBy – use your own IP. I don’t know how important it is to get the agreedAt date close to the current time. I’m going to assume it should be within 24 hours of the current time.

How do we test this JSON input file? I wrote a validate script for that I call validate.sh.

#!/bin/sh
# DrJ 9/2016
# godaddy-json-register was built using GoDaddy's documentation at https://developer.godaddy.com/doc#!/_v1_domains/validate
jsondata=`tr -d '\n' &lt; godaddy-json-register`
 
curl -i -k -H 'Authorization: sso-key *******8m_PwFAffjiNmiCUrKe******:**FF73L********' -H 'Content-Type: application/json' -H 'Accept: application/json' -d "$jsondata" https://api.godaddy.com/v1/domains/purchase/validate

Run the validate script
$ ./validate.sh

HTTP/1.1 100 Continue
 
HTTP/1.1 100 Continue
Via: 1.1 api.godaddy.com
 
HTTP/1.1 200 OK
Date: Thu, 29 Sep 2016 20:11:33 GMT
X-Powered-By: Express
Vary: Origin,Accept-Encoding
Access-Control-Allow-Credentials: true
Content-Type: application/json; charset=utf-8
ETag: W/"2-mZFLkyvTelC5g8XnyQrpOw"
Via: 1.1 api.godaddy.com
Transfer-Encoding: chunked

Revised versions of the above scripts
So we can pass the domain name as argument I revised all the scripts. Also, I provide an agreeddAt date which is current.

The data file: godaddy-json-register

{
  "domain": "DOMAIN",
  "renewAuto": true,
  "privacy": false,
  "nameServers": [
  ],
  "consent": {
    "agreementKeys": ["DNRA"],
    "agreedBy": "50.17.188.196",
    "agreedAt": "DATE"
  },
  "period": 1,
  "contactAdmin": {
    "nameFirst": "Dr","nameLast": "John",
    "email": "dr-john@gmail.com",
    "addressMailing": {
      "address1": "555 Piney Drive",
      "city": "Smallville","state": "New Jersey","postalCode": "55555",
      "country": "US"
    },
    "phone": "+1.5555551212"
  },
  "contactBilling": {
    "nameFirst": "Dr","nameLast": "John",
    "email": "dr-john@gmail.com",
    "addressMailing": {
      "address1": "555 Piney Drive",
      "city": "Smallville","state": "New Jersey","postalCode": "55555",
      "country": "US"
    },
    "phone": "+1.5555551212"
  },
  "contactRegistrant": {
    "nameFirst": "Dr","nameLast": "John",
    "email": "dr-john@gmail.com",
    "phone": "+1.5555551212",
    "addressMailing": {
      "address1": "555 Piney Drive",
      "city": "Smallville","state": "New Jersey","postalCode": "55555",
      "country": "US"
    }
  },
  "contactTech": {
    "nameFirst": "Dr","nameLast": "John",
    "email": "dr-john@gmail.com",
    "phone": "+1.5555551212",
    "addressMailing": {
      "address1": "555 Piney Drive",
      "city": "Smallville","state": "New Jersey","postalCode": "55555",
      "country": "US"
    }
  }
}

validate.sh

#!/bin/sh
# DrJ 10/2016
# godaddy-json-register was built using GoDaddy's documentation at https://developer.godaddy.com/doc#!/_v1_domains/validate
# pass domain as argument
# get date into accepted format
domain=$1
date=`date -u --rfc-3339=seconds|sed 's/ /T/'|sed 's/+.*/Z/'`
jsondata=`tr -d '\n' &lt; godaddy-json-register`
jsondata=`echo $jsondata|sed 's/DATE/'$date'/'`
jsondata=`echo $jsondata|sed 's/DOMAIN/'$domain'/'`
#echo date is $date
#echo jsondata is $jsondata
curl -i -k -H 'Authorization: sso-key *******8m_PwFAffjiNmiCUrKe******:**FF73L********' -H 'Content-Type: application/json' -H 'Accept: application/json' -d "$jsondata" https://api.godaddy.com/v1/domains/purchase/validate

available.sh
No change. See listing above.

purchase.sh
Exact same as validate.sh, with just a slightly different URL. I need to test that it really works, but based on my reading I think it will.

#!/bin/sh
# DrJ 10/2016
# godaddy-json-register was built using GoDaddy's documentation at https://developer.godaddy.com/doc#!/_v1_domains/purchase
# pass domain as argument
# get date into accepted format
domain=$1
date=`date -u --rfc-3339=seconds|sed 's/ /T/'|sed 's/+.*/Z/'`
jsondata=`tr -d '\n' &lt; godaddy-json-register`
jsondata=`echo $jsondata|sed 's/DATE/'$date'/'`
jsondata=`echo $jsondata|sed 's/DOMAIN/'$domain'/'`
#echo date is $date
#echo jsondata is $jsondata
curl -s -i -k -H 'Authorization: sso-key *******8m_PwFAffjiNmiCUrKe******:**FF73L********' -H 'Content-Type: application/json' -H 'Accept: application/json' -d "$jsondata" https://api.godaddy.com/v1/domains/purchase

Putting it all together
Here’s a looping script I call loop.pl. I switched to perl because it’s easier to do certain control operations.

#!/usr/bin/perl
#DrJ 10/2016
$DEBUG = 0;
$status = 0;
open STDOUT, '&gt;', "loop.log" or die "Can't redirect STDOUT: $!";
                   open STDERR, "&gt;&amp;STDOUT"     or die "Can't dup STDOUT: $!";
 
                   select STDERR; $| = 1;      # make unbuffered
                   select STDOUT; $| = 1;      # make unbuffered
# edit this and change to your about-to-expire domain
$domain = "dr-johnstechtalk.com";
while ($status != 200) {
# show that we're alive and working...
  print "Now it's ".`date` if $i++ % 10 == 0;
  $hr = `date +%H`;
  chomp($hr);
# run loop more aggressively during times of day we think Network Solutions releases domains back to the pool, esp. around 2 - 3 PM EST
  $sleep = $hr &gt; 11 &amp;&amp; $hr &lt; 16 ? 1 : 15;
  print "Hr,sleep: $hr,$sleep\n" if $DEBUG;
  $availRes = `./available.sh $domain`;
# {"available":true,"domain":"dr-johnstechtalk.com","definitive":false,"price":11990000,"currency":"USD","period":1}
  print "$availRes\n" if $DEBUG;
  ($available) = $availRes =~ /^\{"available":([^,]+),/;
  print "$available\n" if $DEBUG;
  if ($available eq "false") {
    print "test comparison OP for false result\n" if $DEBUG;
  } elsif ($available eq "true") {
# available value of true is extremely unreliable with many false positives. Confirm availability by making a 2nd call
    print "available.sh results: $availRes\n";
    $availRes = `./available.sh $domain`;
    print "available.sh re-test results: $availRes\n";
    ($available2) = $availRes =~ /^\{"available":([^,]+),/;
    next if $available2 eq "false";
# We got two available eq true results in a row so let's try to buy it!
    print "$domain is very likely available. Trying to buy it at ".`date`;
    open(BUY,"./purchase.sh $domain|") || die "Cannot run ./purchase.pl $domain!!\n";
    while() {
# print out each line so we can analyze what happened
      print ;
# we got it if we got back
# HTTP/1.1 200 OK
      if (/1.1 200 OK/) {
        print "We just bought $domain at ".`date`;
        $status = 200;
     }
    } # end of loop over results of purchase
    close(BUY);
    print "\n";
    exit if $status == 200;
  } else {
    print "available is neither false nor true: $available\n";
  }
  sleep($sleep);
}

Running the loop script
$ nohup ./loop.pl > loop.log 2>&1 &
Stopping the loop script
$ kill ‐9 %1

Description of loop.pl
I gotta say this loop script started out as a much simpler script. I fortunately started on it many days before my desired domain actually became available so I got to see and work out all the bugs. Contributing to the problem is that GoDaddy’s API results are quite unreliable. I was seeing a lot of false positives – almost 20%. So I decided to require two consecutive calls to available.sh to return true. I could have required available true and definitive true, but I’m afraid that will make me late to the party. The API is not documented to that level of detail so there’s no help there. But so far what I have seen is that when available incorrectly returns true, simultaneously definitive becomes false, whereas all other times definitive is true.

Results of running an earlier and simpler version of loop.pl

This shows all manner of false positives. But at least it never allowed me to buy the domain when it wasn’t available.

Now it's Wed Oct  5 15:20:01 EDT 2016
Now it's Wed Oct  5 15:20:19 EDT 2016
Now it's Wed Oct  5 15:20:38 EDT 2016
available.sh results: {"available":true,"domain":"dr-johnstechtalk.com","definitive":false,"price":11990000,"currency":"USD","period":1}
dr-johnstechtalk.com is available. Trying to buy it at Wed Oct  5 15:20:46 EDT 2016
HTTP/1.1 100 Continue
 
HTTP/1.1 100 Continue
Via: 1.1 api.godaddy.com
 
HTTP/1.1 422 Unprocessable Entity
Date: Wed, 05 Oct 2016 19:20:47 GMT
X-Powered-By: Express
Vary: Origin,Accept-Encoding
Access-Control-Allow-Credentials: true
Content-Type: application/json; charset=utf-8
ETag: W/"7d-O5Dw3WvJGo8h30TqR7j8zg"
Via: 1.1 api.godaddy.com
Transfer-Encoding: chunked
 
{"code":"UNAVAILABLE_DOMAIN","message":"The specified `domain` (dr-johnstechtalk.com) isn't available for purchase","name":"ApiError"}
Now it's Wed Oct  5 15:20:58 EDT 2016
Now it's Wed Oct  5 15:21:16 EDT 2016
Now it's Wed Oct  5 15:21:33 EDT 2016
available.sh results: {"available":true,"domain":"dr-johnstechtalk.com","definitive":false,"price":11990000,"currency":"USD","period":1}
dr-johnstechtalk.com is available. Trying to buy it at Wed Oct  5 15:21:34 EDT 2016
HTTP/1.1 100 Continue
 
HTTP/1.1 100 Continue
Via: 1.1 api.godaddy.com
 
HTTP/1.1 422 Unprocessable Entity
Date: Wed, 05 Oct 2016 19:21:36 GMT
X-Powered-By: Express
Vary: Origin,Accept-Encoding
Access-Control-Allow-Credentials: true
Content-Type: application/json; charset=utf-8
ETag: W/"7d-O5Dw3WvJGo8h30TqR7j8zg"
Via: 1.1 api.godaddy.com
Transfer-Encoding: chunked
 
{"code":"UNAVAILABLE_DOMAIN","message":"The specified `domain` (dr-johnstechtalk.com) isn't available for purchase","name":"ApiError"}
Now it's Wed Oct  5 15:21:55 EDT 2016
Now it's Wed Oct  5 15:22:12 EDT 2016
Now it's Wed Oct  5 15:22:30 EDT 2016
available.sh results: {"available":true,"domain":"dr-johnstechtalk.com","definitive":false,"price":11990000,"currency":"USD","period":1}
dr-johnstechtalk.com is available. Trying to buy it at Wed Oct  5 15:22:30 EDT 2016
...

These results show why i had to further refine the script to reduce the frequent false positives.

Review
What have we done? Our looping script, loop.pl, loops more aggressively during the time of day we think Network Solutions releases expired .com domains (around 2 PM EST). But just in case we’re wrong about that we’ll run it anyway at all hours of the day, but just not as quickly. So during the aggressive period we sleep just one second between calls to available.sh. When the domain finally does become available we call purchase.sh on it and exit and we write some timestamps and the domain we’ve just registered to our log file.

Performance
Miserable. This API seems tuned for relative ease-of-use, not speed. The validate call often takes, oh, say 40 seconds to return! I’m sure the purchase call will be no different. For a domainer that’s a lifetime. So any strategy that relies on speed had better turn to a registrar that’s tuned for it. GoDaddy I think is more aiming at resellers of their services.

Don’t have a linux environment handy?
Of course I’m using my own Amazon AWS server for this, but that needn’t be a barrier to entry. I could have used one of my Raspberry Pi’s. Probably even Cygwin on a Windows PC could be made to work.

Appendix A
How to remove all newline characters from your JSON file

Let’s say you have a nice JSON file which was created for you from the Documentation exercises called godaddy-json-register. It will contain lots of newline (“\n”) characters, assuming you’re using a Linux server. Remove them and put the output into a file called compact-json:

$ tr ‐d ‘\n'<godaddy‐json‐register>compact‐json

I like this because then I can still use curl rather than wget to make my API calls.

Appendix B
What an expiring domain looks like in whois

Run this from a linux server
$ whois <expiring‐domain.com>

Domain Name: expiring-domain.com
...
Creation Date: 2010-09-28T15:55:56Z
Registrar Registration Expiration Date: 2016-09-27T21:00:00Z
...
Domain Status: clientDeleteHold
Domain Status: clientDeleteProhibited
Domain Status: clientTransferProhibited
...

You see that Domain Status: clientDeleteHold? You don’t get that for regular domains whose registration is still good. They’ll usually have the two lines I show below that, but not that one. This is shown for my desired domain just a few days after its official expiration date.

2020 update

Four years later, still hunting for that domain – I am very patient! So I dusted off the program described here. Suprisingly, it all still works. Except maybe the JSON file. The onlything wrong with that was the lack of nameservers. I added some random GoDaddy nameservers and it seemed all good.

Conclusion
We show that GoDaddy’s API works and we provide simple scripts which can automate what is known as domain dropcatching. This approach should attempt to register a domain within a cople seconds of its being released – if we’ve done everything right. The GoDaddy API results are a little unstable however.

References and related
If you don’t mind paying, Fabulous.com has a domain drop catching service.
ICANN’s web site with the domain lifecycle infographic.
GoDaddy’s API documentation: http://developer.godaddy.com/
More about Raspberry Pi: http://raspberrypi.org/
I really wouldn’t bother with Cygwin – just get your hands on real Linux environment.
Curious about some of the curl options I used? Just run curl ‐‐help. I left out the description of the switches I use because it didn’t fit into the narrative.
Something about my Amazon AWS experience from some years ago.
All the perl tricks I used would take another blog post to explain. It’s just stuff I learned over the years so it didn’t take much time at all.
People who buy and sell domains for a living are called domainers. They are professionals and my guide will not make you competitive with them.

Categories
Linux Perl

Solution to NPR puzzle: capitals and cities

Problem statement
Take a US state capital. Drop a letter. Re-arrange the remaining letters to get the name of a US city. There are two answers.

I know Wlll Shortz says he makes up problems that are not programmable, but he goofed up this time, big time. This one is eminently programmable. I am not even a programmer, more a scripter. I don’t want to have an unfair advantage so I’m sharing my Perl scripts that solve this problem.

The setup
Find a Linux machine. A Raspberry Pi makes a great economical choice if you don’t have anything like that.

Scrape the 50 state capitals from a web page after arriving at a page (I used Wikipedia) that lists them. By scrape I mean copy-and-paste into a file in a terminal window of your Linux server.

Same for US cities. I went to a more obscure listing and picked up the 666 of the largest cities from a listing of the top 1000.

Cleaning up the Capitals
Here’s a script for processing those capital cities. The input file contains lines like these first two:

Alabama         AL      1819    Montgomery      1846    155.4   2       205,764         374,536         Birmingham is the s
tate's largest city
Alaska  AK      1959    Juneau  1906    2716.7  3       31,275          Juneau is the largest capital by land area. Anchora
ge is the state's largest city.
...

I called it cap-fltr.pl

#!/usr/bin/perl
while(<STDIN>) {
  ($cap) = /\d{4}\s+(\D+)\s+\d{4}/;
  $cap = lc $cap;
  $cap =~ s/\s//g;
  print "$cap\n";
}

Create a clean listing of capitals like this:

$ ./cap-fltr.pl < capitals.orig > capitals

assuming, that is, that you dumped your paste buffer of scraped capitals and related information into a file called capitals.orig. So that creates a file called capitals looking like this:

montgomery
juneau
phoenix
littlerock
sacramento
denver
...

Cleaning our cities
My scraped cities file morecities.orig looks like this:

1       New York - city         New York        8,491,079       5.9%
2       Los Angeles - city      California      3,928,864       6.0%
3       Chicago - city  Illinois        2,722,389       -5.9%
4       Houston - city  Texas   2,239,558       13.2%
5       Philadelphia - city     Pennsylvania    1,560,297       3.0%
6       Phoenix - city  Arizona         1,537,058       15.8%
...

These can be cleaned up with this script, which I call morecities-fltr.pl:

#!/usr/bin/perl
while(<STDIN>) {
  ($city) = /^\d+\s+([^-]+)\s-/;
  $city = lc $city;
  $city =~ s/\s//g;
  if ($city =~ /st\./) {
    $form1 = $city;
    $form1 =~ s/\.//g;
    print "$form1\n";
  }
  $city =~ s/st\./saint/;
  print "$city\n";
}

Note in this and the capitals filter I get rid of space, lowercase everything and here I consider st and saint as alternate forms, just in case. Those puzzle-masters can be tricky!

So the cleanup goes into a cities file with a command like this:

$ ./morecities-fltr.pl < morecities.orig > cities

Finally the program that solves the puzzle…
The solution script, which I called sol2.pl, isn’t too bad if you know a tiny bit of Perl. It is this:

#!/usr/bin/perl
$DEBUG = 0;
open(CAPITALS,"capitals");
open(CITIES,"cities");
@cities = <CITIES>;
@capitals = <CAPITALS>;
foreach $capital (@capitals) {
  chomp($capital);
  print "capital: $capital\n" if $DEBUG;
  $lenCap = length $capital;
  foreach $city (@cities) {
    chomp($city);
    print "city: $city\n" if $DEBUG;
    $lenCity = length $city;
    next unless $lenCap == $lenCity + 1;
# put the individual letters of the city into an array
    @citychars = split //, $city;
    $capitalcopy = $capital;
# the following would be a more crude way to do it, leaving us results to be picked over by hand
    #$capitalcopy =~ s/[$city]//g;
# this is the heart of it - eat the characters of the capital away, one-by-one, with the characters of the city
# this provides for an exact, no-thinking solution
    foreach $citychar (@citychars) {
      $capitalcopy =~ s/$citychar//;
    }
    print "capitalcopy: $capitalcopy\n" if $DEBUG;
# solution occurs when the copy of the capital is left with excatly one character
    print "capital, city: $capital, $city\n" if $capitalcopy =~ /^\w$/;
  }
}

With the comments it’s pretty self-documenting.

I’ll publish my answer after the deadline. you have to do some work after all!

Python approach
This hasn’t been thoroughly tested yet. A friend has suggested the following python code might work.

current_city = ""
current_capital = ""
missed_letters = 0
 
 
for count, capital in enumerate(capitals):
    missed_letters = 0
    current_capital = capital
    print current_capital, "current capital"
 
    for count, city in enumerate(cities):
        current_city = city
        print current_city, "current city"
 
        for count, letter in enumerate(capital):
            while missed_letters <2:
                if missed_letters <2 and len(city)== count+1:
                    print current_capital, "Take one letter from this capital"
                    print current_city, "Rearrange the letters in the capital to get this city"
 
                elif letter in city:
                    pass
                    print "letter in the capital matched the city"
 
                elif letter not in city:
                    missed_letters+=1
                    print "letters didn't match any in the word"

My answer
$ ./sol2.pl

capital, city: trenton, renton
capital, city: salem, mesa
capital, city: salem, ames

Actual answer
salem -> mesa
st paul -> tulsa

This illustrates a good lesson. Program, yes, but don’t turn off your brain. I was aware of possible saint/st spelling pitfalls, but I only applied it to the cities list. For some reason I overlooked applying this ambiguity to the capitals list! If add stpaul to my capitals list the program finds tulsa instantly – I just tried it.

If there’s any satisfaction, very, very few people got the correct answer.

References and related
Audio clip of the puzzle
Another NPR puzzle amenable to simple computer tools, in this case shell commands, is documented here: puzzle
Raspberry Pis. Here we describe building a four-monitor display with Raspberry Pis.
A good python tutorial: https://docs.python.org/2/tutorial/

Categories
Admin Perl

Elegant code, poor results

Intro
A colleague of mine is an awesome Perl programmer. It may be a dying language but for those of us who know it it can still do some great things in our hands. So this guy, let’s call him Stain, wrote some code to analyze large proxy logs in parallel by using threads. Using threads presumably makes the analysis program run a lot faster since you’re throwing more CPUs at a problem that is CPU bound. It would have taken me years to write but I bet he did it in a couple days because he’s so experienced. I was eager to reap the benefits of his labor but was disappointed with the results I was getting…

The details
What I need the program for is very basic – just looking for a character string in the lnies of a proxy log. In fact my needs are so basic I don’t need a program at all. I can cobble together my analysis with Linux command-line utilities like a

$ gunzip -c proxy_log.gz|grep drjohnstechtalk > /tmp/drjs

And that’s what I was doing before his program came along. when I run his program with eight threads on a server with 16 cores, it should finish a lot faster than my command line, but it wasn’t. So I rolled up my sleeves to look into his code to see what the problem is. Running his code with a single thread showed that it ran much, much slower than my command-line commands. I focused on that problem as the root of the issue.

Here is the essence of his code for the analysis:

#!/usr/bin/perl
# DrJ research labs...
use strict;
use Compress::Zlib;
use threads;
use threads::shared;
use File::Find;
my @searchStrings = ("drjohnstechtalk");
my @matches;
my $lines;
&amp;searchGZip("/tmp/p-02.gz");
#----------------------------------------------------------------------------
# searchGZip
#----------------------------------------------------------------------------
sub searchGZip{
        my $file = shift(@_);
        my $line;
        my $gz;
        my $loglines = 0;
 
        if($gz = gzopen($file, "rb")){
 
                while($gz-&gt;gzreadline($line)){
                        $loglines++;
                        if(&amp;matchLine($line)){
                                push(@matches,$line);
                        }
                }
                $lines += $loglines;
                $gz-&gt;gzclose;
        }else{
                #&amp;addAppLogMsg("Could not open file $file: $gzerrno\n");
                print "Cannot read file $file\n";
        }
}
 
#----------------------------------------------------------------------------
# matchLine
#----------------------------------------------------------------------------
sub matchLine{
        my $line = shift;
 
        foreach(@searchStrings){
 
                return 1 if $line =~ /$_/
        }
        return 0;
}

He’s using functions I never even head of, like gzopen and gzreadline, apprently from the Compress::Zlib package. Looks elegant, right? Yes. But slow as a dog.

Version 2
So I began to introduce more shell commands into the Perl since shell by itself is faster. In this version I’ve gotten rid of those unfamiliar gzopen and gzreadlikne functions in favor of what I know:

#!/usr/bin/perl
# DrJ research labs...
use strict;
use Compress::Zlib;
use threads;
use threads::shared;
use File::Find;
my @searchStrings = ("drjohnstechtalk");
my @matches;
my $lines;
&amp;searchGZip("/tmp/p-02.gz");
#----------------------------------------------------------------------------
# searchGZip
#----------------------------------------------------------------------------
sub searchGZip{
        my $file = shift(@_);
        my $line;
        my $gz;
        my $loglines = 0;
 
        open(PXY,"gunzip -c $file|");
        while() {
                        $loglines++;
        $line = $_;
                        if(&amp;matchLine($line)){
                                push(@matches,$line);
                        }
                $lines += $loglines;
        }
}
 
#----------------------------------------------------------------------------
# matchLine
#----------------------------------------------------------------------------
sub matchLine{
        my $line = shift;
 
        foreach(@searchStrings){
 
                return 1 if $line =~ /$_/
        }
        return 0;
}

Version 3
Not satisfied with the time, I got rid of the function call for each line in this version.

#!/usr/bin/perl
# DrJ research labs...
use strict;
use Compress::Zlib;
use threads;
use threads::shared;
use File::Find;
my @searchStrings = ("drjohnstechtalk");
my @matches;
my $lines;
&amp;searchGZip("/tmp/p-02.gz");
#----------------------------------------------------------------------------
# searchGZip
#----------------------------------------------------------------------------
sub searchGZip{
        my $file = shift(@_);
        my $line;
        my $gz;
        my $loglines = 0;
 
        open(PXY,"gunzip -c $file|");
        while() {
                        $loglines++;
        $line = $_;
           push(@matches,$line) if $line =~ /$searchStrings[0]/;
                $lines += $loglines;
        }
}

That was indeed still faster, but not as fast as the command line.

Version 4
I wondered if I could do better still by also using the shells built-in match command, grep. That leads to this version:

#!/usr/bin/perl
# DrJ reseach labs...
use strict;
use Compress::Zlib;
use threads;
use threads::shared;
use File::Find;
my @searchStrings = ("drjohnstechtalk");
my @matches;
my $lines;
&amp;searchGZip("/tmp/p-02.gz");
#----------------------------------------------------------------------------
# searchGZip
#----------------------------------------------------------------------------
sub searchGZip{
        my $file = shift(@_);
        my $line;
        my $gz;
        my $loglines = 0;
 
        open(PXY,"gunzip -c $file|grep $searchStrings[0]|");
        while() {
        $line = $_;
           push(@matches,$line);
        }
}

Here’s table with the performance summaries.

Version changes Time (wall clock)
Original 63.2 s
2 gunzip -c instead of gzopen/gzreadline 18.6 s
3 inline instead of function call 14.0 s
4 grep instead of perl match operator 10.8 s

So with these simple improvements the timing on this routine improved from 63.2 s to 10.8 s – a factor of six!!

I incorporated my changes into his original program and now it really is a big help when it runs multi-threaded. I can run eight threads and finish an analysis about six times as quick as searching from the command-line. That’s a significant speed-up which will make us more productive.

Conclusion
Some elegant perl code is taken apart and found to be the cause of a significant slow-down. I show how we rewrote it in a more primitive style that results in a huge performance gain – code that runs six times faster!

Categories
Linux Perl

HTTP replay user-agent

Intro
When debugging F5’s Web Application Firewall (AKA ASM) you get all the HTTP request headers in the GUI which you can cut and paste. Sometimes it is not so obvious how to fix a particular SupportID and you don’t want to bother the user to reproduce the error. This is code that allows you to scrape the headers and send them back, assuming you have a Linux server on the Internet. It’s not quite exact – Perl wants to add a TE header and maybe some others. More importantly, F5 only gives you around the first 5 KB of the headers, and often the real headers + data can be much longer than that. But It’s pretty good at preserving the basic headers and the POSTed data, if any.

The details

#!/usr/bin/perl
# DrJ - 6/2015
# take waf Full Request as input from STDIN and spit it out
use LWP;
$DEBUG = 1;
# example:
#POST /cognosbi/cgi-bin/cognos.cgi HTTP/1.1
#Accept: */*
#Accept-Language: en-us
#Referer: https://ag-intelligence.drj.com/cognosbi/pat/rsapp.htm
#soapaction: http://www.ibm.com/xmlns/prod/cognos/reportService/201109/
#Content-Type: text/xml; charset=utf-8
#Accept-Encoding: gzip, deflate
#User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/6.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .
NET4.0C; .NET4.0E; Ion 3.3.0.125; InfoPath.3; MDPA101; OWASMIME/4.0200)
#Host: ag-intelligence.drj.com
#Content-Length: 161667
#DNT: 1
#Connection: Keep-Alive
#Cache-Control: no-cache
#Cookie: c8server=https%3A%2F%2Fag-intelligence.drj.com%3A443%2Fcognosbi%2Fcgi-bin%2Fcognos.cgi; mob_lang=en; TSe36f05=b6a15d33eff8797d98bc75f22d2c85e1f
20fa2f8dda9abed5570682f728766c9684dc6d2c7f603263f299dc03470ed0e5dd2c529d1af78e016f94f63ba1abc528f04a493fe2f9ff9dd4ea3b99a99de5024487a2a9a99de5024487a2a9
a99de5024487a2a9a99de5024487a2a; cc_state="null"; cam_passport=MTsxMDE6ZmVjNDEzYzQtMGVjOC03YjgyLTJlNTgtYjgxNWFhNTE0ZDQxOjMzNjQ5NDA0MDQ7MDszOzA7; CRN=col
umnsPerPage%3D3%26contentLocale%3Den-us%26showHiddenObjects%3Dtrue%26skin%3Dcorporate%26linesPerPage%3D35%26showWelcomePage%3Dtrue%26http%3A%2F%2Fdevelo
per.cognos.com%2Fceba%2Fconstants%2FsystemOptionEnum%23accessibilityFeatures%3Dfalse%26listViewSeparator%3Dbackground%26displayMode%3Dlist%26timeZoneID%
3DEST%26http%3A%2F%2Fdeveloper.cognos.com%2Fceba%2Fconstants%2FbiDirectionalOptionEnum%23biDirectionalFeaturesEnabled%3Dfalse%26format%3DHTML%26automati
cPageRefresh%3D30%26productLocale%3Den%26useAccessibilityFeatures%3Dfalse%26showOptionSummary%3Dtrue%26; cea-ssa=false; usersessionid="AQgAAAA/tHlVAAAA
AoAAACSXrdg581uvb1aFAAAAFJ3FMiV1XFHYYjmBPlmT/mXlCpnFAAAAFlzIt8pvs4xewrB8PBrB1omTHOq"; userCapabilities=f%3Bfdbffc6d%3Be000002f%3Bff077efa%26ARQAAABSdxTI
ldVxR2GI5gT5Zk%2F5l5QqZwtel%2FW97E5sO8sAbwnsxaQA5mLc; cc_session="s_cc:|s_conf:na|s_sch:td|s_hd:sa|s_serv:na|s_disp:na|s_set:|s_dep:na|s_dir:na|s_sms:dd
|s_ct:sa|s_cs:sa|s_so:sa|e_hp:CAMID(*22DRJ*3au*3aagarwap*22)|e_proot:Public*20Folders|prootid:i1E4054857A7F43398F82386788807209|e_mroot:My*20Folders|mr
ootid:i45E22CE51A2D4F9DA668810E67B44CD2|e_mrootpath:CAMID(*22DrJ*3au*3aagarwap*22)*2ffolder*5b*40name*3d*27My*20Folders*27*5d|e_user:John*20Doe|
e_tenantDisplayName:|e_showTenantInfo:true|cl:en-us|dcid:i1E4054857A7F43398F82386788807209|show_logon:true|uig:|ui:|rsuiprofile:all|lch:f|lca:f|ci:f|wri
te:true|eom:0|pp:3364940404|null:|cachestamp:2015-06-04T13:04:34|null:"; BIGipServerag-intelligence.drj.us.secure=1090633994.47873.0000; TS83f7ea=c439d
424e740dc4c1ed042aa7bafa59df20fa2f8dda9abed5570851f1e66c5c4bbcf97c1
 
#<SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:
SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:bus='http://developer.cognos.com/schemas/bibus/3/' xmlns:rns1='http://developer.cognos.com/schemas/rep
ortService/1'><SOAP-ENV:Header><bus:biBusHeader xsi:type="bus:biBusHeader"><bus:CAM xsi:type="bus:CAM"><authenticityToken xsi:type="xsd:base64Binary">Vj
FZcyLfKb7OMXsKwfDwawdaJkxzqg==</authenticityToken></bus:CAM><bus:CAF xsi:type="bus:CAF"><contextID xsi:type="xsd:string">CAFW00000070Q0FGQTNjMDAwMDAwD
GQUFBQUZKM0ZNaVYxWEZIWVlqbUJQbG1ULW1YbENwbld5amlFRU5ObWJJcTZHaEdLNTdjVWZ0Zy1GWV8zOTgxNzd8cnM_</contextID></bus:CAF><bus:userPreferenceVars SOAP-ENC:arra
yType="bus:userPreferenceVar[]" xsi:type="SOAP-ENC:Array"><item><bus:name xsi:type="xsd:string">productLocale</bus:name><bus:value xsi:type="xsd:string"
>en</bus:value></item><item><bus:name xsi:type="xsd:string">contentLocale</bus:name><bus:value xsi:type="xsd:string">en-us</bus:value></item></bus:userP
referenceVars><bus:dispatcherTransportVars xsi:type="SOAP-ENC:Array" SOAP-ENC:arrayType="bus:dispatcherTransportVar[]"><item xsi:type="bus:dispatcherTra
nsportVar"><name xsi:type="xsd:string">rs</name><value xsi:type="xsd:string">true</value></item></bus:dispatcherTransportVars></bus:biBusHeader></SOAP-E
NV:Header><SOAP-ENV:Body><rns1:update><object xsi:type="bus:report"><searchPath><value xsi:type="xsd:string">/content/folder[@name=&apos;Dashboards and
Portals&apos;]/folder[@name=&apos;DRJOPS (DrJ)&apos;]/folder[@name=&apos;Dashboard Reports&apos;]/report[@name="Sales"]</value></searchPath><specific
ation><value xsi:type="xsd:string" xml:space="preserve">&lt;report xmlns=&quot;http://developer.cognos.com/schemas/report/9.0/&quot; useStyleVersion=&qu
ot;10&quot; expressionLocale=&quot;en-us&quot;&gt;
#&lt;modelPath&gt;/content/folder[@name=&apos;Shared Packages&apos;]/package[@name=&apos;DrJ_Dashboard_Cube&apos;]/model[@name=&apos;model&apos;]&lt;
/modelPath&gt;
#&lt;drillBehavior modelBasedDrillThru=&quot;true&quot;/&gt;
#&lt;layouts&gt;
#&lt;layout&gt;
#&lt;reportPages&gt;
 
#&lt;page name=&quot;DrJ Dashbord&quot;&gt;
#&lt;pageBody&gt;
#&lt;contents&gt;&lt;table&gt;&lt;style&gt;&lt;defaultStyles&gt;&lt;defaultStyle refStyle=&quot;tb&quot;/&gt;&lt;/defaultStyles&gt;&lt;CSS value=&quot;f
ont-family:Arial;font-size:8pt;font-weight:bold&quot;/&gt;&lt;/style&gt;&lt;tableRows&gt;&lt;tableRow&gt;&lt;tableCells&gt;&lt;tableCell colSpan=&quot;1
&quot;&gt;&lt;contents&gt;&lt;table&gt;&lt;style&gt;&lt;defaultStyles&gt;&lt;defaultStyle refStyle=&quot;tb&quot;/&gt;&lt;/defaultStyles&gt;&lt;CSS valu
e=&quot;border-collapse:collapse;width:100%;text-align:center&quot;/&gt;&lt;/style&gt;&lt;tableRows&gt;&lt;tableRow&gt;&lt;tableCells&gt;&lt;tableCell&g
t;&lt;contents&gt;&lt;textItem&gt;&lt;dataSource&gt;&lt;staticValue&gt;In Thousands, Data As of: &lt;/staticValue&gt;&lt;/dataSource&gt;&lt;style&gt;&lt
;CSS value=&quot;color:#0D3F53&quot;/&gt;&lt;/style&gt;&lt;/textItem&gt;&lt;textItem&gt;&lt;dataSource&gt;&lt;reportExpression&gt;CubeDataUpdatedOn ([Gr
ower_Dashboard].[Year])&lt;/reportExpression&gt;&lt;/dataSource&gt;&lt;style&gt;&lt;CSS value=&quot;color:#0D3F53&quot;/&gt;&lt;/style&gt;&lt;/textItem&
gt;&lt;/contents&gt;&lt;style&gt;&lt;CSS value=&quot;text-align:left;vertical-align:middle;font-family:Arial;font-size:9pt;font-weight:bold;border-top-s
tyle:none;border-right-style:none;border-bottom-style:none;border-left-style:none&quot;/&gt;&lt;/style&gt;&lt;/tableCell&gt;&lt;tableCell&gt;&lt;content
s&gt;&lt;image&gt;
#&lt;dataSource&gt;
#&lt;staticValue&gt;../samples/images/Dashboard_Excel.png&lt;/staticValue&gt;
#&lt;/dataSource&gt;
#&lt;conditionalStyles&gt;&lt;conditionalStyleCases refVariable=&quot;For ReportOutput&quot;&gt;&lt;conditionalStyle refVariableValue=&quot;1&quot;&gt;&
lt;CSS value=&quot;display:none&quot;/&gt;&lt;/conditionalStyle&gt;&lt;/conditionalStyleCases&gt;&lt;conditionalStyleDefault/&gt;&lt;/conditionalStyles&
gt;&lt;reportDrills&gt;&lt;reportDrill name=&quot;Excel&quot;&gt;&lt;drillLabel&gt;&lt;dataSource&gt;&lt;staticValue/&gt;&lt;/dataSource&gt;&lt;/drillLa
bel&gt;&lt;drillTarget method=&quot;execute&quot; outputFormat=&quot;spreadsheetML&quot; showInNewWindow=&quot;true&quot;&gt;&lt;reportPath path=&quot;/
content/folder[@name=&apos;Dashboards and Portals&apos;]/folder[@name=&apos;USCROP (Grower)&apos;]/folder[@name=&apos;Dashboard Reports&apos;]/report[@n
ame=&apos;Sales&apos;]&quot;&gt;&lt;XMLAttributes&gt;&lt;XMLAttribute name=&quot;ReportName&quot; value=&quot;Sales&quot; output=&quot;no&quot;/&gt;&lt;
XMLAttribute name=&quot;class&quot; value=&quot;report&quot; output=&quot;no&quot;/&gt;&lt;/XMLAttributes&gt;&lt;/reportPath&gt;&lt;drillLinks&gt;&lt;dr
illLink&gt;&lt;drillTargetContext&gt;&lt;parameterContext parameter=&quot;Select Area&quot;/&gt;&lt;/drillTargetContext&gt;&lt;drillSourceContext&gt;&lt
;parameterContext parameter=&quot;Select Area&quot;/&gt;&lt;/drillSourceContext&gt;&lt;/drillLink&gt;&lt;drillLink&gt;&lt;drillTargetContext&gt;&lt;para
meterContext parameter=&quot;Select DrJIdea Specialist&quot;/&gt;&lt;/drillTargetContext&gt;&lt;drillSourceContext&gt;&lt;parameterContext parameter=
&quot;Select Innovation Specialist&quot;/&gt;&lt;/drillSourceContext&gt;&lt;/drillLink&gt;&lt;drillLink&gt;&lt;drillTargetContext&gt;&lt;parameterContex
t parameter=&quot;AreaTgtBy&quot;/&gt;&lt;/drillTargetContext&gt;&lt;drillSourceContext&gt;&lt;parameterContext parameter=&quot;AreaTgtBy&quot;/&gt;&lt;
/drillSourceContext&gt;&lt;/drillLink&gt;&lt;drillLink&gt;&lt;drillTargetContext&gt;&lt;parameterContext parameter=&quot;ISTgtBy&quot;/&gt;&lt;/drillTar
getContext&gt;&lt;drillSourceContext&gt;&lt;parameterContext parameter=&quot;ISTgtBy&quot;/&gt;&lt;/drillSourceContext&gt;&lt;/drillLink&gt;&lt;drillLin
k&gt;&lt;drillTargetContext&gt;&lt;parameterContext parameter=&quot;Grower Type&quot;/&gt;&lt;/drillTargetContext&gt;&lt;drillSourceContext&gt;&lt;param
eterContext parameter=&quot;Grower Type&quot;/&gt;&lt;/drillSourceContext&gt;&lt;/drillLink&gt;&lt;/drillLinks&gt;&lt;/drillTarget&gt;&lt;/reportDrill&g
t;&lt;/reportDrills&gt;&lt;/image&gt;&lt;image&gt;
#&lt;dataSource&gt;
#&lt;staticValue&gt;../samples/images/Dashboard_PDF.png&lt;/staticValue&gt;
#&lt;/dataSource&gt;
#&lt;conditionalStyles&gt;&lt;conditionalStyleCases refVariable=&quot;For ReportOutput&quot;&gt;&lt;conditionalStyle refVariableValue=&quot;1&quot;&gt;&
lt;CSS value=&quot;display:none&quot;/&gt;&lt;/conditionalStyle&gt;&lt;/conditionalStyleCases&gt;&lt;conditionalStyleDefault/&gt;&lt;/conditionalStyles&
gt;&lt;reportDrills&gt;&lt;reportDrill name=&quot;PDF&quot;&gt;&lt;drillLabel&gt;&lt;dataSource&gt;&lt;staticValue/&gt;&lt;/dataSource&gt;&lt;/drillLabe
l&gt;&lt;drillTarget method=&quot;execute&quot; outputFormat=&quot;PDF&quot; showInNewWindow=&quot;true&quot;&gt;&lt;reportPath path=&quot;/content/fold
er[@name=&apos;Dashboards and Portals&apos;]/folder[@name=&apos;DRJOPS (Grower)&apos;]/folder[@name=&apos;Dashboard Reports&apos;]/report[@name=&apos;Sa
les&apos;]&quot;&gt;
@lines = <STDIN>;
for $line (@lines) {
  $_ = $line;
  chomp;
  if (/^(GET|POST)\s+(\/.*)\s(http|HTTP)/) {
    $uri = $2;
    $method = $1;
    next;
  } elsif (/^User-Agent:/ && ! $datasec) {
    ($agent) = /^User-Agent:\s+(.+)$/;
    print "User-Agent: $agent\n" if $DEBUG;
    next;
  } elsif (/^Content-Length:\s/ && ! $datasec) {
# LWP calculates its own content-length header
      next;
  } elsif  (! $datasec) {
# add header to hash
      ($hdr,$val) = /^(\S+):\s+(.+)$/;
      print "hdr,val: $hdr, $val\n" if $DEBUG;
      $myhash{$hdr} = $val if $hdr;
  }
  if (/^Host: (\S+)/i && ! $datasec) {
    $host = $1;
  }
  $data .= $line if $datasec;
# test if we are at the end of the headers
  $datasec = 1 if /^$/;
}
 
# Create a user agent object
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$ua->agent($agent);
$req = HTTP::Request->new($method => "https://$host$uri");
# enter all the headers
foreach $key (keys %myhash) {
  $val = $myhash{$key};
  print "key,val: $key, $val\n" if $DEBUG;
  $req->header($key => $val);
}
print "data: $data\n" if $DEBUG;
$req->content($data);
# Pass request to the user agent and get a response back
my $res = $ua->request($req);
 
# Check the outcome of the response
if ($res->is_success) {
 print $res->content;
} else {
  print $res->status_line, "\n";
}

Conclusion
We’ve created a Perl-based HTTP user agent to preserve headers. It’s not perfect but it’s a help, for instance in debugging am F5 web application firewall policy.

Categories
DNS Linux Network Technologies Perl

Announcing a simple DNS web interface and code

Intro
For demonstration purposes I’ve written a WEB interface to do DNS queries. This can be used for light querying. Once it gets abused I will pull it from the web site.

Motivation
Some large enterprises are behind not only a corporate firewall, but also confined to a private namespace with no access to Internet name resolution. Users in such situations can use one of the many available tools to do DNS resolution through the web, but they all want to throw advertising at you and it’s not clear which can be trusted not to load you up with spyware. I am offering this ad-free DNS lookup using my position on the Internet as a trusted source.

And if you’re lucky and looking for code to do this yourself, you might find it. But nowhere will you find a site that’s running its own published code for DNS resolution. Except here.

The code
Admittedly very simple-minded, but hopefully not fatally flawed, here it is in Perl.

#!/usr/bin/perl
use CGI;
$query = new CGI;
%allowedArgs = (domainname =&gt; 'dum',type =&gt; 'dum',short =&gt; 'dum');
#
print "Content-type: text/html\n\n";
print "
\n";
foreach $key ($query->param) {
  exit(1) unless defined $allowedArgs{$key};
  exit(1) if $query->param($key) !~ /^([a-zA-Z0-9\.-]){2,256}$/;
  print "$key " . $query->param($key) . "\n";
}
# possible keys: domainname, type
$domainname = $query->param(domainname);
$type     = $query->param(type);
$type = "any" unless $type;
# argument validation checks
exit(1) if $domainname !~ /^([a-zA-Z0-9\.-]){2,256}$/ || $domainname =~ /\.\./ ||  ! $domainname;
exit(1) if $type !~ /^([a-zA-Z]){1,8}$/;

# short answer?
$short = "+short" if defined $query->param(short);

# authoritative request?
if (defined $query->param(authoritative)) {
# this will be a lot more complicated and so is not implemented. Perhaps someday if there is a request...
}

open(DIG,"dig $short $type $domainname|") || die "Cannot run dig!!\n";
while() {
  print ;
}

Yes it’s very old-school. I do not even use a DNS package. Why bother? It’s not rocket science. There’s a lot more to argument validation than it looks like – you would not believe the evil things people send to your web server. So you have to vigilant about injection attacks or shelling out by use of unexpected characters.

Usage

2020 Update

This URL has been deactivated since I moved to my new server. I’ll have to see if there’s time and interest to restore this functionality.

example 1

https://drjohnstechtalk.com/cgi-bin/digiface.cgi?domainname=johnstechtalk.com&type=a

domainname johnstechtalk.com
type a
 
; &lt;&lt;&gt;&gt; DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.4 &lt;&lt;&gt;&gt; a johnstechtalk.com
;; global options: +cmd
;; Got answer:
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 8711
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
 
;; QUESTION SECTION:
;johnstechtalk.com.		IN	A
 
;; ANSWER SECTION:
johnstechtalk.com.	3600	IN	A	50.17.188.196
 
;; Query time: 10 msec
;; SERVER: 172.16.0.23#53(172.16.0.23)
;; WHEN: Mon May  4 14:59:05 2015
;; MSG SIZE  rcvd: 51

example 2
https://drjohnstechtalk.com/cgi-bin/digiface.cgi?domainname=drjohnstechtalk.com&short

domainname drjohnstechtalk.com
short 
50.17.188.196

Familiarity with dig will help you determine the best switches to use as you can see that at the end of the day it is merely calling dig and sending back that output with a minimum of html markup. This will make it easy to parse the output programatically.

Conclusion
A simple DNS web interface is being announced today. Both the service and the code are being made available. The service may be pulled once it becomes abused.

References
A nice, not too commercial web interface to dig and traceroute that is more user-friendly than mine is http://www.kloth.net/services/dig.php
The dig man pages can be helpful.

Got a geoDNS entry? Although this link has ads, it’s quite interesting because it sends your query to open DNS servers around the world: https://dnschecker.org/.

You can explore some details behind Google’s public resolving server 8.8.8.8 by using the web site: https://dns.google.com/. It’s quite helpful.

I won’t paste the link to my service but you can see what it is from the examples above.

There’s a simple but effective DIG available for your Android smartphone from the Playstore. That’s DNS debugger from TurboBytes. No obnoxious ads and yet no cost.

Of course if you are on the Internet and have access to dig, Google’s DNS servers are available for you to use directly.