pandas – Dr John's Tech Talk

Intro

I mostly transitioned from perl to python programming. I resisted for the longest time, but now I would never go back. I realized something. I was never really good at Perl. What I was good at were the regular expressions. So Perl for me was just a framework to write RegExes. But Perl code looks ugly with all those semicolons. Python is neater and therefore more legible at a glance. Python also has more libraries to pick from. Data structures in Perl were just plain ugly. I never mastered the syntax. I think I got it now in Python, which is a huge timesaver – I don’t have to hit the books every time I have a complex data structure.

I will probably find these tips useful and will improve upon them as I find better ways to do things. It’s mostly for my own reference, but maybe someone else will find them handy. I use maybe 5% of python? As I need additional things I’ll throw thm in here.

I’ve added some entries which I realized I needed so I can understand other people’s programming. For instance there are multiple ways to initialize an empty list.

What is this object?

Say you have an object <obj> and want to know what type it is because you’re a little lost. Do:

print(<obj>.__class__)

Check if this key exists in this dict

if “model” in thisdict:

Remove key from dict

if “model” in thisdict: del thisdict[“model”]

Copy (assign) one dict to another – watch the assignment operator!

Do not use dict2 = dict1! That is accepted, syntactically, but won’t work as you expect because the assignment operator (=) is economical and works by reference. Instead do this:

dict2 = dict1.copy()

It may even be necessary to use deepcopy:

import copy

dict2_complex = copy.deepcopy(dict1_complex)

Multiple assignments in on line

a,b,c = “hi”,23,”there”

Key and value from a single line

for itemid,val in itemvals.items():

Formatting

I guess it is pretty common to use a space-based (not tab) indent of four spaces for each subsequent code block.

Initializing lists and dicts

alist = []

blist = list() # another way to initialize an empty list

adict = {}

adict = dict() # another way to initialize an empty dict

Test for an empty list or empty dict or empty string

if not alist: print(“empty list”)

if not adict: print(“the dict adict is empty”)

astring=””

if not astring: print("the string is empty")

Avoid the KeyError: error

I just learned this technique. Wish I had known sooner!

a = adict.get(‘my_nonexistent_key’) # returns a with None if key does not exist. To test a: if a == None: …

Length of a list or string

len(alist)

len(astring)

Merge two lists together

for elmnt in list2: list1.append(elmnt)

Address first/last element in a list

alist[0] # first element

alist[-1] # last element

Iterate (loop) over a list

for my_element in alist: print(my_element) # all on one line for demo!

First/Last two characters in a string

astring[:2]

astring[-2:]

Third and fourth characters in a string

astring[2:4] # returns AE for astring = EUAEABUDH0014

Lowercase a string

astring.lower() # there also exists an upper() function as well, of course

Conditional (comparison) operators

if a == b: print(“equals”) # so == is comparison operator for strings

if re.search(r’eq’,a):

do something

elif re.search(r’newstring’,a):

do something else

else:

etc.

Order of evaluation of conditionals and max value of a dictionary

a = {‘hi’:0,’there’:1,’man’:2}

if not a or max(a.values()) < 3: do something

Is the above expression safe to evaluate in the case where the dict a is defined but empty? Answer: yes, it is! Although by itself max(a.values()) would produce an error, in this or conditional, execution, I guess, never reaches that statement because the first statement evaluates as True. Same reasoning applies if the boolean operator is and.

Ternary operator

I don’t think is well-developed in Python and shouldn’t be used (my opinion).

++ operator? Doesn’t exist. += and its ilk does, however.

Absolute Value

abs(a)

Boolean variables + multiple assignment example

a, b=True, False

if a==b: print(“equals”)

if a: print(“a is true”)

Reduce number of lines in the program

for n in range(12): colors[n] = ‘red’

if not mykey in mydict: mydict[mykey] = []

Printing stuff while developing

print(“mydict”,mydict,flush=True)

Python figures it out how to print your object, whatever type it is, which is cool. That flush=True is needed if you want to see our output which you’ve redirected to a file right away! Otherwise it gets buffered.

Reading and writing files – prettyify.py

Copy Code


import requests, json, sys, os
import sys,json

from pathlib import Path
aql_file = sys.argv[1]
aql_path = Path(aql_file)
json_file = str(aql_path.with_suffix('.json'))

# Script path
dir_path = os.path.dirname(os.path.realpath(__file__))
dir_path_files = dir_path + "/files/"

# make ugly json file prettier    
# this is kind of a different example, mixed in there
file = sys.argv[1]
f = open(file)
# return json obj as dict
fjson = json.load(f)
nicer = json.dumps(fjson,indent=4)
print(nicer,flush=True)
# back to original example
f = open(dir_path_files + json_file,'w+')
f.write(body)
f.close()

Reading in command-line arguments

Reading in a boolean value

python pgm.py False

So, you could use argparse, but I chose ast. Then I have a line in the script:

import ast
overwrite_s = sys.argv[1] # either True of False - whether to overwrite or not
overwrite = ast.literal_eval(overwrite_s)

Nota Bene that if you fail to take these steps your argument will be read in as a string, not a boolean!

See Reading and Writing files example.

Parsing command line arguments II

Here is a more versatile and generalized way to parse command line arguments.

import optparse
p = optparse.OptionParser()
p.add_option('-b','--brushWidth',dest='brushWidth',type='float')
p.set_defaults(brushWidth=1.0)
opt, args = p.parse_args()
width = opt.brushWidth
print('brushWidth',width)
print(width.class)
remaining arguments
print(args)

$ python3 tst.py -b 1.2 my_file.png

brushWidth 1.2

['my_file.png']

Rounding a floating point number to two significant digits

a = round(901/3600,2)

Command line tips

The command line is your friend and should be used for little tests. Moreover, you can print an object without even a print statement.

>>>a =[1,’hi’,3]

>>>a

Going from byte object to string

s_b = b’string’

s = s_b.decode(‘utf-8’)

Test if object is a string

if type(thisobject) == str: print(“It is a string”)

Python as a calculator

I always used python command line as a calculator, even ebfore I knew the language syntax! It’s very handy.

>>> 5 + 6/23

Breaking out of a for loop

Use the continue statement after testing a condition to skip remaining block and continue onto next iteration. Use the break to completely skip out of this loop. Note that break and continue only apply to the innermost loop!

Infinite loop

while True: # then continue with statements in a code block

Iterator to get key value pairs out of a dict

>>>a = {‘hi’:’there’,’hi2′:12}

>>>for k,v in a.items():

>>> print(‘key,value’,k,v)

Executing shell commands

import os

os.system(“ls -l”)

But, to capture the output, you can use the subprocess package:

import subprocess

output = subprocess.run(cmd, shell=True, capture_output=True)

Generate (pseudo-)random numbers

import random

a = random.random()

Accessing environment variables

os.environ[‘ENV_TOKEN’]

Handling glob (wildcards) in your shell command

import glob

for query_results_file in glob.glob(os.path.join(dir_path_files,OSpattern)): print(“query_results_file”,query_results_file)

But, if you want the results in the same order as the shell gives, put a sorted() around that. Otherwise the results come out randomly.

JSON tips

Python is great for reading and writing JSON files.

# Load inventory file

with open(dir_path_files + inventory_file) as inventory_file:
inventory_json = json.load(inventory_file)

sitenoted={'gmtOffset':jdict["gmtOffset"],'timezoneId':jdict["timezoneId"]}

# update inventory with custom field Site Notes – put GMT – make sitenoted pretty using json.dumps

sitenote=json.dumps(sitenoted,indent=4)
print("sitenote",sitenote)

Convert a string which basically is in json format to a Python data structure

import json
txt_d = json.loads(response.text)

Test for null in JSON value

You may see “mykey”:null in your json values. How to test for that?

if my_dict[mykey] == None: continue

Validate a json file

python3 -m json.tool JSON_FILE

Sleep

from time import sleep

sleep(0.1)

RegExes

Although supported in Python, seems kind of ugly. Many RegExes will need to prefaced with r (raw), or else you’ll get yourself into trouble, as in

import re

r'[a-z]{4}.\s*\w(abc|def)’

if re.search(‘EGW-‘,locale): continue

b = re.sub(‘ ‘,’-‘,locale) # replace the first space with a hyphen

b = re.split(r’\s’,’a b c d e f’) # creates list with value [‘a’,’b’,’c’,’d’,’e’,’f’]

[subnet,descr] = re.split(‘,’,’10.1.2.3/24,descr,etc’,maxsplit=1)

Minimalist URL example

import urllib.request

res = urllib.request.urlopen(‘https://drjohnstechtalk.com/’).read()

Function arguments: are they passed by reference or by value?

This section needs more research and may be inaccurate or simply wrong! By reference for complex objects like a dict (not sure about a list), but by value for a simple object like a Boolean! I got burned by this. I wanted a Boolean to persist across a function call. In the end I simple stuffed it into a dict! And that worked. But python doesn’t use that terminology. But it means you can pass your complex data structure, say a list of dicts of dicts, start appending to the list in your function, and the calling program has access to the additional members.

Print to a string a la sprintf

In python 3.6 and later you have the f-format which is way cool. Stuff between curly braces gets evaluated in place. Say a = 3 and b = ‘man’, then

str = f"first some text mixed with value of a, which is {a} and the text of b, which is {b}"

So no need to paste a string together with awkward combos of strings, plus signs and variables!

Insert a newline character into a string

a=’b\nc’ # when you print(a) b and c will be on separate lines

Putting the concepts to work: print out n randomly sampled lines from a file

Copy Code


import random,sys

def random_line(fname):
    lines = open(fname).read().splitlines() # splitlines removes \n chars
    return random.choice(lines)

file = sys.argv[1]
no_lines = int(sys.argv[2])
for n in range(no_lines):
    print(random_line(file))

Count occurences of a substring within a string

if ‘egw-fw’.count(‘egw’) > 1:

Working with IP addresses

Is this IP address in this subnet test

import ipaddress
ipad = ipaddress.ip_address(‘192.0.2.1’)
ipsubnet = ipaddress.IPv4Network(‘192.0.0.0/22’)
if ipad in ipsubnet: print(‘hi’)

Excel files

I’ve been using the package openpyxl quite successfully to read and write Excel files but I see that pandas also has built-in functions to read spreadsheets.

Date and time

import time

epoch_time = int(time.time()) # seconds since the epoch

Math

numpy seems to be the go-to package.

Using syslog

Please see this post.

Can a keyword be a variable?

Yes. Here’s an example.

timeunit = ‘days’

numbr = 3

datetime.now() + timedelta(**{timeunit: numbr})

Today’s Date in UTC

Copy Code


from datetime import datetime
today = datetime.utcnow() # current time in UTC land
date = today.strftime('%Y%m%d') # e.g., 20240418

Working with exit()

I like to add an exit() when testing code inside a loop so that the first iteration executes but I don’t sit around waiting for the whole thing to be done because I probably have other mistakes I need to correct. However, that can cause truoble if that is inside a try/except block! If the except block has no explicit Exception, it will always get executed and therefore you won’t exit! To get around this, this construct can be used:

Copy Code


try:
    exit() # this always raises SystemExit
except SystemExit:
    print("exit() worked as expected")
except:
    print("Something is horribly wrong") # some other exception got raised

Python and self-signed certificates, or certificates issued by private CAs

I updated this blog article to help address that: Adding private root CAs in Redhat or SLES or Debian.

Write it with style

Use flake8 to see if your python program conforms to the best practice style. In CentOS I installed flake8 using pip while in Debian linux I installed it using apt-get install flake8.

Skip first element of a generator function

subnet_g = ipaddress.IPv4Network('10.23.97.0/26').hosts() # subnet_g is a generator
subnet_l = list(subnet_g) # turn it into a list
for ip in subnet_l[1:]: # skips over first element in the list
    print('ip is',ip)

Does it at least pass the compiler – check syntax without running it

Install pyflakes: pip3 install pyflakes. Then

pyflakes your_script.py

Can I modify a Python script while its running?

Sure. No worries. It is safe to do so.

Print statement prints everything twice

This happens if you unfortunately named your program the same as a module you are importing. In this situation the program imports itself and runs twice. Rename your program something different!

Create virtual environment for portability

I like to call my virtual environment venv.

python3 -m venv venv # requires the SYSTEM package python3.11-venv

Use this virtual environment

source ./venv/bin/activate

List all the packages in this virtual environment

Good portable development style would have you install the minimal set of packages in your virtual environment and then build a requirements.txt file:

pip3 freeze > requirements.txt

Leave this virtual environment

deactivate

Test if package has been installed

python3 -c “import pymsteams” # is pymsteams package present?

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'pymsteams'

Conclusion

I’ve written down some of my favorite tips for using python effectively.

References and related

Adding private root CAs in Redhat or SLES or Debian.

Writing output to syslog