Tag: grep

My favorite bash scripting tips

Post author By john
Post date January 4, 2024
No Comments on My favorite bash scripting tips

Intro

The linux bash shell is great and very flexible. I love to use it and have even installed WSL 2 on my PCs so I can use it as much as possible. When it comes to scripting it’s not exactly my favorite. there is so much history it has absorbed that there are multiple ways to do everything: the really old way, the new way, the alternate way, etc. And your version of bash can also determine what features you can use. nevertheless, I guess if you stick to the basics it makes sense to use bash for simple scripting tasks.

So just like I’ve compiled all the python tips I need for writing my simple python scripts in one convenient, searchable page, I will now do the same for bash. No one but me uses it, but that’s fine.

Iterate (loop) over a range of numbers

Copy Code


END=255 # for instance to loop over an ocetet of an IP address
for i in $(seq 1 $END); do
  echo $i
done
# But if it's OK to just hard-wire start and end, then it's simpler to use:
for i in {1..255}; do echo $i; done

Infinite loop

while /bin/true; do...done

You can always exit to stop it.

Sort IPs in a sensible order

$ sort -n -t . -k1,1 -k2,2 -k 3,3 -k4,4 tmp

What directory is this script in?

DIR=$(cd $(dirname $0);pwd);echo$DIR

Guarantee this script is interpreted (run) by bash and not good ‘ole shell (sh)!

if [ ! "$BASH_VERSION" ] ; then
  exec /bin/bash "$0" "$@"
  exit
fi

Count number of occurrences, even if string occurs multiple times in the same line

grep -o string filename|wc -l

Count total occurrences of the word print in a bunch of files which may or may not be compressed, storing the output in a file

Copy Code


print=0
zgrep -c print tst*|cut -d: -f2|while read pline; do prints=$((prints + pline));echo $prints>prints; done

Note that much of the awkwardness of the above line is to get around issues I had with variable scope.

How to create a grep search pattern which includes the tab character

grep $’pattern\t’

Permitted characters in variable names

Don’t use _ as you might in python! Stick to alphanumeric, but also do not begin with a number!

Execute a command

I used to use back ticks ` in the old days. parentheses is more visually appealing:

print1=$(cat prints)

Variable type

No, variables are not typed. Everything is treated as a string.

Function definition

Put function definitions before they are invoked in the script. Invocation is by plain name. function syntax is as in the example.

sendsummary() {
# function execution statements go here, then close it out
} # optionally with a comment like end function sendsummary
sendsummary # invoke our sendsummary function

Indentation

Unlike python, line indentation does not matter. I recommend to indent blocks of code two spaces, for example, for readability.

Booleans and order of execution

[[ "$DEBUG" -eq "1" ]] && echo subject, $subject, intro, "$intro"

The second statement only gets executed if the first one evaluated as true. Now a more complex example.

[[ $day == $DAY ]] || [[ -n “$anomalies” ]] && { statements…}

The second expressions get evaluated if the first one is false. If either the first or second expressions are true, then the last expression — a series of statements in what is essentially an unnamed function, hence the enclosing braces — gets executed. The -n is a test to see of length of a string is non-zero. See man test.

Or just use old-fashioned if-then statements?

The huge problem with the approach above is that it may be hard to avoid that multiple statements get executed in their own forked shell. so if they’re trying set a variable, or even do an exit, it may not produce the desired result! I may need further research to refine my approach, but the old if – then clause works for me – no subshell needs to be created.

Conditionals

Note that clever use of && and || can in many cases obviate the need for a class if…then structure, but see thw warning above. But you can use if thens. An if block is terminated by a fi. There is an else statement as well as an elif (else if) statement.

grep conditionals

ping -c1 8.8.8.8|grep -iq '1 received'
[ $? -eq 0 ] && echo this host is alive

So the $? variable after grep is run contains 0 if there was a match and 1 if there was no match. -q argument puts grep in “quiet” mode (no output).

More sophisticated example testing exit status and executing multiple commands

Copy Code


#!/bin/bash
# restart mariaDB if home page response becomes greater than one second
curl -m1 -ksH 'Host:drjohnstechtalk.com' https://localhost/blog/ > /dev/null
# if curl didn't have enough time (one sec), its exit status is 28
[ $? -eq 28 ] && (systemctl stop mariadb; sleep 3; systemctl start mariadb; echo mariadb restart at $(date))

Note that I had to group the commands after the conditional test with surrounding parentheses (). That creates a code block. Without those the semicolon ; would have indicated the end of the block! A semicolon ; separates commands. Further note that I nested parentheses and that seems to work as you would hope. also note that STDOUT has been redirected by the greater than sign > to /dev/null in order to silently discard all STDOUT output. /dev/null is linux-specific. The windows equivalent, apparently, is nul. Use curl -so nul suppress output on a Windows system.

Reading in parameters from a config file

Lots of techniques demoed in this example!

Copy Code


# read in params from file QC.conf
IFS=$'\n'
echo Parameters from file
for line in $(<QC.conf); do
  [[ "$line" =~ ^# ]] || {
  pval=$(echo "$line"|sed 's/ //g')
  lhs=$(echo "$pval"|cut -d= -f1)
  rhs=$(echo "$pval"|cut -d= -f2)
  declare -g $lhs="$rhs"
  echo $lhs is ${!lhs}
  }
done

Note the use of declare with the -g (global) switch to assign a variable to a variable-defined variable name! Note the use of < to avoid creation of a subshell. Note the use of -P argument in grep so that it uses perl-style regex! Note the way to get the value of a variable whose name itself is represented by a variable var is ${!var}.

This script parses a config file with values like a = a_val, where spaces may or may not be present.

One square bracket or two?

I have no idea and I use whatever I get to work. All my samples work and I don’t have time to test all variations.

Variable scope

I really struggled with this so I may come back to this topic!

Variable interpolation

$variable will suffice for simple, i.e., one-word content. But if the variable contains anything a bit complex such as words separated by spaces, or containing unusual characters, better go with double quotes around it, “$variable”. And sometimes syntactically throw in curly braces to separate it from other elements, “${variable}”

Eval

eval="ls -l"
$eval # executes ls -l

Shell expansion

mv Pictures{,.old} # renames directory Pictures to Pictures.old

Poor man’s launch at boot time

Use crontab’s @reboot feature!

@reboot sleep 25; ./recordswitch.sh > recordswitch.log 2>&1

The above expression also shows how to redirect standard error to standard out and have both go into a file.

Run cron job every n minutes plus offset

5-59/20 * * * *

will run the job every 20 minutes starting at five minutes after the hour.

Use extended regular expressions, retrieving a positional field using awk, and how to subtract (or add) two numbers

t1=`echo -n $line|awk '{print $1}'` 
t2=`echo -n $line|awk '{print $4}'` 
# test for integer inputs 
[[ "$t1" =~ ^[0-9]+$ ]] && [[ "$t2" =~ ^[0-9]+$ ]] && downtime=$(($t1-$t2))

Oops, I used the backticks there! I never claim that my way is the best way, just the way that I know to work! I know of a zillion options to add or subtract numbers…

Get last field using awk

echo hi.there.111|awk -F\. '{print $NF}' # returns 111

Print all but the first field using awk

awk ‘{$1=””; print substr($0,2)}’

Why do assignments have no extra spaces?

It simply doesn’t work if you try to put in spacing around the assignment operator =.

Divert stdout and stderr to a file from within the script

log=/tmp/my-log.log
exec 1>$log 
exec 2>&1

Lists, arrays amd dictionary variables

I don’t think bash is for you if you need these types of variables.

Formatted date

date +%F

produces yyyy-mm-dd, i.e., 2024-01-25

date +%Y%m%d -> 20240417

Poor man’s source code versioning

The old EDT/TPU editor on VAX used to do this automatically. Now I want to save a version of whatever little script I’m currently working on in the ~/tmpFRI (if it’s Friday) directory to sort of spread out my work by day of the week. I call this script cpj so it’s easy to type:

Copy Code


#!/bin/bash
# save file using sequential versioning to tmp area named after this day - DrJ
DIR='~'/tmp$(date +%a|tr '[a-z]' '[A-Z]') # ~/tmp + day of the week, e.g., FRI
DIRREAL=$(eval "echo $DIR") # the real diretory we need
mkdir -p $DIRREAL
for file in $*; do
  res=$(ls -tr $DIRREAL|egrep "$file"'\.[0-9]{1,}$') # look for saved version numbers of this filename
  if test -n "$res"; then # we have seen this file...
    suffix=$(echo $res|awk -F\. '{print $NF}')  # pull out just the number at the end
    nxt=$(($suffix+1)) # add one to the version number
    saveFile="${file}"."${nxt}"
  else # new file to archive or no versioned number exists yet
    [[ -f $DIRREAL/$file ]] && saveFile="$file".1
    [[ -f $DIRREAL/$file ]] || saveFile=""
  fi
  cp "$file" $DIRREAL/"$saveFile"
  [[ -n $saveFile ]] && target=$DIR/"$saveFile"
  [[ -n $saveFile ]] || target="$DIR"
  echo copying "$file" to "$target"
done

It is a true mish-mash of programming styles, but it gets the job done. Note the use of eval. I’m still wrapping my head around that. Also note the technique used to upper case a string using tr. Note the use of extended regular expressions and egrep. Note the use of tilde ~ expansion. I insist on showing the target directory as ~/tmpSAT or whatever because that is what my brain is looking for. Note the use of nested $‘s.

Now that cpj is in place I occasionally know I want to make that versioned copy before I launch the vi editor, so I created a vij in my bash alias file thusly:

Copy Code


vij () { cpj "$@";sleep 1;vi "$@"; }

Complementing these programs is my gitj script which pushes my code changes to my repository after running pyflakes for python files:

Copy Code


#!/bin/bash
file="$@"
status=0

pushfile() {
  git add "$file"
  echo -n "Enter comment: "
  read comment
  fullComment=$(echo -e ${file}: "${comment}\n[skip ci]")
  echo -e "The full comment will be:\n${fullComment}"
  git commit -m "$fullComment"
  git push
  date
}

suffix=$(echo $file|awk -F\. '{print $NF}')  # pull out just the file type
if [[ $suffix == py ]]; then
  echo python file. Now running pyflakes on it;pyflakes $file;status=$?
  if [[ $status -eq 1 ]]; then echo syntax error detected so no git commands will be run
    exit 1
  else # python file checked out
    pushfile
  fi
else # was not a python file
  pushfile
fi

Name of script example

Copy Code


scriptName=$(echo $0|sed 's/.*\///')

Another example

I wrote this to retain one backup per month plus the last 28 days.

Copy Code


#!/bin/bash
# do some date arithmetic to preserve backup from first Monday in the month
#[[ $(date +%a) == "Wed" ]] && { echo hi; }
DEBUG=0
DRYRUN=''
[[ $DEBUG -eq 1 ]] && DRYRUN='--dry-run'
if [[ $(date +%a) == "Mon" ]] && [[ $(date +%-d) -lt 8 ]]; then
# preserve one month ago's backup!
  echo "On this first Monday of the month we are keeping the Monday backup from four weeks ago"
else
  d4wksAgo=$(date +%Y%m%d -d'-4 weeks') # four weeks ago
  oldBackup=zones-${d4wksAgo}.tar.gz
  git rm $DRYRUN backups/$oldBackup
fi
today=$(date +%Y%m%d)
todaysBackup=zones-${today}.tar.gz
git add $DRYRUN backups/$todaysBackup

It incorpoates a lot of the tricks I’ve accumulated over the years, too numerous to recount. But it’s a good example to study.

Calculate last weekday

Copy Code


today=$(date -u +%Y%m%d) # UTC date
# last weekday calculation
delta="-1"
[[ $(date -u +%a) != "Mon" ]] || delta="-3"
lastday=$(date -u +%Y%m%d -d"${delta} days")

Output the tab character in an echo statement

Just use the -e switch as in this example:

echo -e “$subnet\t$SSID”

Get top output in a non-interactive (batch) shell

top -b -n 1

Prompting for user input

echo -n “Give your input: “

read userInput

Print first 120 characters of each line in a text file

cat file | cut -c -120

Reverse the lines in a file

tac file > file-reversed # tac is cat in reverse!

Send email when there is no mailx, mail or postifx setup

Use curl!

curl –url smtp://mail-relay.com –mail-from $sender –mail-rcpt $recipient -T <(echo -e “$msg”)

Format json into something readable

curl json_api|python3 -m json.tool

Merge every other line in a file

sed ‘N;s/\n/ /’ file

Ending script on compound conditional can be a bad idea

I ended my script with this statement:

# send alerts if needed
[[ $notify -gt 0 ]] && alerting

Problem was, this last statement has normal value of 1 (first condition is false so second expression not evaluated) so whole script exits with value 1 and my ADO pipeline felt that was an error! Guess I’ll add an exit 0 at the end…

Editing file in place with sed

Thge -i switch to sed is designed to do your substitutions right in the file. Here’s an actual crontab entry where I used that switch:

35 22 * * * sed -i s'/enabled=0/enabled=1/' /etc/yum.repos.d/thousandeyes.repo > /dev/null 2>&1

Use sed to first test for condition then substitute depending if found

I learned this from chatgpt. Only do substitution if : is not present:

Copy Code


connectString=$(echo $url|cut -d/ -f3|sed '/:/! s/$/:443/')

Remove last bits of string with // operator

filename="example.tar.gz"
echo "${filename//.tar.gz/}" # example

Date of a file in seconds

The output from e.g., ls -l is unparseable. This will do the trick. Technically this reports the last modified time of filename in seconds.

Copy Code


echo $(($(date +%s) - $(date +%s -r "$filename")))

Change uppercase to lowercse and vica versa

I call it flipcase.

Copy Code


#!/bin/bash
# Read from STDIN and flip the case of every letter
tr '[:lower:][:upper:]' '[:upper:][:lower:]'

Or just use chatgpt

Similar to my latest experience in self-educating on python, in bash I’m regrettably finding it more convenient to ask duck.ai (chatgpt-o4) about certain questions. For instance I saw that double slash // in someone else’s script and wanted to know how it worked so I simply asked What does the // operator do in bash? And out came a nice comprehensive answer – much better than this blog post (again I hate to admit it). I’m usually the only user and customer for my own blog – it’s sort of a high-quality notebok. And now I’m in danger of losing my last customer!!!

Conclusion

I have documented here most of the tecniques I use from bash to achieve simple yet powerful scripts. My style is not always top form, but as I learn better ways I will adopt and improve.

References and related

https://stackoverflow.com/questions/12786410/run-cron-job-every-n-minutes-plus-offset#19204734

Python Tips

Tags /dev/null, awk, bash, code block, extended regular expression, gitj, grep, parentheses, semicolon, tilde, tr

IT Operational Excellence Linux

Grep is Slow as a Snail in SLES 11 – Solved

Post author By john
Post date June 27, 2011
9 Comments on Grep is Slow as a Snail in SLES 11 – Solved

I had written earlier about the performance problems of Suse Linux Enterprise Server v 11 Service Pack 1 (SLES 11 SP1) under VMWare: http://drjohnstechtalk.com/blog/2011/06/performance-degradation-with-sles-11-sp1-under-vmware/. What I hadn’t fully appreciated at that time is that part of the problem could be with the command grep itself. Further investigation has convinced me that grep as implemented under SLES 11 SP 1 X86_64 is horrible. It is seriously broken. The following results are invariant under both a VM and a physical server.

Methodology 1

A cksum shows that grep has changed between SLES 10 SP 3 and SLES 11 SP 1. I’m not sure what the changes are. So I performed an strace while grep’ing a short file to see if there are any extra system calls which occur under SLES 11 SP 1. There are not.

I copied the grep binary from SLES 10 SP 3 to a SLES 11 SP 1 system. I was afraid this wouldn’t work because it might rely on dynamic libraries which also could have changed. However this appears to not be the case and the grep binary from the SLES 10 system is about 19 times faster, running on the same SLES 11 system!

Methodology 2

I figure that I am a completely amateur programmer. If with all my limitations I can implement a search utility that does considerably better than the shell command grep, I can fairly decisively conclude that grep is broken. Recall that we already have comparisons that show that grep under SLES 10 SP 3 is many times faster than under SLES 11 SP 1.

Results

The table summarizes the findings. All tests were on a 109 MB file which has 460,000 lines.

OS	Type of Grep	Time (s)
SLES 11 SP 1	built-in	42.6
SLES 11 SP 1	SLES 10 SP 3 grep binary	2.5
SLES 11 SP 1	Perl grep	1.1
SLES 10 SP 3	built-in	1.2
SLES 10 SP 3	Perl grep	0.35 s

The Code for Perl Grep

Hey, I don’t know about you, but I only use a fraction of the features in grep. The switches i and v cover about 99% of what I do with it. Well, come to think of it I do use alternate expressions in egrep (w/ the “|” character), and the C switch (provides context by including surrounding lines) can sometimes be really helpful. The i (filenames only) and n (include line numbers) look useful on paper, but you almost never end up needing them. Anyways I simply didn’t program those things to keep it simple. Maybe later. To make it as fast as possible I avoided anything I thought the interpreter might trip over, at the expense of repeating code snippets multiple times. At some point (allowing another switch or two) my approach would be ludicrous as there would be too many combinations to consider. But at least in my testing it does function just like grep, only, as you see from the table above, it is much faster than grep. If I had written it in a compiled language like C it should go even faster still. Perl is an interpreted language so there should always be a performance penalty in using it. The advantage is of course that it is so darn easy to write useful code.

#!/usr/bin/perl
# J.Hilgart, 6/2011
# model grep implementation in Perl
# feel free to borrow or use this, but it will not be supported
use Getopt::Std;
$DEBUG = 0;
# Get the command line options.
getopts('iv');
# the search string has to be present
$mstr = shift @ARGV;
usage() unless $mstr;
$mstr =~ s/\./\\./g;
# the remaining arguments are the files to be searched
$nofiles = @ARGV;
print "nofiles: $nofiles\n" if $DEBUG;
$filePrefix = $nofiles > 1 ? "$_:" : "";
 
# call subroutine based on arguments present
optiv() if $opt_i && $opt_v;
opti()  if $opt_i;
optv()  if $opt_v;
normal();
################################
sub normal {
foreach (@ARGV) {
  open(FILE,"$_") || die "Cannot open $_!!\n";
  while(<FILE>) {
# print filename if there is more than one file being searched
    print "$filePrefix$_" if /$mstr/;
  }
  close(FILE);
}
if (! $nofiles) {
# no files specified, use STDIN
while(<STDIN>) {
  print if /$mstr/;
}
}
exit;
} # end sub normal
###############################
sub opti {
foreach (@ARGV) {
  open(FILE,"$_") || die "Cannot open $_!!\n";
  while(<FILE>) {
    print "$filePrefix$_" if /$mstr/i;
  }
  close(FILE);
}
if (! $nofiles) {
# no files specified, use STDIN
while(<STDIN>) {
  print if /$mstr/i;
}
}
exit;
} # end sub opti
#################################
sub optv {
foreach (@ARGV) {
  open(FILE,"$_") || die "Cannot open $_!!\n";
  while(<FILE>) {
    print "$filePrefix$_" unless /$mstr/;
  }
  close(FILE);
}
if (! $nofiles) {
# no files specified, use STDIN
while(<STDIN>) {
  print unless /$mstr/;
}
}
exit;
} # end sub optv
##############################
sub optiv {
foreach (@ARGV) {
  open(FILE,"$_") || die "Cannot open $_!!\n";
  while(<FILE>) {
    print "$filePrefix$_" unless /$mstr/i;
  }
  close(FILE);
}
if (! $nofiles) {
# no files specified, use STDIN
while(<STDIN>) {
  print unless /$mstr/i;
}
}
exit;
} # end sub optiv
sub usage {
# I never did finish this...
}

Conclusion
So built-in grep performs horribly on SLES 11 SP 1, about 17 times slower than the SLES 10 SP 3 grep. I wonder what an examination of the source code would reveal? But who has time for that? So I’ve shown a way to avoid it entirely, by using a perl grep instead – modify to suit your needs. It’s considerably faster than what the system provides, which is really sad since it’s an amateur, two-hour effort compared to the decade+ (?) of professional development on Posix grep. What has me more concerned is what haven’t I found, yet, that also performs horribly under SLES 11 SP 1? It’s like deer on the side of the road in New Jersey – where there’s one there’s likely to be more lurking nearby : ) .

Follow Up
We will probably open a support case with Novell. I am not very optimistic about our prospects. This will not be an easy problem for them to resolve – the code may be contributed, for instance. So, this is where it gets interesting. Is the much-vaunted rapid bug-fixing of open source really going to make a substantial difference? I would have to look to OpenSUSE to find out (where I suppose the fixed code would first be released), which I may do. I am skeptical this will be fixed this year. With luck, in a year’s time.

7/15 Update
There is a newer version of grep available. Old version: grep-2.5.2-90.18.41; New version: grep-2.6.3-90.18.41 Did it fix the problem? Depends how low you want to lower the bar. It’s a lot better, yes. But it’s still three times slower than grep from SLES 10 SP3. So…still a long ways to go.

9/7 Update – The Solution
Novell came through today, three months later. I guess that’s better than I pessimistically predicted, but hardly anything to brag about.

Turns out that things get dramatically better if you simple define the environment variable LC_ALL=POSIX. They do expect a better fix with SLES 11 SP 2, but there’s no release date for that yet. Being a curious sort, I revisited SLES 10 SP3 with this environment variable defined and it also considerably improved performance there as well! This variable has to do with the Locale and language support. Here’s a table with some recent results. Unfortunately the SLES 11 SP 1 is a VM, and SLES 10 SP3 is a physical server, although the same file was used. So the thing to concentrate on is the improvement in performance of grep with vs without LC_ALL defined.

OS	LC_ALL=POSIX defined?	Time (s)
SLES 11 SP 1	no	6.9
SLES 11 SP 1	yes	0.36
SLES 10 SP 3	no	0.35
SLES 10 SP 3	yes	0.19 s

So if you use SLES 10/11, make sure you have a

export LC_ALL=POSIX

defined somewhere in your profile if you plan to use grep very often. It makes a 19x performance improvement in SLES 11 and almost a 2x performance improvement under SLES 10 SP3.

Related
If you like the idea of grep but want a friendlier interface, I was thinking I ought to mention Splunk. A Google search will lead you to it. It started with a noble concept – all the features of grep, plus a convenient web interface so you never have to get yuor hands dirty and actually log into a Linux/unix system. It was like a grep on steroids. But then in my opinion they ruined a simple utility and blew it up with so many features that it’ll take hours to just scratch the surface of its capabilities. And I’m not even sure a free version is still available. Still, it might be worth a look in some cases. In my case it also slowed down searching though supposedly it should have sped them up.

And to save for last what should have come first, grep is a search utility that’s great for looking at unstructured (not in a relational database) data.

Tags grep, linux, OpenSUSE, SLES