After its customary overnight charging my A51 simply showed me a black screen in the morning. Yet I felt something was there because when I plugged it into the computer’s USB port the device was recognized. I was very concerned. But I did manage to completely fix it!
The symptoms
So various sites address this problem and give somewhat different advice. I sort of needed to combine them. So let’s review.
Black screen
Holding power button down for any length of time does nothing
plugging in to USB port of computer shows A51 device
What kind of works, but not really
Yes it’s true that holding the power button and volume down button simultaneously for a few seconds (about three or four) will bring up a menu. The choices presented are
Restart
Power off
Emergency Call
There’s no point to try Emergency Call. But when you try Restart you are asked to Restart a second time. Then the screen goes black again and you are back to where you started. If you choose Power off the screen goes black and you are back to where you started.
What actually works
Continue to hold the power button and volume down button simultaneously – ignore the screen you get mentioned above. Then after another 15 seconds or so it displays a lightning bolt inside a cricle. And if you keep holding that will disappear and you have a black screen. Keep holding and the lightning bolt appears, etc. So let them go. I don’t think it matters at which stage.
Now hopefully you have realy powered off the phone. So then hold the power button for a few seconds like you do to start the phone after it’s been powered off. It should start normally now.
As the other posts say, when you see Samsung on your screen you know you are golden.
Conclusion
I have shared what worked for me recover my Samsugn Galaxy A51 from its Black Screen of Death.
I give to lots of non-profits and realized there are some common elements and that perhaps others could learn from what I have observed. I also give to some political organizations.
Define terms
By non-profit I mean a 501(c)(3) as per IRS regulations. These can have a local focus or a national focus. They will be chartered in a particular state which will have its own rules for incorporation. For my purposes that doesn’t matter too much. I believe they all will have a board of directors. They all have to abide by certain rules such as spending most of what they take in (I think).
Common to all
Engage, engage, engage
They want to send you frequent correspondences, sometimes under various pretenses, to keep you engaged. You will receive correspondence under the following pretenses: the “annual renewal”, the “quarterly report”, the xmas update, the thankyou for contributing letter, the “for tax purposes” letter, the emergency appeal or rapid reaction to something in the news, the special donor multiplying your gift by 3x or even 10x, the estate giving solicitation, and, worst of all, the fake survey. I’m talking about you, ACLU. I have never once seen a published result of these fake surveys, which have zero scientific value and consist of one-sided questions. I used to fill them out the opposite way they expected out of spite, but to no avail as they kept coming with self-addressed stamped envelopes no less. All these correspondences have in common that they will always solicit you to give even more money as though what you’ve already given isn’t good enough.
But by all means read the newsletters on occasion to make sure they are doing the things you expect of them based on their mission. And ignore the extra pleas for money unless you are truly sympathetic. Emergencies do occur, after all.
Snail mail? No problem
You would naively think that by creating a known, non-trivial cost to these non-profits, namely forcing them to contact you by postal mail that they would send you fewer requests for money. Not so! I only contribute online when it seems to be the only practical way to do so (I’m thinking of you, Wikimedia), yet still, I get, no exaggeration, about a letter every two weeks from my organizations.
Phone etiquette
First off, you don’t need to give out your phone number even though they ask for it. It’s asked for in a purposefully ambiguous way, near the billing, as though it is needed to process your credit card. It isn’t. I happily omit my phone number. I figure if they really need it they can just wrote me a letter asking for it – and that’s never happened.
But if you’ve made the mistake of having given out your number, perhaps in the past, you may get called periodically. They do have the right to call you. But you can ask them to put you on a do not call list. What I do once I learn what organization is calling, is to sometime during their long opening sentence – which may come after you confirmed your identity – is to hold the phone away from my ear a little and just calmly say I’m not going to give any more money and hang up.
Universities have a special way of asking for money. I knew classmates who did this for their campus job. They call alumni, especially recent alumni who are more naive, and engage them with a scripted conversation that starts innocently enough, :I’m from your college and I wanted to update you update recent happenings on campus.” pretty soon they’re being asked to donate at the $250 level, then after an uncomfortable No, they’re relieve to learn they can still contribute at the $125 level, and so on down until the hapless alumnus/alumna is guilted into contributing something, anything at all.
Local giving
Fortunately, local giving where they haven’t signed on to use professional fund-raising organizations is more pleasant because you are normally not solicited very often, often just once a year.
Track it
I keep a spreadsheet with my gifts and a summed column at the bottom. I create a new worksheet named with the year for each new year. I have a separate section at the bottom with my non-deductible contributions.
I try to give all my gifts in the first one or two months of the year.
Come tax time, I print out the non-profit giving and include it with my paperwork for my accountant, but more on that below.
Deductions – forget about it
I pat a lot of taxes and still these days (from about 2018 onwards) I don’t get any tax credit for my contributions. Why? The reason is that the standard deduction is so high that it applies instead. This is the case ever since the tax changes of 2017. So if that’s true for me I imagine that’s true for most people. But each year I try…
Non-deductible organizations
Some organziations you would think are non-profits, but they are actually structured differently and so they are not. I’m thinking of you, The Sierra Club. The Sierra Club is using much of your donation to lobby politicians to their point of view about environmental issues and therefore by the rules cannot be a non-profit in the sense of a 501(c)(3).
Privacy
I’m not sure what privacy rules apply around your giving. In my personal experience, there are few constraints. This means expect your name to be sold as part of a giant list of donors. You are data and worth $$ to the organization selling your name to, usually, like-minded organizations who will hope to extract more money out of you. To be concrete, let’s say you donated one time to a senator in a tight senate race. Before six months is up every senator in a competitive race from that party will be soliciting you for funds. And not just one time but often on that bi-weekly basis! Once again using snail mail seems to be no obstacle. maybe it is even worse because with email you can in theory unsubscribe. I haven’t tried it but perhaps I should. I just hate to have my inbox flooded with solicitations. I’m really afraid to contribute to a congressional race for this reason. Or a governor’s race.
But this privacy issue is not just restricted to PACs sharing your data. Let’s say a relative had congenital heart failure so you decide to contribute to a heart association. Eventually you will be solicited by other major organizations with focus on other organs or health in general: lungs, kidneys, cancer, even the same organ but from a different organization, etc. Your data has been sold…
Amazon Smile – Giving while shopping
When I first learned of Amazon Smile from a friend at work I thought there was no way this could be true. Margins are said to be razor thin in retail, yet here was Amazon giving away one half percent of your order to the charity of your choice?? Yet it was true. And Amazon gave away hundreds of millions of dollars. Even my local church got into the program. My original recipient was Habitat for Humanity, which raised well over ten thousand dollars from Amazon Smile.
But Amazon killed this too-good-to-be-true program in March 2023 for reasons unknown. I’m not sure if other merchants have something which can replace it and will update this if I ever find out.
The efficiency of your charity
You want to know if a large portion of your gift to a particular charity is going towards the cause that is its mission, or, to administrative costs such as fund-raising itself. I’ve noticed good charities actually show you a pie chart which breaks down the amount taken by administrative overhead – usually 5 – 10 percent. But another way to learn something about efficiency is to use a third party web site such as Charity Navigator. But don’t get crazy about worrying about their ratings. I have read criticisms of their methods. Still, it’s better than nothing. 5 – 10 % administrative costs is fine. Hey, I used to know people who worked in such administrative positions and they are good people who deserve decent pay. Another drawback of Charity Navigator is that it won’t have ratings for your local charities.
For PACs as far as I know, there is no easy way to get comparable information. You just have to hope your money is well spent. I guess they have quarterly disclosure forms they fill out, but I don’t know how to hunt that down.
Tactics
The national organizations know everything you have ever given and will suggest you give at slightly higher amounts than you have in the past. 25 years ago the American Cancer Society asked if I would solicit my neighbors for contributions, which I did. I pooled all the money and gave them something like $300. I swear for the next 15 years they solicited me suggesting I contribute at that level even though I never gave more than $40 in the following 15 years. So annoying…
Death – an opportunity – for them
Many charities will encourage you to remember them in your estate planning. I suppose this may be a reasonable option if you feel really identified with their cause. I suppose The Nature Conservancy evokes these kinds of emotions, for example, because who doesn’t love nature? So think about your legacy, what you’re leaving behind.
National, with local chapters
Some national charities have local chapters. I’m thinking of you, Red Cross. I’m not really sure how this works out. But I know I have received solicitations from both the local chapter as well as the national chapter. So just be mindful of this. I suppose when you give to the local chapter it has more discretion on spending your donation locally and I guess giving a fraction of it to the national chapter.
Charitable Annuities
I don’t know all the details but if you have for instance appreciated equities instead of paying capital gains taxes you could gift them to a charity and receive a deduction for the gift. They in turn, if they’re a big outfit, usually a university, can set up a charitable annuity which provides you further tax benefits. I will flesh this out if I ever come across it again.
Conclusion
As a reliable contributor I am annoyed by the methods employed to shake even more out of my pockets. But I guess those methods work in bulk and so they continue to be used. As far as I can tell all national non-profits use professional fund-raising methods which closely follow the same patterns.
Although the tenor of this post is terribly cynical, obviously, I think non-profits are doing invaluable work and filling some of the gaping holes left by government. If I didn’t think so I wouldn’t be contributing. Most non-profits do good work and are run efficiently, but the occasional scam happens.
I wrote about my new HP Pavilion Aero laptop previously and how pleased I am with this purchase. And I’m not getting any kickbacks from HP for saying it! Well, this week was a sad story as all of a sudden, the wireless driver could no longer detect the presence of the Mediatek Wireless card. We hadn’t done anything! All the reboots in the world didn’t help. Fortunately it is still under warranty and fortunately HP’s consumer tech support is actually quite good. They helped me fix the problem. I wish to share with the wider community what happened and what fixed it.
The symptoms
No amount of rebooting fixes the issue
WiFi tile no longer appears (so there is no option to simply turn WiFi back on because you accidentally turned it off)
duet.exe file is not found (I don’t think this matters, honestly)
Where you normally see a WiFi icon in the shape of an amphitheater in the system tray, instead you only see:
a globe for the WiFi icon
HP PC Hardware Diagnostics Windows utility shows:
wireless IRQ test (RZ616) 160 MB FAILED
wireless ROM test FAILED
This diagnostics tool can be run in BIOS mode. It restarts the computer and puts you into a special BIOS mode diagnostics. When you run the wireless networking component test:
BIOS level component test of wireless networking PASSES
Yes, that’s right. You really didn’t fry the adapter, but Windows 11 totally messed it up.
On my own I tried…
to run HP PC Hardware Diagnostic Windows utility. It suggested I upgrade the BIOS, which I did. I ran some checks. The wireless IRQ test (RZ616) 160 MB failed, as did the wireless ROM test.
I uninstalled the Mediatek driver and reinstalled it.
Nothing doing. I had the insight to make the laptop useful, i.e., connected to Internet, by inserting an old USB wireless adapter that I used to use for my old Raspberry Pi model 2’s! It worked perfectly except only at 2.4 GHz band, ha, ha. But I knew that wasn’t a long-term solution.
Quickly…
The BIOS diag succeeded.
Hold the power button down for a long time to bring up a new menu. The sequence which results from holding power button down a long time seems to be:
Initial normal boot
Forced shutdown
Boot into a special BIOS submenu
Then you enable something. I don’t remember what. But it should be obvious as there were not a lot of choices.
Another reboot, and voila, the WiFi normal icon appears, though it has forgotten the passwords to the networks.
A word about HP support
Maybe I got a tech support person who was exceptionally knowledgeable, but I have to say tech support was exceptional in its own right. And this is coming from someone who is jaded with regards to tech support. My support person was clearly not simply following a script, but actually creatively thinking in real time. So kudos to them.
Conclusion
I lost my Mediatek WiFi adapter on my brand new HP Pavilion Aero notebook which I was so enamored with. HP support said it was due to a deficiency in the way Microsoft does Windows 11 upgrades. But they did not dance around the issue and helped me to resolve it. Although I don’t exactly what we did, I have tried to provide enough clues that someone else could benefit from my misfortune. Or perhaps I will be the beneficiary should this happen again.
I sometimes find myself in need to blur images to avoid giving away details. I once blurred just a section of an image using a labor-intensive method involving MS Paint. Here I provide a python program to blur an entire image.
The program
I call it blur.py. It uses the Pillow package and it takes an image file as its input.
# Dr John - 4/2023
# This will import Image and ImageChops modules
import optparse
from PIL import Image, ImageEnhance
import sys,re
p = optparse.OptionParser()
p.add_option('-b','--brushWidth',dest='brushWidth',type='float')
p.set_defaults(brushWidth=3.0)
opt, args = p.parse_args()
brushWidth = opt.brushWidth
print('brushWidth',brushWidth)
# Open Image
image = args[0]
print('image file',image)
base = re.sub(r'\.\S+$','',image)
file_type = re.sub(r'^\w+\.','',image)
canvas = Image.open(image)
width,height = canvas.size
print('Original image size',width,height)
widthn = int(width/brushWidth)
heightn = int(height/brushWidth)
smallerCanvas = canvas.resize((widthn, heightn), resample=Image.Resampling.LANCZOS)
# Creating object of Sharpness class
im3 = ImageEnhance.Sharpness(smallerCanvas)
# no of blurring passes to make. 5 seems to be a minimum required
iterations = 5
# showing resultant image
# 0,1,2: blurred,original,sharpened
for i in range(iterations):
canvas_fuzzed = im3.enhance(0.0)
im3 = ImageEnhance.Sharpness(canvas_fuzzed)
# resize back to original size
canvas = canvas_fuzzed.resize((width,height), resample=Image.Resampling.LANCZOS)
canvas.save(base + '-blurred.' + file_type)
So there would be nothing to write about if the the Pillow ImageEnhance worked as expected. But it doesn’t. As far as I can tell on its own it will only do a little blurring. My insight was to realize that by making several passes you can enhance the blurring effect. My second insight is that Image Enhance is probably only working within a 1 pixel radius. I have intruduced the concept of a brush size where the default width is 3.0 (pixels). I effectuate a brush size by reduing the image by the factor equal to the brush size! Then I do the blurring passes, then finally resize back to the original size! Brilliant, if I say so myself.
So in general it is called as
$ python3 blur.py -b 5 image.png
That example would be to use a brush size of 5 pixels. But that is optional so you can use my default value of 3 and call it simply as:
$ python3 blur.py image.png
Example output
Blur a select section of an image
You can easily find the coordinates of a rectangular section of an image by using, e.g., MS Paint and doing a mouseover in the corners of the rectangular section you wish to blur. Note the coordinates in the upper left corner and then again in the lower right corner. Mark them down in that order. My program even allows more than one section to be included. In this example I have three sections. The resulting image with its blurred sections is shown below.
Three rectangular setions of this image were blurred
Here is the code, which I call DrJblur.py for lack of a better name.
# blur one or more sections of an image. Section coordinates can easiily be picked up using e.g., MS Paint
# partially inspired by this tutorial: https://auth0.com/blog/image-processing-in-python-with-pillow/
# This will import Image and ImageChops modules
from PIL import Image, ImageEnhance
import sys,re
def blur_this_rectangle(image,x1,y1,x2,y2):
box = (x1,y1,x2,y2)
cropped_image = image.crop(box)
# Creating object of Sharpness class
im3 = ImageEnhance.Sharpness(cropped_image)
# no of blurring passes to make. 10 seems to be a minimum required
iterations = 10
# showing resultant image
# 0,1,2: blurred,original,sharpened
for i in range(iterations):
cropped_fuzzed = im3.enhance(-.75)
im3 = ImageEnhance.Sharpness(cropped_fuzzed)
# paste this blurred section back onto original image
image.paste(cropped_fuzzed,(x1,y1)) # this modified the original image
# Open Image
image = sys.argv[1]
base = re.sub(r'\.\S+$','',image)
file_type = re.sub(r'^\w+\.','',image)
canvas = Image.open(image)
argNo = len(sys.argv)
boxNo = int(argNo/4) # number of box sections to blur
# (x1,y1) (x2,y2) of rectangle to blur is the next argument
for i in range(boxNo):
j = i*4 + 2
x1 = int(sys.argv[j])
y1 = int(sys.argv[j+1])
x2 = int(sys.argv[j+2])
y2 = int(sys.argv[j+3])
blur_this_rectangle(canvas,x1,y1,x2,y2)
canvas.save(base + '-blurred.' + file_type)
Since it can be a little hard to find an a simple and easy-to-use blurring program, I have written my own and provided it here for general use. Actually I have provided two programs. One blurs an entire picture, the other blurs rectangular sections within a picture. Although I hardcoded 10 passes, that number may need to be increased depending on the amount of blurriness desired. To blur a larger font I changed it to 50 passes, for example!
Obviously, obviously, if you have a decent image editing program like an Adobe Photoshop, you would just use that. There are also probably some online tools available. I myself am leery of using “free” online tools – there is always a hidden cost. And if you all you want to do is to erase in that rectangle and not blur, even lowly MS Paint can do that quite nicely all on its own. But as for me, I will continue to use my blurring program – I like it!
References and related
The need for the ability to blur an image arose when I wanted to share something concrete resulting from my network diagram as code effort.
Since they took away our Visio license to save licensing fees, some of us have wondered where to turn to. I once used the venerable old MS Paint after learning one of my colleagues used it. Some have turned to Powerpoint. Since I had some time and some previous familiarity with the components – for instance when I create CAD designs for 3D printing I am basically also doing CAD as code using openSCAD – I wondered if I could generate my network diagram using code? It turns out I can, at least the basic stuff I was looking to do.
Pillow
I’m sure there are much better libraries out there but I picked something that was very common although also very limited for my purposes. That is the python Pillow package. I created a few auxiliary functions to ease my life by factoring out common calls. I call the auxiliary modules aux_modules.py. Here they are.
from PIL import Image, ImageDraw, ImageFont
serverWidth = 100
serverHeight = 40
small = 5
fnt = ImageFont.truetype('/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf', 12)
fntBold = ImageFont.truetype('/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf', 11)
def drawServer(img_draw,xCorner,yCorner,text,color='white'):
# known good colors for visibility of text: lightgreen, lightblue, tomato, pink and of course white
# draw the server
img_draw.rectangle((xCorner,yCorner,xCorner+serverWidth,yCorner+serverHeight), outline='black', fill=color)
img_draw.text((xCorner+small,yCorner+small),text,font=fntBold,fill='black')
def drawServerPipe(img_draw,xCorner,yCorner,len,source,color='black'):
# draw the connecting line for this server. We permit len to be negative!
# known good colors if added text is in same color as pipe: orange, purple, gold, green and of course black
lenAbs = abs(len)
xhalf = xCorner + int(serverWidth/2)
if source == 'top':
coords = [(xhalf,yCorner),(xhalf,yCorner-lenAbs)]
if source == 'bottom':
coords = [(xhalf,yCorner+serverHeight),(xhalf,yCorner+serverHeight+lenAbs)]
img_draw.line(coords,color,2)
def drawArrow(img_draw,xStart,yStart,len,direction,color='black'):
# draw using several lines
if direction == 'down':
x2,y2 = xStart,yStart+len
x3,y3 = xStart-small,y2-small
x4,y4 = x2,y2
x5,y5 = xStart+small,y3
x6,y6 = x2,y2
coords = [(xStart,yStart),(x2,y2),(x3,y3),(x4,y4),(x5,y5),(x6,y6)]
if direction == 'right':
x2,y2 = xStart+len,yStart
x3,y3 = x2-small,y2-small
x4,y4 = x2,y2
x5,y5 = x3,yStart+small
x6,y6 = x2,y2
coords = [(xStart,yStart),(x2,y2),(x3,y3),(x4,y4),(x5,y5),(x6,y6)]
img_draw.line(coords,color,2)
img_draw.line(coords,color,2)
def drawText(img_draw,x,y,text,fnt,placement,color):
# draw appropriately spaced text
xy = (x,y)
bb = img_draw.textbbox(xy, text, font=fnt, anchor=None, spacing=4, align='left', direction=None, features=None, language=None, stroke_width=0, embedded_color=False)
# honestly, the y results from the bounding box are terrible, or maybe I don't understand how to use it
if placement == 'lowerRight':
x1,y1 = (bb[0]+small,bb[1])
if placement == 'upperRight':
x1,y1 = (bb[0]+small,bb[1]-(bb[3]-bb[1])-2*small)
if placement == 'upperLeft':
x1,y1 = (bb[0]-(bb[2]-bb[0])-small,bb[1]-(bb[3]-bb[1])-2*small)
if placement == 'lowerLeft':
x1,y1 = (bb[0]-(bb[2]-bb[0])-small,bb[1])
xy = (x1,y1)
img_draw.text(xy,text,font=fntBold,fill=color)
How to use
I can’t exactly show my eample due to proprietary elements. So I can just mention I write a main program making lots of calls tto these auxiliary functions.
Tip
Don’t forget that in this environment, the x axis behaves like you learned in geometry class with positive x values to the right of the y axis, but the y axis is inverted! So positive y values are below the x axis. That’s just how it is in a lot of these programs. get used to it.
What I am lacking is a good idea to do element groupings, or an obvious way to do transformations or rotations. So I just have to keep track of where I am, basically. But even still I enjoy creating a network diagram this way because there is so much control. And boy was it easy to replicate a diagram for another one which had a similar layout.
It only required the Pillow package. I am able to develop my diagrams on my local PC in my WSL environment. It’s nice and fast as well.
Example Output
This is an example output from this diagram as code approach which I produced over the last couple days, sufficiently blurred for sharing.
Network diagram (blurred) resulting from use of this code-first approach
Conclusion
I provide my auxiliary functions which permit creating “network diagrams as code.” The results are not pretty, but networking people will understand them.
I am very pleased with my online purchase of an HP lsptop. So I am sharing my experience here. Believe it or not, I did not, unfortunately, receive anything for this endorsement! I simply am thrilled with the product. I heartily recommend this laptop to others if it is similarly configured.
Requirements
Requirements are never made in the abstract, but represent a combination of what is possible and what others offer.
laptop
13″ diagonal screen
lightweight
“fast,” whatever that means
future-proof, if at all possible
distinctive (you’ll see what that means in a second)
durable
no touch-screen!! (hate them)
Windows 11 Home Edition
under $1200
1 TB of storage space
SSD
HP brand
What I got
I used to be a fan of Dell until I got one a few years back in which the left half of the keyboard went dead. It seems that problem was not so uncommon when you would do a search. Also my company seems to much more on the HP bandwagon than the Dell one, and they generally know what they are doing.
I remember buying an HP Pavilion laptop in November 2017. It was an advertised model which had the features I sought at the time, including Windows 7, 512 GB SSD disk. Surely, with the inexorable improvements in everything, wouldn’t you have thought that in the intervening five years, 1 TB would be commonplace, even on relatively low-end laptop models? For whatever reason, that upgrade didn’t happen and even five years later, 1 TB is all but unheard of on sub $1000 laptops. I guess everyone trusts the cloud for their storage. I work with cloud computing every day. But I want the assurance of having my photos on my drive, and not exclusively owned by some corporation. And we have lots of photos. So our Google Drive is about 400 GB. So with regards to storage, future-proof for me means room to grow for years, hence, 1 TB.
My company uses HP Elitebooks. They have touchscreens which I never use and are more geared towards business uses. Not only do I dislike touchscreens (you’re often touching them unintentionally), but they add weight and draw power. So not having one – that’s a win-win.
So since so few cheap laptops offer 1 TB standard, I imagined, correctly, that HP would have a configurator. The model which supports this is the HP Pavilion Aero. I configured a few key upgrades, all of which are worthwhile.
The screen size and the fact of running Windows 11 are not upgrades, everything else on the above list is. Some, like the cpu, a bit pricey. But my five-year-old laptop, which runs fine, by the way, is EOL because Microsoft refuses to support its cpu for Windows 11 upgrade. I’m hoping when I write my five year lookback in 2028 the same does not happen to this laptop!
I especially like the pale rose gold trim. Why? When you go to a public place such as an airport, your laptop does not look like everyone else’s.
We also want to carry this laptop around. So another benefit is that it’s one of the lightest laptops around, for its size. Again, a touchscreen would have been heavier.
Of course the Aero contains microphone, built-in speakers, but no ethernet port (I’m a little leery about that). Only two USB ports, plus a USB-C port and full-sized hdmi port.
One usage beef I have is that it supposedly has a back-lit keyboard, but I’ve never seen it turn on.
My company has a coupon code for a roughly four percent discount – not huge, but every bit helps. Shipping is free. But to get the discount I had to talk to a human being to place the order, which is a good idea anyway for a purchase of this magnitude. She carefully reviewed the order with me multiple times. She commended me on my choice to upgrade to the OLED display, which gave me a good feeling.
Unexpected features
I wasn’t really looking for it, but there it is, a fingerprint scanner(!) in order to do a Windows Hello verification. I did not set it up. I guess it could also do a facial recognition as well (that’s what I use at work for Windows Hello for Business), but I also didn’t try that.
I think there’s a mini stereo output but maybe no microphone input? Of course get a USB microphone and you’re all good…
Price
Price as configured above and with my company coupon code applied was $1080. I think that’s much better than a similarly equipped Surface tablet though I honestly didn’t do any real comparisons since I wanted to go HP from the get-go.
Conclusion
I bought a new HP Pavilion Aero laptop. It’s only been a month but I am very pleased with it so far. I configured it the with upgrades important to me since no off-the-shelf model has adequate storage capacity at the sub $1000 price point where I am.
I recommend this configuration for others. I think it’s really a winning combo. I have – I know this is hard to believe – not been compensated in any way for this glowing review! See my site – no ads? That shows you this is a different kind of web site, the kind that reflects the ideals of the Internet when it was conceived decades ago as an altruistic exchange of ideas, not an overly commercialized hellscape.
Since I saw this laptop was a winner I decided to give it away to a loved one, and now I’m back on that five-year-old HP Pavilion laptop!
I am using the api, needless to say. I cannot say I have mastered the api or even come close to understanding it. I however have leveraged the same api call I have previously used since I observed it contains a lot of interesting data.
conf_check_all.ini
This config file is written as json to make importing a breeze. You set up optional trigger conditions for the various pipeline runs you will have, or even whether or not to perform any checks on it at all.
{
"organization":"drjohns4ServicesCoreSystems",
"project":"Connectivity",
"url_base":"https://dev.azure.com/",
"url_params":"&api-version=7.1-preview.7",
"test_flag":false,
"run_ct_min":2,
"queue_time_max":1800,
"pipelines":{
"comment":{"maximum_processing_time_in_seconds":"integer_value","minimum_processing_time_in_seconds":"integer_value","(optional) check_flag - to potentially disable the checks for this pipeline":"either true or false"},
"default":{"max_proc_time":1800,"min_proc_time":3,"check_flag":true},
"feed_influxdb":{"max_proc_time":180,"min_proc_time":3},
"PAN-Usage4Mgrs":{"max_proc_time":900,"min_proc_time":60,"check_flag":true},
"PAN-Usage4Mgrs-2":{"max_proc_time":900,"min_proc_time":60},
"speed-up-sampling":{"max_proc_time":900,"min_proc_time":"2","check_flag":false},
"Pipeline_check":{"max_proc_time":45,"min_proc_time":"2","check_flag":false},
"Discover new vEdges":{"max_proc_time":3600,"min_proc_time":3,"check_flag":true},
}
}
So you see at the bottom is a dictionary where the keys are the names of the pipelines I am running, plus a default entry.
#!/usr/bin/python3
# fetch raw log to local machine
# for relevant api section, see:
#https://learn.microsoft.com/en-us/rest/api/azure/devops/build/builds/get-build-log?view=azure-devops-rest-7.1
import urllib.request,json,sys,os
from datetime import datetime,timedelta
from modules import aux_modules
conf_file = sys.argv[1]
# pipeline uses UTC so we must follow suit or we will miss files
#a_day_ago = (datetime.utcnow() - timedelta(days = 1)).strftime('%Y-%m-%dT%H:%M:%SZ')
startup_delay = 30 # rough time in seconds before the pipeline even begins to execute our script
an_hour_ago = (datetime.utcnow() - timedelta(hours = 1, seconds = startup_delay)).strftime('%Y-%m-%dT%H:%M:%SZ')
print('An hour ago was (UTC)',an_hour_ago)
format = '%Y-%m-%dT%H:%M:%SZ'
#url = 'https://dev.azure.com/drjohns4ServicesCoreSystems/Connectivity/_apis/build/builds?minTime=2022-10-11T13:00:00Z&api-version=7.1-preview.7'
# dump config file into a dict
config_d = aux_modules.parse_config(conf_file)
test_flag = config_d['test_flag']
if test_flag:
print('config_d',config_d)
print('We are in a testing mode because test_flag is:',test_flag)
url_base = f"{config_d['url_base']}{config_d['organization']}/{config_d['project']}/_apis/build/builds"
url = f"{url_base}?minTime={an_hour_ago}{config_d['url_params']}"
#print('url',url)
req = urllib.request.Request(url)
req.add_header('Authorization', 'Basic ' + os.environ['ADO_AUTH'])
# Get buildIds for pipeline runs from last 1 hour
with urllib.request.urlopen(req) as response:
html = response.read()
txt_d = json.loads(html)
#{"count":215,"value":[{"id":xxx, "buildNumber":"20230203.107","status":"completed","result":"succeeded","queueTime":"2023-02-03T21:12:01.0865046Z","startTime":"2023-02-03T21:12:05.2177605Z","finishTime":"2023-02-03T21:17:28.1523128Z","definition":{"name":"PAN-Usage4Mgrs-2"
value_l = txt_d['value']
all_msgs = ''
header_msg = '**Recent pipeline issues**\n'
# check for too few pipeline runs
if len(value_l) <= config_d['run_ct_min']:
all_msgs = f"There have been fewer than expected pipeline runs this past hour. Greater than **{config_d['run_ct_min']}** runs are expected, but there have been only **{len(value_l)}** runs. \nSeomthing may be wrong. \n"
for builds in value_l:
msg = aux_modules.check_this_build(builds,config_d,url_base)
if msg: all_msgs = f"{all_msgs} \n{msg} \n"
if all_msgs:
if not test_flag: aux_modules.sendMessageToTeams(header_msg + all_msgs) # send to WebHook if not in a testing mode
print(header_msg + all_msgs)
else:
print('No recent pipeline errors')
Short explanation
I consider the code to be mostly self-explanatory. A cool thing I’m trying out here is the f- format specifier to write to a string kind of like sprintf. I run this script every hour from, yes, an ADO pipeline! But since this job looks for errors, including errors which indicate a systemic problem with the agent pool, I run it from a different agent pool.
import json,re
import os,urllib.request
from datetime import datetime,timedelta
import pymsteams
def parse_config(conf_file):
# config file should be a json file
f = open(conf_file)
config_d = json.load(f)
f.close()
return config_d
def get_this_log(config_d,name,buildId,build_number):
# leaving out the api-version etc works better
#GET https://dev.azure.com/{organization}/{project}/_apis/build/builds/{buildId}/logs/{logId}?api-version=7.1-preview.2
#https://dev.azure.com/drjohns4ServicesCoreSystems/d6338e-f5b4-45-6c-7b3a86/_apis/build/builds/44071/logs/7'
buildId_s = str(buildId)
log_name = config_d['log_dir'] + "/" + name + "-" + build_number
# check if we already got this one
if os.path.exists(log_name):
return
#url = url_base + organization + '/' + project + '/_apis/build/builds/' + buildId_s + '/logs/' + logId + '?' + url_params
url = config_d['url_base'] + config_d['organization'] + '/' + config_d['project'] + '/_apis/build/builds/' + buildId_s + '/logs/' + config_d['logId']
print('url for this log',url)
req = urllib.request.Request(url)
req.add_header('Authorization', 'Basic ' + config_d['auth'])
with urllib.request.urlopen(req) as response:
html = response.read()
#print('log',html)
print("Getting (name,build_number,buildId,logId) ",name,build_number,buildId_s,config_d['logId'])
f = open(log_name,"wb")
f.write(html)
f.close()
def check_this_build(builds,config_d,url_base):
format = '%Y-%m-%dT%H:%M:%SZ'
buildId = builds['id']
build_number = builds['buildNumber']
status = builds['status'] # normally: completed
result = builds['result'] # normally: succeeded
queueTime = builds['queueTime']
startTime = builds['startTime']
finishTime = builds['finishTime']
build_def = builds['definition']
name = build_def['name']
print('name,build_number,id',name,build_number,buildId)
print('status,result,queueTime,startTime,finishTime',status,result,queueTime,startTime,finishTime)
qTime = re.sub(r'\.\d+','',queueTime)
fTime = re.sub(r'\.\d+','',finishTime)
sTime = re.sub(r'\.\d+','',startTime)
qt_o = datetime.strptime(qTime, format)
ft_o = datetime.strptime(fTime, format)
st_o = datetime.strptime(sTime, format)
duration_o = ft_o - st_o
duration = int(duration_o.total_seconds())
print('duration',duration)
queued_time_o = st_o - qt_o
queued_time = int(queued_time_o.total_seconds())
queue_time_max = config_d['queue_time_max']
# and from the config file we have...
pipes_d = config_d['pipelines']
this_pipe = pipes_d['default']
if name in pipes_d: this_pipe = pipes_d[name]
msg = ''
if 'check_flag' in this_pipe:
if not this_pipe['check_flag']:
print('Checking for this pipeline has been disabled: ',name)
return msg # skip this build if in test mode or whatever
print('duration,min_proc_time,max_proc_time',duration,this_pipe['min_proc_time'],this_pipe['max_proc_time'])
print('queued_time,queue_time_max',queued_time,queue_time_max)
if duration > this_pipe['max_proc_time'] or duration < this_pipe['min_proc_time']:
msg = f"ADO Pipeline **{name}** run is outside of expected time range. Build number: **{build_number}**. \n Duration, max_proc_time, min_proc_time: **{duration},{this_pipe['max_proc_time']},{this_pipe['min_proc_time']}**"
if not status == 'completed' or not result == 'succeeded':
msg = f"ADO Pipeline **{name}** run has unexpected status or result. Build number: **{build_number}**. \n - Status: **{status}** \n - Result: **{result}**"
if queued_time > queue_time_max: # Check if this job was queued for too long
msg = f"ADO Pipeline **{name}** build number **{build_number}** was queued too long. Queued time was **{queued_time}** seconds"
if msg:
# get the logs meta info to see which log is the largest
url = f"{url_base}/{buildId}/logs"
req = urllib.request.Request(url)
req.add_header('Authorization', 'Basic ' + os.environ['ADO_AUTH'])
# Get buildIds for pipeline runs from last 1 hour
with urllib.request.urlopen(req) as response:
html = response.read()
txt_d = json.loads(html)
value_l = txt_d['value']
#{"count":11,"value":[{"lineCount":31,"createdOn":"2023-02-13T19:03:17.577Z","lastChangedOn":"2023-02-13T19:03:17.697Z","id":1...
l_ct_max = 0
log_id_err = 0
# determine log with either an error or the most lines - it differs for different pipeline jobs
for logs_d in value_l[4:]: # only consider the later logs
url = f"{url_base}/{buildId}/logs/{logs_d['id']}"
req = urllib.request.Request(url)
req.add_header('Authorization', 'Basic ' + os.environ['ADO_AUTH'])
with urllib.request.urlopen(req) as response:
html = response.read().decode('utf-8')
if re.search('error',html):
log_id_err = logs_d['id']
print('We matched the word error in log id',log_id_err)
l_ct = logs_d['lineCount']
if l_ct > l_ct_max:
l_ct_max = l_ct
log_id_all = logs_d['id']
if log_id_err > 0 and not log_id_all == log_id_err: # error over long log file when in conflict
log_id_all = log_id_err
url_all_logs = f"{url_base}/{buildId}/logs/{log_id_all}"
msg = f"{msg} \n**[Go to Log]({url_all_logs})** "
print(msg)
return msg
def sendMessageToTeams(msg: str):
"""
Send a message to a Teams Channel using webhook
"""
# my Pipeline_check webhook
webHookUrl = "https://drjohns.webhook.office.com/webhookb2/66f741-9b1e-401c-a8d3-9448d352db@ec386b-c8f-4c0-a01-740cb5ba55/IncomingWebhook/2c8e881d05caba4f484c92617/7909f-d2f-b1d-3c-4d82a54"
try:
# escaping underscores to avoid alerts in italics.
msg = msg.replace('_', '\_')
teams_msg = pymsteams.connectorcard(webHookUrl)
teams_msg.text(f'{msg}')
teams_msg.send()
except Exception as e:
print(f'failed to send alert: {str(e)}')
aux_modules.py contains most of the logic with checking each pipeline against the criteria and constructing an alert in Markdown to send to MS Teams. I’m not saying it’s beautiful code. I’m still learning. But I am saying it works.
I’ve revised the code to find the log file which is most likely to contain the “interesting” stuff. That’s usually the longest one excluding the first five or so. There are often about 10 logs available for even a minimal pipeline run. So this extra effort helps.
Then I further revised the code to fetch the logs and look for the word “error.” That may show up in the longest log or it may not. It not, that log takes precedence as the most interesting log.
# Python package
# Create and test a Python package on multiple Python versions.
# Add steps that analyze code, save the dist with the build record, publish to a PyPI-compatible index, and more:
# https://docs.microsoft.com/azure/devops/pipelines/languages/python
##trigger:
##- main
trigger: none
pool:
name: dsc-adosonar-drjohns4ServicesCoreSystems-agent
# name: visibility_agents
#strategy:
# matrix:
# Python36:
# python.version: '3.6'
steps:
#- task: UsePythonVersion@0
# inputs:
# versionSpec: '$(python.version)'
# displayName: 'Use Python $(python.version)'
- script: pip3 install -vvv --timeout 60 -r Pipeline_check/requirements.txt
displayName: 'Install requirements'
- script: python3 check_all_pipelines.py conf_check_all.ini
displayName: 'Run script'
workingDirectory: $(System.DefaultWorkingDirectory)/Pipeline_check
env:
ADO_AUTH: $(ado_auth)
PYTHONPATH: $(System.DefaultWorkingDirectory)/Pipeline_check:$(System.DefaultWorkingDirectory)
schedules:
- cron: "19 * * * *"
displayName: Run the script at 19 minutes after the hour
branches:
include:
- main
always: true
This yaml file we sort of drag around from pipeline to pipeline so some of it may not appear too optimized for this particular pipeline. But it does the job without fuss.
Conclusion
My ADO pipeline checker is conveniently showing us all our pipeline runs which have failed for various reasons – takes too long, completed with errors, too few jobs have been run ni the last hour, … It sends its output to a MS Teams channel we have subscribed to, by way of a webhook we set up. So far it’s working great!
Some of my colleagues had used Influxdb and Grafana at their previous job so they thought it might fit for what we’re doing in the Visibility team. It sounded good in theory, anyway, so I had to agree. There were a lot of pitfalls. Eventually I got it to the point where I’m satisfied with my accomplishments and want to document the hurdles I’ve overcome.
So as time permits I will be fleshing this out.
Grafana
I’m going to lead with the picture and then the explanation makes a lot more sense.
I’ve spent the bulk of my time wrestling with Grafana. Actually it looks like a pretty capable tool. It’s mostly just understanding how to make it do what you are dreaming about. Our installed version currently is 9.2.1.
My goal is to make a heatmap. But a special kind similar to what I saw the network provider has. That would namely entail one vedge per row, and one column per hour, hence, 24 columns in total. A vedge is a kind of SD-Wan router. I want to help the networking group look at hundreds of them at a time. So that’s on potential dashboard. It would give a view of a day. Another dashboard would show just one router with the each row representing a day, and the columns again showing an hour. Also a heatmap. The multi-vedge dashboard should link to the individual dashboard, ideally. In the end I pulled it off. I am also responsible for feeding the raw data into Influxdb and hence also for the table design.
Getting a workable table design was really imporant. I tried to design it in a vacuum, but that only partially worked. So I revised, adding tags and fields as I felt I needed to, while being mindful of not blowing up the cardinality. I am now using these two tables, sorry, measurements.
vedge measurementvedge_stats measurement
Although there are hundreds of vedges, some of my tags are redundant, so don’t get overly worried about my high cardinality. UTChour is yes a total kludge – not the “right” way to do things. But I’m still learning and it was simpler in my mind. item in the first measurement is redundant with itemid. But it is more user-friendly: a human-readable name.
Influx Query Explorer
It’s very helpful to use the Explorer, but the synatx there is not exactly the same as it will be when you define template variables. Go figure.
Multiple vedges for the last day
So how did I do it in the end?
Mastering template variables is really key here. I have a drop-down selection for region. In Grafana-world it is a custom variable with potential values EU,NA,SA,AP. That’s pretty easy. I also have a threshold variable, with possible values: 0,20,40,60,80,90,95. And a math variable with values n95,avg,max. More recently I’ve added a threshold_max and a math_days variable.
It gets more interesting however, I promise. I have a category variable which is of type query:
Multi-value and Include all options are checked. Just to make it meaningful, category is assigned by the WAN provider and has values such as Gold, Silver, Bronze.
And it gets still more interesting because the last variable depends on the earlier ones, hence we are using chained variables. The last variable, item, is defined thusly:
So what it is designed to do is to generate a list of all the items, which in reality are particular interfaces of the vedges, into a drop-down list.
Note that I want the user to be able to select multiple categories. It’s not well-documented how to chain such a variable, so note the use of contains and set in that one filter function.
And note the double-quotes around ${Region}, another chained variable. You need those double-quotes! It kind of threw me because in Explorer I believe you may not need them.
And all that would be simply nice if we didn’t somehow incorporate these template variables into our panels. I use the Stat visualization. So you’ll get one stat per series. That’s why I artifically created a tag UTChour, so I could easily get a unique stat box for each hour.
The stat visualization flux Query
Here it is…
data = from(bucket: "poc_bucket2")
|> range(start: -24h, stop: now())
|> filter(fn: (r) =>
r._measurement == "vedge" and
r._field == "percent" and r.hostname =~ /^${Region}/ and r.item == "${item}"
)
|> drop(columns: ["itemid","ltype","hostname"])
data
Note I hae dropped my extra tags and such which I do not wish to appear during a mouseover.
Remember our regions can be one of AP,EU,NA or SA? Well the hostnames assigned to each vedge start with the two letters of its region of location. Hence the regular explression matching works there to restrict consideration to just the vedges in the selected region.
We are almost done.
Making it a heat map
So my measurement has a tag called percent, which is the percent of available bandwidth that is being used. So I created color-based thresholds:
Colorful percent-based thresholds
You can imagine how colorful the dashboard gets as you ratchet up the threshold template variable. So the use of these thresholds is what turns our stat squares into a true heatmap.
Heatmap visualization
I found the actual heatmap visualization useless for my purposes, by the way!
There is also an unsupported heatmap plugin for Grafana which simply doesn’t work. Hence my roll-your-own approach.
Repitition
How do we get a panel row per vedge? The stat visualization has a feature called Repeat Options. So you repeat by variable. The variable selected is item. Remember that item came from our very last template variable. Repeat direction is Vertical.
For calculation I choose mean. Layout orienttion is Vertical.
The visualization title is also variable-driven. It is ${item} .
The panels are long and thin. Like maybe two units high? – one unit for the label (the item) and the one below it for the 24 horizontal stat boxes.
Put it all together and voila, it works and it’s cool and interactive and fast!
Single vedge heatmap data over multiple days
Of course this is very similar to the multiple vedge dashboard. But now we’re drilling down into a single vedge to look at its usage over a period of time, such as the last two weeks.
Flux query
import "date"
b = date.add(d: 1d, to: -${day}d)
data = from(bucket: "poc_bucket2")
|> range(start: -${day}d, stop: b)
|> filter(fn: (r) =>
r._measurement == "vedge" and
r._field == "percent" and
r.item == "$item"
)
|> drop(columns:["itemid","ltype","hostname"])
data
Variables
As before we have a threshold, Region and category variable with category derived from the same flux query shown above. A new variable is day, which is custom and hidden, It has values 1,2,3,4,…,14. I don’t know how to do a loop in flux or I might have opted a more elegant method to specify the last 14 days.
I did the item variable query a little different, but I think it’s mostly an alternate and could have been the same:
Notice the slightly different handling of Region. And those double-quotes are important, as I learned from the school of hard knocks!
The flux query in the panel is of course different. It looks like this:
import "date"
b = date.add(d: 1d, to: -${day}d)
data = from(bucket: "poc_bucket2")
|> range(start: -${day}d, stop: b)
|> filter(fn: (r) =>
r._measurement == "vedge" and
r._field == "percent" and
r.item == "$item"
)
|> drop(columns:["itemid","ltype","hostname"])
data
So we’re doing some date arithmetic so we can get panel strips, one per day. These panels are long and thin, same as before, but I omitted the title since it’s all the same vedge.
The repeat options are repeat by variable day, repeat direction Vertical as in the other dashboard. The visualization is Stat, as in the other dashboard.
And that’s about it! Here the idea is that you play with the independent variables such as Region and threshold, it generates a list of matching vedge interfaces and you pick one from the drop-down list.
Linking the multiple vedge dashboard to the single vedge history dashboard
Of course the more interactive you make these things the cooler it becomes, right? I was excited to be able to link these two dashboards together in a sensible way.
In the panel config you have Data links. I found this link works:
So to generalize since most of the URL is specific to my implementation, both dashboards utilize the item variable. I basically discovered the URL for a single vedge dashboard and dissected it and parameterized the item, getting the syntax right with a little Internet research.
So the net effect is that when you hover over any of the vedge panels in the multi-vedge dashboard, you can click on that vedge and pull up – in a new tab in my case – the individual vedge usage history. It’s pretty awesome.
Influxdb
Influxdb is a time series database. It takes some getting used to. Here is my cheat sheet which I like to refer to.
A bucket is named location with retention policy where time-series data is stored.
A series is a logical grouping of data defined by shared measurement, tag and field.
A measurement is similar to an SQL database table.
A tag is similar to indexed columns in an SQL database.
A field is similar to unindexed columns in an SQL database.
A point is similar to SQL row.
This is not going to make a lot of sense to anyone who isn’t Dr John. But I’m sure I’ll be referring to this section for a How I did this reminder.
OK. So I wrote a feed_influxdb.py script which runs every 12 minutes in an Azure DevOps pipeline. It extracts the relevant vedge data from Zabbix using the Zabbix api and puts it into my influxdb measurement vedge whose definition I have shown above. I would say the code is fairly generic, except that it relies on the existence of a master file which contains all the relevant static data about the vedges such as their interface names, Zabbix itemids, and their maximum bandwidth (we called it zabbixSpeed). You could pretty much deduce the format of this master file by reverse-engineering this script. So anyway here is feed_influxdb.py.
from pyzabbix import ZabbixAPI
import requests, json, sys, os, re
import time,datetime
from time import sleep
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
from modules import aux_modules,influx_modules
# we need to get data out of Zabbix
inventory_file = 'prod.config.visibility_dashboard_reporting.json'
#inventory_file = 'inv-w-bw.json' # this is a modified version of the above and includes Zabbix bandwidth for most ports
# Login Zabbix API - use hidden variable to this pipeline
token_zabbix = os.environ['ZABBIX_AUTH_TOKEN']
url_zabbix = 'https://zabbix.drjohns.com/'
zapi = ZabbixAPI(url_zabbix)
zapi.login(api_token=token_zabbix)
# Load inventory file
with open(inventory_file) as inventory_file:
inventory_json = json.load(inventory_file)
# Time range which want to get data (unixtimestamp)
inventory_s = json.dumps(inventory_json)
inventory_d = json.loads(inventory_s)
time_till = int(time.mktime(datetime.datetime.now().timetuple()))
time_from = int(time_till - 780) # about 12 minutes plus an extra minute to reflect start delay, etc
i=0
max_items = 200
item_l = []
itemid_to_vedge,itemid_to_ltype,itemid_to_bw,itemid_to_itemname = {},{},{},{}
gmtOffset_d = {}
for SSID in inventory_d:
print('SSID',SSID)
hostname_d = inventory_d[SSID]['hostname']
gmtOffset = aux_modules.gmtOffset_calc(inventory_d[SSID])
gmtOffset_d[SSID] = gmtOffset
for vedge_s in hostname_d:
print('vedge_s',vedge_s,flush=True)
items_l = hostname_d[vedge_s]
for item_d in items_l:
print('item_d',item_d,flush=True)
itemname = item_d['itemname']
if not 'lineType' in item_d: continue # probably SNMP availability or something of no interest to us
lineType = item_d['lineType']
if 'zabbixSpeed' in item_d:
bandwidth = int(item_d['zabbixSpeed'])
else:
bandwidth = 0
itemid = item_d['itemid']
if lineType == 'MPLS' or lineType == 'Internet':
i = i + 1
itemid_to_vedge[itemid] = vedge_s # we need this b.c. Zabbix only returns itemid
itemid_to_ltype[itemid] = lineType # This info is nice to see
itemid_to_bw[itemid] = bandwidth # So we can get percentage used
itemid_to_itemname[itemid] = itemname # So we can get percentage used
item_l.append(itemid)
if i > max_items:
print('item_l',item_l,flush=True)
params = {'itemids':item_l,'time_from':time_from,'time_till':time_till,'history':0,'limit':500000}
print('params',params)
res_d = zapi.do_request('history.get',params)
#print('res_d',res_d)
#exit()
print('After call to zapi.do_request')
result_l = res_d['result']
Pts = aux_modules.zabbix_to_pts(result_l,itemid_to_vedge,itemid_to_ltype,itemid_to_bw,itemid_to_itemname)
for Pt in Pts:
print('Pt',Pt,flush=True)
# DEBUGGING!!! Normally call to data_entry is outside of this loop!!
#influx_modules.data_entry([Pt])
influx_modules.data_entry(Pts,gmtOffset_d)
item_l = [] # empty out item list
i = 0
sleep(0.2)
else:
# we have to deal with leftovers which did not fill the max_items
if i > 0:
print('Remainder section')
print('item_l',item_l,flush=True)
params = {'itemids':item_l,'time_from':time_from,'time_till':time_till,'history':0,'limit':500000}
res_d = zapi.do_request('history.get',params)
print('After call to zapi.do_request')
result_l = res_d['result']
Pts = aux_modules.zabbix_to_pts(result_l,itemid_to_vedge,itemid_to_ltype,itemid_to_bw,itemid_to_itemname)
for Pt in Pts:
# DEBUGGING!!! normally data_entry is called after this loop
print('Pt',Pt,flush=True)
#influx_modules.data_entry([Pt])
influx_modules.data_entry(Pts,gmtOffset_d)
print('All done feeding influxdb!')
I’m not saying it’s great code. I’m only saying that it gets the job done. I made it more generic in April 2023 so much fewer lines of code have hard-coded values, which even I recognized as ugly and limiting. I now feed the dict structure, which is pretty cool It relies on a couple auxiliary scripts. Here is aux_modules.py (it may include some packages I need later on).
import re
import time as tm
import numpy as np
def zabbix_to_pts(result_l,itemid_to_vedge,itemid_to_ltype,itemid_to_bw,itemid_to_itemname):
# turn Zabbix results into a list of points which can be fed into influxdb
# [{'itemid': '682837', 'clock': '1671036337', 'value': '8.298851463718859E+005', 'ns': '199631779'},
Pts = []
for datapt_d in result_l:
itemid = datapt_d['itemid']
time = datapt_d['clock']
value_s = datapt_d['value']
value = float(value_s) # we are getting a floating point represented as a string. Convert back to float
hostname = itemid_to_vedge[itemid]
ltype = itemid_to_ltype[itemid]
itemname = itemid_to_itemname[itemid]
# item is a hybrid tag, like a primary tag key
iface_dir = re.sub(r'(\S+) interface (\S+) .+',r'\1_\2',itemname)
item = hostname + '_' + ltype + '_' + iface_dir
if itemid in itemid_to_bw:
bw_s = itemid_to_bw[itemid]
bw = int(bw_s)
if bw == 0:
percent = 0
else:
percent = int(100*value/bw)
else:
percent = 0
#tags = [{'tag':'hostname','value':hostname},{'tag':'itemid','value':itemid},{'tag':'ltype','value':ltype},{'tag':'item','value':item}]
tags = {'hostname':hostname,'itemid':itemid,'ltype':ltype,'item':item}
fields = {'value':value,'percent':percent}
Pt = {'measurement':'vedge','tags':tags,'fields':fields,'time':time}
Pts.append(Pt)
return Pts
def itembasedd(json_data,Region):
# json_data is the master json file the vedge inventory
itemBasedD = {}
offsetdflt = {'AP':8,'NA':-5,'EU':1,'SA':-3}
for SSID_k in json_data:
SSID_d = json_data[SSID_k]
print('SSID_k',SSID_k)
region = SSID_d['region']
if not region == Region: continue # just look at region of interest
siteCategory = SSID_d['siteCategory']
if 'gmtOffset' in SSID_d:
time_off = SSID_d['gmtOffset']
else:
time_off = offsetdflt[region]
for vedge_k in SSID_d['hostname']:
vedge_l = SSID_d['hostname'][vedge_k]
#print('vedge_d type',vedge_d.__class__)
#print('vedge_d',vedge_d)
for this_item_d in vedge_l:
print('this_item_d',this_item_d)
if not 'lineType' in this_item_d: continue
ltype = this_item_d['lineType']
if not (ltype == 'MPLS' or ltype == 'Internet'): continue
itemname = this_item_d['itemname']
if not re.search('gress ',itemname): continue
itemid = this_item_d['itemid']
if not 'zabbixSpeed' in this_item_d: continue # some dicts may be historic
zabbixSpeed = int(this_item_d['zabbixSpeed']) # zabbixSpeed is stoed as a string
iface = re.sub(r' interface .+','',itemname)
direction = re.sub(r'.+ interface (\S+) traffic',r'\1',itemname)
item = vedge_k + '_' + ltype + '_' + iface + '_' + direction
# we may need additional things in this dict
itemBasedD[itemid] = {"item":item, "Time_Offset":time_off,"region":region,"speed":zabbixSpeed,'category':siteCategory}
print('itemid,itemBasedD',itemid,itemBasedD[itemid])
# let's have a look
#for itemid,items in itemBasedD.items():
#for itemid,items in itemBasedD.items():
# print("item, dict",itemid,items)
return itemBasedD
def getitemlist(region,itemBasedD,max_items):
# return list of itemids we will need for this region
iteml1,iteml2 = [],[]
for itemid,items in itemBasedD.items():
if itemid == '0000': continue
iregion = items['region']
if iregion == region:
if len(iteml1) == max_items:
iteml2.append(itemid)
else:
iteml1.append(itemid)
return iteml1,iteml2
def get_range_data(alldata,itemD):
data_range = []
#
for datal in alldata:
#print("datal",datal)
# check all these keys...
itemid = datal["itemid"]
timei = datal["clock"]
timei = int(timei)
# timei is CET. Subtract 3600 s to arrive at time in UTC.
timei = timei - 3600
# hour of day, UTC TZ
H = int(tm.strftime("%H",tm.gmtime(timei)))
# trasform H based on gmt offset of this vedge
H = H + itemD[itemid]["Time_Offset"]
H = H % 24
# Now check if this hour is in range or 7 AM 7 PM local time
#if H < 7 or H > 18:
if H < 8 or H > 17: # change to 8 AM to 6 PM range 22/03/08
#print("H out of range",H)
continue
data_range.append(datal)
return data_range
def massage_data(alldata,item_based_d):
# itemvals - a dict indexed by itemid
itemvals = {}
#print("alldata type",alldata.__class__)
for datal in alldata:
# datal is a dict
#print("datal type",datal.__class__)
#print("datal",datal)
val = datal["value"]
valf = float(val)
itemid = datal["itemid"]
if not itemid in itemvals:
itemvals[itemid] = []
itemvals[itemid].append(valf)
return itemvals
def domath(itemvals,item_based_d):
for itemid,valarray in itemvals.items():
#print("itemid,valarray",itemid,valarray)
avg = np.average(valarray)
n95 = np.percentile(valarray,95)
max = np.amax(valarray)
speed = item_based_d[itemid]["speed"]
if speed > 0:
avg_percent = 100*avg/speed
n95_percent = 100*n95/speed
max_percent = 100*max/speed
else:
avg_percent = 0.0
n95_percent = 0.0
max_percent = 0.0
avgm = round(avg/1000000.,1) # convert to megabits
n95m = round(n95/1000000.,1)
maxm = round(max/1000000.,1)
item_based_d[itemid]["avg"] = avgm
item_based_d[itemid]["n95"] = n95m
item_based_d[itemid]["max"] = maxm
item_based_d[itemid]["avg_percent"] = round(avg_percent,1)
item_based_d[itemid]["n95_percent"] = round(n95_percent,1)
item_based_d[itemid]["max_percent"] = round(max_percent,1)
item_based_d[itemid]["speedm"] = round(speed/1000000.,1)
#print("item_based_d",item_based_d)
def pri_results(item_based_d):
print('item_based_d',item_based_d)
def stats_to_pts(item_based_d):
# turn item-based dict results into a list of points which can be fed into influxdb
#{'683415': {'item': 'NAUSNEWTO0057_vEdge1_MPLS_ge0/1.4000_ingress', 'region': 'NA', 'category': 'Hybrid Silver+', 'avg': 4.4, 'n95': 16.3, 'max': 19.5, 'avg_percent': 22.0, 'n95_percent': 81.6, 'max_percent': 97.3,
Pts = []
time = int(tm.time()) # kind of a fake time. I don't think it matters
for itemid,itemid_d in item_based_d.items():
category = itemid_d['category']
item = itemid_d['item']
region = itemid_d['region']
t_off = itemid_d['Time_Offset']
speed = float(itemid_d['speed']) # speed needs to be a float
if 'avg' in itemid_d and 'n95' in itemid_d:
avg = itemid_d['avg_percent']
n95 = itemid_d['n95_percent']
max = itemid_d['max_percent']
else:
avg,n95,max = (0.0,0.0,0.0)
tags = {'item':item,'category':category,'region':region,'GMT_offset':t_off}
fields = {'avg':avg,'n95':n95,'max':max,'speed':speed}
Pt = {'measurement':'vedge_stat','tags':tags,'fields':fields,'time':time}
Pts.append(Pt)
return Pts
def gmtOffset_calc(SSID_d):
offsetdflt = {'AP':8,'NA':-5,'EU':1,'SA':-3}
region = SSID_d['region']
if 'gmtOffset' in SSID_d and SSID_d['gmtOffset']:
gmtOffset = SSID_d['gmtOffset']
else:
gmtOffset = offsetdflt[region]
return gmtOffset
import influxdb_client, os, time
from datetime import datetime, timezone
import pytz
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
import random,re
def data_entry(Pts,gmtOffset_d):
# Set up variables
bucket = "poc_bucket2" # DrJ test bucket
org = "poc_org"
influxdb_cloud_token = os.environ['INFLUX_AUTH_TOKEN']
# PROD setup
bucket_prod = "UC02" # we are use case 2
#bucket_prod = "test" # we are use case 2
org_prod = "DrJohns - Network Visibility"
influxdb_cloud_token_prod = os.environ['INFLUX_AUTH_TOKEN_PROD']
# Store the URL of your InfluxDB instance
url_local ="http://10.199.123.233:8086/"
url_prod ="https://westeurope-1.azure.cloud2.influxdata.com/"
# Initialize client
client = influxdb_client.InfluxDBClient(url=url_local,token=influxdb_cloud_token,org=org)
client_prod = influxdb_client.InfluxDBClient(url=url_prod,token=influxdb_cloud_token_prod,org=org_prod)
# Write data
write_api = client.write_api(write_options=SYNCHRONOUS)
write_api_prod = client_prod.write_api(write_options=SYNCHRONOUS)
pts = []
SSID_seen_flag = {}
for Pt in Pts:
item = Pt['tags']['item']
time = int(Pt['time'])
# look up the gmtOffset. SSID is the key to the gmt dict
SSID = re.sub(r'_.+','',item) # NAUSNEWTOO0001_vEdge1_MPLS_ge0/1.4084_ingres
gmtOffset = gmtOffset_d[SSID] # units are hours, and can include fractions
gmtOffset_s = int(3600 * gmtOffset)
time_local = time + gmtOffset_s
# convert seconds since epoch into format required by influxdb. pt_time stays utc, not local!
pt_time = datetime.fromtimestamp(time, timezone.utc).isoformat('T', 'milliseconds')
# pull out the UTC hour
ts = datetime.fromtimestamp(time_local).astimezone(pytz.UTC)
Hlocal = ts.strftime('%H')
if len(Hlocal) == 1: Hlocal = '0' + Hlocal # pad single digits with a leading 0 so sort behaves as expected
# extend dict with tag for UTChour
Pt['tags']['UTChour'] = Hlocal
# overwrite time here
Pt['time'] = pt_time
if not SSID in SSID_seen_flag:
#print('item,Hlocal,gmtOffset,gmtOffset_s,time,time_local',item,Hlocal,gmtOffset,gmtOffset_s,time,time_local) # first iteration print
print('item,Pt',item,Pt)
SSID_seen_flag[SSID] = True
##point = Point(measurement).tag("hostname",hostname).tag("itemid",itemid).tag("ltype",ltype).tag("item",item).tag("UTChour",Hlocal).field('value',value).field('percent',percent).time(pt_time)
# based on https://github.com/influxdata/influxdb-client-python/blob/master/influxdb_client/client/write/point.py
point = Point.from_dict(Pt)
pts.append(point)
# write to POC and PROD buckets for now
print('Writing pts to old and new Influx locations')
write_api.write(bucket=bucket, org="poc_org", record=pts, write_precision=WritePrecision.S)
write_api_prod.write(bucket=bucket_prod, org=org_prod, record=pts, write_precision=WritePrecision.S)
def data_entry_stats(Pts):
# Set up variables
bucket = "poc_bucket2" # DrJ test bucket
org = "poc_org"
influxdb_cloud_token = os.environ['INFLUX_AUTH_TOKEN']
# Store the URL of your InfluxDB instance
url_local ="http://10.199.123.233:8086/"
url_prod ="https://westeurope-1.azure.cloud2.influxdata.com/"
# PROD setup
bucket_prod = "UC02" # we are use case 2
org_prod = "DrJohns - Network Visibility"
influxdb_cloud_token_prod = os.environ['INFLUX_AUTH_TOKEN_PROD']
# Initialize client
client = influxdb_client.InfluxDBClient(url=url_local,token=influxdb_cloud_token,org=org)
client_prod = influxdb_client.InfluxDBClient(url=url_prod,token=influxdb_cloud_token_prod,org=org_prod)
# Write data
write_api = client.write_api(write_options=SYNCHRONOUS)
write_api_prod = client_prod.write_api(write_options=SYNCHRONOUS)
pts = []
for Pt in Pts:
# debug
# print('avg type',avg.__class__,'item',item,flush=True)
time = Pt['time']
# convert seconds since epoch into format required by influxdb
pt_time = datetime.fromtimestamp(int(time), timezone.utc).isoformat('T', 'milliseconds')
# overwrite time here
Pt['time'] = pt_time
##point = Point(measurement).tag("item",item).tag("category",category).tag("region",region).tag("GMT_offset",t_off).field('n95',n95).field('avg',avg).field('max',max).field('speed',speed).time(pt_time)
# see aux_modules stats_to_Pts for our dictionary structure for Pt
point = Point.from_dict(Pt)
pts.append(point)
print('Write to old and new influxdb instances')
write_api.write(bucket=bucket, org="poc_org", record=pts, write_precision=WritePrecision.S)
write_api_prod.write(bucket=bucket_prod, org=org_prod, record=pts, write_precision=WritePrecision.S)
These scripts show how I accumulate a bunch of points and make an entry in influxdb once I have a bunch of them to make things go faster. These days I am updating two influxdb instances: a production one that actually uses InfluxDB Cloud (hence the URL is a generic endpoint which may actually work for you), and a POC one which I run on my private network.
What it looks like
This is the view of multiple vedges which match the selection criteria of high bandwidth usage in region Europe:
Then I figured out how to provide a link to a detailed traffic graph for this selection criteria. Obviously, that mostly involved switching the visualization to Time Series. But I wanted as well to provide the interface bandwidth on the same graph. That was tricky and involved creating a transform that is a config query which takes speed from the table and turns it into Threshold1, which I draw as a red dashed line. It’s sort of too much detail to go into it further in this article. I wanted to make a second config query but it turns out this is not supported – still.
As for the link, I have a text panel where I use raw html. My HTML, which creates the active link you see displayed is:
So here is what the detailed traffic graph looks like:
I love that red dashed line showing the interface bandwidth capacity!
I almost forgot to mention it, there is a second query, B, which I use as a basis for the dynamic threshold to pick up the “speed” of the interface. Here it is:
data = from(bucket: "UC02")
|> range(start: -1d, stop: now())
|> filter(fn: (r) =>
r._measurement == "vedge_stat" and
r._field == "speed" and r.item == "${item}"
)
|> drop(columns: ["item","category","region","GMT_offset"])
data
Back to single vedge
At the top of this post I showed the heat map for a single vedge. It includes an active link which leads to a detailed traffic graph. That link in turn is in a Text Panel with HTML text. This is the HTML.
The single vedge detailed graph is a little different from the multiple vedge detailed graph – but not by much. I am getting long-winded so I will omit all the details. Mainly I’ve just blown up the vertical scale and omit panel iteration. So here is what you get:
In full disclosure
In all honesty I added another field called speed to the vedge_stats InfluxDB measurement. It’s kind of redundant, but it made things a lot simpler for me. It is that field I use in the config query to set the threshold which I draw with a red dashed line.
Not sure I mentioned it, but at some piont I re-interpreted the meaning of UTChour to be local time zone hour! This also was a convenience for me since there was a desire to display the heat maps in the local timezone. Instead of messing around with shifting hours in flux query language – which would have taken me days or weeks to figure out, I just did it in my python code I (think) I shared above. So much easier…
Complaints
I am not comfortable with the flux query documentation. Nor the Grafana documentation for that matter. They both give you a taste of what you need, without many details or examples. For instance it looks like there are multiple syntaxes available to you using flux. I just pragmatically developed what works.
Conclusion
Yes, this is truly amateur hour for InfluxDB and Grafana. But even amateurs can produce decent results if the tools are adequate. And that’s the case here. The results I am producing are “good enough” for our purposes – internal usage.
I am content if no one reads this and it only serves as my own documentation. But perhaps it will help someone facing similar issues. Unfortunately one of the challenges is asking a good question to a search engine when you’re a newbie and haven’t mastered the concepts.
But without too too much effort I was able to master enough Grafana to create something that will probably be useful to our vendor management team. Grafana is fun and powerful. There is a slight lack of examples and the documentation is a wee bit sparse. InfluxDB seems to be a good back-end database for Grafana to use. The flux query language, while still obscure to me, was sufficiently powerful enough for me to get my basic goals accomplished.
Or the beginning or creating your own smart speaker
Intro
Imagine you could use a low-cost device to interpret speech without the aid of the big cloud services and their complexity and security and big-brotherly-ness. Well if you have a DIY mindset, you can!
I wanted to control the raspberry pi-based slideshow I have written about many times in the past with voice commands. The question became How could I do it and is it even possible at all? And would I need to master the complex apis provided by either Amazon or Google cloud services? Well, it turns out that it is possible to do passable speech to text without any external cloud provider; and I am very excited to share what I’ve learned so far.
It will be helpful to install and test the examples:
git clone https://github.com/alphacep/vosk-api
cd vosk-api/python/example
python3 ./test_simple.py test.wav
On my RPi 4 it took 36 s the first time, and 6.6 s the second time to run this test.wav. So I got worried and fully expected it would be just too slow on these underpowed RPi systems.
But I forged ahead and looked for an example which could do real-time speech-to-text. They provide a microphone example. It requires some additional packages. But even after installing them it still produced a nasty segmentation fault. So I gave up on that. Then I noticed an ffmpeg-based example. Well, turns out I have lots of prior ffmpeg experience as I also post about live recording of audio with the raspberry pi.
It turns out their example was simply to use ffmpeg to interpret a file, but I didn’t know that to begin with. But I know my way around ffmpeg that I could use it for processing a lvie stream. So I made those changes, and voila. I’m glad to say I was dead wrong about the processing speed. On the RPi 4 it can keep up with its text-to-speech task in real time!
Basic program to examine your speech in real time
I developed the following python script based off one of the python examples from the api. I call it drjtst4.py, just to give it a name:
#!/usr/bin/env python3
import subprocess
import re
from modules import aux_modules
from vosk import Model, KaldiRecognizer, SetLogLevel
SAMPLE_RATE = 16000
SetLogLevel(0)
model = Model(lang="en-us")
rec = KaldiRecognizer(model, SAMPLE_RATE)
start,start_a = 0,0
input_device = 'plughw:1,0'
phrase = ''
accumulating = False
# wake word hey photo is often confused with a photo by vosk...
wake_word_re = '^(hey|a) photo'
with subprocess.Popen(["ffmpeg","-loglevel", "quiet","-f","alsa","-i",
input_device,
"-ar", str(SAMPLE_RATE) , "-ac", "1", "-f", "s16le", "-"],
stdout=subprocess.PIPE) as process:
while True:
data = process.stdout.read(4000)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
print('in first part')
print(rec.Result())
text = rec.PartialResult()
# text is a "string" which is basically a dict
start,start_a,accumulating,phrase = aux_modules.process_text(wake_word_re,text,start,start_a,accumulating,phrase)
else:
# this part always seems to be executed for whatever reason
print('in else part')
text = rec.PartialResult()
start,start_a,accumulating,phrase = aux_modules.process_text(wake_word_re,text,start,start_a,accumulating,phrase)
print(rec.PartialResult())
# we never seem to get here
print(rec.FinalResult())
print('In final part')
text = rec.FinalResult()
I created a modules directory and in it a file called aux_modules.py. It look like this:
import re,time,json
def process_text(wake_word_re,text_s,start,start_a,accumulating,phrase):
max = 5.5 # seconds
inactivity = 10 # seconds
short_max = 1.5 # seconds
elapsed = 0
if time.time() - start_a < inactivity:
# Allow some time to elapse since we just took an action
return start,start_a,accumulating,phrase
# convert text to real text. Real text is in 'partial'
text_d = json.loads(text_s)
text = ''
if 'partial' in text_d:
text = text_d['partial']
if 'text' in text_d:
text = text_d['text']
if not text == '': phrase = text
if re.search(wake_word_re,text):
if not accumulating:
start = time.time()
accumulating = True
print('Wake word detected. Now accumulating text.')
l = len(re.split(r'\s',text))
print('text, word ct',text,l)
if accumulating:
elapsed = time.time() - start
print('Elapsed time:',elapsed)
if l > 1:
phrase = text
if elapsed > max or (elapsed > short_max and l == 1):
# we're at a natural ending here...
print('This is the total text',phrase)
# do some action
# reset everything
accumulating = False
phrase = ''
start_a = time.time()
return start,start_a,accumulating,phrase
And you just invoke it with python3 drjtst4.py.
Sample session output
in else part
text, word ct 1
{
"partial" : ""
}
in else part
text, word ct hey 1
{
"partial" : "hey"
}
in else part
text, word ct hey 1
{
"partial" : "hey"
}
in else part
text, word ct hey 1
{
"partial" : "hey"
}
in else part
Wake word detected. Now accumulating text.
text, word ct hey photo 2
Elapsed time: 0.0004639625549316406
{
"partial" : "hey photo"
}
in else part
text, word ct hey photo 2
Elapsed time: 0.003415822982788086
{
"partial" : "hey photo"
}
in else part
text, word ct hey photo 2
Elapsed time: 0.034906625747680664
{
"partial" : "hey photo"
}
in else part
text, word ct hey photo 2
Elapsed time: 0.09063172340393066
{
"partial" : "hey photo"
}
in else part
text, word ct hey photo 2
Elapsed time: 0.2488384246826172
{
"partial" : "hey photo"
}
in else part
text, word ct hey photo 2
Elapsed time: 0.33771753311157227
{
"partial" : "hey photo"
}
in else part
text, word ct hey photo place 3
Elapsed time: 0.7102789878845215
{
"partial" : "hey photo place"
}
in else part
text, word ct hey photo place 3
Elapsed time: 0.7134637832641602
{
"partial" : "hey photo place"
}
in else part
text, word ct hey photo player 3
Elapsed time: 0.8728365898132324
{
"partial" : "hey photo player"
}
in else part
text, word ct hey photo player 3
Elapsed time: 0.8759913444519043
{
"partial" : "hey photo player"
}
in else part
text, word ct hey photo play slideshow 4
Elapsed time: 1.0684640407562256
{
"partial" : "hey photo play slideshow"
}
in else part
text, word ct hey photo play slideshow 4
Elapsed time: 1.0879075527191162
{
"partial" : "hey photo play slideshow"
}
in else part
text, word ct hey photo play slideshow 4
Elapsed time: 1.3674390316009521
{
"partial" : "hey photo play slideshow"
}
in else part
text, word ct hey photo play slideshow 4
Elapsed time: 1.3706269264221191
{
"partial" : "hey photo play slideshow"
}
in else part
text, word ct hey photo play slideshow 4
Elapsed time: 1.5532972812652588
{
"partial" : "hey photo play slideshow"
}
in else part
text, word ct hey photo play slideshow 4
Elapsed time: 1.5963218212127686
{
"partial" : "hey photo play slideshow"
}
in else part
text, word ct hey photo play slideshow 4
Elapsed time: 1.74298095703125
{
"partial" : "hey photo play slideshow"
}
in else part
text, word ct hey photo play slideshow 4
Elapsed time: 1.842745065689087
{
"partial" : "hey photo play slideshow"
}
in else part
text, word ct hey photo play slideshow 4
Elapsed time: 1.9888567924499512
{
"partial" : "hey photo play slideshow"
}
in else part
text, word ct hey photo play slideshow 4
Elapsed time: 2.0897343158721924
{
"partial" : "hey photo play slideshow"
}
in first part
{
"text" : "hey photo play slideshow"
}
text, word ct 1
Elapsed time: 2.3853299617767334
This is the total text hey photo play slideshow
in else part
{
"partial" : ""
}
in else part
{
"partial" : ""
}
A word on accuracy
It isn’t Alexa or Google. No one expected it would be, right? But if you’re a native English speaker it isn’t too bad. You can see it trying to correct itself.
The desire to choose an uncommon wake word of three syllables is at direct odds with how neural networks are trained! So… although I desired my wake word to be “hey photo,” I also allow “a photo.” A photo was probably in their training set whereas Hey photo certainly was not. Hence the bias against recognizing a unique wake word. And no way will I re-train their model – way too much effort. But to lower false positives this phrase has to occur at the beginning of a spoken phrase.
Turning this into a smart speaker
You can see I’ve got all the pieces set up. At least I think I do! I’ve got my wake word. I don’t have natural language processing but I think I can forgo that. I have a place in the code where I print out the “final text.” That’s where the spoken command is perceived to have been uttered and and a potential action could be exectured at that point.
Dead ends
To be fleshed out later as time permits.
Conclusion
I have demonstrated how speech-to-text without use of complex cloud apis such as those provided by Amazon and Google can be easily achieved on an inexpensive raspberry pi.
I will be building on this facility in subsequent posts as I turn my RPi-powered slideshow into a slideshow which reacts to voice commands!
As far as I can tell there’s no way to search through multiple pipeline logs with a single command. In linux it’s trivial. Seeing the lack of this basic functionality I decided to copy all my pipeline logs over to a linux server using the Azure DevOps (ADO) api.
The details
This is the main program which I’ve called get_raw_logs.py.
#!/usr/bin/python3
# fetch raw log to local machine
# for relevant api section, see:
#https://learn.microsoft.com/en-us/rest/api/azure/devops/build/builds/get-build-log?view=azure-devops-rest-7.1
import urllib.request,json,sys
from datetime import datetime,timedelta
from modules import aux_modules
conf_file = sys.argv[1]
# pipeline uses UTC so we must follow suit or we will miss files
a_day_ago = (datetime.utcnow() - timedelta(days = 1)).strftime('%Y-%m-%dT%H:%M:%SZ')
print('a day ago (UTC)',a_day_ago)
#url = 'https://dev.azure.com/drjohns4ServicesCoreSystems/Connectivity/_apis/build/builds?minTime=2022-10-11T13:00:00Z&api-version=7.1-preview.7'
# dump config file into a dict
config_d = aux_modules.parse_config(conf_file)
url = config_d['url_base'] + config_d['organization'] + '/' + config_d['project'] + '/_apis/build/builds?minTime=' + a_day_ago + config_d['url_params']
#print('url',url)
req = urllib.request.Request(url)
req.add_header('Authorization', 'Basic ' + config_d['auth'])
# Get buildIds for pipeline runs from last 24 hours
with urllib.request.urlopen(req) as response:
html = response.read()
txt_d = json.loads(html)
#{"count":215,"value":[{"id":xxx,buildNumber":"20221011.106","definition":{"name":"PAN-Usage4Mgrs-2"
value_l = txt_d['value']
for builds in value_l:
buildId = builds['id']
build_number = builds['buildNumber']
build_def = builds['definition']
name = build_def['name']
#print('name,build_number,id',name,build_number,buildId)
#print('this_build',builds)
if name == config_d['pipeline1'] or name == config_d['pipeline2']:
aux_modules.get_this_log(config_d,name,buildId,build_number)
It runs very efficiently so I run it every three minutes.
In my pipelines, all the interesting stuff is in logId 7 so I’ve hardcoded that. It could have turned out differently. Notice I am getting the logs from two pipelines due to the limitation, discussed previously, that you can only run 1000 pipeline runs a week so I was forced to run two identical ones, staggered, every 12 minutes with pipeline-2 sleeping the first six minutes.
The auth is the base-64 encoded text for any:<my_auth_token>.
Conclusion
I show how to copy the logs over from Azure DevOps pipeline runs to a local Unix system where you can do normal cool linux commands on them.