Categories
Visibility

Everything I need to know about Influxdb and Grafana

Intro

Some of my colleagues had used Influxdb and Grafana at their previous job so they thought it might fit for what we’re doing in the Visibility team. It sounded good in theory, anyway, so I had to agree. There were a lot of pitfalls. Eventually I got it to the point where I’m satisfied with my accomplishments and want to document the hurdles I’ve overcome.

So as time permits I will be fleshing this out.

Grafana

I’m going to lead with the picture and then the explanation makes a lot more sense.

I’ve spent the bulk of my time wrestling with Grafana. Actually it looks like a pretty capable tool. It’s mostly just understanding how to make it do what you are dreaming about. Our installed version currently is 9.2.1.

My goal is to make a heatmap. But a special kind similar to what I saw the network provider has. That would namely entail one vedge per row, and one column per hour, hence, 24 columns in total. A vedge is a kind of SD-Wan router. I want to help the networking group look at hundreds of them at a time. So that’s on potential dashboard. It would give a view of a day. Another dashboard would show just one router with the each row representing a day, and the columns again showing an hour. Also a heatmap. The multi-vedge dashboard should link to the individual dashboard, ideally. In the end I pulled it off. I am also responsible for feeding the raw data into Influxdb and hence also for the table design.

Getting a workable table design was really imporant. I tried to design it in a vacuum, but that only partially worked. So I revised, adding tags and fields as I felt I needed to, while being mindful of not blowing up the cardinality. I am now using these two tables, sorry, measurements.

vedge measurement
vedge_stats measurement

Although there are hundreds of vedges, some of my tags are redundant, so don’t get overly worried about my high cardinality. UTChour is yes a total kludge – not the “right” way to do things. But I’m still learning and it was simpler in my mind. item in the first measurement is redundant with itemid. But it is more user-friendly: a human-readable name.

Influx Query Explorer

It’s very helpful to use the Explorer, but the synatx there is not exactly the same as it will be when you define template variables. Go figure.

Multiple vedges for the last day

So how did I do it in the end?

Mastering template variables is really key here. I have a drop-down selection for region. In Grafana-world it is a custom variable with potential values EU,NA,SA,AP. That’s pretty easy. I also have a threshold variable, with possible values: 0,20,40,60,80,90,95. And a math variable with values n95,avg,max. More recently I’ve added a threshold_max and a math_days variable.

It gets more interesting however, I promise. I have a category variable which is of type query:

from(bucket: "poc_bucket2")
|> range (start: -1d)
|> filter(fn:(r) => r._measurement == "vedge_stats")
|> group()
|> distinct(column: "category")

Multi-value and Include all options are checked. Just to make it meaningful, category is assigned by the WAN provider and has values such as Gold, Silver, Bronze.

And it gets still more interesting because the last variable depends on the earlier ones, hence we are using chained variables. The last variable, item, is defined thusly:

from(bucket: "poc_bucket2")
|> range (start: -${math_days}d)
|> filter(fn:(r) => r._measurement == "vedge_stats" and r.region == "${Region}")
|> filter(fn:(r) => contains(value: r.category, set: ${category:json}))
|> filter(fn:(r) => r._field == "${math}" and r._value >= ${threshold} and r._value <= ${threshold_max})
|> group()
|> distinct(column: "item")

So what it is designed to do is to generate a list of all the items, which in reality are particular interfaces of the vedges, into a drop-down list.

Note that I want the user to be able to select multiple categories. It’s not well-documented how to chain such a variable, so note the use of contains and set in that one filter function.

And note the double-quotes around ${Region}, another chained variable. You need those double-quotes! It kind of threw me because in Explorer I believe you may not need them.

And all that would be simply nice if we didn’t somehow incorporate these template variables into our panels. I use the Stat visualization. So you’ll get one stat per series. That’s why I artifically created a tag UTChour, so I could easily get a unique stat box for each hour.

The stat visualization flux Query

Here it is…

data = from(bucket: "poc_bucket2")
  |> range(start: -24h, stop: now())
  |> filter(fn: (r) =>
    r._measurement == "vedge" and
    r._field == "percent" and r.hostname =~ /^${Region}/ and r.item == "${item}"
  )
  |> drop(columns: ["itemid","ltype","hostname"])
data

Note I hae dropped my extra tags and such which I do not wish to appear during a mouseover.

Remember our regions can be one of AP,EU,NA or SA? Well the hostnames assigned to each vedge start with the two letters of its region of location. Hence the regular explression matching works there to restrict consideration to just the vedges in the selected region.

We are almost done.

Making it a heat map

So my measurement has a tag called percent, which is the percent of available bandwidth that is being used. So I created color-based thresholds:

Colorful percent-based thresholds

You can imagine how colorful the dashboard gets as you ratchet up the threshold template variable. So the use of these thresholds is what turns our stat squares into a true heatmap.

Heatmap visualization

I found the actual heatmap visualization useless for my purposes, by the way!

There is also an unsupported heatmap plugin for Grafana which simply doesn’t work. Hence my roll-your-own approach.

Repitition

How do we get a panel row per vedge? The stat visualization has a feature called Repeat Options. So you repeat by variable. The variable selected is item. Remember that item came from our very last template variable. Repeat direction is Vertical.

For calculation I choose mean. Layout orienttion is Vertical.

The visualization title is also variable-driven. It is ${item} .

The panels are long and thin. Like maybe two units high? – one unit for the label (the item) and the one below it for the 24 horizontal stat boxes.

Put it all together and voila, it works and it’s cool and interactive and fast!

Single vedge heatmap data over multiple days

Of course this is very similar to the multiple vedge dashboard. But now we’re drilling down into a single vedge to look at its usage over a period of time, such as the last two weeks.

Flux query
import "date"
b = date.add(d: 1d, to: -${day}d)
data = from(bucket: "poc_bucket2")
  |> range(start: -${day}d, stop: b)
  |> filter(fn: (r) =>
    r._measurement == "vedge" and
    r._field == "percent" and
    r.item == "$item"
  )
  |> drop(columns:["itemid","ltype","hostname"])
data
Variables

As before we have a threshold, Region and category variable with category derived from the same flux query shown above. A new variable is day, which is custom and hidden, It has values 1,2,3,4,…,14. I don’t know how to do a loop in flux or I might have opted a more elegant method to specify the last 14 days.

I did the item variable query a little different, but I think it’s mostly an alternate and could have been the same:

from(bucket: "poc_bucket2")
|> range (start: -${math_days}d)
|> filter(fn:(r) => r._measurement == "vedge_stats" and r.region == "${Region}")
|> filter(fn:(r) => contains(value: r.category, set: ${category:json}))
|> filter(fn:(r) => r._field == "${math}" and r._value >= ${threshold} and r._value <= ${threshold_max})
|> group()
|> distinct(column: "item")

Notice the slightly different handling of Region. And those double-quotes are important, as I learned from the school of hard knocks!

The flux query in the panel is of course different. It looks like this:

import "date"
b = date.add(d: 1d, to: -${day}d)
data = from(bucket: "poc_bucket2")
  |> range(start: -${day}d, stop: b)
  |> filter(fn: (r) =>
    r._measurement == "vedge" and
    r._field == "percent" and
    r.item == "$item"
  )
  |> drop(columns:["itemid","ltype","hostname"])
data

So we’re doing some date arithmetic so we can get panel strips, one per day. These panels are long and thin, same as before, but I omitted the title since it’s all the same vedge.

The repeat options are repeat by variable day, repeat direction Vertical as in the other dashboard. The visualization is Stat, as in the other dashboard.

And that’s about it! Here the idea is that you play with the independent variables such as Region and threshold, it generates a list of matching vedge interfaces and you pick one from the drop-down list.

Linking the multiple vedge dashboard to the single vedge history dashboard

Of course the more interactive you make these things the cooler it becomes, right? I was excited to be able to link these two dashboards together in a sensible way.

In the panel config you have Data links. I found this link works:

https://drjohns.com:3000/d/1MqpjD24k/single-vedge-usage-history?orgId=1&var-threshold=60&var-math=n95&${item:queryparam}

So to generalize since most of the URL is specific to my implementation, both dashboards utilize the item variable. I basically discovered the URL for a single vedge dashboard and dissected it and parameterized the item, getting the syntax right with a little Internet research.

So the net effect is that when you hover over any of the vedge panels in the multi-vedge dashboard, you can click on that vedge and pull up – in a new tab in my case – the individual vedge usage history. It’s pretty awesome.

Influxdb

Influxdb is a time series database. It takes some getting used to. Here is my cheat sheet which I like to refer to.

  • bucket is named location with retention policy where time-series data is stored.
  • series is a logical grouping of data defined by shared measurement, tag and field.
  • measurement is similar to an SQL database table.
  • tag is similar to indexed columns in an SQL database.
  • field is similar to unindexed columns in an SQL database.
  • point is similar to SQL row.

This is not going to make a lot of sense to anyone who isn’t Dr John. But I’m sure I’ll be referring to this section for a How I did this reminder.

OK. So I wrote a feed_influxdb.py script which runs every 12 minutes in an Azure DevOps pipeline. It extracts the relevant vedge data from Zabbix using the Zabbix api and puts it into my influxdb measurement vedge whose definition I have shown above. I would say the code is fairly generic, except that it relies on the existence of a master file which contains all the relevant static data about the vedges such as their interface names, Zabbix itemids, and their maximum bandwidth (we called it zabbixSpeed). You could pretty much deduce the format of this master file by reverse-engineering this script. So anyway here is feed_influxdb.py.

                    

from pyzabbix import ZabbixAPI
import requests, json, sys, os, re
import time,datetime
from time import sleep
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
from modules import aux_modules,influx_modules

# we need to get data out of Zabbix
inventory_file = 'prod.config.visibility_dashboard_reporting.json'
#inventory_file = 'inv-w-bw.json' # this is a modified version of the above and includes Zabbix bandwidth for most ports
# Login Zabbix API - use hidden variable to this pipeline
token_zabbix = os.environ['ZABBIX_AUTH_TOKEN']
url_zabbix = 'https://zabbix.drjohns.com/'
zapi = ZabbixAPI(url_zabbix)
zapi.login(api_token=token_zabbix)
# Load inventory file
with open(inventory_file) as inventory_file:
    inventory_json = json.load(inventory_file)
# Time range which want to get data (unixtimestamp)
inventory_s = json.dumps(inventory_json)
inventory_d = json.loads(inventory_s)
time_till = int(time.mktime(datetime.datetime.now().timetuple()))
time_from = int(time_till - 780)  # about 12 minutes plus an extra minute to reflect start delay, etc
i=0
max_items = 200
item_l = []
itemid_to_vedge,itemid_to_ltype,itemid_to_bw,itemid_to_itemname = {},{},{},{}
gmtOffset_d = {}
for SSID in inventory_d:
    print('SSID',SSID)
    hostname_d = inventory_d[SSID]['hostname']
    gmtOffset = aux_modules.gmtOffset_calc(inventory_d[SSID])
    gmtOffset_d[SSID] = gmtOffset
    for vedge_s in hostname_d:
        print('vedge_s',vedge_s,flush=True)
        items_l = hostname_d[vedge_s]
        for item_d in items_l:
            print('item_d',item_d,flush=True)
            itemname = item_d['itemname']
            if not 'lineType' in item_d: continue # probably SNMP availability or something of no interest to us
            lineType = item_d['lineType']
            if 'zabbixSpeed' in item_d:
                bandwidth = int(item_d['zabbixSpeed'])
            else:
                bandwidth = 0
            itemid = item_d['itemid']
            if lineType == 'MPLS' or lineType == 'Internet':
                i = i + 1
                itemid_to_vedge[itemid] = vedge_s # we need this b.c. Zabbix only returns itemid
                itemid_to_ltype[itemid] = lineType # This info is nice to see
                itemid_to_bw[itemid] = bandwidth # So we can get percentage used
                itemid_to_itemname[itemid] = itemname # So we can get percentage used
                item_l.append(itemid)
                if i > max_items:
                    print('item_l',item_l,flush=True)
                    params = {'itemids':item_l,'time_from':time_from,'time_till':time_till,'history':0,'limit':500000}
                    print('params',params)
                    res_d = zapi.do_request('history.get',params)
                    #print('res_d',res_d)
                    #exit()
                    print('After call to zapi.do_request')
                    result_l = res_d['result']
                    Pts = aux_modules.zabbix_to_pts(result_l,itemid_to_vedge,itemid_to_ltype,itemid_to_bw,itemid_to_itemname)
                    for Pt in Pts:
                        print('Pt',Pt,flush=True)
                        # DEBUGGING!!! Normally call to data_entry is outside of this loop!!
                        #influx_modules.data_entry([Pt])
                    influx_modules.data_entry(Pts,gmtOffset_d)
                    item_l = [] # empty out item list
                    i = 0
                    sleep(0.2)
else:
# we have to deal with leftovers which did not fill the max_items
    if i > 0:
                    print('Remainder section')
                    print('item_l',item_l,flush=True)
                    params = {'itemids':item_l,'time_from':time_from,'time_till':time_till,'history':0,'limit':500000}
                    res_d = zapi.do_request('history.get',params)
                    print('After call to zapi.do_request')
                    result_l = res_d['result']
                    Pts = aux_modules.zabbix_to_pts(result_l,itemid_to_vedge,itemid_to_ltype,itemid_to_bw,itemid_to_itemname)
                    for Pt in Pts:
                        # DEBUGGING!!! normally data_entry is called after this loop
                        print('Pt',Pt,flush=True)
                        #influx_modules.data_entry([Pt])
                    influx_modules.data_entry(Pts,gmtOffset_d)
print('All done feeding influxdb!')

I’m not saying it’s great code. I’m only saying that it gets the job done. I made it more generic in April 2023 so much fewer lines of code have hard-coded values, which even I recognized as ugly and limiting. I now feed the dict structure, which is pretty cool It relies on a couple auxiliary scripts. Here is aux_modules.py (it may include some packages I need later on).

                    

import re
import time as tm
import numpy as np

def zabbix_to_pts(result_l,itemid_to_vedge,itemid_to_ltype,itemid_to_bw,itemid_to_itemname):

# turn Zabbix results into a list of points which can be fed into influxdb
# [{'itemid': '682837', 'clock': '1671036337', 'value': '8.298851463718859E+005', 'ns': '199631779'},

    Pts = []
    for datapt_d in result_l:
        itemid = datapt_d['itemid']
        time = datapt_d['clock']
        value_s = datapt_d['value']
        value = float(value_s) # we are getting a floating point represented as a string. Convert back to float
        hostname = itemid_to_vedge[itemid]
        ltype = itemid_to_ltype[itemid]
        itemname = itemid_to_itemname[itemid]
# item is a hybrid tag, like a primary tag key
        iface_dir = re.sub(r'(\S+) interface (\S+) .+',r'\1_\2',itemname)
        item = hostname + '_' + ltype + '_' + iface_dir
        if itemid in itemid_to_bw:
            bw_s = itemid_to_bw[itemid]
            bw = int(bw_s)
            if bw == 0:
                percent = 0
            else:
                percent = int(100*value/bw)
        else:
            percent = 0
        #tags = [{'tag':'hostname','value':hostname},{'tag':'itemid','value':itemid},{'tag':'ltype','value':ltype},{'tag':'item','value':item}]
        tags = {'hostname':hostname,'itemid':itemid,'ltype':ltype,'item':item}
        fields = {'value':value,'percent':percent}
        Pt = {'measurement':'vedge','tags':tags,'fields':fields,'time':time}
        Pts.append(Pt)
    return Pts
def itembasedd(json_data,Region):
# json_data is the master json file the vedge inventory
    itemBasedD = {}

    offsetdflt = {'AP':8,'NA':-5,'EU':1,'SA':-3}

    for SSID_k in json_data:
        SSID_d = json_data[SSID_k]
        print('SSID_k',SSID_k)
        region = SSID_d['region']
        if not region == Region: continue # just look at region of interest
        siteCategory = SSID_d['siteCategory']
        if 'gmtOffset' in SSID_d:
            time_off = SSID_d['gmtOffset']
        else:
            time_off = offsetdflt[region]
        for vedge_k in SSID_d['hostname']:
            vedge_l = SSID_d['hostname'][vedge_k]
            #print('vedge_d type',vedge_d.__class__)
            #print('vedge_d',vedge_d)
            for this_item_d in vedge_l:
                    print('this_item_d',this_item_d)
                    if not 'lineType' in this_item_d: continue
                    ltype = this_item_d['lineType']
                    if not (ltype == 'MPLS' or ltype == 'Internet'): continue
                    itemname = this_item_d['itemname']
                    if not re.search('gress ',itemname): continue
                    itemid =  this_item_d['itemid']
                    if not 'zabbixSpeed' in this_item_d: continue # some dicts may be historic
                    zabbixSpeed = int(this_item_d['zabbixSpeed']) # zabbixSpeed is stoed as a string
                    iface = re.sub(r' interface .+','',itemname)
                    direction = re.sub(r'.+ interface (\S+) traffic',r'\1',itemname)
                    item = vedge_k + '_' + ltype + '_' + iface + '_' + direction
# we may need additional things in this dict
                    itemBasedD[itemid] = {"item":item, "Time_Offset":time_off,"region":region,"speed":zabbixSpeed,'category':siteCategory}
                    print('itemid,itemBasedD',itemid,itemBasedD[itemid])
# let's have a look
#for itemid,items in itemBasedD.items():
#for itemid,items in itemBasedD.items():
#    print("item, dict",itemid,items)

    return itemBasedD

def getitemlist(region,itemBasedD,max_items):
# return list of itemids we will need for this region
    iteml1,iteml2 = [],[]
    for itemid,items in itemBasedD.items():
        if itemid == '0000': continue
        iregion = items['region']
        if iregion == region:
            if len(iteml1) == max_items:
                iteml2.append(itemid)
            else:
                iteml1.append(itemid)

    return iteml1,iteml2

def get_range_data(alldata,itemD):
    data_range = []
#
    for datal in alldata:
        #print("datal",datal)
# check all these keys...
        itemid = datal["itemid"]
        timei = datal["clock"]
        timei = int(timei)
# timei is CET. Subtract 3600 s to arrive at time in UTC.
        timei = timei - 3600
# hour of day, UTC TZ
        H = int(tm.strftime("%H",tm.gmtime(timei)))
# trasform H based on gmt offset of this vedge
        H = H + itemD[itemid]["Time_Offset"]
        H = H % 24
# Now check if this hour is in range or 7 AM 7 PM local time
        #if H < 7 or H > 18:
        if H < 8 or H > 17: # change to 8 AM to 6 PM range 22/03/08
        #print("H out of range",H)
            continue
        data_range.append(datal)

    return data_range

def massage_data(alldata,item_based_d):
# itemvals - a dict indexed by itemid
    itemvals = {}
    #print("alldata type",alldata.__class__)
    for datal in alldata:
# datal is a dict
        #print("datal type",datal.__class__)
        #print("datal",datal)
        val = datal["value"]
        valf = float(val)
        itemid = datal["itemid"]
        if not itemid in itemvals:
            itemvals[itemid] = []
        itemvals[itemid].append(valf)

    return itemvals

def domath(itemvals,item_based_d):
    for itemid,valarray in itemvals.items():
        #print("itemid,valarray",itemid,valarray)
        avg = np.average(valarray)
        n95 = np.percentile(valarray,95)
        max = np.amax(valarray)
        speed = item_based_d[itemid]["speed"]
        if speed > 0:
            avg_percent = 100*avg/speed
            n95_percent = 100*n95/speed
            max_percent = 100*max/speed
        else:
            avg_percent = 0.0
            n95_percent = 0.0
            max_percent = 0.0

        avgm = round(avg/1000000.,1) # convert to megabits
        n95m = round(n95/1000000.,1)
        maxm = round(max/1000000.,1)
        item_based_d[itemid]["avg"] = avgm
        item_based_d[itemid]["n95"] = n95m
        item_based_d[itemid]["max"] = maxm
        item_based_d[itemid]["avg_percent"] = round(avg_percent,1)
        item_based_d[itemid]["n95_percent"] = round(n95_percent,1)
        item_based_d[itemid]["max_percent"] = round(max_percent,1)
        item_based_d[itemid]["speedm"] = round(speed/1000000.,1)

    #print("item_based_d",item_based_d)

def pri_results(item_based_d):
    print('item_based_d',item_based_d)

def stats_to_pts(item_based_d):

# turn item-based dict results into a list of points which can be fed into influxdb
#{'683415': {'item': 'NAUSNEWTO0057_vEdge1_MPLS_ge0/1.4000_ingress', 'region': 'NA', 'category': 'Hybrid Silver+', 'avg': 4.4, 'n95': 16.3, 'max': 19.5, 'avg_percent': 22.0, 'n95_percent': 81.6, 'max_percent': 97.3,

    Pts = []
    time = int(tm.time()) # kind of a fake time. I don't think it matters
    for itemid,itemid_d in item_based_d.items():
        category = itemid_d['category']
        item = itemid_d['item']
        region = itemid_d['region']
        t_off = itemid_d['Time_Offset']
        speed = float(itemid_d['speed']) # speed needs to be a float
        if 'avg' in itemid_d and 'n95' in itemid_d:
            avg = itemid_d['avg_percent']
            n95 = itemid_d['n95_percent']
            max = itemid_d['max_percent']
        else:
            avg,n95,max = (0.0,0.0,0.0)
        tags = {'item':item,'category':category,'region':region,'GMT_offset':t_off}
        fields = {'avg':avg,'n95':n95,'max':max,'speed':speed}
        Pt = {'measurement':'vedge_stat','tags':tags,'fields':fields,'time':time}
        Pts.append(Pt)
    return Pts
def gmtOffset_calc(SSID_d):
    offsetdflt = {'AP':8,'NA':-5,'EU':1,'SA':-3}
    region = SSID_d['region']
    if 'gmtOffset' in SSID_d and SSID_d['gmtOffset']:
        gmtOffset = SSID_d['gmtOffset']
    else:
        gmtOffset = offsetdflt[region]
    return gmtOffset

Next I’ll show influx_modules.py.

                    

import influxdb_client, os, time
from datetime import datetime, timezone
import pytz
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
import random,re

def data_entry(Pts,gmtOffset_d):
# Set up variables
    bucket = "poc_bucket2" # DrJ test bucket
    org = "poc_org"
    influxdb_cloud_token = os.environ['INFLUX_AUTH_TOKEN']
# PROD setup
    bucket_prod = "UC02" # we are use case 2
    #bucket_prod = "test" # we are use case 2
    org_prod = "DrJohns - Network Visibility"
    influxdb_cloud_token_prod = os.environ['INFLUX_AUTH_TOKEN_PROD']

# Store the URL of your InfluxDB instance
    url_local ="http://10.199.123.233:8086/"
    url_prod ="https://westeurope-1.azure.cloud2.influxdata.com/"

# Initialize client
    client = influxdb_client.InfluxDBClient(url=url_local,token=influxdb_cloud_token,org=org)
    client_prod = influxdb_client.InfluxDBClient(url=url_prod,token=influxdb_cloud_token_prod,org=org_prod)

# Write data
    write_api = client.write_api(write_options=SYNCHRONOUS)
    write_api_prod = client_prod.write_api(write_options=SYNCHRONOUS)

    pts = []
    SSID_seen_flag = {}
    for Pt in Pts:
        item = Pt['tags']['item']
        time = int(Pt['time'])

# look up the gmtOffset. SSID is the key to the gmt dict
        SSID = re.sub(r'_.+','',item) # NAUSNEWTOO0001_vEdge1_MPLS_ge0/1.4084_ingres
        gmtOffset = gmtOffset_d[SSID] # units are hours, and can include fractions
        gmtOffset_s = int(3600 * gmtOffset)
        time_local = time + gmtOffset_s
# convert seconds since epoch into format required by influxdb. pt_time stays utc, not local!
        pt_time = datetime.fromtimestamp(time, timezone.utc).isoformat('T', 'milliseconds')
# pull out the UTC hour
        ts = datetime.fromtimestamp(time_local).astimezone(pytz.UTC)
        Hlocal = ts.strftime('%H')
        if len(Hlocal) == 1: Hlocal = '0' + Hlocal # pad single digits with a leading 0 so sort behaves as expected
# extend dict with tag for UTChour
        Pt['tags']['UTChour'] = Hlocal
# overwrite time here
        Pt['time'] = pt_time
        if not SSID in SSID_seen_flag:
            #print('item,Hlocal,gmtOffset,gmtOffset_s,time,time_local',item,Hlocal,gmtOffset,gmtOffset_s,time,time_local) # first iteration print
            print('item,Pt',item,Pt)
            SSID_seen_flag[SSID] = True
        ##point = Point(measurement).tag("hostname",hostname).tag("itemid",itemid).tag("ltype",ltype).tag("item",item).tag("UTChour",Hlocal).field('value',value).field('percent',percent).time(pt_time)
# based on https://github.com/influxdata/influxdb-client-python/blob/master/influxdb_client/client/write/point.py
        point = Point.from_dict(Pt)

        pts.append(point)
# write to POC and PROD buckets for now
    print('Writing pts to old and new Influx locations')
    write_api.write(bucket=bucket, org="poc_org", record=pts, write_precision=WritePrecision.S)
    write_api_prod.write(bucket=bucket_prod, org=org_prod, record=pts, write_precision=WritePrecision.S)

def data_entry_stats(Pts):
# Set up variables
    bucket = "poc_bucket2" # DrJ test bucket
    org = "poc_org"
    influxdb_cloud_token = os.environ['INFLUX_AUTH_TOKEN']

# Store the URL of your InfluxDB instance
    url_local ="http://10.199.123.233:8086/"
    url_prod ="https://westeurope-1.azure.cloud2.influxdata.com/"

# PROD setup
    bucket_prod = "UC02" # we are use case 2
    org_prod = "DrJohns - Network Visibility"
    influxdb_cloud_token_prod = os.environ['INFLUX_AUTH_TOKEN_PROD']

# Initialize client
    client = influxdb_client.InfluxDBClient(url=url_local,token=influxdb_cloud_token,org=org)
    client_prod = influxdb_client.InfluxDBClient(url=url_prod,token=influxdb_cloud_token_prod,org=org_prod)

# Write data
    write_api = client.write_api(write_options=SYNCHRONOUS)
    write_api_prod = client_prod.write_api(write_options=SYNCHRONOUS)

    pts = []
    for Pt in Pts:
# debug
#        print('avg type',avg.__class__,'item',item,flush=True)

        time = Pt['time']
# convert seconds since epoch into format required by influxdb
        pt_time = datetime.fromtimestamp(int(time), timezone.utc).isoformat('T', 'milliseconds')
# overwrite time here
        Pt['time'] = pt_time
        ##point = Point(measurement).tag("item",item).tag("category",category).tag("region",region).tag("GMT_offset",t_off).field('n95',n95).field('avg',avg).field('max',max).field('speed',speed).time(pt_time)
# see aux_modules stats_to_Pts for our dictionary structure for Pt
        point = Point.from_dict(Pt)
        pts.append(point)
    print('Write to old and new influxdb instances')
    write_api.write(bucket=bucket, org="poc_org", record=pts, write_precision=WritePrecision.S)
    write_api_prod.write(bucket=bucket_prod, org=org_prod, record=pts, write_precision=WritePrecision.S)

These scripts show how I accumulate a bunch of points and make an entry in influxdb once I have a bunch of them to make things go faster. These days I am updating two influxdb instances: a production one that actually uses InfluxDB Cloud (hence the URL is a generic endpoint which may actually work for you), and a POC one which I run on my private network.

What it looks like

This is the view of multiple vedges which match the selection criteria of high bandwidth usage in region Europe:

Then I figured out how to provide a link to a detailed traffic graph for this selection criteria. Obviously, that mostly involved switching the visualization to Time Series. But I wanted as well to provide the interface bandwidth on the same graph. That was tricky and involved creating a transform that is a config query which takes speed from the table and turns it into Threshold1, which I draw as a red dashed line. It’s sort of too much detail to go into it further in this article. I wanted to make a second config query but it turns out this is not supported – still.

As for the link, I have a text panel where I use raw html. My HTML, which creates the active link you see displayed is:

                    

<br>
<H2>
<a target="details-multiple-ifaces" href=
"/d/8aXikCa4k/multiple-vedges-graph?orgId=1&${Region:queryparam}&${threshold:queryparam}&${math:queryparam}&${math_days:queryparam}">
Detailed traffic graph for matching interfaces</a>
</H2>

So here is what the detailed traffic graph looks like:

I love that red dashed line showing the interface bandwidth capacity!

I almost forgot to mention it, there is a second query, B, which I use as a basis for the dynamic threshold to pick up the “speed” of the interface. Here it is:

data = from(bucket: "UC02")
  |> range(start: -1d, stop: now())
  |> filter(fn: (r) =>
    r._measurement == "vedge_stat" and
    r._field == "speed" and r.item == "${item}"
  )
  |> drop(columns: ["item","category","region","GMT_offset"])
data
Back to single vedge

At the top of this post I showed the heat map for a single vedge. It includes an active link which leads to a detailed traffic graph. That link in turn is in a Text Panel with HTML text. This is the HTML.

                    

<br>
<H2>
<a target="details-single-iface" href=
"/d/ozKDXiB4k/single-vedge-graph?orgId=1&${Region:queryparam}&${threshold:queryparam}&${math:queryparam}&${math_days:queryparam}&${item:queryparam}">
Detailed traffic graph for ${item}</a>
</H2>

The single vedge detailed graph is a little different from the multiple vedge detailed graph – but not by much. I am getting long-winded so I will omit all the details. Mainly I’ve just blown up the vertical scale and omit panel iteration. So here is what you get:

In full disclosure

In all honesty I added another field called speed to the vedge_stats InfluxDB measurement. It’s kind of redundant, but it made things a lot simpler for me. It is that field I use in the config query to set the threshold which I draw with a red dashed line.

Not sure I mentioned it, but at some piont I re-interpreted the meaning of UTChour to be local time zone hour! This also was a convenience for me since there was a desire to display the heat maps in the local timezone. Instead of messing around with shifting hours in flux query language – which would have taken me days or weeks to figure out, I just did it in my python code I (think) I shared above. So much easier…

Complaints

I am not comfortable with the flux query documentation. Nor the Grafana documentation for that matter. They both give you a taste of what you need, without many details or examples. For instance it looks like there are multiple syntaxes available to you using flux. I just pragmatically developed what works.

Conclusion

Yes, this is truly amateur hour for InfluxDB and Grafana. But even amateurs can produce decent results if the tools are adequate. And that’s the case here. The results I am producing are “good enough” for our purposes – internal usage.

I am content if no one reads this and it only serves as my own documentation. But perhaps it will help someone facing similar issues. Unfortunately one of the challenges is asking a good question to a search engine when you’re a newbie and haven’t mastered the concepts.

But without too too much effort I was able to master enough Grafana to create something that will probably be useful to our vendor management team. Grafana is fun and powerful. There is a slight lack of examples and the documentation is a wee bit sparse. InfluxDB seems to be a good back-end database for Grafana to use. The flux query language, while still obscure to me, was sufficiently powerful enough for me to get my basic goals accomplished.

References and related

I developed a blurring program in python which I used to present most of these imaages.