Intro
As far as I can tell there’s no way to search through multiple pipeline logs with a single command. In linux it’s trivial. Seeing the lack of this basic functionality I decided to copy all my pipeline logs over to a linux server using the Azure DevOps (ADO) api.
The details
This is the main program which I’ve called get_raw_logs.py.
#!/usr/bin/python3
# fetch raw log to local machine
# for relevant api section, see:
#https://learn.microsoft.com/en-us/rest/api/azure/devops/build/builds/get-build-log?view=azure-devops-rest-7.1
import urllib.request,json,sys
from datetime import datetime,timedelta
from modules import aux_modules
conf_file = sys.argv[1]
# pipeline uses UTC so we must follow suit or we will miss files
a_day_ago = (datetime.utcnow() - timedelta(days = 1)).strftime('%Y-%m-%dT%H:%M:%SZ')
print('a day ago (UTC)',a_day_ago)
#url = 'https://dev.azure.com/drjohns4ServicesCoreSystems/Connectivity/_apis/build/builds?minTime=2022-10-11T13:00:00Z&api-version=7.1-preview.7'
# dump config file into a dict
config_d = aux_modules.parse_config(conf_file)
url = config_d['url_base'] + config_d['organization'] + '/' + config_d['project'] + '/_apis/build/builds?minTime=' + a_day_ago + config_d['url_params']
#print('url',url)
req = urllib.request.Request(url)
req.add_header('Authorization', 'Basic ' + config_d['auth'])
# Get buildIds for pipeline runs from last 24 hours
with urllib.request.urlopen(req) as response:
html = response.read()
txt_d = json.loads(html)
#{"count":215,"value":[{"id":xxx,buildNumber":"20221011.106","definition":{"name":"PAN-Usage4Mgrs-2"
value_l = txt_d['value']
for builds in value_l:
buildId = builds['id']
build_number = builds['buildNumber']
build_def = builds['definition']
name = build_def['name']
#print('name,build_number,id',name,build_number,buildId)
#print('this_build',builds)
if name == config_d['pipeline1'] or name == config_d['pipeline2']:
aux_modules.get_this_log(config_d,name,buildId,build_number)
In the modules directory this is aux_modules.py.
import json
import os,urllib.request
def parse_config(conf_file):
# config file should be a json file
f = open(conf_file)
config_d = json.load(f)
f.close()
return config_d
def get_this_log(config_d,name,buildId,build_number):
# leaving out the api-version etc works better
#GET https://dev.azure.com/{organization}/{project}/_apis/build/builds/{buildId}/logs/{logId}?api-version=7.1-preview.2
#https://dev.azure.com/drjohns4ServicesCoreSystems/d6335c8e-f5b4-44a5-8f6c-7b17fe663a86/_apis/build/builds/44071/logs/7'
buildId_s = str(buildId)
log_name = config_d['log_dir'] + "/" + name + "-" + build_number
# check if we already got this one
if os.path.exists(log_name):
return
#url = url_base + organization + '/' + project + '/_apis/build/builds/' + buildId_s + '/logs/' + logId + '?' + url_params
url = config_d['url_base'] + config_d['organization'] + '/' + config_d['project'] + '/_apis/build/builds/' + buildId_s + '/logs/' + config_d['logId']
print('url for this log',url)
req = urllib.request.Request(url)
req.add_header('Authorization', 'Basic ' + config_d['auth'])
with urllib.request.urlopen(req) as response:
html = response.read()
#print('log',html)
print("Getting (name,build_number,buildId,logId) ",name,build_number,buildId_s,config_d['logId'])
f = open(log_name,"wb")
f.write(html)
f.close()
Unlike programs I usually write, some of the key logic resides in the config file. My config file looks something like this.
{ "organization":"drjohns4ServicesCoreSystems", "project":"Connectivity", "pipeline1":"PAN-Usage4Mgrs", "pipeline2":"PAN-Usage4Mgrs-2", "logId":"7", "auth":"Yaskaslkasjklaskldslkjsasddenxisv=", "url_base":"https://dev.azure.com/", "url_params":"&api-version=7.1-preview.7", "log_dir":"/var/tmp/rawlogs" }
It runs very efficiently so I run it every three minutes.
In my pipelines, all the interesting stuff is in logId 7 so I’ve hardcoded that. It could have turned out differently. Notice I am getting the logs from two pipelines due to the limitation, discussed previously, that you can only run 1000 pipeline runs a week so I was forced to run two identical ones, staggered, every 12 minutes with pipeline-2 sleeping the first six minutes.
The auth is the base-64 encoded text for any:<my_auth_token>.
Conclusion
I show how to copy the logs over from Azure DevOps pipeline runs to a local Unix system where you can do normal cool linux commands on them.
References and related
Running an ADO pipeline more than 1000 times a week.
ADO Rest api reference section relevant for this post: https://learn.microsoft.com/en-us/rest/api/azure/devops/build/builds/get-build-log?view=azure-devops-rest-7.1