Python: KeyError: 'style_code' even when style_code is in my json data - json

I might just need new eyes to look at this but im not sure why im yielding this error KeyError: 'style_code'
I know that style code is in my json data and i looked up why this error occurs and it said it was because it cannot find it in the list. Here is my code of the json data...
data1['task'].append({
'profile': profiles_select.get(),
'proxy pool': pool_combo.get(),
'captcha': captcha,
'task_id': num_id,
'style_code': stylecode_entry.get(),
'delay': int(delay_entry.get()),
'size': size,
'splash': splash,
'browser': browser
})
num_id = num_id + 1
with open('tasks_ys.txt', 'w', encoding='utf-8') as outfile:
json.dump(data1, outfile, indent=2)
As you can see 'style_code' is clearly stated in there. Here is my code which is giving me the error.
with open('tasks_ys.txt', 'r') as json_file:
load_data = json.load(json_file)
for task in load_data['task']:
style_c = load_data['style_code']
product_lbl = Label(ys_tasks_frame, text=style_c, bg='#1a2228', fg=fgcolor,
font=("Candara", 12))
product_lbl.place(x=150, y=task_y)
id_lbl = Label(ys_tasks_frame, text=num_t, bg='#1a2228', fg=fgcolor, font=("Candara", 12))
id_lbl.place(x=25, y=task_y)
num_t = num_t + 1
task_y = task_y + 25
Can anyone help me and point out what is wrong about the code?

Related

Pandas DataFrame KeyError: 1

I am a beginner so cant figure a reason for the error in the following code when train.jsonl uses format like that
{"claim": "But he said if people really want to know if they have CHIP they can get a blood test that costs a few MONEYc1", "evidence": "sentenceID100037", "label": "0"}
{"claim": "This is rather a courtly formulation and would doubtless trigger further eyerolling if uttered in", "evidence": "sentenceID100038", "label": "0"}
The top part executes without problem and displays the data.
import pandas as pd
prefix = '/content/'
train_df = pd.read_json(prefix + 'train.jsonl', orient='records', lines=True)
train_df.head()
[See my Colab Notebook][https://colab.research.google.com/gist/lenyabloko/0e17ebe0f3a0e808779bc1fa95e9b24d/semeval2020-delex.ipynb]
I even tried this additional trick which explained comments about 0 column
prefix = '/content/'
train_df = pd.read_json(prefix + 'train_delex.jsonl', orient='columns')
train_df.to_csv(prefix+'train.tsv', sep='\t', index=False, header=False)
train_df = pd.read_csv(prefix + 'train.tsv', header=None)
train_df.head()
Now I see column labeled '0' instead of the original three columns {"claim": "...", "evidence": " ...", "label": "..."} from the above JSONL file (why is that?)
But when I add DataFrame code it results in error
train_df = pd.DataFrame({
'id': train_df[1],
'text': train_df[0],
'labels':train_df[2]
})
In light of the column named "0" this wouldn't work. But where did that column come from??
KeyError Traceback (most recent call last)
2 frames
<ipython-input-16-0537eda6b397> in <module>()
6
7 train_df = pd.DataFrame({
----> 8 'id': train_df[1],
9 'text': train_df[0],
10 'labels':train_df[2]
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __getitem__(self, key)
2993 if self.columns.nlevels > 1:
2994 return self._getitem_multilevel(key)
-> 2995 indexer = self.columns.get_loc(key)
2996 if is_integer(indexer):
2997 indexer = [indexer]
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
Here is the solution that worked for me:
import pandas as pd
prefix = '/content/'
test_df = pd.read_json(prefix + 'test_delex.jsonl', orient='records', lines=True)
test_df.rename(columns={'claim': 'text', 'evidence': 'id', 'label':'labels'}, inplace=True)
cols = test_df.columns.tolist()
cols = cols[-1:] + cols[:-1]
cols = cols[-1:] + cols[:-1]
test_df = test_df[cols]
test_df.to_csv(prefix+'test.csv', sep=',', index=False, header=False)
test_df.head()
I updated my shared Colab Notebook linked in the question above

Python Json creating dictionary from a text file, printing file issue

I was able to take a text file, read each line, create a dictionary per line, update(append) each line and store the json file. The issue is when reading the json file it will not read correctly. the error point to a storing file issue?
The text file looks like:
84.txt; Frankenstein, or the Modern Prometheus; Mary Wollstonecraft (Godwin) Shelley
98.txt; A Tale of Two Cities; Charles Dickens
...
import json
import re
path = "C:\\...\\data\\"
books = {}
books_json = {}
final_book_json ={}
file = open(path + 'books\\set_of_books.txt', 'r')
json_list = file.readlines()
open(path + 'books\\books_json.json', 'w').close() # used to clean each test
json_create = []
i = 0
for line in json_list:
line = line.replace('#', '')
line = line.replace('.txt','')
line = line.replace('\n','')
line = line.split(';', 4)
BookNumber = line[0]
BookTitle = line[1]
AuthorName = line[-1]
file
if BookNumber == ' 2701':
BookNumber = line[0]
BookTitle1 = line[1]
BookTitle2 = line[2]
AuthorName = line[3]
BookTitle = BookTitle1 + ';' + BookTitle2 # needed to combine title into one to fit dict format
books = json.dumps( {'AuthorName': AuthorName, 'BookNumber': BookNumber, 'BookTitle': BookTitle})
books_json = json.loads(books)
final_book_json.update(books_json)
with open(path + 'books\\books_json.json', 'a'
) as out_put:
json.dump(books_json, out_put)
with open(path + 'books\\books_json.json', 'r'
) as out_put:
'books\\books_json.json', 'r')]
print(json.load(out_put))
The reported error is: JSONDecodeError: Extra data: line 1 column 133
(char 132) - adding this is right between the first "}{". Not sure
how json should look in a flat-file format? The output file as seen on
an editor looks like: {"AuthorName": " Mary Wollstonecraft (Godwin)
Shelley", "BookNumber": " 84", "BookTitle": " Frankenstein, or the
Modern Prometheus"}{"AuthorName": " Charles Dickens", "BookNumber": "
98", "BookTitle": " A Tale of Two Cities"}...
I ended up changing the approach and used pandas to read the text and then spliting the single-cell input.
books = pd.read_csv(path + 'books\\set_of_books.txt', sep='\t', names =('r','t', 'a') )
#print(books.head(10))
# Function to clean the 'raw(r)' inoput data
def clean_line(cell):
...
return cell
books['r'] = books['r'].apply(clean_line)
books = books['r'].str.split(';', expand=True)

Is there a way to take a list of strings and create a JSON file, where both the key and value are list items?

I am creating a python script that can read scanned, and tabular .pdfs and extract some important data and insert it into a JSON to later be implemented into a SQL database (I will also be developing the DB as a project for learning MongoDB).
Basically, my issue is I have never worked with any JSON files before but that was the format I was recommended to output to. The scraping script works, the pre-processing could be a lot cleaner, but for now it works. The issue I run into is the keys, and values are in the same list, and some of the values because they had a decimal point are two different list items. Not really sure where to even start.
I don't really know where to start, I suppose since I know what the indexes of the list are I can easily assign keys and values, but then it may not be applicable to any .pdf, that is the script cannot be coded explicitly.
import PyPDF2 as pdf2
import textract
with "TestSpec.pdf" as filename:
pdfFileObj = open(filename, 'rb')
pdfReader = pdf2.pdfFileReader(pdfFileObj)
num_pages = pdfReader.numpages
count = 0
text = ""
while count < num_pages:
pageObj = pdfReader.getPage(0)
count += 1
text += pageObj.extractText()
if text != "":
text = text
else:
text = textract.process(filename, method="tesseract", language="eng")
def cleanText(x):
'''
This function takes the byte data extracted from scanned PDFs, and cleans it of all
unnessary data.
Requires re
'''
stringedText = str(x)
cleanText = stringedText.replace('\n','')
splitText = re.split(r'\W+', cleanText)
caseingText = [word.lower() for word in splitText]
cleanOne = [word for word in caseingText if word != 'n']
dexStop = cleanOne.index("od260")
dexStart = cleanOne.index("sheet")
clean = cleanOne[dexStart + 1:dexStop]
return clean
cleanText = cleanText(text)
This is the current output
['n21', 'feb', '2019', 'nsequence', 'lacz', 'rp', 'n5', 'gat', 'ctc', 'tac', 'cat', 'ggc', 'gca', 'cat', 'ttc', 'ccc', 'gaa', 'aag', 'tgc', '3', 'norder', 'no', '15775199', 'nref', 'no', '207335463', 'n25', 'nmole', 'dna', 'oligo', '36', 'bases', 'nproperties', 'amount', 'of', 'oligo', 'shipped', 'to', 'ntm', '50mm', 'nacl', '66', '8', 'xc2', 'xb0c', '11', '0', '32', '6', 'david', 'cook', 'ngc', 'content', '52', '8', 'd260', 'mmoles', 'kansas', 'state', 'university', 'biotechno', 'nmolecular', 'weight', '10', '965', '1', 'nnmoles']
and we want the output as a JSON setup like
{"Date | 21feb2019", "Sequence ID: | lacz-rp", "Sequence 5'-3' | gat..."}
and so on. Just not sure how to do that.
here is a screenshot of the data from my sample pdf
So, i have figured out some of this. I am still having issues with grabbing the last 3rd of the data i need without explicitly programming it in. but here is what i have so far. Once i have everything working then i will worry about optimizing it and condensing.
# for PDF reading
import PyPDF2 as pdf2
import textract
# for data preprocessing
import re
from dateutil.parser import parse
# For generating the JSON file array
import json
# This finds and opens the pdf file, reads the data, and extracts the data.
filename = "*.pdf"
pdfFileObj = open(filename, 'rb')
pdfReader = pdf2.PdfFileReader(pdfFileObj)
text = ""
pageObj = pdfReader.getPage(0)
text += pageObj.extractText()
# checks if extracted data is in string form or picture, if picture textract reads data.
# it then closes the pdf file
if text != "":
text = text
else:
text = textract.process(filename, method="tesseract", language="eng")
pdfFileObj.close()
# Converts text to string from byte data for preprocessing
stringedText = str(text)
# Removed escaped lines and replaced them with actual new lines.
formattedText = stringedText.replace('\\n', '\n').lower()
# Slices the long string into a workable piece (only contains useful data)
slice1 = formattedText[(formattedText.index("sheet") + 10): (formattedText.index("secondary") - 2)]
clean = re.sub('\n', " ", slice1)
clean2 = re.sub(' +', ' ', clean)
# Creating the PrimerData dictionary
with open("PrimerData.json",'w') as file:
primerDataSlice = clean[clean.index("molecular"): -1]
primerData = re.split(": |\n", primerDataSlice)
primerKeys = primerData[0::2]
primerValues = primerData[1::2]
primerDict = {"Primer Data": dict(zip(primerKeys,primerValues))}
# Generatring the JSON array "Primer Data"
primerJSON = json.dumps(primerDict, ensure_ascii=False)
file.write(primerJSON)
# Grabbing the date (this has just the date, so json will have to add date.)
date = re.findall('(\d{2}[\/\- ](\d{2}|january|jan|february|feb|march|mar|april|apr|may|may|june|jun|july|jul|august|aug|september|sep|october|oct|november|nov|december|dec)[\/\- ]\d{2,4})', clean2)
Without input data it is difficult to give you working code. A minimal working example with input would help. As for JSON handling, python dictionaries can dump to json easily. See examples here.
https://docs.python-guide.org/scenarios/json/
Get a json string from a dictionary and write to a file. Figure out how to parse the text into a dictionary.
import json
d = {"Date" : "21feb2019", "Sequence ID" : "lacz-rp", "Sequence 5'-3'" : "gat"}
json_data = json.dumps(d)
print(json_data)
# Write that data to a file
So, I did figure this out, the problem was really just that because of the way my pre-processing was pulling all the data into a single list wasn't really that great of an idea considering that the keys for the dictionary never changed.
Here is the semi-finished result for making the Dictionary and JSON file.
# Collect the sequence name
name = clean2[clean2.index("Sequence") + 11: clean2.index("Sequence") + 19]
# Collecting Shipment info
ordered = input("Who placed this order? ")
received = input("Who is receiving this order? ")
dateOrder = re.findall(
r"(\d{2}[/\- ](\d{2}|January|Jan|February|Feb|March|Mar|April|Apr|May|June|Jun|July|Jul|August|Aug|September|Sep|October|Oct|November|Nov|December|Dec)[/\- ]\d{2,4})",
clean2)
dateReceived = date.today()
refNo = clean2[clean2.index("ref.No. ") + 8: clean2.index("ref.No.") + 17]
orderNo = clean2[clean2.index("Order No.") +
10: clean2.index("Order No.") + 18]
# Finding and grabbing the sequence data. Storing it and then finding the
# GC content and melting temp or TM
bases = int(clean2[clean2.index("bases") - 3:clean2.index("bases") - 1])
seqList = [line for line in clean2 if re.match(r'^[AGCT]+$', line)]
sequence = "".join(i for i in seqList[:bases])
def gc_content(x):
count = 0
for i in x:
if i == 'G' or i == 'C':
count += 1
else:
count = count
return round((count / bases) * 100, 1)
gc = gc_content(sequence)
tm = mt.Tm_GC(sequence, Na=50)
moleWeight = round(mw(Seq(sequence, generic_dna)), 2)
dilWeight = float(clean2[clean2.index("ug/OD260:") +
10: clean2.index("ug/OD260:") + 14])
dilution = dilWeight * 10
primerDict = {"Primer Data": {
"Sequence": sequence,
"Bases": bases,
"TM (50mM NaCl)": tm,
"% GC content": gc,
"Molecular weight": moleWeight,
"ug/0D260": dilWeight,
"Dilution volume (uL)": dilution
},
"Shipment Info": {
"Ref. No.": refNo,
"Order No.": orderNo,
"Ordered by": ordered,
"Date of Order": dateOrder,
"Received By": received,
"Date Received": str(dateReceived.strftime("%d-%b-%Y"))
}}
# Generating the JSON array "Primer Data"
with open("".join(name) + ".json", 'w') as file:
primerJSON = json.dumps(primerDict, ensure_ascii=False)
file.write(primerJSON)

How to load a json file with strings including double quotes (")

I've been given a load of JSON files which I'm trying to load into python 3.5
I've already had to do some clean up work, removing double backslashes and extra quotations, however I've run into an issue I don't know how to solve.
I'm running the following code:
with open(filepath,'r') as json_file:
reader = json_file.readlines()
for row in reader:
row = row.replace('\\', '')
row = row.replace('"{', '{')
row = row.replace('}"', '}')
response = json.loads(row)
for i in response:
responselist.append(i['ActionName'])
However it's throwing up the error:
JSONDecodeError: Expecting ',' delimiter: line 1 column 388833 (char 388832)
The part of the JSON that's causing the issue is the status text entry below:
"StatusId":8,
"StatusIdString":"UnknownServiceError",
"StatusText":"u003cCompany docTypeu003d"Mobile.Tile" statusIdu003d"421" statusTextu003d"Start time of 11/30/2015 12:15:00 PM is more than 5 minutes in the past relative to the current time of 12/1/2015 12:27:01 AM." copyrightu003d"Copyright Company Inc." versionNumberu003d"7.3" createdDateu003d"2015-12-01T00:27:01Z" responseIdu003d"e74710c0-dc7c-42db-b608-bf905d95d153" /u003e",
"ActionName":"GetTrafficTile"
I added the line breaks to illustrate my point, it looks like python is unhappy that the string contains double quotes.
I have a feeling this may be to do with my replacing '\ \' with '' messing with the unicode characters in the string. Is there any way to repair these nested strings? I don't mind if the StatusText field is deleted completely, all I'm after is a list of the ActionName fields.
EDIT:
I've hosted an example file here:
https://www.dropbox.com/s/1oanrneg3aqandz/2015-12-01T00%253A00%253A42.527Z_2015-12-01T00%253A01%253A17.478Z?dl=0
This is exactly as I received, before I've replaced the extra backslashes and quotations
Here is a pared down version of the sample with one bad entry
["{\"apiServerType\":0,\"RequestId\":\"52a65260-1637-4653-a496-7555a2386340\",\"StatusId\":0,\"StatusIdString\":\"Ok\",\"StatusText\":null,\"ActionName\":\"GetCameraImage\",\"Url\":\"http://mosi-prod.cloudapp.net/api/v1/GetCameraImage?AuthToken=vo*AB57XLptsKXf0AzKjf1MOgQ1hZ4BKipKgYl3uGew%7C&CameraId=13782\",\"Lat\":0.0,\"Lon\":0.0,\"iVendorId\":12561,\"iConsumerId\":2986897,\"iSliverId\":51846,\"UserId\":\"2986897\",\"HardwareId\":null,\"AuthToken\":\"vo*AB57XLptsKXf0AzKjf1MOgQ1hZ4BKipKgYl3uGew|\",\"RequestTime\":\"2015-12-01T00:00:42.5278699Z\",\"ResponseTime\":\"2015-12-01T00:01:02.5926127Z\",\"AppId\":null,\"HttpMethod\":\"GET\",\"RequestHeaders\":\"{\\\"Connection\\\":[\\\"keep-alive\\\"],\\\"Via\\\":[\\\"HTTP/1.1 nycnz01msp1ts10.wnsnet.attws.com\\\"],\\\"Accept\\\":[\\\"application/json\\\"],\\\"Accept-Encoding\\\":[\\\"gzip\\\",\\\"deflate\\\"],\\\"Accept-Language\\\":[\\\"en-us\\\"],\\\"Host\\\":[\\\"mosi-prod.cloudapp.net\\\"],\\\"User-Agent\\\":[\\\"Traffic/5.4.0\\\",\\\"CFNetwork/758.1.6\\\",\\\"Darwin/15.0.0\\\"]}\",\"RequestContentHeaders\":\"{}\",\"RequestContentBody\":\"\",\"ResponseBody\":null,\"ResponseContentHeaders\":\"{\\\"Content-Type\\\":[\\\"image/jpeg\\\"]}\",\"ResponseHeaders\":\"{}\",\"MiniProfilerJson\":null}"]
The problem is a little different than you think. Whatever program built these files used data that was already json-encoded and ended up double and even triple encoding some of the information. I peeled it apart in a shell session and got usable python data. You can (1) go dope-slap whoever wrote the program that built this steaming pile of... um... goodness? and (2) manually scan through and decode inner json strings.
I decoded the data and it was a list of strings, but those strings looked suspiciously like json
>>> data = json.load(open('test.json'))
>>> type(data)
<class 'list'>
>>> d0 = data[0]
>>> type(d0)
<class 'str'>
>>> d0[:70]
'{"apiServerType":0,"RequestId":"52a65260-1637-4653-a496-7555a2386340",'
Sure enough, I can decode it
>>> d0_1 = json.loads(d0)
>>> type(d0_1)
<class 'dict'>
>>> d0_1
{'ResponseBody': None, 'StatusText': None, 'AppId': None, 'ResponseTime': '2015-12-01T00:01:02.5926127Z', 'HardwareId': None, 'RequestTime': '2015-12-01T00:00:42.5278699Z', 'StatusId': 0, 'Lon': 0.0, 'Url': 'http://mosi-prod.cloudapp.net/api/v1/GetCameraImage?AuthToken=vo*AB57XLptsKXf0AzKjf1MOgQ1hZ4BKipKgYl3uGew%7C&CameraId=13782', 'RequestContentBody': '', 'RequestId': '52a65260-1637-4653-a496-7555a2386340', 'MiniProfilerJson': None, 'RequestContentHeaders': '{}', 'ActionName': 'GetCameraImage', 'StatusIdString': 'Ok', 'HttpMethod': 'GET', 'iSliverId': 51846, 'ResponseHeaders': '{}', 'ResponseContentHeaders': '{"Content-Type":["image/jpeg"]}', 'apiServerType': 0, 'AuthToken': 'vo*AB57XLptsKXf0AzKjf1MOgQ1hZ4BKipKgYl3uGew|', 'iConsumerId': 2986897, 'RequestHeaders': '{"Connection":["keep-alive"],"Via":["HTTP/1.1 nycnz01msp1ts10.wnsnet.attws.com"],"Accept":["application/json"],"Accept-Encoding":["gzip","deflate"],"Accept-Language":["en-us"],"Host":["mosi-prod.cloudapp.net"],"User-Agent":["Traffic/5.4.0","CFNetwork/758.1.6","Darwin/15.0.0"]}', 'iVendorId': 12561, 'Lat': 0.0, 'UserId': '2986897'}
Picking one of the entries, that looks like more json
>>> hdrs = d0_1['RequestHeaders']
>>> type(hdrs)
<class 'str'>
Yep, it decodes to what I want
>>> hdrs_0 = json.loads(hdrs)
>>> type(hdrs_0)
<class 'dict'>
>>>
>>> hdrs_0["Via"]
['HTTP/1.1 nycnz01msp1ts10.wnsnet.attws.com']
>>>
>>> type(hdrs_0["Via"])
<class 'list'>
Here you are :) :
responselist = []
with open('dataFile.json','r') as json_file:
reader = json_file.readlines()
for row in reader:
strActNm = 'ActionName":"'; lenActNm = len(strActNm)
actionAt = row.find(strActNm)
while actionAt > 0:
nxtQuotAt = row.find('"',actionAt+lenActNm+2)
responselist.append( row[actionAt-1: nxtQuotAt+1] )
actionAt = row.find('ActionName":"', nxtQuotAt)
print(responselist)
which gives:
>python3.6 -u "dataFile.py"
['"ActionName":"GetTrafficTile"']
>Exit code: 0
where dataFile.json is the file with the line you provided and dataFile.py the code provided above.
It's the hard tour, but if the files are in a bad format you have to find a way around and a simple pattern matching works in any case. For more complex cases you will need regex (regular expressions), but in this case a simple .find() is enough to do the job.
The code finds also multiple "actions" in the line (if the line would contain more than one action).
Here the result for the file you provided in your link while using following small modification of the code above:
responselist = []
with open('dataFile1.json','r') as json_file:
reader = json_file.readlines()
for row in reader:
strActNm='\\"ActionName\\":\\"'
# strActNm = 'ActionName":"'
lenActNm = len(strActNm)
actionAt = row.find(strActNm)
while actionAt > 0:
nxtQuotAt = row.find('"',actionAt+lenActNm+2)
responselist.append( row[actionAt: nxtQuotAt+1].replace('\\','') )
actionAt = row.find('ActionName":"', nxtQuotAt)
print(responselist)
gives:
>python3.6 -u "dataFile.py"
['"ActionName":"GetCameraImage"']
>Exit code: 0
where dataFile1.json is the file you provided in the link.

Failed processing pyformat-parameters; 'MySQLConverter' object has no attribute '_list_to_mysql'

I have a python script which everytime a wait_for_page call is made it writes the time it took to wait for the page to a database. The query is below:
conn = mysql.connector.connect(**config)
connect = conn.cursor()
params = {'build': self.tc.tag, 'page': unicode(self), 'object_id': self.object_id, 'page_header':
self.page_header, 'interval': t.interval, 'timestamp': timestamp}
query = u'INSERT INTO page_load_times (build, page, object_id, page_header, elapsed_time, date_run) ' \
'VALUES (%(build)s, %(page)s, %(object_id)s, %(page_header)s, %(interval)s, %(timestamp)s)'
connect.execute(query, params)
conn.commit()
conn.close()
Occasionally, when this runs, I get an error which says:
"Failed processing pyformat-parameters; %s" % err)
ProgrammingError: Failed processing pyformat-parameters; 'MySQLConverter'
object has no attribute '_list_to_mysql'
I know what is causing this, just uncertain how to go about fixing it. The 'page': unicode(self) param occasionally gets a list as a result.
In an attempt to fix this, I tweaked the above script to extract the list into a string, with the following:
page_list = u'{}'.format(self)
page_results = "('%s')" % "','".join(page_list)
params = {'build': self.tc.tag, 'page': page_results, 'object_id': self.object_id, 'page_header':
self.page_header, 'interval': t.interval, 'timestamp': timestamp}
When I run this, the error I am getting now is that the data is too long for the field. I debug it, to find that my page results has each character parsed out individually looking like so:
u'(\\'A\\',\\'p\\',\\'p\\',\\'M\\',\\'a\\',\\'i\\',\\'n\\',\\'M\\',\\'e\\',\\'n\\',\\'u\\',\\':\\',\\' \\',\\'N\\',\\'o\\',\\'n\\',\\'e\\')'
So the solution was to do the following, which takes the page_header and if it is in the instance of list to make that list a string:
conn = mysql.connector.connect(**config)
connect = conn.cursor()
page_list = u'{}'.format(self)
page_header_list = u'{}'.format(self.page_header)
if isinstance(page_header_list, list):
page_header_list = ', '.join(page_header_list)[0:100]
params = {'build': self.tc.tag, 'page': page_list, 'object_id': self.object_id,
'page_header': page_header_list, 'interval': t.interval, 'timestamp': timestamp}
query = u'INSERT INTO page_load_times (build, page, object_id, page_header, elapsed_time, date_run) ' \
'VALUES (%(build)s, %(page)s, %(object_id)s, %(page_header)s, %(interval)s, %(timestamp)s)'
connect.execute(query, params)
conn.commit()
conn.close()
Thank you #DarthOpto, you gave me lots of light. I solved it by putting srt around the variable:
params = {'build': self.tc.tag, 'page': srt(page_list),'object_id': self.object_id,
'page_header': srt(page_header_list),'interval': t.interval,'timestamp': timestamp}