Print a python output to an html file - html

I am writing a script to print the output to an html file. I am stuck on the format of my output. Below is my code:
def printTohtml(Alist):
myfile = open('zip_files.html', 'w')
html = """<html>
<head></head>
<body><p></p>{htmlText}</body>
</html>"""
title = "Study - User - zip file - Last date modified"
myfile.write(html.format(htmlText = title))
for newL in Alist:
for j in newL:
if j == newL[-1]:
myfile.write(html.format(htmlText=j))
else:
message = j + ', '
myfile.write(html.format(htmlText = message))
myfile.close()
Alist = [['123', 'user1', 'New Compressed (zipped) Folder.zip', '05-24-17'],
['123', 'user2', 'Iam.zip', '05-19-17'], ['abcd', 'Letsee.zip', '05-22-17'],
['Here', 'whichTwo.zip', '06-01-17']]
printTohtml(Alist)
I want my output to be like this:
Study - User - zip file - Last date modified
123, user1, New Compressed (zipped) Folder.zip, 05-24-17
123, user2, Iam.zip, 05-19-17
abcd, Letsee.zip, 05-22-17
Here, whichTwo.zip, 06-01-17
But my code is giving me everything on its own line. Can anyone please help me?
Thanks in advance for your help!
My Output:
Study - User - zip file - Last date modified
123,
user1,
New Compressed (zipped) Folder.zip,
05-24-17
123,
user2,
Iam.zip,
05-19-17
abcd,
Letsee.zip,
05-22-17
Here,
whichTwo.zip,
06-01-17

Try this instead:
for newL in Alist:
for j in newL:
if j == newL[-1]:
myfile.write(html.format(htmlText=j))
else:
message = message + ', ' + j
myfile.write(html.format(htmlText = message))
For each element in newL, you need to store them in 'message'. Once each newL is completely read, write the 'message' into the myfile.

With each iteration, your loop is creating new <head></head><body><p></p></body></html><html>.
Also, if you want to remove space between paragraphs, use <style>p { margin: 0, !important; } </style>.
You could try this function which gives the HTML output you wanted.
def printTohtml(Alist, htmlfile):
html = "<html>\n<head></head>\n<style>p { margin: 0 !important; }</style>\n<body>\n"
title = "Study - User - zip file - Last date modified"
html += '\n<p>' + title + '</p>\n'
for line in Alist:
para = '<p>' + ', '.join(line) + '</p>\n'
html += para
with open(htmlfile, 'w') as f:
f.write(html + "\n</body>\n</html>")
Alist = [['123', 'user1', 'New Compressed (zipped) Folder.zip', '05-24-17'],
['123', 'user2', 'Iam.zip', '05-19-17'], ['abcd', 'Letsee.zip', '05-22-17'],
['Here', 'whichTwo.zip', '06-01-17']]
printTohtml(Alist, 'zip_files.html')
The markup written to zip_files.html is:
<html>
<head></head>
<style>p { margin: 0 !important; }</style>
<body>
<p>Study - User - zip file - Last date modified</p>
<p>123, user1, New Compressed (zipped) Folder.zip, 05-24-17</p>
<p>123, user2, Iam.zip, 05-19-17</p>
<p>abcd, Letsee.zip, 05-22-17</p>
<p>Here, whichTwo.zip, 06-01-17</p>
</body>
</html>
The page displays the following output:
Study - User - zip file - Last date modified
123, user1, New Compressed (zipped) Folder.zip, 05-24-17
123, user2, Iam.zip, 05-19-17
abcd, Letsee.zip, 05-22-17
Here, whichTwo.zip, 06-01-17

Related

Formatting json text in discord.py bot

#client.command()
async def show(ctx, player, *args): # General stats
rs = requests.get(apiLink + "/checkban?name=" + str(player))
if rs.status_code == 200: # HTTP OK
rs = rs.json()
joined_array = ','.join({str(rs["otherNames"]['usedNames'])})
embed = discord.Embed(title="Other users for" + str(player),
description="""User is known as:
""" +joined_array)
await ctx.send(embed=embed)
My goal here is to have every username on different lines after each comma, and preferably without the [] at the start and end. I have tried adding
joined_array = ','.join({str(rs["otherNames"]['usedNames'])}) but the response from the bot is the same as shown in the image.
Any answer or tip/suggestion is appreciated!
Try this:
array = ['user1', 'user2', 'user3', 'user4', 'user5', 'user6'] #your list
new = ",\n".join(array)
print(new)
Output:
user1,
user2,
user3,
user4,
user5,
user6
In your case I think array should be replaced with rs["otherNames"]['usedNames']

Python Json creating dictionary from a text file, printing file issue

I was able to take a text file, read each line, create a dictionary per line, update(append) each line and store the json file. The issue is when reading the json file it will not read correctly. the error point to a storing file issue?
The text file looks like:
84.txt; Frankenstein, or the Modern Prometheus; Mary Wollstonecraft (Godwin) Shelley
98.txt; A Tale of Two Cities; Charles Dickens
...
import json
import re
path = "C:\\...\\data\\"
books = {}
books_json = {}
final_book_json ={}
file = open(path + 'books\\set_of_books.txt', 'r')
json_list = file.readlines()
open(path + 'books\\books_json.json', 'w').close() # used to clean each test
json_create = []
i = 0
for line in json_list:
line = line.replace('#', '')
line = line.replace('.txt','')
line = line.replace('\n','')
line = line.split(';', 4)
BookNumber = line[0]
BookTitle = line[1]
AuthorName = line[-1]
file
if BookNumber == ' 2701':
BookNumber = line[0]
BookTitle1 = line[1]
BookTitle2 = line[2]
AuthorName = line[3]
BookTitle = BookTitle1 + ';' + BookTitle2 # needed to combine title into one to fit dict format
books = json.dumps( {'AuthorName': AuthorName, 'BookNumber': BookNumber, 'BookTitle': BookTitle})
books_json = json.loads(books)
final_book_json.update(books_json)
with open(path + 'books\\books_json.json', 'a'
) as out_put:
json.dump(books_json, out_put)
with open(path + 'books\\books_json.json', 'r'
) as out_put:
'books\\books_json.json', 'r')]
print(json.load(out_put))
The reported error is: JSONDecodeError: Extra data: line 1 column 133
(char 132) - adding this is right between the first "}{". Not sure
how json should look in a flat-file format? The output file as seen on
an editor looks like: {"AuthorName": " Mary Wollstonecraft (Godwin)
Shelley", "BookNumber": " 84", "BookTitle": " Frankenstein, or the
Modern Prometheus"}{"AuthorName": " Charles Dickens", "BookNumber": "
98", "BookTitle": " A Tale of Two Cities"}...
I ended up changing the approach and used pandas to read the text and then spliting the single-cell input.
books = pd.read_csv(path + 'books\\set_of_books.txt', sep='\t', names =('r','t', 'a') )
#print(books.head(10))
# Function to clean the 'raw(r)' inoput data
def clean_line(cell):
...
return cell
books['r'] = books['r'].apply(clean_line)
books = books['r'].str.split(';', expand=True)

Is there a way to take a list of strings and create a JSON file, where both the key and value are list items?

I am creating a python script that can read scanned, and tabular .pdfs and extract some important data and insert it into a JSON to later be implemented into a SQL database (I will also be developing the DB as a project for learning MongoDB).
Basically, my issue is I have never worked with any JSON files before but that was the format I was recommended to output to. The scraping script works, the pre-processing could be a lot cleaner, but for now it works. The issue I run into is the keys, and values are in the same list, and some of the values because they had a decimal point are two different list items. Not really sure where to even start.
I don't really know where to start, I suppose since I know what the indexes of the list are I can easily assign keys and values, but then it may not be applicable to any .pdf, that is the script cannot be coded explicitly.
import PyPDF2 as pdf2
import textract
with "TestSpec.pdf" as filename:
pdfFileObj = open(filename, 'rb')
pdfReader = pdf2.pdfFileReader(pdfFileObj)
num_pages = pdfReader.numpages
count = 0
text = ""
while count < num_pages:
pageObj = pdfReader.getPage(0)
count += 1
text += pageObj.extractText()
if text != "":
text = text
else:
text = textract.process(filename, method="tesseract", language="eng")
def cleanText(x):
'''
This function takes the byte data extracted from scanned PDFs, and cleans it of all
unnessary data.
Requires re
'''
stringedText = str(x)
cleanText = stringedText.replace('\n','')
splitText = re.split(r'\W+', cleanText)
caseingText = [word.lower() for word in splitText]
cleanOne = [word for word in caseingText if word != 'n']
dexStop = cleanOne.index("od260")
dexStart = cleanOne.index("sheet")
clean = cleanOne[dexStart + 1:dexStop]
return clean
cleanText = cleanText(text)
This is the current output
['n21', 'feb', '2019', 'nsequence', 'lacz', 'rp', 'n5', 'gat', 'ctc', 'tac', 'cat', 'ggc', 'gca', 'cat', 'ttc', 'ccc', 'gaa', 'aag', 'tgc', '3', 'norder', 'no', '15775199', 'nref', 'no', '207335463', 'n25', 'nmole', 'dna', 'oligo', '36', 'bases', 'nproperties', 'amount', 'of', 'oligo', 'shipped', 'to', 'ntm', '50mm', 'nacl', '66', '8', 'xc2', 'xb0c', '11', '0', '32', '6', 'david', 'cook', 'ngc', 'content', '52', '8', 'd260', 'mmoles', 'kansas', 'state', 'university', 'biotechno', 'nmolecular', 'weight', '10', '965', '1', 'nnmoles']
and we want the output as a JSON setup like
{"Date | 21feb2019", "Sequence ID: | lacz-rp", "Sequence 5'-3' | gat..."}
and so on. Just not sure how to do that.
here is a screenshot of the data from my sample pdf
So, i have figured out some of this. I am still having issues with grabbing the last 3rd of the data i need without explicitly programming it in. but here is what i have so far. Once i have everything working then i will worry about optimizing it and condensing.
# for PDF reading
import PyPDF2 as pdf2
import textract
# for data preprocessing
import re
from dateutil.parser import parse
# For generating the JSON file array
import json
# This finds and opens the pdf file, reads the data, and extracts the data.
filename = "*.pdf"
pdfFileObj = open(filename, 'rb')
pdfReader = pdf2.PdfFileReader(pdfFileObj)
text = ""
pageObj = pdfReader.getPage(0)
text += pageObj.extractText()
# checks if extracted data is in string form or picture, if picture textract reads data.
# it then closes the pdf file
if text != "":
text = text
else:
text = textract.process(filename, method="tesseract", language="eng")
pdfFileObj.close()
# Converts text to string from byte data for preprocessing
stringedText = str(text)
# Removed escaped lines and replaced them with actual new lines.
formattedText = stringedText.replace('\\n', '\n').lower()
# Slices the long string into a workable piece (only contains useful data)
slice1 = formattedText[(formattedText.index("sheet") + 10): (formattedText.index("secondary") - 2)]
clean = re.sub('\n', " ", slice1)
clean2 = re.sub(' +', ' ', clean)
# Creating the PrimerData dictionary
with open("PrimerData.json",'w') as file:
primerDataSlice = clean[clean.index("molecular"): -1]
primerData = re.split(": |\n", primerDataSlice)
primerKeys = primerData[0::2]
primerValues = primerData[1::2]
primerDict = {"Primer Data": dict(zip(primerKeys,primerValues))}
# Generatring the JSON array "Primer Data"
primerJSON = json.dumps(primerDict, ensure_ascii=False)
file.write(primerJSON)
# Grabbing the date (this has just the date, so json will have to add date.)
date = re.findall('(\d{2}[\/\- ](\d{2}|january|jan|february|feb|march|mar|april|apr|may|may|june|jun|july|jul|august|aug|september|sep|october|oct|november|nov|december|dec)[\/\- ]\d{2,4})', clean2)
Without input data it is difficult to give you working code. A minimal working example with input would help. As for JSON handling, python dictionaries can dump to json easily. See examples here.
https://docs.python-guide.org/scenarios/json/
Get a json string from a dictionary and write to a file. Figure out how to parse the text into a dictionary.
import json
d = {"Date" : "21feb2019", "Sequence ID" : "lacz-rp", "Sequence 5'-3'" : "gat"}
json_data = json.dumps(d)
print(json_data)
# Write that data to a file
So, I did figure this out, the problem was really just that because of the way my pre-processing was pulling all the data into a single list wasn't really that great of an idea considering that the keys for the dictionary never changed.
Here is the semi-finished result for making the Dictionary and JSON file.
# Collect the sequence name
name = clean2[clean2.index("Sequence") + 11: clean2.index("Sequence") + 19]
# Collecting Shipment info
ordered = input("Who placed this order? ")
received = input("Who is receiving this order? ")
dateOrder = re.findall(
r"(\d{2}[/\- ](\d{2}|January|Jan|February|Feb|March|Mar|April|Apr|May|June|Jun|July|Jul|August|Aug|September|Sep|October|Oct|November|Nov|December|Dec)[/\- ]\d{2,4})",
clean2)
dateReceived = date.today()
refNo = clean2[clean2.index("ref.No. ") + 8: clean2.index("ref.No.") + 17]
orderNo = clean2[clean2.index("Order No.") +
10: clean2.index("Order No.") + 18]
# Finding and grabbing the sequence data. Storing it and then finding the
# GC content and melting temp or TM
bases = int(clean2[clean2.index("bases") - 3:clean2.index("bases") - 1])
seqList = [line for line in clean2 if re.match(r'^[AGCT]+$', line)]
sequence = "".join(i for i in seqList[:bases])
def gc_content(x):
count = 0
for i in x:
if i == 'G' or i == 'C':
count += 1
else:
count = count
return round((count / bases) * 100, 1)
gc = gc_content(sequence)
tm = mt.Tm_GC(sequence, Na=50)
moleWeight = round(mw(Seq(sequence, generic_dna)), 2)
dilWeight = float(clean2[clean2.index("ug/OD260:") +
10: clean2.index("ug/OD260:") + 14])
dilution = dilWeight * 10
primerDict = {"Primer Data": {
"Sequence": sequence,
"Bases": bases,
"TM (50mM NaCl)": tm,
"% GC content": gc,
"Molecular weight": moleWeight,
"ug/0D260": dilWeight,
"Dilution volume (uL)": dilution
},
"Shipment Info": {
"Ref. No.": refNo,
"Order No.": orderNo,
"Ordered by": ordered,
"Date of Order": dateOrder,
"Received By": received,
"Date Received": str(dateReceived.strftime("%d-%b-%Y"))
}}
# Generating the JSON array "Primer Data"
with open("".join(name) + ".json", 'w') as file:
primerJSON = json.dumps(primerDict, ensure_ascii=False)
file.write(primerJSON)

Printing out python output to an html file

I am writing a script to print the output to an html file. I am stuck on the format of my output. Below is my code:
def printTohtml(Alist):
myfile = open('zip_files.html', 'w')
html = """<html>
<head></head>
<body><p></p>{htmlText}</body>
</html>"""
title = "Study - User - zip file - Last date modified"
myfile.write(html.format(htmlText = title))
for newL in Alist:
for j in newL:
if j == newL[-1]:
myfile.write(html.format(htmlText=j))
else:
message = j + ', '
myfile.write(html.format(htmlText = message))
myfile.close()
Alist = [['123', 'user1', 'New Compressed (zipped) Folder.zip', '05-24-17'],
['123', 'user2', 'Iam.zip', '05-19-17'], ['abcd', 'Letsee.zip', '05-22-17'],
['Here', 'whichTwo.zip', '06-01-17']]
printTohtml(Alist)
I want my output to be like this:
Study - User - zip file - Last date modified
123, user1, New Compressed (zipped) Folder.zip, 05-24-17
123, user2, Iam.zip, 05-19-17
abcd, Letsee.zip, 05-22-17
Here, whichTwo.zip, 06-01-17
But my code is giving me everything on its own line. Can anyone please help me?
Thanks in advance for your help!
My Output:
Study - User - zip file - Last date modified
123,
user1,
New Compressed (zipped) Folder.zip,
05-24-17
123,
user2,
Iam.zip,
05-19-17
abcd,
Letsee.zip,
05-22-17
Here,
whichTwo.zip,
06-01-17
You might want to try something like that. I haven't tested but this will create the string first and then write it to the file. Might be faster for avoiding multiple writes but I am not sure how python is handling that on the background.
def printTohtml(Alist):
myfile = open('zip_files.html', 'w')
html = """<html>
<head></head>
<body><p></p>{htmlText}</body>
</html>"""
title = "Study - User - zip file - Last date modified"
Alist = [title] + [", ".join(line) for line in Alist]
myfile.write(html.format(htmlText = "\n".join(Alist)))
myfile.close()
Alist = [['123', 'user1', 'New Compressed (zipped) Folder.zip', '05-24-17'],
['123', 'user2', 'Iam.zip', '05-19-17'], ['abcd', 'Letsee.zip', '05-22-17'],
['Here', 'whichTwo.zip', '06-01-17']]
printTohtml(Alist)
Your issue is that you're including the html, body and paragraph tags every time you write a line to your file.
Why don't you concatenate your string, separating the lines with <br> tags, and then load them into your file, like so:
def printTohtml(Alist):
myfile = open('zip_files.html', 'w')
html = """<html>
<head></head>
<body><p>{htmlText}</p></body>
</html>"""
complete_string = "Study - User - zip file - Last date modified"
for newL in Alist:
for j in newL:
if j == newL[-1]:
complete_string += j + "<br>"
else:
message = j + ', '
complete_string += message + "<br>"
myfile.write(html.format(htmlText = complete_string))
myfile.close()
Also, your template placeholder is in the wrong spot, it should be between your paragraph tags.

How to convert a dynamic JSON like file to a CSV file

I have a file which looks exactly as below.
{"eventid" : "12345" ,"name":"test1","age":"18"}
{"eventid" : "12346" ,"age":"65"}
{"eventid" : "12336" ,"name":"test3","age":"22","gender":"Male"}
Think of the above file as event.json
The number of data objects may vary per line.
I would like the following csv output. and it would be output.csv
eventid,name,age,gender
12345,test1,18
12346,,65
12336,test3,22,Male
Could someone kindly help me? I could accept the answer from an any scripting language (Javascript, Python and etc.).
This code will collect all the headers dynamically and write the file to CSV.
Read comments in code for details:
import json
# Load data from file
data = '''{"eventid" : "12345" ,"name":"test1","age":"18"}
{"eventid" : "12346" ,"age":"65"}
{"eventid" : "12336" ,"name":"test3","age":"22","gender":"Male"}'''
# Store records for later use
records = [];
# Keep track of headers in a set
headers = set([]);
for line in data.split("\n"):
line = line.strip();
# Parse each line as JSON
parsedJson = json.loads(line)
records.append(parsedJson)
# Make sure all found headers are kept in the headers set
for header in parsedJson.keys():
headers.add(header)
# You only know what headers were there once you have read all the JSON once.
#Now we have all the information we need, like what all possible headers are.
outfile = open('output_json_to_csv.csv','w')
# write headers to the file in order
outfile.write(",".join(sorted(headers)) + '\n')
for record in records:
# write each record based on available fields
curLine = []
# For each header in alphabetical order
for header in sorted(headers):
# If that record has the field
if record.has_key(header):
# Then write that value to the line
curLine.append(record[header])
else:
# Otherwise put an empty value as a placeholder
curLine.append('')
# Write the line to file
outfile.write(",".join(curLine) + '\n')
outfile.close()
Here is a solution using jq.
If filter.jq contains the following filter
(reduce (.[]|keys_unsorted[]) as $k ({};.[$k]="")) as $o # object with all keys
| ($o | keys_unsorted), (.[] | $o * . | [.[]]) # generate header and data
| join(",") # convert to csv
and data.json contains the sample data then
$ jq -Mrs -f filter.jq data.json
produces
eventid,name,age,gender
12345,test1,18,
12346,,65,
12336,test3,22,Male
Here's a Python solution (should work in both Python 2 & 3).
I'm not proud of the code, as there's probably a better way to do this (using the csv module) but this gives you the desired output.
I've taken the liberty of naming your JSON data data.json and I'm naming the output csv file output.csv.
import json
header = ['eventid', 'name', 'age', 'gender']
with open('data.json', 'r') as infile, \
open('outfile.csv', 'w+') as outfile:
# Writes header row
outfile.write(','.join(header))
outfile.write('\n')
for row in infile:
line = ['', '', '', ''] # I'm sure there's a better way
datarow = json.loads(row)
for key in datarow:
line[header.index(key)] = datarow[key]
outfile.write(','.join(line))
outfile.write('\n')
Hope this helps.
Using Angularjs with ngCsv plugin we can generate csv file from desired json with dynamic headers.
Run in plunkr
// Code goes here
var myapp = angular.module('myapp', ["ngSanitize", "ngCsv"]);
myapp.controller('myctrl', function($scope) {
$scope.filename = "test";
$scope.getArray = [{
label: 'Apple',
value: 2,
x:1,
}, {
label: 'Pear',
value: 4,
x:38
}, {
label: 'Watermelon',
value: 4,
x:38
}];
$scope.getHeader = function() {
var vals = [];
for( var key in $scope.getArray ) {
for(var k in $scope.getArray[key]){
vals.push(k);
}
break;
}
return vals;
};
});
<!DOCTYPE html>
<html>
<head>
<link href="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/css/bootstrap.min.css" rel="stylesheet">
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.4.7/angular.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.4.7/angular-sanitize.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/ng-csv/0.3.6/ng-csv.min.js"></script>
</head>
<body>
<div ng-app="myapp">
<div class="container" ng-controller="myctrl">
<div class="page-header">
<h1>ngCsv <small>example</small></h1>
</div>
<button class="btn btn-default" ng-csv="getArray" csv-header="getHeader()" filename="{{ filename }}.csv" field-separator="," decimal-separator=".">Export to CSV with header</button>
</div>
</div>
</body>
</html>
var arr = $.map(obj, function(el) { return el });
var content = "";
for(var element in arr){
content += element + ",";
}
var filePath = "someFile.csv";
var fso = new ActiveXObject("Scripting.FileSystemObject");
var fh = fso.OpenTextFile(filePath, 8, false, 0);
fh.WriteLine(content);
fh.Close();