How to access values in ordereddict? - csv

I opened and read csv file from argv to dictionary
data = open(argv[1])
reader = csv.DictReader(data)
dict_list = []
for line in reader:
dict_list.append(line)
and now when I want to access the content of the csv file like this:
for x in dict_list[0]:
print(x)
All I get is this:
"OrderedDict([('name', 'Alice'), ('AGATC', '2'), ('AATG', '8'), ('TATC', '3')])"
With this function:
for x in dict_list[0]:
print(x)
I get this result:
name
AGATC
AATG
TATC
Can you help me to access 'Alice', '2', '8' and '3'.

You can iterate through the dictionary a couple ways.
let's initialize the dictionary with your values:
from collections import OrderedDict
dict_list = OrderedDict([('name', 'Alice'), ('AGATC', '2'), ('AATG', '8'), ('TATC', '3')])
which gets us:
OrderedDict([('name', 'Alice'), ('AGATC', '2'), ('AATG', '8'), ('TATC', '3')])
you can then iterate through each key and then query the value attached:
for k in dict_list:
print(f"key={k}, value={dict_list[k]}")
and you will get:
key=name, value=Alice
key=AGATC, value=2
key=AATG, value=8
key=TATC, value=3
or, you can get both the key and the value at the same time:
for (k, v) in dict_list.items():
print(f"key={k}, value={v}")
which will get you the same output:
key=name, value=Alice
key=AGATC, value=2
key=AATG, value=8
key=TATC, value=3

I made my OrderedDict dict_list a dictionary and now I can access to the values of keys:
for x in dict_list:
temp = dict(x)
for y in types_count:
print(temp.get(y))

Related

List of tuples with strings not all arguments converted during string formatting

I am trying to insert hundreds of rows into a MySQL db at once. There are two types of records, unanswered calls and answered calls. I am putting all records into a list of tuples, and each record is it's own tuple, so that I can use the executemany function. I am getting a TypeError: not all arguments converted during string formatting, and I don't understand why.
answered = []
unanswered = []
insertQuery = """ INSERT INTO cdr (recno, relcause, starttime, answertime, endtime, releasecausetext, releasecausecode, 1streleasedialog,
origtrunk, callingnumber, orighost, callednumber, desthost, origcallid, origremotepayloadip, origremotepayloadport,
origlocalpayloadip, origlocalpayloadport, termtrunk, termsourcenumber, termsourcehost, termdestnumber, termdesthostname,
termcallid, termremotepayloadip, termremotepayloadport, termlocalpayloadip, termlocalpayloadport, duration, postdialdelay,
ringtime, durationms, routetableused, origtidalias, termtidalias, termpddms, reasoncause, mappedcausecode, mappedreasoncause,
reasoncausetext, origmos, termmos) VALUES ('%s'); """
for y in cdrList:
#Check to make sure record does not exist
sqlQuery = "select * from cdr where recno = %d and origcallid = %s;" % (int(y[0]), y[13])
if cursor.execute(sqlQuery):
print("Record exists")
else:
if y[7]=='NA':
unanswered.append((y[0], y[5],extractSqlDate(y[6]), 'null', extractSqlDate(y[8]), y[10], y[11], y[12], y[13], y[15], y[16], y[17], y[18], y[19], y[20], y[21], y[22], y[23], y[32], y[34], y[35], y[36], y[37], y[38], y[39], y[40], y[41], y[42], y[53], y[54], y[55], y[56], y[60], y[66], y[67], y[71], y[78], y[79], y[80], y[81], y[85], y[88]))
else:
answered.append((y[0], y[5],extractSqlDate(y[6]), extractSqlDate(y[7]), extractSqlDate(y[8]), y[10], y[11], y[12], y[13], y[15], y[16], y[17], y[18], y[19], y[20], y[21], y[22], y[23], y[32], y[34], y[35], y[36], y[37], y[38], y[39], y[40], y[41], y[42], y[53], y[54], y[55], y[56], y[60], y[66], y[67], y[71], y[78], y[79], y[80], y[81], y[85], y[88]))
try:
print(answered)
cursor.executemany(insertQuery, answered)
cursor.executemany(insertQuery, unanswered)
db.commit()
print("Record inserted successfully")
except MySQLdb.Error as e:
print(e)
I have confirmed that each element in each tuple in the list is a string:
Successfully connected to database
/PATH/20190610/20190610-0015-1750147245-1750147250.cdr
[('1750147245', '0001', '2019-06-10 00:10:50', '2019-06-10 00:10:59', '2019-06-10 00:11:13', 'Normal BYE', ' 200', 'O', '001102', '+tn', 'ip', '+tn', 'ip', '273418599_83875291#ip', 'ip', '20530', 'ip', '11944', '000020', '+tn', 'ip', 'tn', 'ip', '4121333-0-2851866068#ip', 'ip', '16840', 'ip', '11946', '13', '1', '8', '13450', '50', 'C - Peerless C6933_04 Origin', 'P - Thirdlane 6', '1150', '', '200', '', '', '0', '0')]
I found the problem. The tuple was returning strings, so the insert query was trying to insert values like this: ''value''. I removed the ' around the %s, and, based on #jonrsharpe's comment, added %s for each other value, and it worked.

how to convert a list of dataframe to json in python

I want to convert below dataframes to json.
Salary :
Balance before Salary Salary
Date
Jun-18 27.20 15300.0
Jul-18 88.20 15300.0
Aug-18 176.48 14783.0
Sep-18 48.48 16249.0
Oct-18 241.48 14448.0
Nov-18 49.48 15663.0
Balance :
Balance
Date
Jun-18 3580.661538
Jul-18 6817.675556
Aug-18 7753.483077
Sep-18 5413.868421
Oct-18 5996.120000
Nov-18 8276.805000
Dec-18 9269.000000
I tried:
dfs = [Salary, Balance]
dfs.to_json("path/test.json")
but it gives me an error:
AttributeError: 'list' object has no attribute 'to_json'
but when I tried for single dataframe, I got the following result:
{"Balance before Salary":{"Jun-18":27.2,"Jul-18":88.2,"Aug-18":176.48,"Sep-18":48.48,"Oct-18":241.48,"Nov-18":49.48},"Salary":{"Jun-18":15300.0,"Jul-18":15300.0,"Aug-18":14783.0,"Sep-18":16249.0,"Oct-18":14448.0,"Nov-18":15663.0}}
You can use to_json method.
From the docs:
>>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
... index=['row 1', 'row 2'],
... columns=['col 1', 'col 2'])
>>> df.to_json(orient='records')
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
Use concat for one DataFrame (necessary same index values in each DataFrame for alignment) and then convert to json:
dfs = [check_Salary_date, sum_Salary]
df = pd.concat(dfs, axis=1, keys=np.arange(len(dfs)))
df.columns = ['{}{}'.format(b, a) for a, b in df.columns]
df.to_json("path/test.json")

python csv writing SET values

Python 3.6
I am trying to write set values to CSV , I am getting the following output for the given code.
import csv
class test_write:
#classmethod
def test_write1(cls):
fieldnames1 = ['first_name', 'last_name']
cls.write_a_test1(fieldnames=fieldnames1)
#classmethod
def write_a_test1(cls, fieldnames):
with open('/Users/Desktop/delete1.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
abc = cls.var1()
writer.writerow(abc)
print("Done writing")
#staticmethod
def var1():
d = ('my', 'name', 'is', 'hahaha')
c = set()
abc = {'first_name': c, 'last_name': d}
return abc
test_write.test_write1()
When I open CSV file:
Output:
first_name last_name
set() ('my', 'name', 'is', 'hahaha')
I don't want it to print set() in the file if it is empty. Instead I need blank. Variable 'C' might have or might not have values it depends. How do I proceed with that.
Dictwriter expects the keys and values to be strings, so str is being called on the objects. What you should use is something like:
d = ('my', 'name', 'is', 'hahaha')
c = set()
abc = {'first_name': ' '.join(c), 'last_name': ' '.join(d)}
return abc
The the result of the file will be:
first_name,last_name
,my name is hahaha

Convert pandas columns to comma separated lists to be used in sql statements

I have a dataframe and I am trying to turn the column into a comma separated list. The end goal is to pass this comma seperated list as a list of filtered items in a SQL query.
How do I go about doing this?
> import pandas as pd
>
> mydata = [{'id' : 'jack', 'b': 87, 'c': 1000},
> {'id' : 'jill', 'b': 55, 'c':2000}, {'id' : 'july', 'b': 5555, 'c':22000}]
df = pd.DataFrame(mydata)
df
Expected solution - note the quotes around the ids since they are strings and the items in column titled 'b' since that is a numerical field and the way in which SQL works. I would then eventually send a query like
select * from mytable where ids in (my_ids) or values in (my_values):
my_ids = 'jack', 'jill','july'
my_values = 87,55,5555
I encountered a similar issue and solved it in one line using values and tolist() as
df['col_name'].values.tolist()
So in your case, it will be
my_ids = my_data['id'].values.tolist() # ['jack', 'jill', 'july']
my_values = my_data['b'].values.tolist()
Let's use apply with argument 'reduce=False' then check the dtype of the series and apply the proper argument to join:
df.apply(lambda x: ', '.join(x.astype(str)) if x.dtype=='int64' else ', '.join("\'"+x.astype(str)+"\'"), reduce=False)
Output:
b 87, 55, 5555
c 1000, 2000, 22000
id 'jack', 'jill', 'july'
dtype: object

Merge three csv files with same headers in Python

I have multiple CSVs; however, I'm having difficulty merging them as they all have the same headers. Here's an example.
CSV 1:
ID,COUNT
1,3037
2,394
3,141
5,352
7,31
CSV 2:
ID, COUNT
1,375
2,1178
3,1238
5,2907
6,231
7,2469
CSV 3:
ID, COUNT
1,675
2,7178
3,8238
6,431
7,6469
I need to combine all the CSV file on the ID, and create a new CSV with additional columns for each count column.
I've been testing it with 2 CSVs but I'm still not getting the right output.
with open('csv1.csv', 'r') as checkfile: #CSV Data is pulled from
checkfile_result = {record['ID']: record for record in csv.DictReader(checkfile)}
with open('csv2.csv', 'r') as infile:
#infile_result = {addCount['COUNT']: addCount for addCount in csv.Dictreader(infile)}
with open('Result.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, reader.fieldnames + ['COUNT'])
writer.writeheader()
for item in reader:
record = checkfile_result.get(item['ID'], None)
if record:
item['ID'] = record['COUNT'] # ???
item['COUNT'] = record['COUNT']
else:
item['COUNT'] = None
item['COUNT'] = None
writer.writerow(item)
However, with the above code, I get three columns, but the data from the first CSV is populated in both columns. For example.
Result.CSV *Notice the keys skipping the ID that doesn't exist in the CSV
ID, COUNT, COUNT
1, 3037, 3037
2, 394, 394
3,141, 141
5,352. 352
7,31, 31
The result should be:
ID, COUNT, COUNT
1,3037, 375
2,394, 1178
3,141, 1238
5,352, 2907
6, ,231
7,31, 2469
Etc etc
Any help will be greatly appreciated.
This works:
import csv
def read_csv(fobj):
reader = csv.DictReader(fobj, delimiter=',')
return {line['ID']: line['COUNT'] for line in reader}
with open('csv1.csv') as csv1, open('csv2.csv') as csv2, \
open('csv3.csv') as csv3, open('out.csv', 'w') as out:
data = [read_csv(fobj) for fobj in [csv1, csv2, csv3]]
all_keys = sorted(set(data[0]).union(data[1]).union(data[2]))
out.write('ID COUNT COUNT COUNT\n')
for key in all_keys:
counts = (entry.get(key, '') for entry in data)
out.write('{}, {}, {}, {}\n'.format(key, *tuple(counts)))
The content of the output file:
ID, COUNT, COUNT, COUNT
1, 3037, 375, 675
2, 394, 1178, 7178
3, 141, 1238, 8238
5, 352, 2907,
6, , 231, 431
7, 31, 2469, 6469
The Details
The function read_csv returns a dictionary with the ids as keys and the counst as values. We will use this function to read all three inputs. For example for csv1.csv
with open('csv1.csv') as csv1:
print(read_csv(csv1))
we get this result:
{'1': '3037', '3': '141', '2': '394', '5': '352', '7': '31'}
We need to have all keys. One way is to convert them to sets and use union to find the unique ones. We also sort them:
all_keys = sorted(set(data[0]).union(data[1]).union(data[2]))
['1', '2', '3', '5', '6', '7']
In the loop over all keys, we retrieve the count using entry.get(key, ''). If the key is not contained, we get an empty string. Look at the output file. You see just commas and no values at places were no value was found in the input. We use a generator expression so we don't have to re-type everything three times:
counts = (entry.get(key, '') for entry in data)
This is the content of one of the generators:
list(counts)
('3037', '375', '675')
Finally, we write to our output file. The * converts a tuple like this ('3037', '375', '675') into three arguments, i.e. .format() is called like this .format(key, '3037', '375', '675'):
out.write('{}, {}, {}, {}\n'.format(key, *tuple(counts)))