EOF Error During Dict Slice - json

I am trying to compile monthly data in to an existing JSON file that I loaded via import json. Initially, my json data just had one property which is 'name':
json_data['features'][1]['properties']
>>{'name':'John'}
But the end result with the monthly data I want is like this:
json_data['features'][1]['properties']
>>{'name':'John',
'2016-01': {'x1':0, 'x2':0, 'x3':1, 'x4':0},
'2016-02': {'x1':1, 'x2':0, 'x3':1, 'x4':0}, ... }
My monthly data are on separate tsv files. They have this format:
John 0 0 1 0
Jane 1 1 1 0
so I loaded them via import csv and parsed through a list of urls and set about placing them in a collective dictionary like so:
file_strings = ['2016-01.tsv', '2016-02.tsv', ... ]
collective_dict = {}
for i in strings:
with open(i) as f:
tsv_object = csv.reader(f, delimiter='\t')
collective_dict[i[:-4]] = rows[0]:rows[1:5] for rows in tsv_object
I checked how things turned out by slicing collective_dict like so:
collective_dict['2016-01']['John'][0]
>>'0'
Which is correct; it just needs to be cast into an integer.
For my next feat, I attempted to assign all of the monthly data to the respective json members as part of their external properties:
for i in file_strings:
for j in range(len(json_data['features'])):
json_data['features'][j]['properties'][i[:-4]] = {}
json_data['features'][j]['properties'][i[:-4]]['x1'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][0])
json_data['features'][j]['properties'][i[:-4]]['x2'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][1])
json_data['features'][j]['properties'][i[:-4]]['x3'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][2])
json_data['features'][j]['properties'][i[:-4]]['x4'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][3])
Here I got an arrow pointing at the last few characters:
Syntax Error: unexpected EOF while parsing
It is a pretty complicated slice, I suppose user error is not to be ruled out. However, I did double and triple check things. I also looked up this error. It seems to come up with input() related calls. I'm left a bit confused, I don't see how I made a mistake (although I'm already mentally prepared to accept that).
My only guess was that something somewhere was not a string. When I checked collective_dict and json_data, everything that was supposed to be a string was a string ('John', 'Jane' et all). So, I guess it's something else.
I made the problem as simple as I could while keeping the original structure of the data and for loops and so forth. I'm using Python 3.6.
Question
Why am I getting the EOF error? How can I build my external properties data without encountering such an error?

Here I have rewritten your last code block to:
for i in file_strings:
file_name = i[:-4]
for j in range(len(json_data['features'])):
name = json_data['features'][j]['properties']['name']
file_dict = json_data['features'][j]['properties'][file_name] = {}
for x in range(4):
x_string = 'x{}'.format(x+1)
file_dict[x_string] = int(collective_dict[file_name][name][x])
from:
for i in file_strings:
for j in range(len(json_data['features'])):
json_data['features'][j]['properties'][i[:-4]] = {}
json_data['features'][j]['properties'][i[:-4]]['x1'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][0])
json_data['features'][j]['properties'][i[:-4]]['x2'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][1])
json_data['features'][j]['properties'][i[:-4]]['x3'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][2])
json_data['features'][j]['properties'][i[:-4]]['x4'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][3])
That is just to make it a bit more readable, but that shouldn't change anything.
A thing I noticed in your other part of code is the following:
collective_dict[i[:-4]] = rows[0]:rows[1:5] for rows in tsv_object
The thing I refer to is the = rows[0]:rows[1:5] for rows in tsv_object part. In my IDE, that does not work, and I'm not sure if that is a typo in your question or of that is actually in your code, but I imagine you want it to actually be
collective_dict[i[:-4]] = {rows[0]:rows[1:5] for rows in tsv_object}
or something like that. I'm not sure if that could confuse the parser think that there is an error at the end of the file.
The ValueError: Invalid literal for int()
If your tsv-data is
John 0 0 1 0
Jane 1 1 1 0
Then it should be no problem to do int() of the string value. E.g.: int('42') will become an int with value 42. However, if you have an error in one, or several, lines of your files, then use something like this block of code to figure out which file and line it is:
file_strings = ['2016-01.tsv', '2016-02.tsv', ... ]
collective_dict = {}
for file_name in file_strings:
print('Reading {}'.format(file_name))
with open(file_name) as f:
tsv_object = csv.reader(f, delimiter='\t')
for line_no, (name, *x_values) in enumerate(tsv_object):
if len(x_values) != 4:
print('On line {}, there is only {} values!'.format(line_no, len(x_values)))
try:
intx = [int(x) for x in x_values]
except ValueError as e:
# Catch "Invalid literal for int()"
print('Line {}: {}'.format(line_no, e))

Related

Lua - Match pattern for CSV import to array, that factors in empty values (two commas next to each other)

I have been using the following Lua code for a while to do simply csv to array conversions, but everything previously had a value in every column, but this time on a csv formatted bank statement there are empty values, which this does not handle.
Here’s an example csv, with debit and credits.
Transaction Date,Transaction Type,Sort Code,Account Number,Transaction Description,Debit Amount,Credit Amount,Balance
05/04/2022,DD,'11-70-79,6033606,Refund,,10.00,159.57
05/04/2022,DEB,'11-70-79,6033606,Henry Ltd,30.00,,149.57
05/04/2022,SO,'11-70-79,6033606,NEIL PARKS,20.00,,179.57
01/04/2022,FPO,'11-70-79,6033606,MORTON GREEN,336.00,,199.57
01/04/2022,DD,'11-70-79,6033606,WORK SALARY,,100.00,435.57
01/04/2022,DD,'11-70-79,6033606,MERE BC,183.63,,535.57
01/04/2022,DD,'11-70-79,6033606,ABC LIFE,54.39,,719.20
I’ve tried different patterns (https://www.lua.org/pil/20.2.html), but none seem to work, I’m beginning to think I can’t fix this via the pattern as it’ll break how it works for the rest? I appreciate it if anyone can share how they would approach this…
local csvfilename = "/mnt/nas/Fireflyiii.csv"
local MATCH_PATTERN = "[^,]+"
local function create_array_from_file(csvfilename)
local file = assert(io.open(csvfilename, "r"))
local arr = {}
for line in file:lines() do
local row = {}
for match in string.gmatch(line, MATCH_PATTERN) do
table.insert(row, match)
end
table.insert(arr, row)
end
return arr
end

Can I use a value stored in a table as a key in another table?

I'm new to LUA and I still haven't gotten the hang of how classes work in LUA, so my question
probably has a very simple answer. I'm trying to make a function that takes a CSV file and turns it into a lua table.
The input file would be something like this
PropertyKey1,Propertykey2,Propertykey3
object1property1,object1property2,object1property3
object2property1,object2property2,object2property3
object3property1,object3property2,object3property3
and I want the resulting lua table to look something like this
objects = {
{
PropertyKey1 = object1property1
PropertyKey2 = object1property2
PropertyKey3 = object1property3
}
{
PropertyKey1 = object2property1
PropertyKey2 = object2property2
PropertyKey3 = object2property3
}
{
PropertyKey1 = object3property1
PropertyKey2 = object3property2
PropertyKey3 = object3property3
}
}
this is what I have thus far
function loadcsv(path)
local OutTable = {}
local file = io.open(path, "r")
local linecount = 0
for line in file:lines() do
local data = {}
local headers = {}
local headerkey = 1
if linecount < 1 then
for val in line:gmatch("([^,]+),?") do
table.insert(headers, val)
end
else
for word in line:gmatch("([^,]+),?") do
key = headers[headerkey]
data[headerkey] = word
headerkey = headerkey + 1
table.insert(OutTable, data)
end
end
linecount = linecount + 1
end
file:close()
return OutTable
end
The above code does not run. When I try to print any of the values, they come as nil.
The problem is this bit
key = headers[headerkey]
data[headerkey] = word
I wanted to use the values I stored in one table as keys on the second table, but it looks like since LUA only passes the references, that doesn't work.
I did a quick experiment to confirm it. I first set up 2 tables.
test = {}
test2 = {}
test[1]={"index"}
key = test[1]
key2 = "index"
First I tried assigning the value directly form the table
test2[test[1]] = "text"
print(test2.index) --This did not work
then I tried going trough another variable
test2[key] = "texto"
print(test2.index) --This did not work
I even tried using tostring()
key = tostring(test[1])
test2[key] = "texto"
print(test2.index) --This did not work
I wrote the string directly in the variable "key2" to confirm that I was using the right notation.
test2[key2] = "text"
print(test2.index) --This one worked
I read a bit on metatables, but I'm not fully clear on those. Would that be the simplest way to do what I'm trying to do, or is my approach flawed in some other way?
key = headers[headerkey]
key is not used so why assign a value to it?
data[headerkey] = word
headerkey is a numeric key. You start at 1 for each line and add 1 for each word in a line. So you end up with
data = {
[1] = "object1property1",
[2] = "object1property2",
[3] = "object1property3"
}
Instead of the intended
data = {
PropertyKey1 = "object1property1",
PropertyKey2 = "object1property2",
PropertyKey3 = "object1property3"
}
So you probably meant to write
local key = headers[headerkey]
data[key] = word
But you have to move headers out of the loop. Otherwise you'll end up with an empty table for line 1 resulting in key being nil which would cause Lua errors for using a nil table index.
The following line is called for every word
table.insert(OutTable, data)
You need to do this for every line!
Your code basically produces this output:
local tableA = {"object1property1", "object1property2", "object1property3"}
local tableB = {"object2property1", "object2property2", "object2property3"}
local tableC = {"object3property1", "object3property2", "object3property3"}
OutTable = {
tableA, tableA, tableA, tableB, tableB, tableB, tableC, tableC, tableC
}
I suggest you formulate your program in your first language and then translate it into Lua. This helps to avoid such errors.
Your problem is not related to metatables, classes or anything else mentioned. You simply used the wrong variable and messed up your inner loop.

Capture any standard report to JSON or XML?

I know that I can use LIST_TO_ASCI to convert a report to ASCII, but I would like to have a more high level data format like JSON, XML, CSV.
Is there a way to get something that is easier to handle then ASCII?
Here is the report I'd like to convert:
The conversion needs to be executed in ABAP on a result which was executed like this:
SUBMIT <REPORT_NAME> ... EXPORTING LIST TO MEMORY AND RETURN.
You can get access to SUBMIT list in memory like this:
call function 'LIST_FROM_MEMORY'
TABLES
listobject = t_list
EXCEPTIONS
not_found = 1
others = 2.
if sy-subrc <> 0.
message 'Unable to get list from memory' type 'E'.
endif.
call function 'WRITE_LIST'
TABLES
listobject = t_list
EXCEPTIONS
EMPTY_LIST = 1
OTHERS = 2
.
if sy-subrc <> 0.
message 'Unable to write list' type 'E'.
endif.
And the final step of the solution (conversion of result table to JSON) was already answered to you in your question.
I found a solution here: http://zevolving.com/2015/07/salv-table-22-get-data-directly-after-submit/
This is the code:
DATA: lt_outtab TYPE STANDARD TABLE OF alv_t_t2.
FIELD-SYMBOLS: <lt_outtab> like lt_outtab.
DATA lo_data TYPE REF TO data.
" Let know the model
cl_salv_bs_runtime_info=>set(
EXPORTING
display = abap_false
metadata = abap_false
data = abap_true
).
SUBMIT salv_demo_table_simple
AND RETURN.
TRY.
" get data from SALV model
cl_salv_bs_runtime_info=>get_data_ref(
IMPORTING
r_data = lo_data
).
ASSIGN lo_data->* to <lt_outtab>.
BREAK-POINT.
CATCH cx_salv_bs_sc_runtime_info.
ENDTRY.
Big thanks to Sandra Rossi, she gave me the hint to cx_salv_bs_sc_runtime_info.
Related answer: https://stackoverflow.com/a/52834118/633961

random line in file

This question was given to me during an interview. The interview is long over, but I'm still thinking about hte problem and its bugging me:
You have a language that contains the following tools: a rand() function, while and for loops, if statements, and a readline() method (similar to python's readline()). Given these tools, write an algorithm that returns a random line in the file. You don't know the size of the file, and you can only loop over the file's contents once.
I don't know the desired answer, but my solution would be the following:
chosen_line = ""
lines = 0
while (current_line = readline()):
if (rand(0, lines) == 0):
chosen_line = current_line
lines++
return chosen_line
Edit: A good explanation why this works was posted in this comment.
One method, guaranteeing a uniform distribution:
(1) Read the file line-by-line into an array (or similar, e.g. python list)
(2) Use rand() to select a number between 0 and largest index in the array.
Another, not guaranteeing a uniform distribution:
Read each line. On each read, also call rand(). If over a threshold, return the line.
Although similar to Marcin's third option, Luc's implementation always returns the first line, while parsing the whole file.
It should be something like:
chosen_line = ""
treshold = 90
max = 100
while chosen_line == "":
current_line = readline()
if (rand(0, max) > treshold):
chosen_line = current_line
print chosen_line
You could also return current_line in the case no line was chosen and you read the whole file.

how to chunk a csv (dict)reader object in python 3.2?

I try to use Pool from the multiprocessing module to speed up reading in large csv files. For this, I adapted an example (from py2k), but it seems like the csv.dictreader object has no length. Does it mean I can only iterate over it? Is there a way to chunk it still?
These questions seemed relevant, but did not really answer my question:
Number of lines in csv.DictReader,
How to chunk a list in Python 3?
My code tried to do this:
source = open('/scratch/data.txt','r')
def csv2nodes(r):
strptime = time.strptime
mktime = time.mktime
l = []
ppl = set()
for row in r:
cell = int(row['cell'])
id = int(row['seq_ei'])
st = mktime(strptime(row['dat_deb_occupation'],'%d/%m/%Y'))
ed = mktime(strptime(row['dat_fin_occupation'],'%d/%m/%Y'))
# collect list
l.append([(id,cell,{1:st,2: ed})])
# collect separate sets
ppl.add(id)
return (l,ppl)
def csv2graph(source):
r = csv.DictReader(source,delimiter=',')
MG=nx.MultiGraph()
l = []
ppl = set()
# Remember that I use integers for edge attributes, to save space! Dic above.
# start: 1
# end: 2
p = Pool(processes=4)
node_divisor = len(p._pool)*4
node_chunks = list(chunks(r,int(len(r)/int(node_divisor))))
num_chunks = len(node_chunks)
pedgelists = p.map(csv2nodes,
zip(node_chunks))
ll = []
for l in pedgelists:
ll.append(l[0])
ppl.update(l[1])
MG.add_edges_from(ll)
return (MG,ppl)
From the csv.DictReader documentation (and the csv.reader class it subclasses), the class returns an iterator. The code should have thrown a TypeError when you called len().
You can still chunk the data, but you'll have to read it entirely into memory. If you're concerned about memory you can switch from csv.DictReader to csv.reader and skip the overhead of the dictionaries csv.DictReader creates. To improve readability in csv2nodes(), you can assign constants to address each field's index:
CELL = 0
SEQ_EI = 1
DAT_DEB_OCCUPATION = 4
DAT_FIN_OCCUPATION = 5
I also recommend using a different variable than id, since that's a built-in function name.