I can explore a struct on the command line, like,
octave:1> fieldnames(data)
ans =
{
[1,1] = training
[2,1] = validation
[3,1] = test
}
octave:2> fieldnames(data.training)
ans =
{
[1,1] = inputs
[2,1] = targets
}
but, is there any way I could dump the entire structure? I'm envisioning some kind of output like,
data :: struct
training :: struct
inputs :: 256x1000 double
...
Thanks in advance!
Take a look at Basic Usage & Examples where it says
Note that when Octave prints the value of a structure that contains other structures, only a few levels are displayed. [...] This prevents long and confusing output from large deeply nested structures. The number of levels to print for nested structures may be set with the function struct_levels_to_print, and the function print_struct_array_contents may be used to enable printing of the contents of structure arrays.
Related
I have data loaded from JSON and am trying to extract arbitrary nested values using a list as input, where the list corresponds to the names of successive children. I want a function get_value(data,lookup) that returns the value from data by treating each entry in lookup as a nested child.
In the example below, when lookup=['alldata','TimeSeries','rates'], the return value should be [1.3241,1.3233].
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
def get_value(data,lookup):
res = data
for item in lookup:
res = res[item]
return res
lookup = ['alldata','TimeSeries','rates']
get_value(json_data,lookup)
My example works, but there are two problems:
It's inefficient - In my for loop, I copy the whole TimeSeries object to res, only to then replace it with the rates list. As #Andrej Kesely explained, res is a reference at each iteration, so data isn't being copied.
It's not concise - I was hoping to be able to find a concise (eg one or two line) way of extracting the data using something like list comprehension syntax
If you want one-liner and you are using Python 3.8, you can use assignment expression ("walrus operator"):
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
def get_value(data,lookup):
return [data:=data[item] for item in lookup][-1]
lookup = ['alldata','TimeSeries','rates']
print( get_value(json_data,lookup) )
Prints:
[1.3241, 1.3233]
I don't think you can do it without a loop, but you could use a reducer here to increase readability.
functools.reduce(dict.get, lookup, json_data)
I am trying to read a simple xlsx file with xlsread in octave. Its csv version is shown below:
2,4,abc,6
8,10,pqr,12
14,16,xyz,18
I am trying to read and write the contents with this code:
[~, ~, RAW] = xlsread('file.xlsx');
allData = cell2mat(RAW); # error with cell2mat()
printf('data nrows=%d, ncolms=%d\n', rows(allData), columns(allData));
for i=1:rows(allData)
for j=1:columns(allData)
printf('data(%d,%d) = %d\n', i,j, allData(i,j));
endfor
endfor
and I am getting the following error:
error: cell2mat: wrong type elements or mixed cells, structs, and matrices
I have experimented with several variations of this problem:
(A) If I delete the column with the text data, ie the xlsx file contains only numbers, then this code works fine.
(B) On the other hand, if I delete the cell2mat() call even for the purely number xlsx, I get an error during the cell access:
error: printf: wrong type argument 'cell'
(C) If I use cell2mat() during printf, like this:
printf('data(%d,%d) = %d\n', i,j, cell2mat(allData(i,j)));
I get correct data for the integers, and garbage for the text items.
So, how can I access and print each cell of the xlsx data, when the xlsx contains mixed-type data?
In other words, given a column index, and given that I know what type of data I am expecting there (integer or string), so how can I re-format the cell type before using it?
A numeric array cannot have multi-class data hence cell2mat fails. Cell-arrays are used to hold such type of data and you already have it in a cell array, so there is no need of conversion and so just skip that line (allData = cell2mat(RAW);).
Within the loop, you have this line:
printf('data(%d,%d) = %d\n', i, j, allData(i,j) );
% ↑ ↑ ↑
% 1 2a 2b
The problems are represented by up-arrows.
You've mixed data in your cell array but you're using %d as the data specifier. You can fix this by converting all of your data to string and then use %s as the specifier.
If you use square brackets ( ) for indexing a cell array, you will get a cell. What you need here is the content of that cell and braces { } are used for that.
So it will be:
printf('data(%d,%d) = %s\n', i,j, num2str(RAW{i,j}));
Note that instead of all that, you can simply just enter RAW to get this:
octave:1> RAW
RAW =
{
[1,1] = 2
[2,1] = 8
[3,1] = 14
[1,2] = 4
[2,2] = 10
[3,2] = 16
[1,3] = abc
[2,3] = pqr
[3,3] = xyz
[1,4] = 6
[2,4] = 12
[3,4] = 18
}
Do you know a very fast JSON Parser for Matlab?
Currently I'm using JSONlab, but with larger JSON files (mine is 12 MB, 500 000 lines) it's really slow. Or do you have any tips' for me to increase the speed?
P.S. The JSON file is max. 3 levels deep.
If you want to be fast, you could use the Java JSON parser.
And before this answer gets out of hand, I am going to post the stuff I put down so far:
clc
% input example
jsonStr = '{"bool1": true, "string1": "some text", "double1": 5, "array1": [1,2,3], "nested": {"val1": 1, "val2": "one"}}'
% use java..
javaaddpath('json.jar');
jsonObj = org.json.JSONObject(jsonStr);
% check out the available methods
jsonObj.methods % see also http://www.json.org/javadoc/org/json/JSONObject.html
% get some stuff
b = jsonObj.getBoolean('bool1')
s = jsonObj.getString('string1')
d = jsonObj.getDouble('double1')
i = jsonObj.getJSONObject('nested').getInt('val1')
% put some stuff
jsonObj = jsonObj.put('sum', 1+1);
% getting an array or matrix is not so easy (you get a JSONArray)
e = jsonObj.get('array1');
% what are the methods to access that JSONArray?
e.methods
for idx = 1:e.length()
e.get(idx-1)
end
% but putting arrays or matrices works fine
jsonObj = jsonObj.put('matrix1', ones(5));
% you can get these also easily ..
m1 = jsonObj.get('matrix1')
% .. as long as you dont convert the obj back to a string
jsonObj = org.json.JSONObject(jsonObj.toString());
m2 = jsonObj.get('matrix1')
If you can afford to call .NET code, you may want to have a look at this lightweight guy (I'm the author):
https://github.com/ysharplanguage/FastJsonParser#PerfDetailed
Coincidentally, my benchmark includes a test ("fathers data") in the 12MB ballpark precisely (and with a couple levels of depth also) that this parser parses into POCOs in under 250 ms on my cheap laptop.
As for the MATLAB + .NET code integration:
http://www.mathworks.com/help/matlab/using-net-libraries-in-matlab.html
'HTH,
If you just want to read JSON files, and have a C++11 compiler, you can use the very fast json_read mex function.
Since Matlab 2016b, you can use jsondecode.
I have not compared its performance to other implementations.
From personal experience I can say that it is not horribly slow.
I try to use Pool from the multiprocessing module to speed up reading in large csv files. For this, I adapted an example (from py2k), but it seems like the csv.dictreader object has no length. Does it mean I can only iterate over it? Is there a way to chunk it still?
These questions seemed relevant, but did not really answer my question:
Number of lines in csv.DictReader,
How to chunk a list in Python 3?
My code tried to do this:
source = open('/scratch/data.txt','r')
def csv2nodes(r):
strptime = time.strptime
mktime = time.mktime
l = []
ppl = set()
for row in r:
cell = int(row['cell'])
id = int(row['seq_ei'])
st = mktime(strptime(row['dat_deb_occupation'],'%d/%m/%Y'))
ed = mktime(strptime(row['dat_fin_occupation'],'%d/%m/%Y'))
# collect list
l.append([(id,cell,{1:st,2: ed})])
# collect separate sets
ppl.add(id)
return (l,ppl)
def csv2graph(source):
r = csv.DictReader(source,delimiter=',')
MG=nx.MultiGraph()
l = []
ppl = set()
# Remember that I use integers for edge attributes, to save space! Dic above.
# start: 1
# end: 2
p = Pool(processes=4)
node_divisor = len(p._pool)*4
node_chunks = list(chunks(r,int(len(r)/int(node_divisor))))
num_chunks = len(node_chunks)
pedgelists = p.map(csv2nodes,
zip(node_chunks))
ll = []
for l in pedgelists:
ll.append(l[0])
ppl.update(l[1])
MG.add_edges_from(ll)
return (MG,ppl)
From the csv.DictReader documentation (and the csv.reader class it subclasses), the class returns an iterator. The code should have thrown a TypeError when you called len().
You can still chunk the data, but you'll have to read it entirely into memory. If you're concerned about memory you can switch from csv.DictReader to csv.reader and skip the overhead of the dictionaries csv.DictReader creates. To improve readability in csv2nodes(), you can assign constants to address each field's index:
CELL = 0
SEQ_EI = 1
DAT_DEB_OCCUPATION = 4
DAT_FIN_OCCUPATION = 5
I also recommend using a different variable than id, since that's a built-in function name.
In my application iOS I need to export some data into CSV or HTML format. How can I do this?
RegexKitLite comes with an example of how to read a csv file into an NSArray of NSArrays, and to go in the reverse direction is pretty trivial.
It'd be something like this (warning: code typed in browser):
NSArray * data = ...; //An NSArray of NSArrays of NSStrings
NSMutableString * csv = [NSMutableString string];
for (NSArray * line in data) {
NSMutableArray * formattedLine = [NSMutableArray array];
for (NSString * field in line) {
BOOL shouldQuote = NO;
NSRange r = [field rangeOfString:#","];
//fields that contain a , must be quoted
if (r.location != NSNotFound) {
shouldQuote = YES;
}
r = [field rangeOfString:#"\""];
//fields that contain a " must have them escaped to "" and be quoted
if (r.location != NSNotFound) {
field = [field stringByReplacingOccurrencesOfString:#"\"" withString:#"\"\""];
shouldQuote = YES;
}
if (shouldQuote == YES) {
[formattedLine addObject:[NSString stringWithFormat:#"\"%#\"", field]];
} else {
[formattedLine addObject:field];
}
}
NSString * combinedLine = [formattedLine componentsJoinedByString:#","];
[csv appendFormat:#"%#\n", combinedLine];
}
[csv writeToFile:#"/path/to/file.csv" atomically:NO];
The general solution is to use stringWithFormat: to format each row. Presumably, you're writing this to a file or socket, in which case you would write a data representation of each string (see dataUsingEncoding:) to the file handle as you create it.
If you're formatting a lot of rows, you may want to use initWithFormat: and explicit release messages, in order to avoid running out of memory by piling up too many string objects in the autorelease pool.
And always, always, always remember to escape the values correctly before passing them to the formatting method.
Escaping (along with unescaping) is a really good thing to write unit tests for. Write a function to CSV-format a single row, and have test cases that compare its result to correct output. If you have a CSV parser on hand, or you're going to need one, or you just want to be really sure your escaping is correct, write unit tests for the parsing and unescaping as well as the escaping and formatting.
If you can start with a single record containing any combination of CSV-special and/or SQL-special characters, format it, parse the formatted string, and end up with a record equal to the one you started with, you know your code is good.
(All of the above applies equally to CSV and to HTML. If possible, you might consider using XHTML, so that you can use XML validation tools and parsers, including NSXMLParser.)
CSV - comma separated values.
I usually just iterate over the data structures in my application and output one set of values per line, values within set separated with comma.
struct person
{
string first_name;
string second_name;
};
person tony = {"tony", "momo"};
person john = {"john", "smith"};
would look like
tony, momo
john, smith