Do you know a very fast JSON Parser for Matlab?
Currently I'm using JSONlab, but with larger JSON files (mine is 12 MB, 500 000 lines) it's really slow. Or do you have any tips' for me to increase the speed?
P.S. The JSON file is max. 3 levels deep.
If you want to be fast, you could use the Java JSON parser.
And before this answer gets out of hand, I am going to post the stuff I put down so far:
clc
% input example
jsonStr = '{"bool1": true, "string1": "some text", "double1": 5, "array1": [1,2,3], "nested": {"val1": 1, "val2": "one"}}'
% use java..
javaaddpath('json.jar');
jsonObj = org.json.JSONObject(jsonStr);
% check out the available methods
jsonObj.methods % see also http://www.json.org/javadoc/org/json/JSONObject.html
% get some stuff
b = jsonObj.getBoolean('bool1')
s = jsonObj.getString('string1')
d = jsonObj.getDouble('double1')
i = jsonObj.getJSONObject('nested').getInt('val1')
% put some stuff
jsonObj = jsonObj.put('sum', 1+1);
% getting an array or matrix is not so easy (you get a JSONArray)
e = jsonObj.get('array1');
% what are the methods to access that JSONArray?
e.methods
for idx = 1:e.length()
e.get(idx-1)
end
% but putting arrays or matrices works fine
jsonObj = jsonObj.put('matrix1', ones(5));
% you can get these also easily ..
m1 = jsonObj.get('matrix1')
% .. as long as you dont convert the obj back to a string
jsonObj = org.json.JSONObject(jsonObj.toString());
m2 = jsonObj.get('matrix1')
If you can afford to call .NET code, you may want to have a look at this lightweight guy (I'm the author):
https://github.com/ysharplanguage/FastJsonParser#PerfDetailed
Coincidentally, my benchmark includes a test ("fathers data") in the 12MB ballpark precisely (and with a couple levels of depth also) that this parser parses into POCOs in under 250 ms on my cheap laptop.
As for the MATLAB + .NET code integration:
http://www.mathworks.com/help/matlab/using-net-libraries-in-matlab.html
'HTH,
If you just want to read JSON files, and have a C++11 compiler, you can use the very fast json_read mex function.
Since Matlab 2016b, you can use jsondecode.
I have not compared its performance to other implementations.
From personal experience I can say that it is not horribly slow.
Related
Following script describes the decoding of a JSON Object, that is received via MQTT. In this case, we shall take following JSON Object as an example:
{"00-06-77-2f-37-94":{"publish_topic":"/stations/test","sample_rate":5000}}
After being received and decoded in the handleOnReceive function, the local function saveTable is called up with the decoded object which looks like:
["00-06-77-2f-37-94"] = {
publish_topic = "/stations/test",
sample_rate = 5000
}
The goal of the saveTable function is to go through the table above and assign the values "/stations/test" and 5000 respectively to the variables pubtop and rate. When I however print each of both variables, nil is returned in both cases.
How can I extract the values of this table and save them in mentioned variables?
If i can only save the values "publish_topic = "/stations/test"" and "sample_rate = 5000" at first, would I need to parse these to get the values above and save them, or is there another way?
local pubtop
local rate
local function saveTable(t)
local conversionTable = {}
for k,v in pairs(t) do
if type(v) == "table" then
conversionTable [k] = string.format("%q: {", k)
printTable(v)
print("}")
else
print(string.format("%q:", k) .. v .. ",")
end
end
pubtop = conversionTable[0]
rate = conversionTable[1]
end
local lua_value
local function handleOnReceive(topic, data, _, _)
print("handleOnReceive: topic '" .. topic .. "' message '" .. data .. "'")
print(data)
lua_value = JSON:decode(data)
saveTable(lua_value)
print(pubtop)
print(rate)
end
client:register('OnReceive', handleOnReceive)
previous question to thread: Decode and Parse JSON to Lua
The function I gave you was to recursively print table contents. It was not ment to be modified to get some specific values.
Your modifications do not make any sense. Why would you store that string in conversionTable[k]? You obviously have no idea what you're doing here. No offense but you should learn some basics befor you continue.
I gave you that function so you can print whatever is the result of your json decode.
If you know you get what you expect there is no point in recursively iterating through that table.
Just do it like that
for k,v in pairs(lua_value) do
print(k)
print(v.publish_topic)
print(v.sample_rate)
end
Now read the Lua reference manual and do some beginners tutorials please.
You're wasting a lot of time and resources if you're trying to implement things like that if you do not know how to access the elements of a table. This is like the most basic and important operation in Lua.
I'm trying to write a MySQL UDF in Go with cgo, in which I have a basic one functioning, but there's little bits and pieces that I can't figure out because I have no idea what some of the C variables are in terms of Go.
This is an example that I have written in C that forces the type of one of the MySQL parameters to an int
my_bool unhex_sha3_init(UDF_INIT *initid, UDF_ARGS *args, char *message) {
if (args->arg_count != 2) {
strcpy(message, "`unhex_sha3`() requires 2 parameters: the message part, and the bits");
return 1;
}
args->arg_type[1] = INT_RESULT;
initid->maybe_null = 1; //can return null
return 0;
}
And that works fine, but then I try to do the same/similar thing with this other function in Go like this
//export get_url_param_init
func get_url_param_init(initid *C.UDF_INIT, args *C.UDF_ARGS, message *C.char) C.my_bool {
if args.arg_count != 2 {
message = C.CString("`get_url_param` require 2 parameters: the URL string and the param name")
return 1
}
(*args.arg_type)[0] = C.STRING_RESULT
(*args.arg_type)[1] = C.STRING_RESULT
initid.maybe_null = 1
return 0
}
With this build error
./main.go:24: invalid operation: (*args.arg_type)[0] (type uint32 does
not support indexing)
And I'm not totally sure what that means. Shouldn't this be a slice of some sort, not a uint32?
And this is where it'd be super helpful have some way of dumping the args struct somewhere somehow (maybe even in Go syntax as a super plus) so that I can tell what I'm working with.
Well I used spew to dump the variable contents to a tmp file inside the init function (commenting out the lines that made it not compile) and I got this
(string) (len=3) "%#v"
(*main._Ctype_struct_st_udf_args)(0x7ff318006af8)({
arg_count: (main._Ctype_uint) 2,
_: ([4]uint8) (len=4 cap=4) {
00000000 00 00 00 00 |....|
},
arg_type: (*uint32)(0x7ff318006d18)(0),
args: (**main._Ctype_char)(0x7ff318006d20->0x7ff3180251b0)(0),
lengths: (*main._Ctype_ulong)(0x7ff318006d30)(0),
maybe_null: (*main._Ctype_char)(0x7ff318006d40)(0),
attributes: (**main._Ctype_char)(0x7ff318006d58->0x7ff318006b88)(39),
attribute_lengths: (*main._Ctype_ulong)(0x7ff318006d68)(2),
extension: (unsafe.Pointer) <nil>
})
Alright so huge help with #JimB who stuck with me even though I'm clearly less adept with Go (and especially CGO) but I've got a working version of my UDF, which is an easy and straight forward (and fast) function that pulls a single parameter out of a URL string and decodes it correctly and what not (e.g. %20 gets returned as a space, basically how you would expect it to work).
This seemed incredibly tricky with a pure C UDF because I don't really know C (as well as I know other languages), and there's a lot that can go wrong with URL parsing and URL parameter decoding, and native MySQL functions are slow (and there's not really a good, clean way to do the decoding either), so Go seemed like the better-than-perfect candidate for this kind of problem, for strong performance, ease of writing, and wide variety of easy to use built ins & third party libraries.
The full UDF and it's installation/usage instructions are here https://github.com/StirlingMarketingGroup/mysql-get-url-param/blob/master/main.go
First problem was debugging output. And I did that by Fprintfing to a tmp file instead of the standard output, so that I could check the file to see variable dumps.
t, err := ioutil.TempFile(os.TempDir(), "get-url-param")
fmt.Fprintf(t, "%#v\n", args.arg_type)
And then after I got my output (I was expecting args.arg_type to be an array like it is in C, but instead was a number) I needed to convert the data referenced by that number (the pointer to the start of the C array) to a Go array so I could set it's values.
argsTypes := *(*[2]uint32)(unsafe.Pointer(args.arg_type))
argsTypes[0] = C.STRING_RESULT
argsTypes[1] = C.STRING_RESULT
QueryRequest req=new QueryRequest(solrQuery);
NoOpResponseParser responseParser = new NoOpResponseParser();
responseParser.setWriterType("csv");
searcherServer.setParser(responseParser);
NamedList<Object> resp=searcherServer.request(req);
QueryResponse res = searcherServer.query(solrQuery);
responseString = (String)resp.get("response");
I use the above code to get the output in CSV format. The data I am trying to fetch is huge (In billions). So I want to include deep pagination of SOLR and get chunks of CSV output. Is there a way to do? Also, with the current version of SOLR (I cannot upgrade) I have to use the above code to get CSV output.
I tried the below way to fetch the results.
searcherServer = new HttpSolrServer(url);
SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(query);
solrQuery.set("fl","field1");
solrQuery.setParam("wt", "csv");
solrQuery.setStart(0);
solrQuery.setRows(1000);
solrQuery.setSort(SolrQuery.SortClause.asc("field2"));
In the output from the above code has wt as javabin. So I cannot get the CSV output.
Any suggestions?
You have two ways.
use Solr export request handler (or add it) and wt=csv parameter. Just to be clear, this is an Implicit Request Handler usually available even in older Solr versions and specifically designed to handle scenarios that involve exporting millions of records.
implement deep paging correctly. I suggest Yonic post paging and deep paging, it easier than you think. But after you'll have correctly implement, you also need to create the csv file by yourself.
The solution I found was:
SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(query); //what you want to fetch
QueryResponse res = searcherServer.query(solrQuery);
int numFound = (int)res.getResults().getNumFound();
int rowsToBeFetched = (numFound > 1000 ? (int)(numFound/6) : numFound);
for(int i=0; i< numFound; i=i+rowsToBeFetched ){
solrQuery.set("fl","fieldToBeFetched");
solrQuery.setParam("wt", "csv");
solrQuery.setStart(i);
solrQuery.setRows(rowsToBeFetched);
QueryRequest req=new QueryRequest(solrQuery);
NoOpResponseParser responseParser = new NoOpResponseParser();
responseParser.setWriterType("csv");
searcherServer.setParser(responseParser);
NamedList<Object> resp=searcherServer.request(req);
responseString = (String)resp.get("response"); //This is in CSV format
}
Pros:
Since I don't get the result at once, it was faster.
The output was csv.
Hitting solr multiple items isn't costly.
Cons:
The result is not unique, meaning there can be repeated data based on what you are fetching.
To get unique results, you can use facets.
Thanks!
I have a string function (and I am sure it is reversible, so no need to test this), could I call it in reverse to perform the opposite operation?
For example:
def sample(s):
return s[1:]+s[:1]
would put the first letter of a string on the end and return it.
'Output' would become 'utputO'.
When I want to get the opposite operation, could I use this same function?
'utputO' would return 'Output'.
Short answer: no.
Longer answer: I can think of 3, maybe 4 ways to approach what you want -- all of which depend on how are you allowed to change your functions (possibly restricting to a sub-set of Python or mini language), train them, or run them normally with the operands you are expecting to invert later.
So, method (1) - would probably not reach 100% determinism, and would require training with a lot of random examples for each function: use a machine learning approach. That is cool, because it is a hot topic, this would be almost a "machine learning hello world" to implement using one of the various frameworks existing for Python or even roll your own - just setup a neural network for string transformation, train it with a couple thousand (maybe just a few hundred) string transformations for each function you want to invert, and you should have the reverse function. I think this could be the best - at least the "least incorrect" approach - at least it will be the more generic one.
Method(2): Create a mini language for string transformation with reversible operands. Write your functions using this mini language. Introspect your functions and generate the reversed ones.
May look weird, but imagine a minimal stack language that could remove an item from a position in a string, and push it on the stack, pop an item to a position on the string, and maybe perform a couple more reversible primitives you might need (say upper/lower) -
OPSTACK = []
language = {
"push_op": (lambda s, pos: (OPSTACK.append(s[pos]), s[:pos] + s[pos + 1:])[1]),
"pop_op": (lambda s, pos: s[:pos] + OPSTACK.pop() + s[pos:]),
"push_end": (lambda s: (OPSTACK.append(s[-1]), s[:-1])[1]),
"pop_end": lambda s: s + OPSTACK.pop(),
"lower": lambda s: s.lower(),
"upper": lambda s: s.upper(),
# ...
}
# (or pip install extradict and use extradict.BijectiveDict to avoid having to write double entries)
reverse_mapping = {
"push_op": "pop_op",
"pop_op": "push_op",
"push_end": "pop_end",
"pop_end": "push_end",
"lower": "upper",
"upper": "lower"
}
def engine(text, function):
tokens = function.split()
while tokens:
operator = tokens.pop(0)
if operator.endswith("_op"):
operand = int(tokens.pop(0))
text = language[operator](text, operand)
else:
text = language[operator](text)
return text
def inverter(function):
inverted = []
tokens = function.split()
while tokens:
operator = tokens.pop(0)
inverted.insert(0, reverse_mapping[operator])
if operator.endswith("_op"):
operand = tokens.pop(0)
inverted.insert(1, operand)
return " ".join(inverted)
Example:
In [36]: sample = "push_op 0 pop_end"
In [37]: engine("Output", sample)
Out[37]: 'utputO'
In [38]: elpmas = inverter(sample)
In [39]: elpmas
Out[39]: 'push_end pop_op 0'
In [40]: engine("utputO", elpmas)
Out[40]: 'Output'
Method 3: If possible, it is easy to cache the input and output of each call, and just use that to operate in reverse - it could be done as a decorator in Python
from functools import wraps
def reverse_cache(func):
reverse_cache = {}
wraps(func)
def wrapper(input_text):
result = func(input_text)
reverse_cache[result] = input_text
return result
wrapper.reverse_cache = reverse_cache
return wrapper
Example:
In [3]: #reverse_cache
... def sample(s):
... return s[1:]+s[:1]
In [4]:
In [5]: sample("Output")
Out[5]: 'utputO'
In [6]: sample.reverse_cache["utputO"]
Out[6]: 'Output'
Method 4: If the string operations are limited to shuffling the string contents in a deterministic way, like in your example, (and maybe offsetting the character code values by a constant - but no other operations at all), it is possible to write a learner function without the use of neural-network programming: it would construct a string with one character of each (possibly with code-points in ascending order), pass it through the function, and note down the numeric order of the string that was output -
so, in your example, the reconstructed output order would be (1,2,3,4,5,0) - given that sequence, one just have to reorder the input for the inverted function according to those indexes - which is trivial in Python:
def order_map(func, length):
sample_text = "".join(chr(i) for i in range(32, 32 + length))
result = func(sample_text)
return [ord(char) - 32 for char in result]
def invert(func, text):
map_ = order_map(func, len(text))
reordered = sorted(zip(map_, text))
return "".join(item[1] for item in reordered)
Example:
In [47]: def sample(s):
....: return s[1:] + s[0]
....:
In [48]: sample("Output")
Out[48]: 'utputO'
In [49]: invert(sample, "uputO")
Out[49]: 'Ouput'
In [50]:
What is the most efficient way to compare two JSON-formatted objects data in node.js?
These objects do not contain "undefined" or functions and their propotype is Object.
I've heard there is a good support if JSON in node.js
Friends! I'm really sorry if I don't understand something, or explain question in wrong terms. All I wanna do is to compare the equality of two pieces of JSON-standardized data like this:
{"skip":0, "limit":7, "arr": [1682, 439, {"x":2, arr:[]}] }
{"skip":0, "limit":7, "arr": [1682, 450, "a", ["something"] }
I'm 100% sure there will be no functions, Date, null or undefined, etc. in these data. I want to say I don't want to compare JavaScript objects in the most general case (with complex prototypes, circular links and all this stuff). The prototype of these data will be Object. I'm also sure lots of skillful programmers have answered this question before me.
The main thing I'm missing is that I don't know how to explain my question correctly. Please feel free to edit my post.
My answer:
First way: Unefficient but reliable. You can modify a generic method like this so it does not waste time looking for functions and undefined. Please note that generic method iterates the objects three times (there are three for .. in loops inside)
Second way: Efficient but has one restriction. I've found JSON.stringify is extremely fast in node.js. The fastest solution that works is:
JSON.stringify(data1) == JSON.stringify(data2)
Very important note! As far as I know, neither JavaScript nor JSON don't matter the order of the object fields. However, if you compare strings made of objects, this will matter much. In my program the object fields are always created in the same order, so the code above works. I didn't search in the V8 documentation, but I think we can rely on the fields creation order. In other case, be aware of using this method.
In my concrete case the second way is 10 times more efficient then the first way.
I've done some benchmarking of the various techniques, with underscore coming out on top:
> node nodeCompare.js
deep comparison res: true took: 418 (underscore)
hash comparison res: true took: 933
assert compare res: true took: 2025
string compare res: true took: 1679
Here is the source code:
var _ = require('underscore');
var assert = require('assert');
var crypto = require('crypto');
var large = require('./large.json');
var large2 = JSON.parse(JSON.stringify(large));
var t1 = new Date();
var hash = crypto.createHash('md5').update(JSON.stringify(large)).digest('hex');
var t2 = new Date();
var res = _.isEqual(large, large2);
var t3 = new Date();
var res2 = (hash == crypto.createHash('md5').update(JSON.stringify(large2)).digest('hex'));
var t4 = new Date();
assert.deepEqual(large, large2);
var t5 = new Date();
var res4 = JSON.stringify(large) === JSON.stringify(large2);
var t6 = new Date();
console.log("deep comparison res: "+res+" took: "+ (t3.getTime() - t2.getTime()));
console.log("hash comparison res: "+res2+" took: "+ (t4.getTime() - t3.getTime()));
console.log("assert compare res: true took: "+ (t5.getTime() - t4.getTime()));
console.log("string compare res: "+res4+" took: "+ (t6.getTime() - t5.getTime()));
On the face of it, I believe this is what you're looking for:
Object Comparison in JavaScript
However, the more complete answer is found here:
How do you determine equality for two JavaScript objects?
These answers are from 2 and 3 years ago, respectively. It's always a good idea to search the site for your intended question before posting, or search Google more broadly -- "javascript compare two JSON objects" in Google returns a lot. The top 4 hits are all StackOverflow answers, at the moment.
You can use rename/unset/set diff for JSON objects - https://github.com/mirek/node-rus-diff
npm install rus-diff
Your example JSON objects:
a = {"skip":0, "limit":7, "arr": [1682, 439, {"x":2, arr:[]}] }
b = {"skip":0, "limit":7, "arr": [1682, 450, "a", ["something"]] }
var rusDiff = require('rus-diff').rusDiff
console.log(rusDiff(a, b))
...generate the following diff:
{ '$set': { 'arr.1': 450, 'arr.2': 'a', 'arr.3': [ 'something' ] } }
...which is MongoDB compatible diff format.
If documents are the same, rusDiff(a, b) returns false.