Efficient parsing of first four elements of large JSON arrays

Efficient parsing of first four elements of large JSON arrays - json

I am using Jackson to parse JSON from a json inputStream which looks like following:
[
[ 36,
100,
"The 3n + 1 problem",
56717,
0,
1000000000,
0,
6316,
0,
0,
88834,
0,
45930,
0,
46527,
5209,
200860,
3597,
149256,
3000,
1
],
[
........
],
[
........
],
.....// and almost 5000 arrays like above
]
This is the original feed link: http://uhunt.felix-halim.net/api/p
I want to parse it and keep only the first 4 elements of every array and skip other 18 elements.
36
100
The 3n + 1 problem
56717
Code structure I have tried so far:
while (jsonParser.nextToken() != JsonToken.END_ARRAY) {
jsonParser.nextToken(); // '['
while (jsonParser.nextToken() != JsonToken.END_ARRAY) {
// I tried many approaches here but not found appropriate one
}
}
As this feed is pretty big, I need to do this efficiently with less overhead and memory.
Also there are three models to procress JSON: Streaming API, Data Binding and Tree Model. Which one is appropriate for my purpose?
How can I parse this json efficiently with Jackson? How can I skip those 18 elements and jump to next array for better performance?
Edit: (Solution)
Jackson and GSon both works in almost in the same mechanism (incremental mode, since content is read and written incrementally), I am switching to GSON as it has a function skipValue() (pretty appropriate with name). Although Jackson's nextToken() will work like skipValue(), GSON seems more flexible to me. Thanks #Kowser bro for his recommendation, I came to know about GSON before but somehow ignored it. This is my working code:
reader.beginArray();
while (reader.hasNext()) {
reader.beginArray();
int a = reader.nextInt();
int b = reader.nextInt();
String c = reader.nextString();
int d = reader.nextInt();
System.out.println(a + " " + b + " " + c + " " + d);
while (reader.hasNext())
reader.skipValue();
reader.endArray();
}
reader.endArray();
reader.close();

This is for Jackson
Follow this tutorial.
Judicious use of jasonParser.nextToken() should help you.
while (jasonParser.nextToken() != JsonToken.END_ARRAY) { // might be JsonToken.START_ARRAY?
The pseudo-code is
find next array
read values
skip other values
skip next end token
This is for gson.
Take a look at this tutorial. Consider following second example from the tutorial.
Judicious use of reader.begin* reader.end* and reader.skipValue should do the job for you.
And here is the documentation for JsonReader

Related

How to parse json and replace a value in a nested json?

I have the below json:
{
"status":"success",
"data":{
"_id":"ABCD",
"CNTL":{"XMN Version":"R3.1.0"},
"OMN":{"dree":["ANY"]},
"os0":{
"Enable":true,"Service Reference":"","Name":"",
"TD ex":["a0.c985.c0"],
"pn ex":["s0.c100.c0"],"i ex":{},"US Denta Treatment":"copy","US Denta Value":0,"DP":{"Remote ID":"","cir ID":"","Sub Options":"","etp Number":54469},"pe":{"Remote ID":"","cir ID":""},"rd":{"can Identifier":"","can pt ID":"","uno":"Default"},"Filter":{"pv":"pass","pv6":"pass","ep":"pass","pe":"pass"},"sc":"Max","dc":"","st Limit":2046,"dm":false},
"os1":{
"Enable":false,"Service Reference":"","Name":"",
"TD ex":[],
"pn ex":[],"i ex":{},"US Denta Treatment":"copy","US Denta Value":0,"DP":{"Remote ID":"","cir ID":"","Sub Options":"","etp Number":54469},"pe":{"Remote ID":"","cir ID":""},"rd":{"can Identifier":"","can pt ID":"","uno":"Default"},"Filter":{"pv":"pass","pv6":"pass","ep":"pass","pe":"pass"},"sc":"Max","dc":"","st Limit":2046,"dm":false},
"ONM":{
"ONM-ALARM-XMN":"Default","Auto Boot Mode":false,"XMN Change Count":0,"CVID":0,"FW Bank Files":[],"FW Bank":[],"FW Bank Ptr":65535,"pn Max Frame Size":2000,"Realtime Stats":false,"Reset Count":0,"SRV-XMN":"Unmodified","Service Config Once":false,"Service Config pts":[],"Skip ot":false,"Name":"","Location":"","dree":"","Picture":"","Tag":"","PHY Delay":0,"Labels":[],"ex":"From OMN","st Age":60,"Laser TX Disable Time":0,"Laser TX Disable Count":0,"Clear st Count":0,"MIB Reset Count":0,"Expected ID":"ANY","Create Date":"2023-02-15 22:41:14.422681"},
"SRV-XMN Values":{},
"nc":{"Name":"ABCD"},
"Alarm History":{
"Alarm IDs":[],"Ack Count":0,"Ack Operator":"","Purge Count":0},"h FW Upgrade":{"wsize":64,"Backoff Divisor":2,"Backoff Delay":5,"Max Retries":4,"End Download Timeout":0},"Epn FW Upgrade":{"Final Ack Timeout":60},
"UNI-x 1":{"Max Frame Size":2000,"Duplex":"Auto","Speed":"Auto","lb":false,"Enable":true,"bd Rate Limit":200000,"st Limit":100,"lb Type":"PHY","Clear st Count":0,"ex":"Off","pc":false},
"UNI-x 2":{"Max Frame Size":2000,"Duplex":"Auto","Speed":"Auto","lb":false,"Enable":true,"bd Rate Limit":200000,"st Limit":100,"lb Type":"PHY","Clear st Count":0,"ex":"Off","pc":false},
"UNI-POTS 1":{"Enable":true},"UNI-POTS 2":{"Enable":true}}
}
All I am trying to do is to replace only 1 small value in this super-complicated json. I am trying to replace the value of os0 tags's TD ex's value from ["a0.c985.c0"] to ["a0.c995.c0"].
Is freemarker the best way to do this? I need to change only 1 value. Can this be done through regex or should I use gson?
I can replace the value like this:
JsonObject jsonObject = new JsonParser().parse(inputJson).getAsJsonObject();
JsonElement jsonElement = jsonObject.get("data").getAsJsonObject().get("os0").getAsJsonObject().get("TD ex");
String str = jsonElement.getAsString();
System.out.println(str);
String[] strs = str.split("\\.");
String replaced = strs[0] + "." + strs[1].replaceAll("\\d+", "201") + "." + strs[2];
System.out.println(replaced);
How to put it back and create the json?

FreeMarker is a template engine, so it's not the tool for this. Load JSON with some real JSON parser library (like Jackson, or GSon) to a node tree, change the value in that, and then use the same JSON library to generate JSON from the node tree. Also, always avoid doing anything in JSON with regular expressions, as JSON (like most pracitcal languages) can describe the same value in many ways, and so writing truly correct regular expression is totally unpractical.

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U57') dtype('<U57') dtype('<U57')

I am using great-expectation for pipeline testing.
I have One Dataframe batch of type :-
great_expectations.dataset.pandas_dataset.PandasDataset
I want to build dynamic validation expression.
i.e
batch.("columnname","value") in which
validationtype columname and value coming from json file .
JSON structure:-
{
"column_name": "sex",
"validation_type": "expect_column_values_to_be_in_set",
"validation_value": ["MALE","FEMALE"]
},
when i am building this expression getting error message described below .
Code:-
def add_validation(self,batch,validation_list):
for d in validation_list:
expression = "." + d["validation_type"] + "(" + d["column_name"] + "," +
str(d["validation_value"]) + ")"
print(expression)
batch+expression
batch.save_expectation_suite(discard_failed_expectations=False)
return batch
Output:-
print statement output
.expect_column_values_to_be_in_set(sex,['MALE','FEMALE'])
Error:-
TypeError: ufunc 'add' did not contain a loop with signature matching
types dtype('

In great_expectations, the expectation_suite object is designed to capture all of the information necessary to evaluate an expectation. So, in your case, the most natural thing to do would be to translate the source json file you have into the great_expectations expectation suite format.
The best way to do that will depend on where you're getting the original JSON structure from -- you'd ideally want to do the translation as early as possible (maybe even before creating that source JSON?) and keep the expectations in the GE format.
For example, if all of the expectations you have are of the type expect_column_values_to_be_in_set, you could do a direct translation:
expectations = []
for d in validation_list:
expectation_config = {
"expectation_type": d["validation_type"],
"kwargs": {
"column": d["column_name"],
"value_set": d["validation_value"]
}
}
expectation_suite = {
"expectation_suite_name": "my_suite",
"expectations": expectations
}
On the other hand, if you are working with a variety of different expectations, you would also need to make sure that the validation_value in your JSON gets mapped to the right kwargs for the expectation (for example, if you expect_column_values_to_be_between then you actually need to provide min_value and/or max_value).

CFML/JS Creating nested JSON/Array from plain SQL

I would like to build a tree structre from a plain json array.
The regular depth is approx. 6/7 (max 10) and has about 5,000 records.
My input json looks like this
[3,"01","GruppenAnfangHook",1,0,1,0,"Installationsmaterial",1.0,"",null,null,0.0,-1.0,null,803.0300,803.0300,0.00000,1,1]
[5,"01.001","JumboAnfangHook",3,0,3,0,"MBS Wandler 1.000",6.0,"St",null,null,0.0,-6.0,0.0000,336.7800,56.1300,0.00000,2,2],
[38,"","ArtikelHook",3,5,3,0,"ASK 61.4 1000/5A 5VA Kl.1 Preis lt. Hr. K am 16.05.17",6.0,"stk",6.0,6.0,0.0,-6.0,null,21.5000,21.5000,0.00000,3,3]
But I need it structured with childrens like that
{"0":34,1":"02.003",2":"JumboBegin","3":26,"4":0, "5":26,"6":0, "children":[
{ "0":36,"1":"", "2":"Article","3":26,"4":34,"5":26,6:"0", 7: "Artikel"},
{ "0":35,"1":"", "2":"JumboEnd",3":26,"4":34, "5":26, 6:"0",7:"Stunde"}
]}
My best approach so far was to build the child-structure with the following JS function in the frontend
function nest(data, parentId = 0) {
return data.reduce((r, e) => {
let obj = Object.assign({}, e)
if (parentId == e[4]) {
let children = nest(data, e[0])
if (children.length) obj.children = children
r.push(obj)
}
return r;
}, [])}
It works well and fast (< 1s) with a small (<500) amount of records but my browser begins to freeze at 2,000 and above.
My thought was it is too much data and so I tried to solve it in the CFML backend.
Due to I'm new with recursion, Ben Nadels Blog helped me alot, so I used his post about recursion and created a working example with sample data.
q = queryNew("id,grpCol,jumCol,leiCol,name,typ,order");
The grpCol is level 0, up to 5 groups can be placed in each other, in those groups can be placed two kinds of containers (jumCol and leiCol), they can be placed in each other to, but not in themselfs.
But now I am failing to convert it to a array of structures with child members. The structure of the HTML tree generated as output in the example is exactly what I want for my frontend JSON.
Because of the recursion I don't get, how to store it in an array outside of the function.
My goal is a final return as serzializeJson(array).

Node-red SQL output object / Array conversion

I'm doing a SQL query in Node-Red to output a load of time/value data. This data is then passed to a web page for display in a graph.
Previously I've used php to do the SQL query, which I'm trying to replace. However SQL queries in php are delivered in a different format.
With Node-Red, I get:
[
{
"Watts": 1018,
"Time": 1453825454
},
{
"Watts": 1018,
"Time": 1453825448
},
{
"Watts": 1010,
"Time": 1453825442
}]
With PHP, I get:
[
[1453819620000,962],
[1453819614000,950],
[1453819608000,967],
[1453819602000,947]
]
I think I'm getting an array from php and an array of JSON objects from Node-Red. How do I convert the Node-Red object to be output from Node-Red in the same format as the PHP is? (Ie: I want to handle the processing at the server, rather than the client.)

A function node can be used to generate something in the same format.
var array = msg.payload;
var phpFormat = "[";
for (var i=0; i<array.length; i++) {
phpFormat += "[" +
// time format differ, NodeJS is in seconds
// php is in milliseconds
(array[i].Time * 1000 ) +
"," +
array[i].Watts + "],";
}
//take the last "," off
phpFormat = phpFormat.substring(0,phpFormat.lenght - 1);
phpFormat += "]";
msg.payload = phpFormat;
return msg;

I've had a bit of help from a chap at work and here is what he's come up with, modified for node-red by me:
var outputArray = [];
for(var i in msg.payload){
var entryData = [msg.payload[i]['Time']];
for(var attr in msg.payload[i]) {
if(attr!='Time') {
entryData.push(msg.payload[i][attr])}
};
outputArray.push(entryData); }
var returnMsg={"payload":outputArray};
return returnMsg;

I know, I know, this question is over 2 years old... however, for the next 500 people seeking an answer to a similar problem, I'd like to highlight the new JSONata expression feature built-in to the change node. Using this simple expression:
payload.[Time, Watts]
transforms your JS objects into the requested output of an array of arrays. In fact, much of my old repetitive looping through arrays has been replaced with some simpler (to me) expressions like this.
The magic of the lambda syntax evaluator is documented on the JSONata site. There you will also find the online exerciser where you can build an expression against your own data and immediately see the resulting structure.
Note: in order to use a jsonata expression in your change node, be sure to select the J: pulldown next to the input field (not the {} JSON option)... two totally different things!

How do I ensure SerializeJSON keeps trailing/leading zeroes?

EDIT 3 Problem below exists for Coldfusion 9.0, updating to 9.0.1 does indeed fix this
I have an application that is using SerializeJSON to encode query results:
#SerializeJSON('Ok works fine')#
Unfortunately it trims the trailing zeroes from numbers:
#SerializeJSON(12345.50)#
manually if i was to make the same value a string, same thing occurs
#SerializeJSON('12345.50')#
How can I prevent this from happening?
EDIT - my scenario specifics
Database (Oracle) has these example values stored on a row
benefactor_id : 0000729789 varchar2(10)
life_gift_credit_amt : 12345.50 number(14,2)
When I query using Coldfusion 9.0.1 (cfscript if it matters) , here is an RC dump, notice the id string retains leading zeroes, but the number column has removed trailing zero.
While that is interesting, it doesnt matter to the Original issue as i can create a query manually to retain that trailing zero like below, it still gets lost in the serializeJSON
I take the query results, and encode the values using serializeJSON. The JSON is consumed by jquery Datatables ajax. Notice the id string has become a number, and has added the '.0' as Miguel-F mentioned
<cfscript>
...
rc.sql = q.setsql;
rc.qResult = q.execute().getresult();
savecontent variable="rc.aaData" {
for (i=1; i <= rc.qResult.RecordCount; i++) {
writeOutput('{');
for (col=1; col <= iColumnsLen; col++) {
// the following line contains a conditional specific to this example
writeOutput('"#aColumns[col]#":#SerializeJSON(rc.qResult[aColumns[col]][i])#');
//former statement, discarded due to not being able to handle apostrophe's ... writeOutput('"#jsStringFormat(rc.qResult[aColumns[col]][i])#"');
writeOutput((col NEQ iColumnsLen) ? ',' : '');
}
writeOutput('}');
writeOutput((i NEQ rc.qResult.RecordCount) ? ',' : '');
}
};
</cfscript>
I was oringially using jsStringFormat instead of serializeJSON, but this would return invalid JSON due to the comments text area containing apostrophe's ect
{
"sEcho": 1,
"iTotalRecords": 65970,
"iTotalDisplayRecords": 7657,
"aaData": [
{
"nd_event_id": 525,
"benefactor_id": 729789.0,
"seq_number": 182163,
"life_gift_credit_amt": 12345.5,
"qty_requested": 2,
"b_a_comment": "#swap",
"pref_mail_name": "Jay P. Rizzi"
}
]
}
EDIT 2
a quick sidenote, if i change my serialization line to
writeOutput('"#aColumns[col]#": "#SerializeJSON(rc.qResult[aColumns[col]][i])#"');
then my result set changes to placing records in double quoting , but also double double quotes strings, while still removing the trailing zero; It leads me to believe serializeJSON is casting the value as a type?
"aaData": [
{
"nd_event_id": "525",
"benefactor_id": "729789.0",
"seq_number": "182163",
"life_gift_credit_amt": "12345.5",
"qty_requested": "2",
"b_a_comment": ""#swap"",
"pref_mail_name": ""JayP.Rizzi""
},

This is a bit baffling... I tested in CF 9 as well. Not really knowing what you are doing with the serialized data (passing as a service, outputting on a page, etc.), I put together some test patterns. One possible solution is if only trying to serialize a sing value - don't. You can actually run deserialize against your numeric value without serializing, and all it does is strip the trailing 0. Otherwise, if you must serialize a single value and don't want the trailing 0 stripped, set the variable to contain the quotation marks
<cfset manualserial = '"111.10"'>
<cfdump var="#DeSerializeJson(manualserial)#">
At this point you can us Deserialize and see that it maintains the 0, with output of 111.10
Below is some additional testing, so you can see what happens when serializing an array while trying to keep the trailing 0... no luck. However when I forwent the built in CF serialize and just created a serialized string, the trailing 0 is maintained (refer to var customarr and d_customarr in WriteDump example below).
Hope that helps a little.
<cfscript>
/*initial testing*/
string = SerializeJSON('Ok works fine');
numericstring = SerializeJSON('12345.50');
numeric = SerializeJSON(12345.50);
arr = SerializeJSON([12345.50,12345.10,'12345.20']);
arrFormat = SerializeJSON([NumberFormat(12345.50,'.00') & ' ',12345.10,'12345.20']);
d_string = DeSerializeJSON(string);
d_numericstring = DeSerializeJSON(numericstring);
d_numeric = DeSerializeJSON(numeric);
d_arr = DeSerializeJSON(arr);
d_arrFormat = DeSerializeJSON(arrFormat);
/*technically, there is no need to serialize a single string value, as running through DeSerialize just trims the trailing 0
if you need to do so, you would want to pass in as a string with quotation marks*/
customstring = '"12345.50"';
d_customstring = DeSerializeJSON(customstring);
customarr = '["12345.50","12345.10","12345.20"]'; //--you can format your own array instead of using CF to serialize
d_customarr = DeSerializeJSON(customarr);
WriteDump(variables);
</cfscript>
=======appended possible solution b========
I think that manually serializing your records may be the most stable option, try this example, and if it works you should be able to add the function to a cfc or create a udf for re-use. Hope it helps.
<cfscript>
q = QueryNew('nd_event_id,benefactor_id,seq_number,life_gift_credit_amt,qty_requested,b_a_comment,pref_mail_name',
'Integer,VarChar,Integer,Decimal,Integer,VarChar,VarChar');
r = queryaddrow(q,2);
querysetcell(q, 'nd_event_id', 525, 1);
querysetcell(q, 'benefactor_id', 0000729789, 1);
querysetcell(q, 'seq_number', 182163, 1);
querysetcell(q, 'life_gift_credit_amt', 12345.50, 1);
querysetcell(q, 'qty_requested', 2, 1);
querysetcell(q, 'b_a_comment', '##swap', 1);
querysetcell(q, 'pref_mail_name', 'Jay P. Rizzi', 1);
querysetcell(q, 'nd_event_id', 525, 2);
querysetcell(q, 'benefactor_id', 0000729790, 2);
querysetcell(q, 'seq_number', 182164, 2);
querysetcell(q, 'life_gift_credit_amt', 12345.90, 2);
querysetcell(q, 'qty_requested', 10, 2);
querysetcell(q, 'b_a_comment', '##swap', 2);
querysetcell(q, 'pref_mail_name', 'Jay P. Rizzi', 2);
WriteDump(q);
s = membershipManualSerializer(q);
public string function membershipManualSerializer(required query q){
var jsonString = '{"aaData":[';
var cols = listtoarray(q.columnList,',');
for(var i=1; i lte q.recordcount; i++){
jsonString &= "{";
for(var c=1;c lte arraylen(cols);c++){
jsonString &= '"' & cols[c] & '":"' & q[cols[c]][i] & '"';
jsonString &= (c lt arraylen(cols))? ",":"";
}
jsonString &= (i lt q.recordcount)? "},":"}]";
}
jsonString &="}";
return jsonString;
}
WriteOutput(s);
WriteDump(DeserializeJson(s));
</cfscript>

Taken from the comments
The original poster (OP) of this question initially reported that they were having this issue with ColdFusion 9.0.1. As it turned out they were actually running ColdFusion 9.0.0. This is significant because Adobe had made changes to how the SerializeJSON() function treats numbers in version 9.0.1. When the server was upgraded to version 9.0.1 these issues were resolved.
This blog post by Raymond Camden discusses the changes made in 9.0.1 - Not happy with the CF901 JSON Changes?
In that blog post he references bug 83638 that had been entered and then fixed in HotFix 1 for version 9.0.1 - Cumulative Hotfix 1 (CHF1) for ColdFusion 9.0.1
If you search the BugBase for JSON under version 9.0.1 there are several reporting the same issue as the OP.
Those reported bugs also mentioned another issue that the OP had not initially reported, that a .0 was being appended to integers as well. Later in the discussion the OP confirmed that they too were seeing this behavior. This lead them to verify the ColdFusion version being utilized and found that it was not 9.0.1.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Efficient parsing of first four elements of large JSON arrays - json

Related

How to parse json and replace a value in a nested json?

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U57') dtype('<U57') dtype('<U57')

CFML/JS Creating nested JSON/Array from plain SQL

Node-red SQL output object / Array conversion

How do I ensure SerializeJSON keeps trailing/leading zeroes?

Categories

Resources