Read Dynamic JSON Property - json

So I'm currently using JSON.NET in Visual Studio to parse my JSON since using deserialization is too slow for what I'm trying to do. I'm pulling stock information from TD Ameritrade and can request multiple stocks at the same time. The JSON result below is from pulling only 1. As you can see, the first line is "TQQQ". If I were to pull more than one stock, I'd have "TQQQ", then "CEI" in separate blocks representing different objects.
Under normal deserialization, I could just say to deserialize a dictionary and it would put them into the dictionary accordingly with whatever class I had written for it to populate. However, since I need to parse line by line, is there a clean way of being able to tell when I've arrived to the next object?
I could say to keep track of the very last field and then add the next line (the next ticker's name) to dictionary, but that seems a little hacky.
I don't think any VB code is necessary besides the initial startup of creating a new JSONReader.
{
"TQQQ": {
"assetType": "ETF",
"symbol": "TQQQ",
"description": "ProShares UltraPro QQQ",
"bidPrice": 54.59,
"bidSize": 200,
"bidId": "Q",
"askPrice": 54.6,
"askSize": 8000,
"askId": "Q",
"lastPrice": 54.6,
"lastSize": 100,
"lastId": "P",
"openPrice": 51.09,
"highPrice": 54.6,
"lowPrice": 50.43,
"bidTick": " ",
"closePrice": 48.92,
"netChange": 5.68,
"totalVolume": 14996599,
"quoteTimeInLong": 1540493136946,
"tradeTimeInLong": 1540493136946,
"mark": 54.6,
"exchange": "q",
"exchangeName": "NASDAQ",
"marginable": true,
"shortable": true,
"volatility": 0.02960943,
"digits": 4,
"52WkHigh": 73.355,
"52WkLow": 38.6568,
"nAV": 0,
"peRatio": 0,
"divAmount": 0,
"divYield": 0,
"divDate": "2016-12-21 00:00:00.0",
"securityStatus": "Normal",
"regularMarketLastPrice": 54.6,
"regularMarketLastSize": 1,
"regularMarketNetChange": 5.68,
"regularMarketTradeTimeInLong": 1540493136946,
"delayed": true
}
}

Is there a clean way of being able to tell when I've arrived at the next object?
Yes, assuming you are using a JsonTextReader you can look at the TokenType property and check whether it is StartObject. This corresponds to the opening braces { in the JSON. There is also an EndObject token type corresponding to the closing braces }, which will probably also be useful depending on how your code is written.
Typical usage pattern is something like this:
If reader.TokenType == TokenType.StartObject Then
While reader.Read() AndAlso reader.TokenType <> JsonToken.EndObject
' process properties of the JSON object
End While
End If

Related

What JSON format does STRIP_OUTER_ARRAY support?

I have a file composed of a single array containing multiple records.
{
"Client": [
{
"ClientNo": 1,
"ClientName": "Alpha",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "12345"
},
{
"BusinessNo": 2,
"IndustryCode": "23456"
}
]
},
{
"ClientNo": 2,
"ClientName": "Bravo",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "34567"
},
{
"BusinessNo": 2,
"IndustryCode": "45678"
}
]
}
]
}
I load it with the following code:
create or replace stage stage.test
url='azure://xxx/xxx'
credentials=(azure_sas_token='xxx');
create table if not exists stage.client (json_data variant not null);
copy into stage.client_test
from #stage.test/client_test.json
file_format = (type = 'JSON' strip_outer_array = true);
Snowflake imports the entire file as one row.
I would like the the COPY INTO command to remove the outer array structure and load the records into separate table rows.
When I load larger files, I hit the size limit for variant and get the error Error parsing JSON: document is too large, max size 16777216 bytes.
If you can import the file into Snowflake, into a single row, then you can use LATERAL FLATTEN on the Clients field to generate one row per element in the array.
Here's a blog post on LATERAL and FLATTEN (or you could look them up in the snowflake docs):
https://support.snowflake.net/s/article/How-To-Lateral-Join-Tutorial
If the format of the file is, as specified, a single object with a single property that contains an array with 500 MB worth of elements in it, then perhaps importing it will still work -- if that works, then LATERAL FLATTEN is exactly what you want. But that form is not particularly great for data processing. You might want to use some text processing script to massage the data if that's needed.
RECOMMENDATION #1:
The problem with your JSON is that it doesn't have an outer array. It has a single outer object containing a property with an inner array.
If you can fix the JSON, that would be the best solution, and then STRIP_OUTER_ARRAY will work as you expected.
You could also try to recompose the JSON (an ugly business) after reading line for line with:
CREATE OR REPLACE TABLE X (CLIENT VARCHAR);
COPY INTO X FROM (SELECT $1 CLIENT FROM #My_Stage/Client.json);
User Response to Recommendation #1:
Thank you. So from what I gather, COPY with STRIP_OUTER_ARRAY can handle a file starting and ending with square brackets, and parse the file as if they were not there.
The real files don't have line breaks, so I can't read the file line by line. I will see if the source system can change the export.
RECOMMENDATION #2:
Also if you would like to see what the JSON parser does, you can experiment using this code, I have parsed JSON on the copy command using similar code. Working with your JSON data in small project can help you shape the Copy command to work as intended.
CREATE OR REPLACE TABLE SAMPLE_JSON
(ID INTEGER,
DATA VARIANT
);
INSERT INTO SAMPLE_JSON(ID,DATA)
SELECT
1,parse_json('{
"Client": [
{
"ClientNo": 1,
"ClientName": "Alpha",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "12345"
},
{
"BusinessNo": 2,
"IndustryCode": "23456"
}
]
},
{
"ClientNo": 2,
"ClientName": "Bravo",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "34567"
},
{
"BusinessNo": 2,
"IndustryCode": "45678"
}
]
}
]
}');
SELECT
C.value:ClientNo AS ClientNo
,C.value:ClientName::STRING AS ClientName
,ClientBusiness.value:BusinessNo::Integer AS BusinessNo
,ClientBusiness.value:IndustryCode::Integer AS IndustryCode
from SAMPLE_JSON f
,table(flatten( f.DATA,'Client' )) C
,table(flatten(c.value:ClientBusiness,'')) ClientBusiness;
User Response to Recommendation #2:
Thank you for the parse_json example!
Trouble is, the real files are sometimes 500 MB, so the parse_json function chokes.
Follow-up on Recommendation #2:
The JSON needs to be in the NDJSON http://ndjson.org/ format. Otherwise the JSON will be impossible to parse because of the potential for large files.
Hope the above helps other running into similar questions!

How can I use RegEx to extract data within a JSON document

I am no RegEx expert. I am trying to understand if can use RegEx to find a block of data from a JSON file.
My Scenario:
I am using an AWS RDS instance with enhanced monitoring. The monitoring data is being sent to a CloudWatch log stream. I am trying to use the data posted in CloudWatch to be visible in log management solution Loggly.
The ingestion is no problem, I can see the data in Loggly. However, the whole message is contained in one big blob field. The field content is a JSON document. I am trying to figure out if I can use RegEx to extract only certain parts of the JSON document.
Here is an sample extract from the JSON payload I am using:
{
"engine": "MySQL",
"instanceID": "rds-mysql-test",
"instanceResourceID": "db-XXXXXXXXXXXXXXXXXXXXXXXXX",
"timestamp": "2017-02-13T09:49:50Z",
"version": 1,
"uptime": "0:05:36",
"numVCPUs": 1,
"cpuUtilization": {
"guest": 0,
"irq": 0.02,
"system": 1.02,
"wait": 7.52,
"idle": 87.04,
"user": 1.91,
"total": 12.96,
"steal": 2.42,
"nice": 0.07
},
"loadAverageMinute": {
"fifteen": 0.12,
"five": 0.26,
"one": 0.27
},
"memory": {
"writeback": 0,
"hugePagesFree": 0,
"hugePagesRsvd": 0,
"hugePagesSurp": 0,
"cached": 505160,
"hugePagesSize": 2048,
"free": 2830972,
"hugePagesTotal": 0,
"inactive": 363904,
"pageTables": 3652,
"dirty": 64,
"mapped": 26572,
"active": 539432,
"total": 3842628,
"slab": 34020,
"buffers": 16512
},
My Question
My question is, can I use RegEx to extract, say a subset of the document? For example, CPU Utilization, or Memory etc.? If that is possible, how do I write the RegEx? If possible, I can use it to drill down into the extracted document to get individual data elements as well.
Many thanks for your help.
First I agree with Sebastian: A proper JSON parser is better.
Anyway sometimes the dirty approach must be used. If your text layout will not change, then a regexp is simple:
E.g. "total": (\d+\.\d+) gets the CPU usage and "total": (\d\d\d+) the total memory usage (match at least 3 digits not to match the first total text, memory will probably never be less than 100 :-).
If changes are to be expected make it a bit more stable: ["']total["']\s*:\s*(\d+\.\d+).
It may also be possible to match agains return chars like this: "cpuUtilization"\s*:\s*\{\s*\n.*\n\s*"irq"\s*:\s*(\d+\.\d+) making it a bit more stable (this time for irq value).
And so on and so on.
You see that you can get fast into very complex expressions. That approach is very fragile!
P.S. Depending of the exact details of the regex of loggy, details may change. Above examples are based on Perl.

Parsing JSON object in RUBY with a wildcard?

Problem:
I'm relatively new to programming and learning Ruby, I've worked with JSON before but have been stumped by this problem.
I'm taking a hash, running hash.to_json, and returning a json object that looks like this:
'quantity' =
{
"line_1": {
"row": "1",
"productNumber": "111",
"availableQuantity": "4"
},
"line_2": {
"row": "2",
"productNumber": "112",
"availableQuantity": "6"
},
"line_3": {
"row": "3",
"productNumber": "113",
"availableQuantity": "10"
}
I want to find the 'availableQuantity' value that's greater than 5 and return the line number.
Further, I'd like to return the line number and the product number.
What I've tried
I've been searching on using a wildcard in a JSON query to get over the "line_" value for each entry, but with no luck.
to simply identify a value for 'availableQuantity' within the JSON object greater than 5:
q = JSON.parse(quantity)
q.find {|key| key["availableQuantity"] > 5}
However this returns the error: "{TypeError}no implicit conversion of String into Integer."
I've googled this error but I can not understand what it means in the context of this problem.
or even
q.find {|key, value| value > 2}
which returns the error: "undefined method `>' for {"row"=>"1", "productNumber"=>111, "availableQuantity"=>4}:Hash"
This attempt looks so simplistic I'm ashamed, but it reveals a fundamental gap in my understanding of how to work with looping over stuff using enumerable.
Can anyone help explain a solution, and ideally what the steps in the solution mean? For example, does the solution require use of an enumerable with find? Or does Ruby handle a direct query to the json?
This would help my learning considerably.
I want to find the 'availableQuantity' value that's greater than 5 and [...] return the line number and the product number.
First problem: your value is not a number, so you can't compare it to 5. You need to_i to convert.
Second problem: getting the line number is easiest with regular expressions. /\d+/ is "any consecutive digits". Combining that...
q.select { |key, value|
value['availableQuantity'].to_i > 5
}.map { |key, value|
[key[/\d+/].to_i, value['productNumber'].to_i]
}
# => [[2, 112], [3, 113]]

Check if JSON string in Orbeon repeating grid contains a specific value

Working with the repeating grids through the form builder.
I have a custom control that has a string value represented in json.
{
"data": {
"type": "File",
"itemID": "12345",
"name": "Annual Summary",
"parentFolderID": "fileID",
"owner": "Owner",
"lastModifiedDate": "2016-10-17 22:48:05Z"
}
}
In the controls outside of the repeating grid, i need to check if name = "Annual Summary"
Previously, i had a drop down control and using Calculated Value $dropdownControl = "Annual Summary" it was able to return true if any of the repeated rows contained the value. My understanding is that using the = operator, it will validate against all rows.
Now with the json output of the control, I am attempting to use
contains($jsonStringValue, 'Annual Summary')
However, this only works with one entry and will be null if there are multiple rows.
2 questions:
How would validate whether "Annual Summary" (or any other text) is present within any of the repeated rows?
Is there any way to navigate the json or parse it to XML and navigate it?
Constraint:
within the Calculated Value or Visibility fields within form builder
manipulating the source that is generated by the form builder
You probably want to parse the JSON string first. See also this other Stackoverflow question.
Until Orbeon Forms 2016.3 is released, you would write:
(
for $v in $jsonStringValue
return converter:jsonStringToXml($v)
)//name = 'Annual Summary'
With the above, you also need to scope the namespace:
xmlns:converter="org.orbeon.oxf.json.Converter"
Once Orbeon Forms 2016.3 is released you can switch to:
$jsonStringValue/xxf:json-to-xml()//name = 'Annual Summary'

How to add nested json object to Lucene Index

I need a little help regarding lucene index files, thought, maybe some of you guys can help me out.
I have json like this:
[
{
"Id": 4476,
"UrlName": null,
"PhoneData": [
{
"PhoneType": "O",
"PhoneNumber": "0065898",
},
{
"PhoneType": "F",
"PhoneNumber": "0065898",
}
],
"Contact": [],
"Services": [
{
"ServiceId": 10,
"ServiceGroup": 2
},
{
"ServiceId": 20,
"ServiceGroup": 1
}
],
}
]
Adding first two fields is relatively easy:
// add lucene fields mapped to db fields
doc.Add(new Field("Id", sampleData.Id.Value.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("UrlName", sampleData.UrlName.Value ?? "null" , Field.Store.YES, Field.Index.ANALYZED));
But how I can add PhoneData and Services to index so it can be connected to unique Id??
For indexing JSON objects I would go this way:
Store the whole value under a payload field, named for example $json. This field would be stored but not indexed.
For each (indexable) property (maybe nested) create an indexable field with its name as a XMLPath-like expression identifying the property, for example PhoneData.PhoneType
If is ok that all nested properties will be indexed then it's simple, just iterate over all of them generating this indexable field.
But if you don't want to index all of them (a more realistic case), how to know which property is indexable is another problem; in this case you could:
Accept from the client the path expressions of the index fields to be created when storing the document, or
Put JSON Schema into play to describe your data (assuming your JSON records have a common schema), and extend it with a custom property that would allow you to tag which properties are indexable.
I have created a library doing this (and much more) that maybe can help you.
You can check it at https://github.com/brutusin/flea-db