I need parse JSON to Greenplum - json

How to parse JSON
{
"35673": [
"234",
"357",
"123"
],
"34566": [
"333",
"456",
"789"
]
}
to Greenplum table, format 1 key 1 value?

In your basic example, there is no json nested object, no array of json object, and all the object values are arrays of string. So within these assumptions, you can use the json_path_query function with the jsonpath operator keyvalue() so that to split your json object into a set of key/value and then use the json_array_elements function so that to split the resulting values which are systematically 1-Dim arrays :
SELECT j->>'key' AS Key, json_array_elements(j->'value')->>0 AS Value
FROM json_path_query (your_json_object_here, '$.keyvalue()') AS j
The output for your example is :
Key Value
34566 333
34566 456
34566 789
35673 234
35673 357
35673 123
All these functions can apply to the jsonb type which is preferred to the json type, see the manual :
"The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The
json data type stores an exact copy of the input text, which
processing functions must reparse on each execution; while jsonb data
is stored in a decomposed binary format that makes it slightly slower
to input due to added conversion overhead, but significantly faster to
process, since no reparsing is needed. jsonb also supports indexing,
which can be a significant advantage."
If your json structure is more complex with nested objects or arrays of objects, you will have to adapt the function in the FROM clause.
If your object values are not systematically arrays of string, you will have to adapt the function applied to the resulting json value field.

Related

Azure Data Factory - Azure Dataflow, json being converted/serialized into a different format string

For context I have Azure Dataflow reading from CosmosDB where the table has two "schemas". By this I mean one format has a field called "data" that is a JSON representation of data I need. The other format also has a field called "data" but the field is just a compressed string and I believe this is what causes the issue I'm having.
When dataflow reads the field from the source, the JSON gets turned into a non-json format through some form of serialization because when the projection gets imported, the data field is being read as a string and not a complex object. (Example below) I suspect this is because the two "data" fields have the same name.
I unfortunately cannot change how the data is being stored in CosmosDB so I cannot change one of the field names to another value.
Is there any way to prevent this, I would like to keep it in JSON format with quotes and colons etc, instead of what I have below.
Example of data in CosmosDB:
{
"Name": "sample",
"ValueInfo": [
{
"Field1": "foo",
"Field2": "bar"
}
]
}
How it looks in ADF
{
Name=sample,
ValueInfo=[
{
Field1=foo,
Field2=bar
}
]
}

Using JSON: A way to apply a number and a string on same key-value pair?

I need to define one nested JSON object which stores in use case 1 a key value with an integer (amount of something) and in use case 2 a key value with a string (UUID).
The goal is to analyse the data in later procedures.
I know decided to put the number into quotes and get away with implicit conversion. Which has been described here:
Can JSON numbers be quoted?
Example-1:
"kpiValue": {
"type": "Driver",
"value": 16 // => amount of something
}
Example-2
"kpiValue": {
"type": "Driver",
"value": "ident" // => UUID is handled as string
},
Any thoughts on this?
A quoted number is a string.
You can safely do this.
NB: Don't forget to determine and type your variable property afterward.

Get multiple JSON keypair values within a JSON object only if a specific keypair value matches iterable

I am trying to pull multiple json keyvalue data within any given JSON object, from a JSON file produced via an API call, for only the JSON objects that contain a specific value within a keyvalue pair; this specific value within a keyvalue pair is obtained via iterating through a list. Hard to articulate, so let me show you:
Here is a list (fake list representing my real list) I have containing the "specific value within a keyvalue pair" I'm interested in:
mylist = ['device1', 'device2', 'device3']
And here is the json file (significantly abridged, obviously) that I am reading from my python script:
[
{
"name": "device12345",
"type": "other",
"os_name": "Microsoft Windows 10 Enterprise",
"os_version": null
},
{
"name": "device2",
"type": "other",
"os_name": "Linux",
"os_version": null
},
{
"name": "device67890",
"type": "virtual",
"os_name": "Microsoft Windows Server 2012 R2 Standard",
"os_version": null
}
]
I want to iterate through mylist, and for each item in mylist that matches the json keyvalue key "name" in the json file, print the values for 'name' and 'os_name'. In this case, I should see a result where after iterating through mylist, device2 matches the 'name' key in the json object that contains {'name':'device2'}, and then prints BOTH the 'name' value ('device2') and the 'os_name' value ('Linux'), and then continues iterating through mylist for the next value to iterate through the json file.
My code that does not work; let us assume the json file was opened correctly in python and is defined as my_json_file with type List:
for i in mylist:
for key in my_json_file:
if key == i:
deviceName = i.get('name')
deviceOS = i.get('os_name')
print(deviceName)
print(deviceOS)
I think the problem here is that I can't figure out how to "identify" json objects that "match" items from mylist, since the json objects are "flat" (i.e. 'device2' in my json file doesn't have 'name' and 'os_name' nested under it, which would allow me to dict.get('device2') and then retrieve nested data).
Sorry if I did not understand your question clearly, but it appears you're trying to read values off of the list instead of the JSON file since you're looking at i.get instead of key.get when key is what actually contains the information.
And for the optimization issue, I'd recommend converting your list of devices into a set. You can then iterate through the JSON array and check if a given name is in the set instead of doing it the other way. The advantage is that sets can return if they contain an item in O(1) time, meaning it will significantly speed up the overall speed of the program to O(n) where n = size of json array.
for key in my_json_file:
if key.get('name') in my_list:
deviceName = key.get('name')
deviceOS = key.get('os_name')
print(deviceName)
print(deviceOS)

Are Json values type specific

I'm trying to understand what is going on when I query data from a database and provide it to JSON and then push it to a GUI, in regards to what and when is converting to a string.
If I make a call to a database and provide this data as JSON to the front end, my JSON may look like
MyData{ [["Name": "Anna", "Age": 50],["Name": "Bob", "Age": 40 ]};
Visually it appears as if the value for Name is a string (as it's in quote marks) and the value for Age is an integer (as of no quote marks).
My 2 questions are
Is my understanding that the value for Name is a string and the value of Age is an integer or does JavaScript convert/cast behind the scenes?
How would I specify the type DateTime. According to The "right" JSON date format I actually just ensure the format is correct but ultimately it's a string.
Is my understanding that the value for Name is a string and the value of Age is an integer...?
Almost. It's a number, which may or may not be an integer. For instance, 10.5 is a number, but not an integer.
...or does JavaScript convert/cast behind the scenes?
JavaScript and JSON are different things. JSON defines that the number will be represented by a specific format of digits, . and possibly the e or E character. Whatever environment you're parsing the JSON in will map that number to an appropriate data type in its environment. In JavaScript's case, it's an IEEE-754 double-precision binary floating point number. So in that sense, JavaScript may cast/convert the number, if the JSON defines a number that can't be accurately represented in IEEE-754, such as 9007199254740999 (which becomes the number 9007199254741000 because IEEE-754 only has ~15 digits of decimal precision).
How would I specify the type DateTime. According to The "right" JSON date format I actually just ensure the format is correct but ultimately it's a string.
JSON has a fixed number of types: object, array, string, number, boolean, null. You can't add your own. So what you do is encode any type meta-data you want into either the key or value, or use an object with separate properties for type and value, and enforce that agreement at both ends.
An example of doing this is the common format for encoding date/time values: "/Date(1465198261547)/". As far as the JSON parser is concerned, that's just a string, but many projects use it as an indicator that it's actually a date, with the number in the parens being the number of milliseconds since The Epoch (Jan 1 1970 at midnight, GMT).
Two concepts related to this are a replacer and a reviver: When converting a native structure to JSON, the JSON serializer you're using may support a replacer which lets you replace a value during the serialization with another value. The JSON parser you're using may support a reviver that lets you handle the process in the other direction.
For example, if you were writing JavaScript code and wanted to preserve Dates across a JSON layer, you might do this when serializing to JSON:
var obj = {
foo: "bar",
answer: 42,
date: new Date()
};
var json = JSON.stringify(obj, function(key, value) {
var rawValue = this[key];
if (rawValue instanceof Date) {
// It's a date, convert it to our special format
return "/Date(" + rawValue.getTime() + ")/";
}
return value;
});
console.log(json);
That uses the "replacer" parameter of JSON.stringify to encode dates in the way I described earlier.
Then you might do this when parsing that JSON:
var json = '{"foo":"bar","answer":42,"date":"/Date(1465199095286)/"}';
var rexIsDate = /^\/Date\((-?\d+)\)\/$/;
var obj = JSON.parse(json, function(key, value) {
var match;
if (typeof value == "string" && (match = rexIsDate.exec(value)) != null) {
// It's a date, create a Date instance using the number
return new Date(+match[1]);
}
return value;
});
console.log(obj);
That uses a "reviver" to detect keys in the special format and convert them back into Date instances.
The specific format of a JSON string can be described by JSON Schema, as described here; in particular, date-time apparently is a type already defined in JSON Schema.

JSON decoding in PERL - Maintaining the original data type

I am writing a simple perl script to read JSON from a file and insert into MongoDB. But I am facing issues with json decoding.
All non-string values in my original json are getting converted to object type after decode_json.
Input JSON(only part of it since it's original is huge) -
{
"_id": 2006010100000801089,
"show_image" : false,
"event" : "publish",
"publish_date" :1136091600,
"data_version" : 1
}
JSON that gets inserted to MongoDB -
{
"_id": NumberLong("2006010100000801089"),
"show_image" : BinData(0,"MA=="),
"event" : "publish",
"publish_date" :NumberLong(1136091600),
"data_version" : NumberLong(1)
}
I am providing the custom _id for the documents, which I want to get converted to NumberLong type. That is working as expected as you can see from the JSON above. But notice how other non-string values for show_image, publish_date and data_version got converted to it's object representation.
Is there any way I can retain the original type for these values?
Perl code snipper that does the insert -
use MongoDB;
use MongoDB::OID;
use JSON;
use JSON::XS
while(my $record = <$source_file>) {
my $record_decoded = decode_json($record);
$db_collection->insert($record_decoded);
}
Perl version used v5.18.2.
I looked up JSON::XS docs but couldn't find a way to do this. Any help is appreciated. Thanks in advance!
I am very new to perl. Sorry if this is a trivial question.
I am providing the custom _id for the documents, which I want to get converted to NumberLong type. That is working as expected as you can see from the JSON above. But notice how other non-string values for show_image, publish_date and data_version got converted to it's object representation.
From your example all of the data types are actually matching aside from the boolean value for show_image which is currently being converted to binary data.
It is expected that numeric types are displayed as NumberLong or NumberInt when queried from the mongo shell. The mongo shell uses JavaScript, which only has a single numeric type of Number (64-bit floating point). Shell helpers like NumberLong() and NumberInt() are used to represent values in MongoDB's BSON data types that do not have a native JavaScript equivalent.
Referring to my sample JSON, I want value of show_image to be inserted as false instead of BinData(0,"MA==") and publish_date to be inserted as 1136091600 instead of NumberLong(1136091600)
While it's OK to insert publish_date as a Unixtime if that suits your use case, you may find it more useful to use MongoDB's Date type instead. There are convenience methods for querying dates including Date Aggregation Operators. FYI, date fields will be displayed in the mongo shell with an ISODate() wrapper.
The boolean value for show_image definitely needs an assist, though.
If you use Data::Dumper to inspect the result from decode_json(), you will see that the show_image field is a blessed object:
'show_image' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' )
In order to get the expected boolean value in MongoDB, the recommended approach in the MongoDB module docs is to use the boolean module (see: MongoDB::DataTypes).
I couldn't find an obvious built-in option for JSON or JSON::XS to support serialising booleans to something other than the JSON emulated boolean class, but one solution would be to use the Data::Clean::Base module which is part of the Data::Clean::JSON distribution.
Sample snippet (excluding the MongoDB set up):
use Data::Clean::Base;
use boolean;
my $cleanser = Data::Clean::Base->new(
'JSON::XS::Boolean' => ['call_func', 'boolean::boolean'],
'JSON::PP::Boolean' => ['call_func', 'boolean::boolean']
);
while (my $record = <$source_file>) {
my $record_decoded = decode_json($record);
$cleanser->clean_in_place($record_decoded);
$db_collection->insert($record_decoded);
}
Sample record as saved in MongoDB 3.0.2:
{
"_id": NumberLong("2006010100000801089"),
"event": "publish",
"data_version": NumberLong("1"),
"show_image": false,
"publish_date": NumberLong("1136091600")
}
JSON data contains only (double-precision) numbers, strings, and the special values true, false, and null. They can be arranged in arrays or "objects" (hashes).
The MongoDB engine is converting these basic types into something more complex, but the original values are available in the hash referred to by $record_decoded, like so
$record_decoded->{_id}
$record_decoded->{show_image}
$record_decoded->{event}
$record_decoded->{publish_date}
$record_decoded->{data_version}
Is that what you wanted?
The object serialization documentation (particularly allow_tags) in JSON::XS may do something like what you want. Note, though, that this is not a standard JSON feature and will only work with JSON::XS.