Create single dictionary with jq - json

I have the following input:
[
{
constant_id: 5,
object_id: 2,
object_type: 'delimited_file',
name: 'data_file_pattern',
value: 'list_of_orders.csv',
insert_date: 2021-11-23T10:24:16.568Z,
update_date: null
},
{
constant_id: 6,
object_id: 2,
object_type: 'delimited_file',
name: 'header_count',
value: '1',
insert_date: 2021-11-23T10:24:16.568Z,
update_date: null
}
]
That I'd like to combine to get the following result:
{
data_file_pattern: 'list_of_orders.csv',
header_count: '1'
}
Basically creating a single dictionary with only the name and value keys from the input dictionaries. I believe I've done this before but for the life of me I can't figure it out again.

If you get your quoting right in the input JSON, it's as simple as calling the from_entries builtin. It converts an array of objects to a single object with given key/value pairs. It takes the field name from a field called key, Key, name or Name and the value from a field called value or Value (see Demo):
from_entries
{
"data_file_pattern": "list_of_orders.csv",
"header_count": "1"
}
Note: I believe the second field name should read header_count instead of delimited_file as you wanted to take its name from .name, not .object_type.

Related

How to read field from nested json?

this is my test json file.
{
"item" : {
"fracData" : [ ],
"fractimeData" : [ {
"number" : "1232323232",
"timePeriods" : [ {
"validFrom" : "2021-08-03"
} ]
} ],
"Module" : [ ]
}
}
This is how I read the json file.
starhist_test_df = spark.read.json("/mapr/xxx/yyy/ttt/dev/rawdata/Test.json", multiLine=True)
starhist_test_df.createOrReplaceTempView("v_test_df")
This query works.
df_test_01 = spark.sql("""
select item.fractimeData.number from v_test_df""")
df_test_01.collect();
Result
[Row(number=['1232323232'])]
But this query doesn't work.
df_test_01 = spark.sql("""
select item.fractimeData.timePeriods.validFrom from v_test_df""")
df_test_01.collect();
Error
cannot resolve 'v_test_df.`item`.`fractimeData`.`timePeriods`['validFrom']' due to data type mismatch: argument 2 requires integral type, however, ''validFrom'' is of string type.; line 3 pos 0;
What do I have to change, to read the validFrom field?
dot notation to access values works with struct or array<struct> types.
The schema for field number in item.fractimeData is string and accessing it via dot notation returns an array<string> since fractimeData is an array.
Similarly, the schema for field timePeriods in item.fractimeData is <array<struct<validFrom>>, and accessing it via dot notation wraps it into another array, resulting in final schema of array<array<struct<validFrom>>>.
The error you get is because the dot notation can work on array<struct> but not on array<array>.
Hence, flatten the result from item.fractimeData.timePeriods to get back an array<struct<validFrom>> and then apply the dot notation.
df_test_01 = spark.sql("""
select flatten(item.fractimeData.timePeriods).validFrom as validFrom from v_test_df""")
df_test_01.collect()
"""
[Row(validFrom=['2021-08-03', '2021-08-03'])]
"""

How to retrieve values from an array of json objects in PostgreSQL?

I have the following json :
[
{
"transition":"random_word",
"from":"paris",
"to":"porto",
"date":{
"date":"2020-05-28 11:51:25.201864",
"timezone_type":3,
"timezone":"Europe\/Paris"
}
},
{
"transition":"rainbow",
"from":"porto",
"to":"faro",
"date":{
"date":"2020-06-06 23:10:06.878539",
"timezone_type":3,
"timezone":"Europe\/Paris"
}
},
{
"transition":"banana",
"from":"faro",
"to":"rio_de_janeiro",
"date":{
"date":"2020-06-06 23:14:10.975099",
"timezone_type":3,
"timezone":"Europe\/Paris"
}
},
{
"transition":"hello",
"from":"rio_de_janeiro",
"to":"buenos_aires",
"date":{
"date":"2020-06-06 23:14:15.314370",
"timezone_type":3,
"timezone":"Europe\/Paris"
}
}
]
Imagine I want to retrieve the last stop of my traveler (the value of the key "to" from the last json object. Here : buenos_aires) and the date (here :2020-06-06 23:14:15.314370).
How should I proceed knowing that I want to do that using PostgreSQL?
If with "last" you mean the order in which the elements show up in the array, you can use jsonb_array_length() to get the length of the array and use that to obtain the last element:
select (the_json_column -> jsonb_array_length(the_json_column) - 1) ->> 'to' as "to",
(the_json_column -> jsonb_array_length(the_json_column) - 1) #>> '{date,date}' as "date"
from the_table
The expression jsonb_array_length(the_json_column) - 1 calculates the index of the "last" element in the array.
If your column is defined as json rather than jsonb (which it should be) you need to use the equivalent json_array_length() instead.

How to model endpoint with different data types?

I have a model in the DB with the following structure
id
key
value_type (can be 'String', 'Integer' and 'Boolean')
string_value
int_value
bool_value
Where only one of string_value, int_value or bool_value have value while the other 2 are nulls depends on the value of value_type. for example:
id:1, key:'key1', value_type:'String', string_value:'random string', int_value:null, bool_value: null
id:2, key:'key2', value_type:'Integer', string_value:NULL, int_value:777, bool_value:NULL
id:3, key:'key3', value_type:'Boolean', string_value:NULL, int_value:NULL, bool_value:false
I want to open an endpoint to allow GET for entities of this type and consider the two following JSON structure for the return object:
Option 1:
{
id:<id>,
key:<key>,
value_type:<value_type>,
value:<a string representation of the value>
}
for instance:
{
id:2,
key:"key2",
value_type:"Integer",
value:"777"
}
Option 2:
{
id:<id>,
key:<key>,
value_type:<value_type>,
string_value:<string_value>,
integer_value:<integer_value>,
boolean_value:<boolean_value>
}
for instance:
{
id:2,
key:"key2",
value_type:"Integer",
string_value:null,
integer_value:777
boolean_value:null
}
The first option is cleaner while the second one is more typesafe. Is there any standard or consideration to take before making a choice?

What do I call the object-hierarchical "path" of JSON attributes / properties / members and what is their standardized name?

Consider I have following typed JSON objects:
Parent: {
"field1" : "Value of field1"
"fieldC" : {Child}
}
Child: {
"field2" : "Value of field2"
}
Q: What do I call field1 and field2?
Just Strings?
Q: What to i call the "path" fieldC.field2?
Accessor path?
Field path?
Member hierarcy path?
field1 and field2 are just strings.
[anything, ..., ... ] is just an array, so the elements of an object.
and then you have 0-9 (with decimals, negative, positive or with e), true/false and null, as numeric values, boolean and nullvalue
{Child} is an object. I don't think it's called path (I'd say that's opinion-based). maybe field-path, but it's rather a child-object. the key is a string and the value is an object/array/string/bool/null/numeric or decimal
all the possibilities e.g.:
{
"string": "string-value",
"nulltype": null,
"child_object": {
"boolean": true,
"any_decimal_int": -1.5e3
},
"array_values":[
{
"any_value": true
},
{
"any_value": false
}
]
}
of course you can combine more and have unlimited child-objects and lists :)
jsonapi.org seems to refer field1,fieldC,and field2 as member names, which I find much more descriptive than just 'Strings'.
As mentioned in my comment to first answer, I guess I'll personally be using (hierarchical) property path or just (object) member hierarchy while referring to 'writing open' the object-hierarchical property/attribute/member 'path' such as fieldC.field2. Seems to be alot of room for interpretation in that. : ]

Can you store numbers, bools, etc in a postgres jsonb field?

I have one field in my data model that can be practically anything: a number, a string, a bool, or a complex object.
Can I store this in postgres as jsonb?
id | response (jsonb)
------------
1 | "hello"
2 | 3
3 | { "firstName": "bob", "lastName" : "wilson" }
4 | True
I'm currently getting this error when I try to save a numeric json value into that column:
column "response" is of type jsonb but expression is of type text
Is it only possible if I save it in an object structure, like this?
{ "value" : "hello" }
{ "value" : 3 }
{ "value" : ... }
I'd rather do it the first way if it's possible.
drop table if exists t;
create table t (j jsonb);
insert into t (j) values
(to_json(1)::jsonb),
(to_json('hello'::text)::jsonb),
(to_json(true)::jsonb)
;
select * from t;
j
---------
1
"hello"
true
Notice that the string must be cast before being converted to json.
It turns out that even though numbers, bools, and strings have valid JSON representations, the JSON standard requires that the top-level value be either an array or object.
So I shouldn't be attempting to do it this way. Instead, I should store it as an object.