Parsing JSON without key names to retrieve a column - json

I am loading json from data.gov that does not have key names for the values in the json data, e.g. below: the metadata is available separately.
I am able to load the json into a variant column, but cannot see how to parse and query for specific columns, e.g. Frankford below - I have tried JSONcol:data[0] which returns the entire entry, but am unable to see how to specify column 4, say.
{
data: [ [ "row-ea6u~fkaa~32ry", "0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\"address\": \"4509 BELAIR ROAD\", \"city\": \"Baltimore\", \"state\": \"MD\", \"zip\": \"\"}", null, null, null, true ], null, null, null ]]
}
The following code is used to create and load the snowflake table:
create or replace table snowpipe.public.snowtable(jsontext variant);
copy into snowpipe.public.snowtable
from #snowpipe.public.snowstage
file_format = (type = 'JSON')

Not exactly sure how your varient data is look once you have loaded it, but experimenting on variant via PARSE_JSON for you object. Which I has to double slash the \ to make it valid sql.
select
PARSE_JSON('{ data: [ [ "row-ea6u~fkaa~32ry", "0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\\"address\\": \\"4509 BELAIR ROAD\\", \\"city\\": \\"Baltimore\\", \\"state\\": \\"MD\\", \\"zip\\": \\"\\"}", null, null, null, true ], null, null, null ]]}') as j
,j:data as jd
,jd[0] as jd0
,jd0[3] as jd0_3
,array_slice(j:data[0],3,5) as jd0_3to4
;
shows that you can use [0] notation to index arrays, and thus get the results:
J: { "data": [ [ "row-ea6u~fkaa~32ry", "0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\"a...
JD: [ [ "row-ea6u~fkaa~32ry", "0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\"address\": \"4509 BELAIR ROAD\", \"city\": \"...
JD0: [ "row-ea6u~fkaa~32ry", "0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\"address\": \"4509 BELAIR ROAD\", \"city\": \"Baltimore\", \"state\": \"MD\", \"...
JD0_3: 1486063689
JD0_3TO4: [ 1486063689, null ]
so if you have unknown amount of first level elements in data that you want to access, then use LATERAL FLATTEN like so:
WITH data as (
select PARSE_JSON('{ data: [ [ "row-1", "0B8", 0 ],["row-2", "F94", 2],
["row-3", "EE5", 4]]}') as j
)
select f.value[0]::text as row_name
,f.value[1]::text as serial_number
,f.value[2]::number as num
from data d,
lateral flatten(input=> d.j:data) f;
gives:
ROW_NAME SERIAL_NUMBER NUM
row-1 0B8 0
row-2 F94 2
row-3 EE5 4

Related

Parse the following JSON using google apps script

-- Hi everyone, It's been now several days I'm trying to parse the following JSON using google apps script.
[
{
"NOMBRE": "ViejosNoUsarEl Quebrachal",
"ACTIVO": false,
"CODIGO": "ViejosNoUsarQUEB",
"CALLE": null,
"NUMERO": null,
"PROVINCIA": "Jujuy",
"LOCALIDAD": "EL MORRO",
"ZONA": null,
"SUPERFICIE": 3900,
"CODIGOEXTERNO": ""
},
{
"NOMBRE": "ViejoNoUsarSanta Teresa",
"ACTIVO": false,
"CODIGO": "ViejoNoUsarST",
"CALLE": null,
"NUMERO": null,
"PROVINCIA": "San Luis",
"LOCALIDAD": "Villa MercedesOLD",
"ZONA": "Oeste",
"SUPERFICIE": 3700,
"CODIGOEXTERNO": ""
},
{
"NOMBRE": "ViejosNoUsarGil",
"ACTIVO": false,
"CODIGO": "ViejosNoUsarGIL",
"CALLE": null,
"NUMERO": null,
"PROVINCIA": "Cordoba",
"LOCALIDAD": "9 DE JULIO",
"ZONA": "Oeste",
"SUPERFICIE": 200,
"CODIGOEXTERNO": ""
},
{
"NOMBRE": "ViejosNoUsarDon Manuel",
"ACTIVO": false,
"CODIGO": "ViejosNoUsarDM",
"CALLE": null,
"NUMERO": null,
"PROVINCIA": "Cordoba",
"LOCALIDAD": "9 DE JULIO",
"ZONA": "Oeste",
"SUPERFICIE": 400,
"CODIGOEXTERNO": ""
}
]
The GET response is giving me the JSON as I posted it.
Using google apps script I want to add on a google sheet as much rows as objects are in the array.
In this case there would be 4 google sheet rows. I want to parse only the values of the properties.
As an example, the first row would look like this:
ViejosNoUsarEl Quebrachal | false | ViejosNoUsarQUEB | null | null | Jujuy | EL MORRO | null | 3900 |
I want to focus on this question on the pasrsing matter, not on the adding the rows to the google sheet yet.
The problem is that I cant get the dot notation to extract the values I want.
For example, Logger.log(response.provincia); prints "Information null".
Modification points:
From your showing sample data and For example, Logger.log(response.provincia); prints "Information null"., I thought that the reason for your issue is due to that you are trying to retrieve the values from an array using response.provincia. In this case, it is required to be response[i].PROVINCIA. i is the index of an array. If you want to retrieve the value of "PROVINCIA" of the 1st element of the array, you can use response[0].PROVINCIA. From your showing data, provincia is required to be PROVINCIA. When response[0].provincia is run, undefined is returned. Please be careful about this.
When you want to retrieve the values like ViejosNoUsarEl Quebrachal | false | ViejosNoUsarQUEB | null | null | Jujuy | EL MORRO | null | 3900 | in order, in this case, the values are retrieved by preparing the keys in order.
When these points are reflected in a sample script, it becomes as follows.
Sample script:
const keys = ["NOMBRE", "ACTIVO", "CODIGO", "CALLE", "NUMERO", "PROVINCIA", "LOCALIDAD", "ZONA", "SUPERFICIE", "CODIGOEXTERNO"];
const response = [
{
"NOMBRE": "ViejosNoUsarEl Quebrachal",
"ACTIVO": false,
"CODIGO": "ViejosNoUsarQUEB",
"CALLE": null,
"NUMERO": null,
"PROVINCIA": "Jujuy",
"LOCALIDAD": "EL MORRO",
"ZONA": null,
"SUPERFICIE": 3900,
"CODIGOEXTERNO": ""
},
{
"NOMBRE": "ViejoNoUsarSanta Teresa",
"ACTIVO": false,
"CODIGO": "ViejoNoUsarST",
"CALLE": null,
"NUMERO": null,
"PROVINCIA": "San Luis",
"LOCALIDAD": "Villa MercedesOLD",
"ZONA": "Oeste",
"SUPERFICIE": 3700,
"CODIGOEXTERNO": ""
},
{
"NOMBRE": "ViejosNoUsarGil",
"ACTIVO": false,
"CODIGO": "ViejosNoUsarGIL",
"CALLE": null,
"NUMERO": null,
"PROVINCIA": "Cordoba",
"LOCALIDAD": "9 DE JULIO",
"ZONA": "Oeste",
"SUPERFICIE": 200,
"CODIGOEXTERNO": ""
},
{
"NOMBRE": "ViejosNoUsarDon Manuel",
"ACTIVO": false,
"CODIGO": "ViejosNoUsarDM",
"CALLE": null,
"NUMERO": null,
"PROVINCIA": "Cordoba",
"LOCALIDAD": "9 DE JULIO",
"ZONA": "Oeste",
"SUPERFICIE": 400,
"CODIGOEXTERNO": ""
}
];
const values = response.map(o => keys.map(h => o[h]));
console.log(values)
When this script is run, the values are returned as the 2-dimensional array. This can be used for putting to the Spreadsheet using setValues.
Reference:
map()

Retrieving sub-fields from parsed JSON in snowflake

I'm having some difficulty getting the individual components of the address component
with data as (select
PARSE_JSON('{ "data" : [
[ "row-ea6u~fkaa~32ry", "00000000-0000-0000-01B7-0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\"address\": \"4509 BELAIR ROAD\", \"city\": \"Baltimore\", \"state\": \"MD\", \"zip\": \"\"}", null, null, null, true ], null, null, null ]
}') as j
)
select f.value[1][0]::text
from data d,
lateral flatten(input=> d.j:data,recursive=>TRUE) f;
f.value[1][0] has a field address
{"address": "4509 BELAIR ROAD", "city": "Baltimore", "state": "MD", "zip": ""}
but
f.value[1][0].address returns null
How do I get the individual attributes of f.value[1] like address, city, etc?
The problem is given you have three levels of nested data, you should not be using recursive=>TRUE as the objects are not the same, so you cannot make anything of value out of the data. You need to break the different layers apart manually.
with data as (
select
PARSE_JSON('{ data: [ [ "row-ea6u~fkaa~32ry", "0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\\"address\\": \\"4509 BELAIR ROAD\\", \\"city\\": \\"Baltimore\\", \\"state\\": \\"MD\\", \\"zip\\": \\"\\"}", null, null, null, true ], null, null, null ]]}') as j
), data_rows as (
select f.value as r
from data d,
lateral flatten(input=> d.j:data) f
)
select dr.r[0] as v0
,dr.r[1] as v1
,dr.r[2] as v2
,dr.r[3] as v3
,f.value as addr_n
from data_rows dr,
lateral flatten(input=> dr.r[13]) f;
so this get all the rows (of which your example has only one) the unpacks the values of interest (you will need to complete this part and give v0 - vN meaning) but there is an array or addresses
V0 V1 V2 V3 ADDR_N
"row-ea6u~fkaa~32ry" "0B8F94EE5292" 0 1486063689 "{\"address\": \"4509 BELAIR ROAD\", \"city\": \"Baltimore\", \"state\": \"MD\", \"zip\": \"\"}"
"row-ea6u~fkaa~32ry" "0B8F94EE5292" 0 1486063689 null
"row-ea6u~fkaa~32ry" "0B8F94EE5292" 0 1486063689 null
"row-ea6u~fkaa~32ry" "0B8F94EE5292" 0 1486063689 null
"row-ea6u~fkaa~32ry" "0B8F94EE5292" 0 1486063689 true
now to decode the address as json ,parse_json(f.value) as addr_n does that, so you can break it apart like:
with data as (
select
PARSE_JSON('{ data: [ [ "row-ea6u~fkaa~32ry", "0B8F94EE5292", 0, 1486063689, null, 1486063689, null, "{ }", "410", "21206", "Frankford", "2", "NORTHEASTERN", [ "{\\"address\\": \\"4509 BELAIR ROAD\\", \\"city\\": \\"Baltimore\\", \\"state\\": \\"MD\\", \\"zip\\": \\"\\"}", null, null, null, true ], null, null, null ]]}') as j
), data_rows as (
select f.value as r
from data d,
lateral flatten(input=> d.j:data) f
)
select dr.r[0] as v0
,dr.r[1] as v1
,dr.r[2] as v2
,dr.r[3] as v3
,parse_json(f.value) as addr_n
,addr_n:address::text as addr_address
,addr_n:city::text as addr_city
,addr_n:state::text as addr_state
,addr_n:zip::text as addr_zip
from data_rows dr,
lateral flatten(input=> dr.r[13]) f;
you can ether leave the addr_n dummy variable or swap it out by cut'n'pasting it like so:
,parse_json(f.value):address::text as addr_address
,parse_json(f.value):city::text as addr_city
,parse_json(f.value):state::text as addr_state
,parse_json(f.value):zip::text as addr_zip
You can follow the article for step-by-step for achieving it:
https://community.snowflake.com/s/article/Using-lateral-flatten-to-extract-data-from-JSON-internal-field
Hope this helps!

TSQL JSON_QUERY can you use a filter in the JSON Path

I have a table with a column that holds valid JSON, heres an example of the JSON structure:
{
"Requirements": {
"$values": [
{
"$type": "List",
"ListId": "956cf9c5-24ab-47d9-8082-940118f2f1a3",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value1",
"Value2",
"Value3"
]
}
},
{
"$type": "List",
"ListId": "D11149DD-A682-4BC7-A87D-567954779234",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value4",
"Value5",
"Value6",
"Value7"
]
}
}
]
}
}
I need to return the values from MultiSelected collection depending on the value of ListID.
I'm using the following JSON Path to retun value
$.Requirements."$values"[?(#.ListId=='956cf9c5-24ab-47d9-8082-940118f2f1a3')].MultiSelected."$values"
This worked fine in a JSON Expression tester.
But when I try to use it to query the table I get the following error:
JSON path is not properly formatted. Unexpected character '?' is found at position 25.
The query I'm using is as follows:
SELECT ID AS PayloadID,
Items.Item AS ItemsValues
FROM dbo.Payload
CROSS APPLY ( SELECT *
FROM OPENJSON( JSON_QUERY( Payload, '$.Requirements."$values"[?(#.ListId==''956cf9c5-24ab-47d9-8082-940118f2f1a3'')].MultiSelected."$values"' ) )
WITH ( Item nvarchar(200) '$' ) ) AS Items
WHERE ID = 3
I've tried replacing
?(#.ListId==''956cf9c5-24ab-47d9-8082-940118f2f1a3'')
with 0 and it works fine on SQL Server.
My question is, is filter syntax ?(...) supported in JSON_QUERY or is there something else I should be doing?
The database is running on Azure, were the database compatability level is set to SQL Server 2017 (140).
Thanks for your help in advance.
Andy
I would use openjson twice in stead
drop table if exists #payload
create table #payload(ID int,Payload nvarchar(max))
insert into #payload VALUES
(3,N'
{
"Requirements": {
"$values": [
{
"$type": "List",
"ListId": "956cf9c5-24ab-47d9-8082-940118f2f1a3",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value1",
"Value2",
"Value3"
]
}
},
{
"$type": "List",
"ListId": "D11149DD-A682-4BC7-A87D-567954779234",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value4",
"Value5",
"Value6",
"Value7"
]
}
}
]
}
}'
)
SELECT ID AS PayloadID,
Items.[value]
FROM #Payload a
CROSS APPLY OPENJSON( Payload, '$.Requirements."$values"' ) with ( ListId varchar(50),MultiSelected nvarchar(max) as json) b
CROSS APPLY OPENJSON( MultiSelected,'$."$values"' ) Items
where
a.id=3
AND b.listid='956cf9c5-24ab-47d9-8082-940118f2f1a3'
Reply:
+-----------+--------+
| PayloadID | value |
+-----------+--------+
| 3 | Value1 |
| 3 | Value2 |
| 3 | Value3 |
+-----------+--------+

Converting 1-to-many json into csv

I'm trying to parse json output from an API call. The output has an array of orders, and each order has an array of items. I want to parse the output such that I have a single CSV output of each individual item with its parent order ID.
So if a single order contains multiple items, I need the orderID repeated for each item in its order. I've read the jq documentation and dozens of samples, and I've tried some trial and error for hours. I'm SO confused as to how to do this.
I'm struggling very much with the jq parsing syntax. None of the examples are really helping, and I'm just confused. Here's the basics:
curl -s https://api.site.com/orders?page=1&pageSize=10 | jq '.'
A sample of the json is below.
{
"orders": [
{
"orderId": 217356098,
"items": [
{
"orderItemId": 327010821,
"lineItemKey": "1",
"sku": "AJC-C10S",
"name": "TestDescription",
"imageUrl": null,
"weight": null,
"quantity": 2,
"unitPrice": 106.85,
"taxAmount": null,
"shippingAmount": null,
"warehouseLocation": null,
"options": [],
"productId": null,
"fulfillmentSku": null,
"adjustment": false,
"upc": null,
"createDate": "2016-11-09T02:11:28.307",
"modifyDate": "2016-11-09T02:11:28.307"
},
{
"orderItemId": 327010822,
"lineItemKey": "1",
"sku": "AJC-C106",
"name": "AnotherTestDescription",
"imageUrl": null,
"weight": null,
"quantity": 2,
"unitPrice": 106.85,
"taxAmount": null,
"shippingAmount": null,
"warehouseLocation": null,
"options": [],
"productId": null,
"fulfillmentSku": null,
"adjustment": false,
"upc": null,
"createDate": "2016-11-09T02:11:28.307",
"modifyDate": "2016-11-09T02:11:28.307"
}
]
},
],
"total": 359934,
"page": 1,
"pages": 179968
}
Expected output (without column headers of course):
orderId,orderItemId,sku,name
217356098,327010821,"JC-C10S","TestDescription"
217356098,327010822,"JC-C106","AnotherTestDescription"
As you can see, each item has its own line, but if they came from the same order, the orderId should be repeated on each line.
How can I do this?
With the -r command-line option, the following jq filter:
.orders[]
| .orderId as $oid
| .items[]
| [$oid, .orderItemId, .sku, .name]
| #csv
produces the desired output.
If there's any chance that any of the selected values might be [], then consider adding a line like the following immediately before the last line above:
| map_values(if . == [] then "NONE" else . end)
Thanks! That worked with a slight alteration:
.orders[]
| .orderId as $oid
| .items[]
| [$oid, .items.orderItemId, .items.sku, .items.name | tostring]
| #csv

Extract values from inside JSON column in MySQL based on conditions

Consider a table in MySQL 5.7.x having a JSON column.
-- CREATE TABLE "plans" -----------------------------------
CREATE TABLE `plans` (
`id` VarChar( 36 ) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL,
`name` VarChar( 50 ) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`structure` JSON NOT NULL,
PRIMARY KEY ( `id` ),
CONSTRAINT `index_exam_plans_on_id` UNIQUE( `id` ),
CHARACTER SET = utf8mb4
COLLATE = utf8mb4_general_ci
ENGINE = InnoDB;
The structure column has a JSON. Please find an example structure of the JSON below:
{
"33aa1e1c-0c95-4860-9b71-ccd13f393dd0": {
"name": "Term 1",
"tags": [
"Term"
],
"type": "default",
"uuid": "33aa1e1c-0c95-4860-9b71-ccd13f393dd0",
"is_locked": false
},
"cb896a12-f07c-4bcc-9c22-7bdfa585f5f7": {
"name": "English",
"tags": [
"Paper",
"Course Paper"
],
"type": "course_paper",
"uuid": "cb896a12-f07c-4bcc-9c22-7bdfa585f5f7",
"course_id": 1,
"is_locked": false
},
"e6d2f9fb-0429-42b2-b704-c438e1695044": {
"name": "Written Work",
"tags": [
"Paper",
"Regular Paper"
],
"type": "regular_paper",
"uuid": "e6d2f9fb-0429-42b2-b704-c438e1695044",
"course_id": 2,
"is_locked": false
},
"d0d3eeff-9ffb-4f35-b0fb-b94373d1fe5b": {
"name": "Summative Assessment",
"tags": [
"Exam"
],
"type": "default",
"uuid": "d0d3eeff-9ffb-4f35-b0fb-b94373d1fe5b",
"is_locked": false
},
"0952e100-cd4a-473e-bd24-2370e0dfcc1c": {
"name": "Speaking skills",
"tags": [
"Paper",
"Regular Paper"
],
"type": "regular_paper",
"uuid": "0952e100-cd4a-473e-bd24-2370e0dfcc1c",
"course_id": 5,
"is_locked": false
}
}
Points to note:
The JSON is an object, containing any number of key-value pairs. The example structure has 5 key-value pairs.
The "value" in each key-value pair is another JS object of key-value pairs. Let's call this the "inner JSON".
Every inner JSON has a key called type.
In the example structure above, the first and fourth inner JSONs have default as the value for the type key.
The requirement
I want to find all the inner JSONs which have type set to default. This should be done in just SQL. For the example above, the output should be just the first and fourth inner JSONs.
How do I do this?
I have tried the following, but it does not return the inner JSONs based on the condition. It returns every inner JSON in the structure column.
select JSON_EXTRACT(structure, '$.*') from plans
where name = 'Academic'
and JSON_CONTAINS( structure->'$.*.type', '"default"' );
Could somebody give the right way to go about this?