How do I parse a nested json string in Snowflake? - json

thanks for reading and hope you can help me.
This is what my json string looks like. I'm struggling to find a way to parse it in Snowflake.
{"date":"2020-07-13T00:00:00.0000000","Reason":"{\"description\":\"Test\",\"alternates\":{},\"position\":10}","forename":"Tester","surname":"Test","title":"Mr","dateOfBirth":"2000-11-22T00:00:00.0000000"}
When I try PARSE_JSON() I get the following error
SQL Error [100069] [22P02]: Error parsing JSON: missing comma, pos 51
I'm exploring the possibility of cleansing/transforming the data before ingestion but perhaps someone out there has a better solution to deal with this issue within Snowflake.
So far I haven't been able to parse this or create a regular expression to replace the quote marks after the backwards slash.
Any help is much appreciated
Thanks!
jc

JCB,
I am unable to reproduce your issue. Here is what I am using:
WITH X AS (
SELECT PARSE_JSON($1) AS MY_JSON
FROM VALUES ($$
{
"date": "2020-07-13T00:00:00.0000000",
"Reason": "{\"description\":\"Test\",\"alternates\":{},\"position\":10}",
"forename": "Tester",
"surname": "Test",
"title": "Mr",
"dateOfBirth": "2000-11-22T00:00:00.0000000"
}
$$)
)
SELECT MY_JSON
FROM X
;
Please provide the EXACT SQL that you are using, so that others here can assist you better.

I managed to parse the json with Darren's help. Also managed to list any new keys and attributes with a lateral join to a flatten subquery.
SELECT DISTINCT
f.path,
typeof(f.value)
FROM
REPORT_DATA,
LATERAL FLATTEN(SRC, RECURSIVE=>true) f
WHERE
TYPEOF(f.value) != 'OBJECT';

Related

Redshift JSON Parsing

I have some JSON data in Redshift table of type character varying. An example entry is:
[{"value":["*"], "key":"testData"}, {"value":"["GGG"], key: "differentData"}]
I want to return vales based on keys, how can i do this? I'm attempting to do something like
json_extract_path_text(column, 'value') but unfortunately it errors out. Any ideas?
So the first issue is that your string isn't valid JSON. There are mismatched and missing quotes. I think you mean:
[{"value":["*"], "key":"testData"}, {"value":["GGG"], "key": "differentData"}]
I don't know if this is a data issue or a transcription error but these functions won't work unless the json text is valid.
The next thing to consider is that at the top level this json is an array so you will need to use json_extract_array_element_text() function to pick up an element of the array. For example:
json_extract_array_element_text('json string', 0)
So putting this together we can extract the first "value" with (untested):
json_extract_path_text(
json_extract_array_element_text(
'[{"value":["*"], "key":"testData"}, {"value":["GGG"], "key": "differentData"}]', 0
), 'value'
)
Should return the string ["*"].

Bracket notation for SQL Server json_value?

This works:
select json_value('{ "a": "b" }', '$.a')
This doesn't work:
select json_value('{ "a": "b" }', '$["a"]')
and neither does this:
select json_value('{ "a": "b" }', '$[''a'']')
In JSON, these are the same:
foo = { "a": "b" }
console.log(foo.a)
console.log(foo["a"])
What am I missing? I get an error trying to use bracket notation in SQL Server:
JSON path is not properly formatted. Unexpected character '"' is found at position 2
No sooner do I ask, than I stumble on an answer. I couldn't find this in any documentation anywhere, but select json_value('{ "a": "b" }', '$."a"') works. Bracket notation is not supported, but otherwise invalid keys can be escaped with quotation marks, e.g. select json_value('{ "I-m so invalid][": "b" }', '$."I-m so invalid]["') when in JavaScript that would be foo["I-m so invalid]["]
MsSql reserves this for array index. SQL parses all JSON as a string literal, instead of as an object(JSON or ARRAY) with any hidden key.
Some of what SQL can do will vary with version. Here's a crash course on the annoying (but also really powerful, and fast once in place) requirements. I'm posting more than you need because a lot of the documentation for JSON through MsSql is lacking, and doesn't do justice to how strong it is with JSON.
MsDoc here: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver15
In this example, we are working with a JSON "object" to separate the data into columns. Note how calling a position inside of an array is weird.
declare #data nvarchar(max) = N'[{"a":"b","c":[{"some":"random","array":"value"},{"another":"random","array":"value"}]},{"e":"f","c":[{"some":"random","array":"value"},{"another":"random","array":"value"}]}]'
--make sure SQL is happy. It will not accept partial snippets
select ISJSON(#data)
--let's look at the data in tabular form
select
json1.*
, json2.*
from openjson(#data)
with (
a varchar --note there is no "path" specified here, as "a" is a key in the first layer of the object
, c nvarchar(max) as JSON --must use "nvarchar(max)" and "as JSON" or SQL freaks out
, c0 nvarchar(max) N'$.c[0]' as JSON
) as json1
cross apply openjson(json1.c) as json2
You can also pull out the individual values, if needed
select oj.value from openjson(#data) as oj where oj.[key] = 1;
select
oj.value
, JSON_VALUE(oj.value,N'$.e')
, JSON_VALUE(oj.value,N'$.c[0].some')
, JSON_VALUE(#data,N'$[1].c[0].some') --Similar to your first example, but uses index position instead of key value. Works because SQL views the "[]" brackets as an array while trying to parse.
from openjson(#data) as oj
where oj.[key] = 1

Python3: JSON to CSV

I have a JSON dict in Python which I would like to parse into a CSV, my data and code looks like this:
import csv
import json
x = {
"success": 1,
"return": {
"variable_id": {
"var1": "val1",
"var2": "val2"
}...
f = csv.writer(open("foo.csv", "w", newline=''))
for x in x:
f.writerow([x["success"],
'--variable value--',
x["return"]["variable_id"]["var1"],
x["return"]["variable_id"]["var2"])
However, since variable_id's value is going to change I don't know how to refer to in the code. Apologies if this is trivial but I guess I lack the terminology to find the solution.
You can use the * (unpack) operator to do this, assuming only the values in your variable_id matter :
f.writerow([x["success"],
'--variable value--',
*[val for variable_id in x['return'].values() for val in variable_id.values()])
The unpack operator essentially takes everything in x["return"]["variable_id"].values() and appends it in the list you're creating as input for writerow.
EDIT this should now work if you don't know how to referencevariable_id. This will work best if you have several variable_ids in x['return'].
If you only have one variable_id, then you can also try this :
f.writerow([x["success"],
'--variable value--',
*list(x['return'].values())[0].values()])
Or
f.writerow([x["success"],
'--variable value--',
*next(iter(x['return'].values())).values()])
You can get variable_id's value using x['success']['return'].keys[0].

Parse complex Json string contained in Hadoop

I want to parse a string of complex JSON in Pig. Specifically, I want Pig to understand my JSON array as a bag instead of as a single chararray. I found that complex JSON can be parsed by using Twitter's Elephant Bird or Mozilla's Akela library. (I found some additional libraries, but I cannot use 'Loader' based approach since I use HCatalog Loader to load data from Hive.)
But, the problem is the structure of my data; each value of Map structure contains value part of complex JSON. For example,
1. My table looks like (WARNING: type of 'complex_data' is not STRING, a MAP of <STRING, STRING>!)
TABLE temp_table
(
user_id BIGINT COMMENT 'user ID.',
complex_data MAP <STRING, STRING> COMMENT 'complex json data'
)
COMMENT 'temp data.'
PARTITIONED BY(created_date STRING)
STORED AS RCFILE;
2. And 'complex_data' contains (a value that I want to get is marked with two *s, so basically #'d'#'f' from each PARSED_STRING(complex_data#'c') )
{ "a": "[]",
"b": "\"sdf\"",
"**c**":"[{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},
{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},]"
}
3. So, I tried... (same approach for Elephant Bird)
REGISTER '/path/to/akela-0.6-SNAPSHOT.jar';
DEFINE JsonTupleMap com.mozilla.pig.eval.json.JsonTupleMap();
data = LOAD temp_table USING org.apache.hive.hcatalog.pig.HCatLoader();
values_of_map = FOREACH data GENERATE complex_data#'c' AS attr:chararray; -- IT WORKS
-- dump values_of_map shows correct chararray data per each row
-- eg) ([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }])
([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }]) ...
attempt1 = FOREACH data GENERATE JsonTupleMap(complex_data#'c'); -- THIS LINE CAUSE AN ERROR
attempt2 = FOREACH data GENERATE JsonTupleMap(CONCAT(CONCAT('{\\"key\\":', complex_data#'c'), '}'); -- IT ALSO DOSE NOT WORK
I guessed that "attempt1" was failed because the value doesn't contain full JSON. However, when I CONCAT like "attempt2", I generate additional \ mark with. (so each line starts with {\"key\": ) I'm not sure that this additional marks breaks the parsing rule or not. In any case, I want to parse the given JSON string so that Pig can understand. If you have any method or solution, please Feel free to let me know.
I finally solved my problem by using jyson library with jython UDF.
I know that I can solve it by using JAVA or other languages.
But, I think that jython with jyson is the most simplist answer to this issue.

SQL Server dynamic JSON using within Analysis Services?

I am trying to get my head around which direction to even start with the following..
Imaging a dynamic form (JSON) that I store in SQL Server 2016+. So far, I have seen / tried a couple of dynamic queries to take the dynamic JSON and flatten out as columns.
Given the "dynamic" nature, it is hard to "store" that flatten out data. I have been looking at temporary/temporal/memory tables to store that dynamic flattened data for a "relatively short period" of time (say an hour or two).
I have also been asked if it is possible to use the dynamic JSON data in building a cube within Analysis Services.. again given the dynamic nature of this, would something like this even be possible?
I guess my question is two-fold:
Pointers to flatten out dynamic JSON within SQL Server
Is it possible to take dynamic JSON, flatten out to columns and somehow use within Analysis Services? i.e. ultimately to use within a cube?
Realise the above is a bit vague, but any pointers to get me going in the correct direction would be appreciated!
Many thanks.
Dynamically converting JSON into columns can get tricky. Especially if you are NOT certain of the structure. That said, have you considered converting the JSON into a hierarchy via a Recursive CTE?
Example
declare #json varchar(max)='
[
{
"url": "https://www.google.com",
"image-url": "https://www.google.com/imghp",
"labels": [
{
"source": "Bob, Inc",
"name": "Whips",
"info": "Ouch"
},
{
"source": "Weezles of Oregon",
"name": "Chains",
"info": "Let me go"
}
],
"Fact": "Fictional"
}
]';
;with cte0 as (
Select *
,[Level]=1
,[Path]=convert(varchar(max),row_number() over(order by (select null)))
From OpenJSON(#json,'$')
Union All
Select R.*
,[Level]=p.[Level]+1
,[Path]=concat(P.[Path],'\',row_number() over(order by (select null)))
From cte0 p
Cross Apply OpenJSON(p.value,'$') R
Where P.[Type]>3
)
Select [Level]
,[Path]
,Title = replicate('|---',[Level]-1)+[Key]
,Item = [Key]
,Value = case when [type]<4 then Value else null end
From cte0
Order By [Path]
Returns