How to convert json object into array in prestodb/athena - json

I have a JSON object in this format data = {"1": {"col1":"a", "col2":"b"}, "2": {"col1":"c", "col2":"d"}....,"99":{"col1":"asd", "col2":"exm"}}. I would like to get all the values for col_1 and col_2 using athena. How do I achieve this using athena?

I was able to solve it by converting it to Map and then unnesting it like
select key, value from table where
unnest(cast(json_parse(data) as Map(varchar,JSON)) as t(key,value))

Ignore that it is a JSON format (since it is not simple enough) and use the regex function:
SELECT regexp_extract_all(column_data, '"col1":"([a-z]+)",'); -- [a, c, ... asd]

Related

Parse JSON data in T-SQL [duplicate]

This is driving me nuts, and I don't understand what's wrong with my approach.
I generate a JSON object in SQL like this:
select #output = (
select distinct lngEmpNo, txtFullName
from tblSecret
for json path, root('result'), include_null_values
)
I get a result like this:
{"result":[{"lngEmpNo":696969,"txtFullName":"Clinton, Bill"}]}
ISJSON() confirms that it's valid JSON, and JSON_QUERY(#OUTPUT, '$.result') will return the array [] portion of the JSON object... cool!
BUT, I'm trying to use JSON_QUERY to extract a specific value:
This gets me a NULL value. Why??????? I've tried it with the [0], without the [0], and of course, txtFullName[0]
SELECT JSON_QUERY(#jsonResponse, '$.result[0].txtFullName');
I prefixed with strict, SELECT JSON_QUERY(#jsonResponse, 'strict $.result[0].txtFullName');, and it tells me this:
Msg 13607, Level 16, State 4, Line 29
JSON path is not properly formatted. Unexpected character 't' is found at
position 18.
What am I doing wrong? What is wrong with my structure?
JSON_QUERY will only extract an object or an array. You are trying to extract a single value so, you need to use JSON_VALUE. For example:
SELECT JSON_VALUE(#jsonResponse, '$.result[0].txtFullName');

Pull data from JSON column and create new output with ABSENT ON NULL option

I have a JSON column in an Oracle DB where it was populated without the ABSENT ON NULL option and there are some pretty long lengths because of this.
I would like to trim things down and have created a new table similar to the first but I would like to select the JSON from form the old, add the ABSENT ON NULL option and place the new values in reducing the column length.
So I can see the JSON easy enough like
SELECT json_query(json_data,'$') FROM table;
This will give a result like:
{
"REC_TYPE_IND":"1",
"ID":"1234",
"OTHER_ID":"4321",
"LOCATION":null,
"EFF_BEG_DT":"19970101",
"EFF_END_DT":"99991231",
"NAME":"Joe",
"CITY":null
}
When I try to remove the null values like
SELECT json_object (json_query(json_data,'$') ABSENT ON NULL
RETURNING VARCHAR2(4000)
) AS col1 FROM table;
I get the following:
ORA-02000: missing VALUE keyword
I assume this is because the funcion json_object is expecting the format:
json_object ('REC_TYPE_IND' VALUE '1',
'ID' VALUE '1234')
Is there a way around this, to turn the JSON back into values that JSON_OBJECT can recognize like above or is there a function I am missing?

How to extract JSON array stored as string in BigQuery

I have a JSON array that looks similar to this
[{"key":"Email","slug":"customer-email","value":"abc#gmail.com"},{"key":"Phone Number","slug":"mobile-phone-number","value":"123456789"},{"key":"First Name","slug":"first-name","value":"abc"},{"key":"Last Name","slug":"last-name","value":"xyz"},{"key":"Date of birth","slug":"date-of-birth","value":"01/01/1990"}]
But the tricky part is, this array is stored as string. So I am thinking that the first step would be to convert the string into array then unnest it then follow the method in here
I wonder if this method is doable, if so I guess the challenge that I am having is to convert string into array. If not, or if you have more efficient method please help. Thanks
Have you tried json_extract_array
select json_extract_array(
"""[{"key":"Email","slug":"customer-email","value":"abc#gmail.com"},{"key":"Phone Number","slug":"mobile-phone-number","value":"123456789"},{"key":"First Name","slug":"first- name","value":"abc"},{"key":"Last Name","slug":"last-name","value":"xyz"},{"key":"Date of birth","slug":"date-of-birth","value":"01/01/1990"}]""");
Below is for BigQuery Standard SQL
#standardSQL
SELECT
id,
JSON_EXTRACT_ARRAY(json_string) AS json_array
FROM `project.dataset.table`
if to apply to sample data from your question as in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '[{"key":"Email","slug":"customer-email","value":"abc#gmail.com"},{"key":"Phone Number","slug":"mobile-phone-number","value":"123456789"},{"key":"First Name","slug":"first-name","value":"abc"},{"key":"Last Name","slug":"last-name","value":"xyz"},{"key":"Date of birth","slug":"date-of-birth","value":"01/01/1990"}]' json_string
)
SELECT
id,
JSON_EXTRACT_ARRAY(json_string) AS json_array
FROM `project.dataset.table`
output is
Row id json_array
1 1 {"key":"Email","slug":"customer-email","value":"abc#gmail.com"}
{"key":"Phone Number","slug":"mobile-phone-number","value":"123456789"}
{"key":"First Name","slug":"first-name","value":"abc"}
{"key":"Last Name","slug":"last-name","value":"xyz"}
{"key":"Date of birth","slug":"date-of-birth","value":"01/01/1990"}
From this point - you can use solution in How do I parse value from JSON array into columns in BigQuery that you referenced in your question

Iterate over json array with python 2.7

I want to iterate over json array that looks likes this.
[
{"key":"value"},
{"key2":"value2"},
{"key3":"value3"},
]
I have tried with json library but it's not possible to iterate over it. The index is not aƱways 0 but succesive
json_result = json.loads(json_var)
print(json_result[0])
print(json_result[0]["key"])
print(json_result[1])
print(json_result[1]["key1"])
comes with:
{"key":"value"}
value
{"key1":"value1"}
value1
So, I would like to get values without accessing their names. something like this:
for x in json_result:
print(json_result[0][x])
Try this:
for i in list_json_var:
for key in i:
print(i[key])

Parse JSON into U-SQL then convert to csv

I'm trying to convert some telemetry data that is in JSON format into CSV format, then write it out to a file, using U-SQL.
The problem is that some of the JSON key values have periods in them, and so when I'm doing the SELECT operation, U-SQL is not recognizing them. When I check the output file, all that I am seeing is the values for "p1". How can I represent the names of the JSON key names in the script so that they are recognized. Thanks in advance for any help!
Code:
REFERENCE ASSEMBLY MATSDevDB.[Newtonsoft.Json];
REFERENCE ASSEMBLY MATSDevDB.[Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
#jsonDocuments =
EXTRACT jsonString string
FROM #"adl://xxxx.azuredatalakestore.net/xxxx/{*}/{*}/{*}/telemetry_{*}.json"
USING Extractors.Tsv(quoting:false);
#jsonify =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS json
FROM #jsonDocuments;
#columnized = SELECT
json["EventInfo.Source"] AS EventInfoSource,
json["EventInfo.InitId"] AS EventInfoInitId,
json["EventInfo.Sequence"] AS EventInfoSequence,
json["EventInfo.Name"] AS EventInfoName,
json["EventInfo.Time"] AS EventInfoTime,
json["EventInfo.SdkVersion"] AS EventInfoSdkVersion,
json["AppInfo.Language"] AS AppInfoLanguage,
json["UserInfo.Language"] AS UserInfoLanguage,
json["DeviceInfo.BrowserName"] AS DeviceInfoBrowswerName,
json["DeviceInfo.BrowserVersion"] AS BrowswerVersion,
json["DeviceInfo.OsName"] AS DeviceInfoOsName,
json["DeviceInfo.OsVersion"] AS DeviceInfoOsVersion,
json["DeviceInfo.Id"] AS DeviceInfoId,
json["p1"] AS p1,
json["PipelineInfo.AccountId"] AS PipelineInfoAccountId,
json["PipelineInfo.IngestionTime"] AS PipelineInfoIngestionTime,
json["PipelineInfo.ClientIp"] AS PipelineInfoClientIp,
json["PipelineInfo.ClientCountry"] AS PipelineInfoClientCountry,
json["PipelineInfo.IngestionPath"] AS PipelineInfoIngestionPath,
json["AppInfo.Id"] AS AppInfoId,
json["EventInfo.Id"] AS EventInfoId,
json["EventInfo.BaseType"] AS EventInfoBaseType,
json["EventINfo.IngestionTime"] AS EventINfoIngestionTime
FROM #jsonify;
OUTPUT #columnized
TO "adl://xxxx.azuredatalakestore.net/poc/TestResult.csv"
USING Outputters.Csv(quoting : false);
JSON:
{"EventInfo.Source":"JS_default_source","EventInfo.Sequence":"1","EventInfo.Name":"daysofweek","EventInfo.Time":"2018-01-25T21:09:36.779Z","EventInfo.SdkVersion":"ACT-Web-JS-2.6.0","AppInfo.Language":"en","UserInfo.Language":"en-US","UserInfo.TimeZone":"-08:00","DeviceInfo.BrowserName":"Chrome","DeviceInfo.BrowserVersion":"63.0.3239.132","DeviceInfo.OsName":"Mac OS X","DeviceInfo.OsVersion":"10","p1":"V1","PipelineInfo.IngestionTime":"2018-01-25T21:09:33.9930000Z","PipelineInfo.ClientCountry":"CA","PipelineInfo.IngestionPath":"FastPath","EventInfo.BaseType":"custom","EventInfo.IngestionTime":"2018-01-25T21:09:33.9930000Z"}
I got this to work with single quotes and single square brackets, eg
#columnized = SELECT
json["['EventInfo.Source']"] AS EventInfoSource,
...
Full code:
#columnized = SELECT
json["['EventInfo.Source']"] AS EventInfoSource,
json["['EventInfo.InitId']"] AS EventInfoInitId,
json["['EventInfo.Sequence']"] AS EventInfoSequence,
json["['EventInfo.Name']"] AS EventInfoName,
json["['EventInfo.Time']"] AS EventInfoTime,
json["['EventInfo.SdkVersion']"] AS EventInfoSdkVersion,
json["['AppInfo.Language']"] AS AppInfoLanguage,
json["['UserInfo.Language']"] AS UserInfoLanguage,
json["['DeviceInfo.BrowserName']"] AS DeviceInfoBrowswerName,
json["['DeviceInfo.BrowserVersion']"] AS BrowswerVersion,
json["['DeviceInfo.OsName']"] AS DeviceInfoOsName,
json["['DeviceInfo.OsVersion']"] AS DeviceInfoOsVersion,
json["['DeviceInfo.Id']"] AS DeviceInfoId,
json["p1"] AS p1,
json["['PipelineInfo.AccountId']"] AS PipelineInfoAccountId,
json["['PipelineInfo.IngestionTime']"] AS PipelineInfoIngestionTime,
json["['PipelineInfo.ClientIp']"] AS PipelineInfoClientIp,
json["['PipelineInfo.ClientCountry']"] AS PipelineInfoClientCountry,
json["['PipelineInfo.IngestionPath']"] AS PipelineInfoIngestionPath,
json["['AppInfo.Id']"] AS AppInfoId,
json["['EventInfo.Id']"] AS EventInfoId,
json["['EventInfo.BaseType']"] AS EventInfoBaseType,
json["['EventINfo.IngestionTime']"] AS EventINfoIngestionTime
FROM #jsonify;
My results: