Extract JSON Values in Snowflake SQL

Extract JSON Values in Snowflake SQL - json

I want to create two columns from a column of values containing JSON in Snowflake using SQL.
Say this table is called keywords_bids
then there is a column called keywords that has JSON in it
example json in a cell in the keywords column:
row1: {"apple":0.1, "peach":0.2, "banana":0.1} row2: similar JSON, etc....
input image
I want to create a columns called keyword and it is bid price from the JSON
output would be:
keyword | Bid
'apple' | 0.1
'peach' | 0.2
'banana'| 0.3

First for JSON you'll need to change the single quotes to double quotes.
Then you just need to flatten the json to get keys and values:
with data as (
select parse_json('{"apple":0.1, "peach":0.2, "banana":0.1}') j
)
select k.key, k.value
from data, table(flatten(j)) k
;

https://community.snowflake.com/s/article/Dynamically-extracting-JSON-using-LATERAL-FLATTEN
This article is to demonstrate various examples of using LATERAL FLATTEN to extract information from a JSON Document. Examples are provided for its utilization together with GET_PATH, UNPIVOT, and SEQ functions.

Related

mySQL json array reformatting

I have a table which contains a column "owners", which has json data in it like this:
[
{
"first":"bob",
"last":"boblast"
},
{
"first":"mary",
"last": "marylast"
}
]
I would like to write a query that would return for each row that contains data like this, a column that has all of the first names concatenated with a comma.
i.e.
id owners
----------------------------
1 bob,mary
2 frank,tom
Not on mysql 8.0 yet.

You can get the values as a JSON array:
SELECT JSON_EXTRACT(owners, '$[*].first') AS owners ...
But that returns in JSON array format:
+-----------------+
| owners |
+-----------------+
| ["bob", "mary"] |
+-----------------+
JSON_UNQUOTE() won't take the brackets and double-quotes out of that. You'd have to use REPLACE() as I show in a recent answer here:
MYSQL JSON search returns results in square brackets
You should think about not storing data in JSON format if it doesn't support the way you need to query them.

Here is another option, get a helper table with running numbers up to the max json array length, and extract values by individual index, after that group_concat the values, something like this:
SELECT g.id, GROUP_CONCAT(g.name)
FROM (
SELECT a.id, JSON_UNQUOTE(JSON_EXTRACT(a.owners, CONCAT('$[', n.idx, '].first'))) name
FROM running_numbers n
JOIN mytable a
) g
GROUP BY g.id
https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=d7453c9edf89f79ca4ab2f63578b320c

Bigquery: Extract data from an array of json

(This is an extension to this question, but my reputation is too low to comment or ask more questions on that topic...)
We work on bigquery, hence limited in importing packages or using other languages. And, as per the link above, js is a solution, but not what I'm looking for here. I implemented it in js, and it was too slow for our needs.
Suppose one of our columns is a string that look like this (array of json):
[{"location":[22.99902,66.000],"t":1},{"location":[55.32168,140.556],"t":2},{"location":[85.0002,20.0055],"t":3}]
I want to extract from the column the json for which "t":2
Where:
some columns don't have elements "t":2
Some columns have several elements "t":2
The number of json elements in each string can change
element "t":2 is not always in second position.
I don't know regexp well enough for this. We tried regexp_extract with this pattern: r'(\{.*?\"t\":2.*?\})')), but that doesn't work. It extracts everything that precedes "t":2, including the json for "t":2. We only want the json of element "t":2.
Could you advise a regexp pattern that would work?
EDIT:
I have a preference for a solution that gives me 1 match. Suppose I have this string:
[{"location":[22.99902,66.000],"t":1},{"location":[55.32168,140.556],"t":2},{"location":[55.33,141.785],"t":2}],
I would prefer receiving only 1 answer, the first one.
In that case perhaps regexp is less appropriate, but I'm really not sure?

How about this:
(?<=\{)(?=.*?\"t\"\s*:\s*2).*?(?=\})
As seen here

There is another solution but it is not regexp based (as I had originally asked). So this should not count as the final answer to my own question, nonetheless could be useful.
It is based on a split of the string in array and then chosing the element in the array that satisfies my needs.
Steps:
transform the string into something better for splits (using '|' as seperator):
replace(replace(replace(my_field,'},{','}|{'),'[{','{'),'}]','}')
split it using split(), which yields an array of strings (each one a json element)
find the relevant element ("t":2) - in my case, the first one is good enough, so I limit the query to 1: array( select data from unnest(split(replace(replace(replace(my_field,'},{','}|{'),'[{','{'),'}]','}'),'|')) as data where data like '%"t":2%' limit 1)
Convert that into a useable string with array_to_string() and use json_extract on that string to extract the relevant info from the element that I need (say for example, location coordinate x).
So putting it all together:
round(safe_cast(json_extract(array_to_string(array( select data from unnest(split(replace(replace(replace(my_field,'},{','}|{'),'[{','{'),'}]','}'),'|')) as data where data like '%"t":2%' limit 1),''),'$.location[0]') as float64),3) loc_x

May 1st, 2020 Update
A new function, JSON_EXTRACT_ARRAY, has been just added to the list of JSON
functions. This function allows you to extract the contents of a JSON document as
a string array.
so in below you can replace use of json2array UDF with just in-built function JSON_EXTRACT_ARRAY as in below example
#standardSQL
SELECT id,
(
SELECT x
FROM UNNEST(JSON_EXTRACT_ARRAY(json, '$')) x
WHERE JSON_EXTRACT_SCALAR(x, '$.t') = '2'
) extracted
FROM `project.dataset.table`
==============
Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(json).map(x=>JSON.stringify(x));
""";
SELECT id,
(
SELECT x
FROM UNNEST(json2array(JSON_EXTRACT(json, '$'))) x
WHERE JSON_EXTRACT_SCALAR(x, '$.t') = '2'
) extracted
FROM `project.dataset.table`
You can test, play with above using dummy data as in below example
#standardSQL
CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(json).map(x=>JSON.stringify(x));
""";
WITH `project.dataset.table` AS (
SELECT 1 id, '[{"location":[22.99902,66.000],"t":1},{"location":[55.32168,140.556],"t":2},{"location":[85.0002,20.0055],"t":3}]' json UNION ALL
SELECT 2, '[{"location":[22.99902,66.000],"t":11},{"location":[85.0002,20.0055],"t":13}]'
)
SELECT id,
(
SELECT x
FROM UNNEST(json2array(JSON_EXTRACT(json, '$'))) x
WHERE JSON_EXTRACT_SCALAR(x, '$.t') = '2'
) extracted
FROM `project.dataset.table`
with output
Row id extracted
1 1 {"location":[55.32168,140.556],"t":2}
2 2 null
Above assumes that there is no more than one element with "t":2 in json column. In case if there can be more than one - you should add ARRAY as below
SELECT id,
ARRAY(
SELECT x
FROM UNNEST(json2array(JSON_EXTRACT(json, '$'))) x
WHERE JSON_EXTRACT_SCALAR(x, '$.t') = '2'
) extracted
FROM `project.dataset.table`

Even though, you have posted a work around your issue. I believe this answer will be informative. You mentioned that one of the answer selected more than what you needed, I wrote the query below to reproduce your case and achieve aimed output.
WITH
data AS (
SELECT
" [{ \"location\":[22.99902,66.000]\"t\":1},{\"location\":[55.32168,140.556],\"t\":2},{\"location\":[85.0002,20.0055],\"t\":3}] " AS string_j
UNION ALL
SELECT
" [{ \"location\":[22.99902,66.000]\"t\":1},{\"location\":[55.32168,140.556],\"t\":3},{\"location\":[85.0002,20.0055],\"t\":3}] " AS string_j
UNION ALL
SELECT
" [{ \"location\":[22.99902,66.000]\"t\":1},{\"location\":[55.32168,140.556],\"t\":3},{\"location\":[85.0002,20.0055],\"t\":3}] " AS string_j
UNION ALL
SELECT
" [{ \"location\":[22.99902,66.000]\"t\":1},{\"location\":[55.32168,140.556],\"t\":3},{\"location\":[85.0002,20.0055],\"t\":3}] " AS string_j ),
refined_data AS (
SELECT
REGEXP_EXTRACT(string_j, r"\{\"\w*\"\:\[\d*\.\d*\,\d*\.\d*\]\,\"t\"\:2\}") AS desired_field
FROM
data )
SELECT
*
FROM
refined_data
WHERE
desired_field IS NOT NULL
Notice that I have used the dummy described in the temp table, populated inside the WITH method. As below:
Afterwords, in the table refined_data, I used the REGEXP_EXTRACT to extract the desired string from the column. Observe that for the rows which there is not a match expression, the output is null. Thus, the table refined_data is as follows :
As you can see, now it is just needed a simple WHERE filter to obtain the desired output, which was done in the last select.
In addition you can see the information about the regex expression I provided here.

How to pick values from json_object based on field matching from list in hive

i am working with json_object where for different product categories we have different color variable for ex- kdt_color,fcy_color, etc. (check below)
How to select appropriate color_variable for each product to extract color value from json_object. There are 100+ verticals so cant use case here.
{"ctg_ideal_for":["Women"],"ctg_fabric":["Chiffon"],"ctg_design_style":["Umbrella Burqa"],"aba_color":["Black"],"aba_sleeve":["Full Sleeves"],"aba_with_hijab":[true]}
{"blz_color":["single"],"blz_size":["34\"36\"38\"40\"42"],"blz_sleeve_type":["Full Sleeves"],"ctg_ideal_for":["Men"],"ctg_fabric":["Imported"],"ctg_design_style":["Plaid Blazer"]}
{"color":["Multicolor"],"material":["PU"],"ideal_for":["Women"],"closure":["Zipper"],"bpk_style_code":["RMMY2418"]}

It is possible to use regexp_extract to extract color":["SomeColor from JSON string like this:
select nbr, regexp_extract(json, 'color":\\["([A-Za-z]*)',1) as color
from
(
select 1 as nbr, '{"ctg_ideal_for":["Women"],"ctg_fabric":["Chiffon"],"ctg_design_style":["Umbrella Burqa"],"aba_color":["Black"],"aba_sleeve":["Full Sleeves"],"aba_with_hijab":[true]}' as json union all
select 2 as nbr, '{"blz_color":["single"],"blz_size":["34\"36\"38\"40\"42"],"blz_sleeve_type":["Full Sleeves"],"ctg_ideal_for":["Men"],"ctg_fabric":["Imported"],"ctg_design_style":["Plaid Blazer"]}' as json union all
select 3 as nbr, '{"color":["Multicolor"],"material":["PU"],"ideal_for":["Women"],"closure":["Zipper"],"bpk_style_code":["RMMY2418"]}' as json
)s;
OK
1 Black
2 single
3 Multicolor
You can also use RegexSerDe to define regex columns in a table DDL.

Querying a JSON String in SQL

I have a table that is a single column made up of a JSON string. The JSON has multiple key pairs, and is a string because it is the raw table.
One of the keys is "Ticket" and has dollar amount values. I am not certain if prices are in __.__ format, or just ____. I want to query the column to return me the entire string if this "Ticket" ends in a 6, as in 96 cents, or 66 cents, etc.
This is my query:
SELECT json FROM tablename
WHERE json RLIKE '%"TICKET": "___6",%'
OR json RLIKE '%"TICKET": "__._6",%'
This currently returns as blank.
How can I get the entire string if the dollar amount ends in a 6 (as in 6 cents)?

The search strings you are using are what you would use for LIKE
So you could use LIKE :
select * from tablename
where (json LIKE '%"TICKET": "___6"%' or json LIKE '%"TICKET": "__._6"%')
Or a RLIKE with a regex:
select * from tablename
where json RLIKE '"TICKET":[ ]*"[0-9.]+6"'

Remove elements from string json array in MySQL

I have table with many columns that contains string array json of objects.
I need to remove several elements from these arrays.
I find how can I remove elements from one row, but how can I do it on multiple rows?
For one row I used json_search to find element that must be removed, but I have many rows and many elements to remove. Is there any way to do it without stored procedure (while loop)?
This is sample of data:
-------------------------------------------------------------------
id | DATA |
-------------------------------------------------------------
1 | {"array":[{"a":"a","b":"b","c":"c"},{"b":"b","c":"c"}]}|
------------------------------------------------------------
2 | {"array":[{"b":"b","c":"c","f":"f"},{"b":"b","c":"c","d":"d"}]}|
-------------------------------------------------------------------
3 | {"array":[{"a":"a","b":"b","c":"c"},{"g":"g","ff":"ff"}]}|
4 | {"array":[{"q":"q"},{"g":"f","e":"e"}]}|
I need to remove only elements from each array that contans a and/or g
My query is:
UPDATE MY_TABLE
SET DATA = JSON_REMOVE(
DATA,
REPLACE(JSON_SEARCH(
(SELECT DATA WHERE DATA LIKE "%a%"),
'all',
"%a%"
),
'"',
'')
) WHERE DATA LIKE "%a%";
I found the way how to update all columns, but this query removes only json field, not whole object. How can I remove whole object?

Found the way.
The data was in "$.array[5].a" format after calling JSON_SEARCH. I just needed to wrap replace into another replace that will replace '.a' to just ''.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Extract JSON Values in Snowflake SQL - json

First for JSON you'll need to change the single quotes to double quotes. Then you just need to flatten the json to get keys and values: with data as ( select parse_json('{"apple":0.1, "peach":0.2, "banana":0.1}') j ) select k.key, k.value from data, table(flatten(j)) k ;

Related

mySQL json array reformatting

Bigquery: Extract data from an array of json

How to pick values from json_object based on field matching from list in hive

Querying a JSON String in SQL

Remove elements from string json array in MySQL

Categories

Resources