I have a postgres db with a json data field.
The json I have is an array of objects:
[{"name":"Mickey Mouse","age":10},{"name":"Donald Duck","age":5}]
I'm trying to return values for a specific key in a JSON array, so in the above example I'd like to return the values for name.
When I use the following query I just get a NULL value returned:
SELECT data->'name' AS name FROM json_test
Im assuming this is because it's an array of objects? Is it possible to directly address the name key?
Ultimately what I need to do is to return a count of every unique name, is this possible?
Thanks!
you have to unnest the array of json-objects first using the function (json_array_elements or jsonb_array_elements if you have jsonb data type), then you can access the values by specifying the key.
WITH json_test (col) AS (
values (json '[{"name":"Mickey Mouse","age":10},{"name":"Donald Duck","age":5}]')
)
SELECT
y.x->'name' "name"
FROM json_test jt,
LATERAL (SELECT json_array_elements(jt.col) x) y
-- outputs:
name
--------------
"Mickey Mouse"
"Donald Duck"
To get a count of unique names, its a similar query to the above, except the count distinct aggregate function is applied to y.x->>name
WITH json_test (col) AS (
values (json '[{"name":"Mickey Mouse","age":10},{"name":"Donald Duck","age":5}]')
)
SELECT
COUNT( DISTINCT y.x->>'name') distinct_names
FROM json_test jt,
LATERAL (SELECT json_array_elements(jt.col) x) y
It is necessary to use ->> instead of -> as the former (->>) casts the extracted value as text, which supports equality comparison (needed for distinct count), whereas the latter (->) extracts the value as json, which does not support equality comparison.
Alternatively, convert the json as jsonb and use jsonb_array_elements. JSONB supports the equality comparison, thus it is possible to use COUNT DISTINCT along with extraction via ->, i.e.
COUNT(DISTINCT (y.x::jsonb)->'name')
updated answer for postgresql versions 12+
It is now possible to extract / unnest specific keys from a list of objects using jsonb path queries, so long as the field queried is jsonb and not json.
example:
WITH json_test (col) AS (
values (jsonb '[{"name":"Mickey Mouse","age":10},{"name":"Donald Duck","age":5}]')
)
SELECT jsonb_path_query(col, '$[*].name') "name"
FROM json_test
-- replaces this original snippet:
-- SELECT
-- y.x->'name' "name"
-- FROM json_test jt,
-- LATERAL (SELECT json_array_elements(jt.col) x) y
Do like this:
SELECT * FROM json_test WHERE (column_name #> '[{"name": "Mickey Mouse"}]');
You can use jsonb_array_elements (when using jsonb) or json_array_elements (when using json) to expand the array elements.
For example:
WITH sample_data_array(arr) AS (
VALUES ('[{"name":"Mickey Mouse","age":10},{"name":"Donald Duck","age":5}]'::jsonb)
)
, sample_data_elements(elem) AS (
SELECT jsonb_array_elements(arr) FROM sample_data_array
)
SELECT elem->'name' AS extracted_name FROM sample_data_elements;
In this example, sample_data_elements is equivalent to a table with a single jsonb column called elem, with two rows (the two array elements in the initial data).
The result consists of two rows (one jsonb column, or of type text if you used ->>'name' instead):
extracted_name
----------------
"Mickey Mouse"
"Donald Duck"
(2 rows)
You should them be able to group and aggregate as usual to return the count of individual names.
Related
I am having following data
{
"City": "Fontana",
"Timezone": "America/Los_Angeles",
"Longitude": "-117.4864123",
"Timestamp": "2020-07-15T12:13:00-07:00",
"refs": ["123", "456", "789"], "tZone": "PPP"
}
above data store against analytis.col_json column
I am having table structure
CREATE TABLE analytics
(
id bigint NOT NULL,
col_typ character varying(255) COLLATE pg_catalog."default",
col_json json,
cre_dte timestamp without time zone,
CONSTRAINT clbk_logs_pkey PRIMARY KEY (id)
);
The above records are in n-rows.
I am trying to fetch records on basis of 'refs' by sending list of string. for example:-
I have a separate List as a right side values to be filter on my table.
My query is as following
select * FROM public.analytics
where col_json-> 'refs' in (
'123',
'pqa',
'bhu',
'qwerty'
);
but above query is not working for me.
The more advanced JSON capabilities are only available when using the jsonb type, so you will have to cast your column every time you want to do something non-trivial. It would be better to define the column as jsonb in the long run.
You can use the ?| operator
select a.*
from analytics a
where col_json::jsonb -> 'refs' ?| array['123','pqa','bhu','qwerty'];
Note that this only works if all array elements are strings. It does not work with numbers e.g. if the json contained "refs": [123,456] it will not work.
Alternatively you can use an EXISTS condition with a sub-query:
select a.*
from analytics a
where exists (select *
from json_array_elements_text(a.col_json -> 'refs') as x(item)
where x.item in ('123','pqa','bhu','qwerty'));
If you want refs to contain all of the values in your list you can use the contains operator #>
select a.*
from analytics a
where a.col_json::jsonb -> 'refs' #> '["123", "456"]';
Or alternatively: where a.col_json #> '{"refs": ["123", "456"]}'
The above will only return rows where both values are contained in the refs array.
Online example
In my MySQL 8.0 table, I have a JSON ARRAY column. It is an array of JSON objects. I want to pick one object out of each row's array, based on the key value pairs in the objects.
Example row:
[{bool:false, number:0, value:'hello'},
{bool:true, number:1, value:'world'},
{bool:true, number:2, value:'foo'},
{bool:false, number:1, value:'bar'}]
What I am trying to do is get the 'value' WHERE bool=true, AND number=1. So I want a query that in this example returns 'world'.
What would also work is if I could get the index of the object where bool=true and number=1, in this example it would return '$[1]'.
I am trying to run a query across the whole column, setting a new column to the value returned from the query. Is this possible with MySQL JSON functions? I've looked at the references but none have objects inside arrays like my example.
EDIT: If I do
SELECT JSON_SEARCH(column->"$[*]", 'all', '1');
SELECT JSON_SEARCH(names->"$[*]", 'all', 'true');
I get the paths/indexes of objects where number=1, and where bool=true, respectively. I would like the overlap of these two results.
You can use JSON_TABLE to convert the JSON into a derived table which you can then extract values from:
SELECT j.value
FROM test t
JOIN JSON_TABLE(t.jsonStr,
'$[*]'
COLUMNS(bool BOOLEAN PATH '$.bool',
number INT PATH '$.number',
value VARCHAR(20) PATH '$.value')) j
WHERE j.bool = true AND j.number = 1
Output:
value
world
If you also want to get the index within each JSON value of the value which matched, you can add a FOR ORDINALITY clause to your JSON_TABLE e.g.:
SELECT j.idx, j.value
FROM test t
JOIN JSON_TABLE(t.jsonStr,
'$[*]'
COLUMNS(idx FOR ORDINALITY,
bool BOOLEAN PATH '$.bool',
number INT PATH '$.number',
value VARCHAR(20) PATH '$.value')) j
WHERE j.bool = true AND j.number = 1
Output:
idx value
2 world
Demo on dbfiddle
In PostgreSQL, using a JSONB column, I can store arrays of values. In a WHERE clause, I can then filter these arrays by performing comparisons on individual array items.
As an example, I can check "Is the first item of array data, cast to a number, greater than 5?" by using WHERE CAST((data -> 0) AS FLOAT) > 5.
What I would like to be able to do, is to check "Is any item of array data, cast to to a number, greater than 5?".
Is there a way to do this as part of an PostgreSQL query, as opposed to first fetching all data and then manually performing this filter?
Use the function jsonb_array_elements_text(), example:
with my_table (id, data) as (
values
(1, '[1,2,3]'::jsonb),
(2, '[4,5,6]'::jsonb)
)
select *
from my_table
where exists (
select
from jsonb_array_elements_text(data)
where value::float > 5
)
id | data
----+-----------
2 | [4, 5, 6]
(1 row)
I have defined a Hive table where a single column contains JSON text:
CREATE EXTERNAL TABLE IF NOT EXISTS my.rawdata (
json string
)
PARTITIONED BY (dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = '\n',
'quoteChar' = '\0',
'escapeChar' = '\r'
)
STORED AS TEXTFILE
LOCATION 's3://mydata/';
Is there a Presto/Athena query that can list out all field names that occur within the JSON and their frequency (i.e. total number of times the attribute appears in the table)?
Use the JSON functions to parse the JSON and turn it into a map. Then extract the keys and unnest them. Finally, use a normal SQL aggregation:
SELECT key, count(*)
FROM (
SELECT map_keys(cast(json_parse(json) AS map(varchar, json))) AS keys
FROM rawdata
)
CROSS JOIN UNNEST (keys) AS t (key)
GROUP BY key
Supports multi-levels documents
Ignores keys of nesting elements
select key
,count(*)
from t cross join
unnest (regexp_extract_all(json,'"([^"]+)"\s*:\s*("[^"]+"|[^,{}]+)',1)) u (key)
group by key
;
Given a table that contains a column of JSON like this:
{"payload":[{"type":"b","value":"9"}, {"type":"a","value":"8"}]}
{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}
How can I write a Presto query to give me the average b value across all entries?
So far I think I need to use something like Hive's lateral view explode, whose equivalent is cross join unnest in Presto.
But I'm stuck on how to write the Presto query for cross join unnest.
How can I use cross join unnest to expand all array elements and select them?
Here's an example of that
with example(message) as (
VALUES
(json '{"payload":[{"type":"b","value":"9"},{"type":"a","value":"8"}]}'),
(json '{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}')
)
SELECT
n.type,
avg(n.value)
FROM example
CROSS JOIN
UNNEST(
CAST(
JSON_EXTRACT(message,'$.payload')
as ARRAY(ROW(type VARCHAR, value INTEGER))
)
) as x(n)
WHERE n.type = 'b'
GROUP BY n.type
with defines a common table expression (CTE) named example with a column aliased as message
VALUES returns a verbatim table rowset
UNNEST is taking an array within a column of a single row and returning the elements of the array as multiple rows.
CAST is changing the JSON type into an ARRAY type that is required for UNNEST. It could easily have been an ARRAY<MAP< but I find ARRAY(ROW( nicer as you can specify column names, and use dot notation in the select clause.
JSON_EXTRACT is using a jsonPath expression to return the array value of the payload key
avg() and group by should be familiar SQL.
As you pointed out, this was finally implemented in Presto 0.79. :)
Here is an example of the syntax for the cast from here:
select cast(cast ('[1,2,3]' as json) as array<bigint>);
Special word of advice, there is no 'string' type in Presto like there is in Hive.
That means if your array contains strings make sure you use type 'varchar' otherwise you get an error msg saying 'type array does not exist' which can be misleading.
select cast(cast ('["1","2","3"]' as json) as array<varchar>);
The problem was that I was running an old version of Presto.
unnest was added in version 0.79
https://github.com/facebook/presto/blob/50081273a9e8c4d7b9d851425211c71bfaf8a34e/presto-docs/src/main/sphinx/release/release-0.79.rst