Given a table that contains a column of JSON like this:
{"payload":[{"type":"b","value":"9"}, {"type":"a","value":"8"}]}
{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}
How can I write a Presto query to give me the average b value across all entries?
So far I think I need to use something like Hive's lateral view explode, whose equivalent is cross join unnest in Presto.
But I'm stuck on how to write the Presto query for cross join unnest.
How can I use cross join unnest to expand all array elements and select them?
Here's an example of that
with example(message) as (
VALUES
(json '{"payload":[{"type":"b","value":"9"},{"type":"a","value":"8"}]}'),
(json '{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}')
)
SELECT
n.type,
avg(n.value)
FROM example
CROSS JOIN
UNNEST(
CAST(
JSON_EXTRACT(message,'$.payload')
as ARRAY(ROW(type VARCHAR, value INTEGER))
)
) as x(n)
WHERE n.type = 'b'
GROUP BY n.type
with defines a common table expression (CTE) named example with a column aliased as message
VALUES returns a verbatim table rowset
UNNEST is taking an array within a column of a single row and returning the elements of the array as multiple rows.
CAST is changing the JSON type into an ARRAY type that is required for UNNEST. It could easily have been an ARRAY<MAP< but I find ARRAY(ROW( nicer as you can specify column names, and use dot notation in the select clause.
JSON_EXTRACT is using a jsonPath expression to return the array value of the payload key
avg() and group by should be familiar SQL.
As you pointed out, this was finally implemented in Presto 0.79. :)
Here is an example of the syntax for the cast from here:
select cast(cast ('[1,2,3]' as json) as array<bigint>);
Special word of advice, there is no 'string' type in Presto like there is in Hive.
That means if your array contains strings make sure you use type 'varchar' otherwise you get an error msg saying 'type array does not exist' which can be misleading.
select cast(cast ('["1","2","3"]' as json) as array<varchar>);
The problem was that I was running an old version of Presto.
unnest was added in version 0.79
https://github.com/facebook/presto/blob/50081273a9e8c4d7b9d851425211c71bfaf8a34e/presto-docs/src/main/sphinx/release/release-0.79.rst
Related
I have table with numeric/decimal columns and I am converting the rows to json
select to_jsonb(t.*) from my_table t
I need to have the numeric columns casted to text before converted to json.
The reason why I need this is JavaScript don't handle really big numbers well so I may loose a precision. I use decimal.js and the string representation is best to construct the decimal.js number from.
I know I can do this
select to_jsonb(t.*) || jsonb_build_object('numeric_column', numeric_column::text) from my_table t
But I want to have it done automatically. Is there a way to somehow cast all numeric columns to text before passing to to_jsonb function?
It can be user-defined postgres function.
EDIT: Just to clarify my question. What I need is some function similar to to_jsonb except all columns of the type numeric/decimal are stored as string in the resulting JSON.
Thanks
You can run a query like:
select row_to_json(row(t.column1,t.column2,t.column_numeric::text)) from my_table t
Result here
This solution converts all the json values into text :
SELECT jsonb_object_agg(d.key, d.value)
FROM my_table AS t
CROSS JOIN LATERAL jsonb_each_text(to_jsonb(t.*)) AS d
GROUP BY t
whereas this solution only converts json numbers into text :
SELECT jsonb_object_agg(d.key, CASE WHEN jsonb_typeof(d.value) = 'number' THEN to_jsonb(d.value :: text) ELSE d.value END)
FROM my_table AS t
CROSS JOIN LATERAL jsonb_each(to_jsonb(t.*)) AS d
GROUP BY t
test result in dbfiddle.
I have a JSON column and the data stored looks like:
{"results":{"made":true,"cooked":true,"eaten":true}}
{"results":{"made":true,"cooked":true,"eaten":false}}
{"results":{"made":true,"eaten":true,"a":false,"b":true,"c":false}, "more": {"ignore":true}}
I need to find all rows where 1+ values in $.results is false.
I tried using JSON_CONTAINS() but didn't find a way to get it to compare to a boolean JSON value, or to look at all values in $.results.
This needs to work with MySQL 5.7 but if it's not possible I will accept a MySQL 8+ answer.
I don't know the way for to search for a JSON true/false/null value using JSON functions - in practice these values are treated as string-type values during the search with JSON_CONTAINS, JSON_SEARCH, etc.
Use regular expression for the checking. Something like
SELECT id,
JSON_PRETTY(jsondata)
FROM test
WHERE jsondata REGEXP '"results": {[^}]+: false.*}';
DEMO
You could simply search the JSON_EXTRACT using the LIKE condition this way.
SELECT * FROM table1 WHERE JSON_EXTRACT(json_dict, '$.results') LIKE '%: false%';
Check this DB FIDDLE
An alternative to the pattern matching in other answers, is to extract all values from $.results and check each entry with a helper table with running numbers
SELECT DISTINCT v.id, v.json_value
FROM (
SELECT id, json_value, JSON_EXTRACT(json_value, '$.results.*') value_array
FROM json_table
) v
JOIN seq ON seq.n < JSON_LENGTH(v.value_array)
WHERE JSON_EXTRACT(v.value_array, CONCAT('$[', seq.n, ']')) = false
Here is the demo
I have the following table:
I need to create a select that returns me something like this:
I have tried this code:
SELECT Code, json_extract_path(Registers::json,'sales', 'name')
FROM tbl_registers
The previous code returns me a NULL in json_extract_path, I have tried the operator ::json->'sales'->>'name', but doesn't work too.
You need to unnest the array, and the aggregate the names back. This can be done using json_array_elements with a scalar sub-query:
select code,
(select string_agg(e ->> 'name', ',')
from json_array_elements(t.products) as x(e)) as products
from tbl_registers t;
I would also strongly recommend to change your column's type to jsonb
step-by-step demo:db<>fiddle
SELECT
code,
string_agg( -- 3
elems ->> 'name', -- 2
','
) as products
FROM tbl_registers,
json_array_elements(products::json) as elems -- 1
GROUP BY code
If you have type text (strictly not recommended, please use appropriate data type json or jsonb), then you need to cast it into type json (I guess you have type text because you do the cast already in your example code). Afterwards you need to extract the array elements into one row per element
Fetch the name value
Reaggregate by grouping and use string_agg() to create the string list
I am storing event data in S3 and want to use Athena to query the data. One of the fields is a dynamic JSON field that I do not know the field names for. Therefore, I need to query the keys in the JSON and then use those keys to query for the first non-null for that field. Below is an example of the data stored in S3.
{
timestamp: 1558475434,
request_id: "83e21b28-7c12-11e9-8f9e-2a86e4085a59",
user_id: "example_user_id_1",
traits: {
this: "is",
dynamic: "json",
as: ["defined","by","the", "client"]
}
}
So, I need a query to extract the keys from the traits column (which is stored as JSON), and use those keys to get the first non-null value for each field.
The closest I could come was sampling a value using min_by, but this does not allow for me to add a where clause without returning null values. I will need to use presto's "first_value" option, but I cannot get this to work with the extracted JSON keys from the dynamic JSON field.
SELECT DISTINCT trait, min_by(json_extract(traits, concat('$.', cast(trait AS varchar))), received_at) AS value
FROM TABLE
CROSS JOIN UNNEST(regexp_extract_all(traits,'"([^"]+)"\s*:\s*("[^"]+"|[^,{}]+)', 1)) AS t(trait)
WHERE json_extract(traits, concat('$.', cast(trait AS varchar))) IS NOT NULL OR json_size(traits, concat('$.', cast(trait AS varchar))) <> 0
GROUP BY trait
It's not clear to me what you expect as result, and what you mean by "first non-null value". In your example you have both string and array values, and none of them is null. It would be helpful if you provided more examples and also expected output.
As a first step towards a solution, here's a way to filter out the null values from traits:
If you set the type of the traits column to map<string,string> you should be able to do something like this:
SELECT
request_id,
MAP_AGG(ARRAY_AGG(trait_key), ARRAY_AGG(trait_value)) AS trait
FROM (
SELECT
request_id,
trait_key,
trait_value
FROM some_table CROSS JOIN UNNEST (trait) AS t (trait_key, trait_value)
WHERE trait_value IS NOT NULL
)
However, if you want to also filter values that are arrays and pick out the first non-null value, that becomes more complex. It could probably be done with a combination of casts to JSON, the filter function, and COALESCE.
I'm trying to query a table with a JSON column which will always hold an array of "primitive" values (i.e. integers, strings, booleans -- not objects or arrays).
My query should be similar to [ref2], but I can't do ->>'id' because I'm not trying to access a JSON object but the value itself.
In the [ref1] fiddle (blatant fork from the above), there's and incomplete query... I'd like to query all things which contain 3 among its values.
Even more so, I'd like some rows to have arrays of strings, other rows to have arrays of integers, and other ones arrays of booleans... So casting is undesiderable.
I believe ->> returns the original JSON value type, but I need the "root" object... That is, my JSON value is [1,2,3,4], using json_array_elements should yield e.g. 2, but that is a JSON type according to my tests.
Upgrading to 9.4 is planned in the near future, but I haven't read anything yet that gave me a clue jsonb would help me.
UPDATE: at the moment, I'm (1) making sure all values are integers (mapping non-integers values to integers), which is suboptimal; (2) querying like this:
SELECT *
FROM things, json_array_elements(things.values) AS vals
WHERE vals.value::text::integer IN (1,2,3);
I need the double casting (otherwise it complains that cannot cast type json to integer).
ref1: http://sqlfiddle.com/#!15/5febb/1
ref2: How to query an array of JSON in PostgreSQL 9.3?
Rather than using json_array_elements you can unpack the array with generate_series, using the ->> operator to extract a text representation.
SELECT things.*
FROM things
CROSS JOIN generate_series(0, json_array_length(values) - 1) AS idx
WHERE values ->> idx = '1'
GROUP BY things.id;
This is a workaround for the lack of json_array_elements_text in 9.3.
You need an operator(=) for json to do this without either messing with casting or relying on the specific textual representations of integers, booleans, etc. operator(=) is only available for jsonb. So in 9.3 you're stuck with using the text representation (so 1.00 won't = 1) or casting to a PostgreSQL type based on the element type.
In 9.4 you could use to_json and the jsonb operator(=), e.g.:
SELECT things.*
FROM things
CROSS JOIN generate_series(0, json_array_length(values) - 1) AS idx
WHERE (values -> idx)::jsonb = to_json(1)::jsonb
GROUP BY things.id;
id | date | values
----+-------------------------------+---------
1 | 2015-08-09 04:54:38.541989+08 | [1,2,3]
(1 row)