I have a Postgres table that has content similar to this:
id | data
1 | {"a":"4", "b":"5"}
2 | {"a":"6", "b":"7"}
3 | {"a":"8", "b":"9"}
The first column is an integer and the second is a json column.
I want to be able to expand out the keys and values from the json so the result looks like this:
id | key | value
1 | a | 4
1 | b | 5
2 | a | 6
2 | b | 7
3 | a | 8
3 | b | 9
Can this be achieved in Postgres SQL?
What I've tried
Given that the original table can be simulated as such:
select *
from
(
values
(1, '{"a":"4", "b":"5"}'::json),
(2, '{"a":"6", "b":"7"}'::json),
(3, '{"a":"8", "b":"9"}'::json)
) as q (id, data)
I can get just the keys using:
select id, json_object_keys(data::json)
from
(
values
(1, '{"a":"4", "b":"5"}'::json),
(2, '{"a":"6", "b":"7"}'::json),
(3, '{"a":"8", "b":"9"}'::json)
) as q (id, data)
And I can get them as record sets like this:
select id, json_each(data::json)
from
(
values
(1, '{"a":"4", "b":"5"}'::json),
(2, '{"a":"6", "b":"7"}'::json),
(3, '{"a":"8", "b":"9"}'::json)
) as q (id, data)
But I can't work out how to achieve the result with id, key and value.
Any ideas?
Note: the real json I'm working with is significantly more nested than this, but I think this example represents my underlying problem well.
SELECT q.id, d.key, d.value
FROM q
JOIN json_each_text(q.data) d ON true
ORDER BY 1, 2;
The function json_each_text() is a set returning function so you should use it as a row source. The output of the function is here joined laterally to the table q, meaning that for each row in the table, each (key, value) pair from the data column is joined only to that row so the relationship between the original row and the rows formed from the json object is maintained.
The table q can also be a very complicated sub-query (or a VALUES clause, like in your question). In the function, the appropriate column is used from the result of evaluating that sub-query, so you use only a reference to the alias of the sub-query and the (alias of the) column in the sub-query.
This will solve it as well:
select you_table.id , js.key, js.value
from you_table, json_each(you_table.data) as js
Another way that i think is very easy to work when you have multiple jsons to join is doing something like:
SELECT data -> 'key' AS key,
data -> 'value' AS value
FROM (SELECT Hstore(Json_each_text(data)) AS data
FROM "your_table") t;
you can
select js.key , js.value
from metadata, json_each(metadata.column_metadata) as js
where id='6eec';
Related
I have a table with a json column that contains an array of objects, like the following:
create table test_json (json_id int not null primary key, json_data json not null) select 1 as json_id, '[{"category":"circle"},{"category":"square", "qualifier":"def"}]' as json_data union select 2 as json_id, '[{"category":"triangle", "qualifier":"xyz"},{"category":"square"}]' as json_data;
+---------+----------------------------------------------------------------------------------------+
| json_id | json_data |
+--------------------------------------------------------------------------------------------------+
| 1 | [{"category":"circle"}, {"category":"square", "qualifier":"def"}] |
| 2 | [{"category":"triangle", "qualifier":"xyz"}, {"category":"square"}] |
+---------+----------------------------------------------------------------------------------------+
I'd like to be able to query this table to look for any rows (json_id's) that contain a json object in the array with both a "category" value of "square" and no "qualifier" property.
The sample table above is just a sample and I'm looking for a query that would work over hundreds of rows and hundreds of objects in the json array.
In MySQL 8.0, you would use JSON_TABLE() for this:
mysql> select json_id, j.* from test_json, json_table(json_data, '$[*]' columns (
category varchar(20) path '$.category',
qualifier varchar(10) path '$.qualifier')) as j
where j.category = 'square' and j.qualifier is null;
+---------+----------+-----------+
| json_id | category | qualifier |
+---------+----------+-----------+
| 2 | square | NULL |
+---------+----------+-----------+
It's not clear why you would use JSON for this at all. It would be better to store the data in the normal manner, one row per object, with category and qualifier as individual columns.
A query against normal columns is a lot simpler to write, and you can optimize the query easily with an index:
select * from mytable where category = 'square' and qualifier is null;
I found another solution using only MySQL 5.7 JSON functions:
select json_id, json_data from test_json
where json_extract(json_data,
concat(
trim(trailing '.category' from
json_unquote(json_search(json_data, 'one', 'square'))
),
'.qualifier')
) is null
This assumes the value 'square' only occurs as a value for a "category" field. This is true in your simple example, but I don't know if it will be true in your real data.
Result:
+---------+------------------------------------------------------------------------+
| json_id | json_data |
+---------+------------------------------------------------------------------------+
| 2 | [{"category": "triangle", "qualifier": "xyz"}, {"category": "square"}] |
+---------+------------------------------------------------------------------------+
I still think that it's a CodeSmell anytime you reference JSON columns in a condition in the WHERE clause. I understood your comment that this is a simplified example, but regardless of the JSON structure, if you need to do search conditions, your queries will be far easier to develop if your data is stored in conventional columns in normalized tables.
Your request is not clear. Both of your SQL records has not such properties but your JSON object has. Maybe you try to find any record that has such object. So the following is your answer:
create table test_json (json_id int not null primary key, json_data json not null) select 1 as json_id, '[{"category":"circle", "qualifier":"abc"},{"category":"square", "qualifier":"def"}]' as json_data union select 2 as json_id, '[{"category":"triangle", "qualifier":"xyz"},{"category":"square"}]' as json_data;
select * from test_json;
select * from test_json where 'square' in (JSON_EXTRACT(json_data, '$[0].category'),JSON_EXTRACT(json_data, '$[1].category'))
AND (JSON_EXTRACT(json_data, '$[0].qualifier') is NULL || JSON_EXTRACT(json_data, '$[1].qualifier') is NULL);
See Online Demo
Also see JSON Function Reference
In MySQL (or SQL in general), is it possible to generate a list of pre-defined identifiers, joined with matching table data?
Take for instance the following table data, let's call it my_table:
id | value
---+------
1 | 'a'
3 | 'c'
Now, I have a list of possible id values and would like to get a full list of these values, together with joined data from the table above. With a list [1, 2, 3, 4], the desired result is:
item | id | value
-----+------+------
1 | 1 | 'a'
2 | NULL | NULL
3 | 3 | 'c'
4 | NULL | NULL
Obviously, a query like SELECT * FROM my_table WHERE id IN (1, 2, 3, 4) yields only results for two rows (values 'a' and 'c').
For a solution, I am thinking along the line of some form of temporary table, fed with the full list of id's ([1, 2, 3, 4]) and left joining that with the table data, such as
SELECT t1.`item`, t2.`id`, t2.`value`
FROM
...
AS t1
LEFT JOIN `my_table` AS t2 ON t2.`id` = t1.`item`
But how do I do that?
Is this even possible? Or is it really necessary to compare the result with the initial list in external code? (This would be possible, but not trivial as in my case, the identifiers are not integers)
(The ultimate idea of this, is that I would like a result set from the DB with all input id's so that I can easily identify the non-existing records)
Update: I guess it boils down to the question: how can I get a result set such as
id
---
1
2
3
4
from a (My)SQL server without having this as data in some table, but from setting the data in some query?
A new approach flashed into my mind... using a union.
SELECT t1.`item`, t2.`id`, t2.`value`
FROM (
select 1 as `item`
union select 2
union select 3
union select 4
) AS t1
LEFT JOIN `my_table` AS t2 ON t2.`id` = t1.`item`
It answers the question, but it remains to be seen whether this is the 'best' answer. It works as long as the list of items is not too long (which is the case for me).
Anyone a better solution?
My table is like this:
create table alphabet_soup(
id numeric,
index json bigint
);
my data looks like this:
(id, json) looks like this: (1, '{('key':1,'value':"A"),('key':2,'value':"C"),('key':3,'value':"C")...(600,"B")}')
How do I sum across the json for number of A and number of B and do % of the occurence of A or B? I have about 6 different types of values (ABCDEF), but for simplicity I am just looking for a comparison of 3 values.
I am trying to find something to help me calculate the % of occurrence of a value from a key value pair in json. I am using postgres 9.4. I am new to both json and postgres, and I am landing on the same json functions manual page of postgres over and over.
I have managed to find a sum, but how to calculate the % in a nested select and display the key and values in increasing order of occurence like follows:
value | occurence | %
====================================
A | 300 | 50
B | 198 | 33
C | 102 | 17
The script I am using for the sum is :
select id, index->'key'::key as key
sum(case when (1,index::json->'1')::text = (1,index::json->'2')::text
then 1
else 0
end)/count(id) as res
from
alphabet_soup
group by id;
limit 10;
I get an output as follows:
column "alphabet_soup.id" must appear in the group by clause or be used in an aggregate function.
Thanks for the comment Patrick. Sorry I forgot to add I am using postgres 9.4
The easiest way to do this is to expand the json document into a regular row set using the json_each_text() function. Every single json document then becomes a set of rows and you can then apply aggregate function as you would on any other row set. However, you need to use the function as a row source (section 7.2.1.4) (since it returns a set of rows) and then select the value field which has the category of interest. Note that the function uses a field of the table, through an implicit LATERAL join (section 7.2.1.5).
SELECT id, value
FROM alphabet_soup, json_each_text("index");
which yields something like:
test=# SELECT id, value FROM alphabet_soup, json_each_text("index");
id | value
----+-------
1 | A
1 | C
1 | C
1 | B
To this you can apply regular aggregate functions over the appropriate windows to get the result you are looking for:
SELECT DISTINCT id, value,
count(value) OVER (PARTITION BY id, value) AS occurrence,
count(value) OVER (PARTITION BY id, value) * 100.0 /
count(id) OVER (PARTITION BY id) AS percentage
FROM (
SELECT id, value
FROM alphabet_soup, json_each_text("index") ) sub
ORDER BY id, value;
Which gives a result like:
id | value | occurrence | percentage
----+-------+------------+---------------------
1 | A | 1 | 25.0000000000000000
1 | B | 1 | 25.0000000000000000
1 | C | 2 | 50.0000000000000000
This will work for any number of categories (ABCDEF) and any number of ids.
# Patrick, it was an accident. I am new to stackoverflow. I did not realize how ti works. I was fiddling around and I found the answer to the question I asked in addition to the first one. Sorry about that!
For fun, I added some more to the code to make the % compare of the result set:
With q1 as
(SELECT DISTINCT id, value,
count(value) OVER (PARTITION BY id, value) AS occurrence,
count(value) OVER (PARTITION BY id, value) * 100.0 / count(id) OVER(PARTITION BY id) AS percentage
FROM ( SELECT id, value FROM alphabet_soup, json_each_text("index") ) sub
ORDER BY id, value) Select distinct id, value, least(percentage) from q1
Where (least(percentage))>20 Order by id, value;
The output for this is:
id | value | least
----+-------+--------
1 | B | 33
1 | C | 50
I have a Redshift table that looks like this:
id | metadata
---------------------------------------------------------------------------
1 | [{"pet":"dog"},{"country":"uk"}]
2 | [{"pet":"cat"}]
3 | []
4 | [{"country":"germany"},{"education":"masters"},{"country":"belgium"}]
All array elements have just one field.
There is no guarantee that a particular field will feature in any of an array's elements.
A field name can be repeated in an array
The array elements can be in any order
I am wanting to get back a table that looks like this:
id | field | value
------------------------
1 | pet | dog
1 | country | uk
2 | pet | cat
4 | country | germany
4 | education | masters
4 | country | belgium
I can then combine this with my queries on the rest of the input table.
I have tried playing around with the Redshift JSON functions, but without being able to write functions/use loops/have variables in Redshift, I really can't see a way to do this!
Please let me know if I can clarify anything else.
Thanks to this inspired blog post, I've been able to craft a solution. This is:
Create a look-up table to effectively 'iterate' over the elements of each array. The number of rows in this table has be equal to or greater than the maximum number of elements of arrays. Let's say this is 4 (it can be calculated using SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table):
CREATE VIEW seq_0_to_3 AS
SELECT 0 AS i UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3
);
From this, we can create one row per JSON element:
WITH exploded_array AS (
SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
FROM input_table, seq_0_to_3 AS seq
WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
)
SELECT *
FROM exploded_array;
Producing:
id | json
------------------------------
1 | {"pet":"dog"}
1 | {"country":"uk"}
2 | {"pet":"cat"}
4 | {"country":"germany"}
4 | {"education":"masters"}
4 | {"country":"belgium"}
However, I was needing to extract the field names/values. As I can't see any way to extract JSON field names using Redshift's limited functions, I'll do this using a regular expression:
WITH exploded_array AS (
SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
FROM input_table, seq_0_to_3 AS seq
WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
)
SELECT id, field, JSON_EXTRACT_PATH_TEXT(json, field)
FROM (
SELECT id, json, REGEXP_SUBSTR(json, '[^{"]\\w+[^"]') AS field
FROM exploded_array
);
There is generic version for CREATE VIEW seq_0_to_3. Let's call it CREATE VIEW seq_0_to_n. This can be generated by
CREATE VIEW seq_0_to_n AS (
SELECT row_number() over (
ORDER BY TRUE)::integer - 1 AS i
FROM <insert_large_enough_table> LIMIT <number_less_than_table_entries>);
This helps in generating large sequences as a view.
It's now possible in Redshift to treat strings in either array format [] or json format {} as parsable json structures. First let's make a temp table based on your data:
create temporary table #t1 (id int, json_str varchar(100));
truncate table #t1;
insert into #t1 values (1, '[{"pet":"dog"},{"country":"uk"}]');
insert into #t1 values (2, '[{"pet":"cat"}]');
insert into #t1 values (3, '[]');
insert into #t1 values (4, '[{"country":"germany"},{"education":"masters"},{"country":"belgium"}]');
This creation of a common table expression (cte) will be used to implicitly convert the json_str field into a formal json structure of SUPER type. If the table's field were already SUPER type, we could skip this step.
drop table if exists #t2;
create temporary table #t2 as
with cte as
(select
x.id,
json_parse(x.json_str) as json_str -- convert string to SUPER structure
from
#t1 x
)
select
x.id
,unnested
from
cte x, x.json_str as unnested -- an alias of cte and x.json_str is required!
order by
id
;
Now we have an exploded list of key/value pairs to easily extract:
select
t2.id
,json_key -- this is the extracted key
,cast(json_val as varchar) as json_val -- eleminates the double quote marks
from
#t2 t2, unpivot t2.unnested as json_val at json_key --"at some_label" (e.g. json_key) will extract the key
order by
id
A different way to render the info is to allow the parsing engine to turn keys into columns. This isn't what you asked for, but potentially interesting:
select
id
,cast(t2.unnested.country as varchar) -- data is already parsed into rows, so it's directly addressable now
,cast(t2.unnested.education as varchar)
,cast(t2.unnested.pet as varchar)
from
#t2 t2
;
If you want more info on this, use a search engine to search for parsing the SUPER data type. If the data already existed as SUPER in the Redshift table, these latter 2 queries would work natively against the table, no need for a temp table.
I'm looking for something like
SELECT
`foo`.*,
(SELECT MAX(`foo`.`bar`) FROM `foo`)
FROM
(SELECT * FROM `fuz`) AS `foo`;
but it seem that foo does not get recognized in nested query as there is error like
[Err] 1146 - Table 'foo' doesn't exist
I try the query above because I think its faster than something like
SELECT
`fuz`.*,
(SELECT MAX(`bar`) FROM `fuz`) as max_bar_from_fuz
FROM `fuz`
Please give me some suggestions.
EDIT: I am looking for solutions with better performance than the second query. Please assume that my table fuz is a very, very big one, thus running an additional query getting max_bar cost me a lot.
What you want, for the first query (with some modification) to work, is called Common Table Expressions and MySQL has not that feature.
If your second query does not perform well, you can use this:
SELECT
fuz.*,
fuz_grp.max_bar
FROM
fuz
CROSS JOIN
( SELECT MAX(bar) AS max_bar
FROM fuz
) AS fuz_grp
Alias created into a SELECT clause only can be used to access scalar values, they are not synonyms to the tables. If you want to return the max value of a column for all returned rows you can do it by running a query before to calculate the max value into a variable and then use this variable as a scalar value into your query, like:
-- create and populate a table to demonstrate concept
CREATE TABLE fuz (bar INT, col0 VARCHAR(20), col1 VARCHAR(20) );
INSERT INTO fuz(bar, col0, col1) VALUES (1, 'A', 'Airplane');
INSERT INTO fuz(bar, col0, col1) VALUES (2, 'B', 'Boat');
INSERT INTO fuz(bar, col0, col1) VALUES (3, 'C', 'Car');
-- create the scalar variable with the value of MAX(bar)
SELECT #max_foo:=MAX(bar) FROM fuz;
-- use the scalar variable into the query
SELECT *, #max_foo AS `MAX_FOO`
FROM fuz;
-- result:
-- | BAR | COL0 | COL1 | MAX_FOO |
-- |-----|------|----------|---------|
-- | 1 | A | Airplane | 3 |
-- | 2 | B | Boat | 3 |
-- | 3 | C | Car | 3 |
Just simple use MAX function:
SELECT
`fuz`.*,
MAX(`fuz`.`bar`)
FROM
`fuz`
or if you use
SELECT
`foo`.*,
MAX(`foo`.`bar`)
FROM
(SELECT * FROM `fuz` JOIN `loolse' ON (`fuz`.`field` = `loolse`.`smile`)) AS `foo`;