Related
Background
I have a MySQL table with a json field. That field stores a array of json objects.
The order is not always the same, so i need to get the path to a key for updating operations.
Entity | ID, jsonField |
Entity | 1, [{clazz:'health', hp:'100'},{...},{...}] // Health the first index
Entity | 2, [{...},{...},{clazz:'health', hp:'25'}] // Health at the last index
The question
How do i get the path to the .hp field for every single entity in order to update its value ? Or... a bit more precious, how do we set the .hp field to, lets say 100, in every entity jsonField array, regardless of its position ?
You can use a recursive CTE to find the index for the hp key in the array (if it exists), and then use that result to make the update:
with recursive cte(id, js, ind, f) as (
select e.id, e.jsonfield, 0, json_extract(e.jsonfield, '$[0].hp') is not null from entities e
union all
select c.id, c.js, c.ind+1, json_extract(c.js, concat('$[', c.ind+1, '].hp')) is not null from cte c where not c.f and c.ind+1 < json_length(c.js)
),
inds(id, ind) as (select id, ind from cte where f)
update entities e join inds i on e.id = i.id set e.jsonfield = json_set(e.jsonfield, concat('$[', i.ind, '].hp'), '100');
select * from entites;
Output:
entity
id
jsonfield
entity
1
[{"hp": "100", "clazz": "health"}, {"x": "1"}, {"y": "2"}]
entity
2
[{"x": "1"}, {"y": "2"}, {"hp": "100", "clazz": "health"}]
I have a PostgreSQL table with unique key/value pairs, which were originally in a JSON format, but have been normalized and melted:
key | value
-----------------------------
name | Bob
address.city | Vancouver
address.country | Canada
I need to turn this into a hierarchical JSON:
{
"name": "Bob",
"address": {
"city": "Vancouver",
"country": "Canada"
}
}
Is there a way to do this easily within SQL?
jsonb_set() almost does everything for you, but unfortunately it can only create missing leafs (i.e. missing last keys on a path), but not whole missing branches. To overcome this, here is a modified version of it, which can set values on any missing levels:
create function jsonb_set_rec(jsonb, jsonb, text[])
returns jsonb
language sql
as $$
select case
when array_length($3, 1) > 1 and ($1 #> $3[:array_upper($3, 1) - 1]) is null
then jsonb_set_rec($1, jsonb_build_object($3[array_upper($3, 1)], $2), $3[:array_upper($3, 1) - 1])
else jsonb_set($1, $3, $2, true)
end
$$;
Now you only need to apply this function one-by-one to your rows, starting with an empty json object: {}. You can do this with either recursive CTEs:
with recursive props as (
(select distinct on (grp)
pk, grp, jsonb_set_rec('{}', to_jsonb(value), string_to_array(key, '.')) json_object
from eav_tbl
order by grp, pk)
union all
(select distinct on (grp)
eav_tbl.pk, grp, jsonb_set_rec(json_object, to_jsonb(value), string_to_array(key, '.'))
from props
join eav_tbl using (grp)
where eav_tbl.pk > props.pk
order by grp, eav_tbl.pk)
)
select distinct on (grp)
grp, json_object
from props
order by grp, pk desc;
Or, with a custom aggregate defined as:
create aggregate jsonb_set_agg(jsonb, text[]) (
sfunc = jsonb_set_rec,
stype = jsonb,
initcond = '{}'
);
your query could became as simple as:
select grp, jsonb_set_agg(to_jsonb(value), string_to_array(key, '.'))
from eav_tbl
group by grp;
https://rextester.com/TULNU73750
There are no ready to use tools for this. The function generates a hierarchical json object based on a path:
create or replace function jsonb_build_object_from_path(path text, value text)
returns jsonb language plpgsql as $$
declare
obj jsonb;
keys text[] := string_to_array(path, '.');
level int := cardinality(keys);
begin
obj := jsonb_build_object(keys[level], value);
while level > 1 loop
level := level- 1;
obj := jsonb_build_object(keys[level], obj);
end loop;
return obj;
end $$;
You also need the aggregate function jsonb_merge_agg(jsonb) described in this answer. The query:
with my_table (path, value) as (
values
('name', 'Bob'),
('address.city', 'Vancouver'),
('address.country', 'Canada'),
('first.second.third', 'value')
)
select jsonb_merge_agg(jsonb_build_object_from_path(path, value))
from my_table;
gives this object:
{
"name": "Bob",
"first":
{
"second":
{
"third": "value"
}
},
"address":
{
"city": "Vancouver",
"country": "Canada"
}
}
The function do not recognize json arrays.
I can't really think of something simpler, although I think there should be an easier way.
I assume there is some additional column that can be used to bring the keys that belong to one "person" together, I used p_id for that in my example.
select p_id,
jsonb_object_agg(k, case level when 1 then v -> k else v end)
from (
select p_id,
elements[1] k,
jsonb_object_agg(case cardinality(elements) when 1 then ky else elements[2] end, value) v,
max(cardinality(elements)) as level
from (
select p_id,
"key" as ky,
string_to_array("key", '.') as elements, value
from kv
) t1
group by p_id, k
) t2
group by p_id;
The innermost query just converts the dot notation to an array for easier access later.
The next level then builds JSON objects depending on the "key". For the "single level" keys, it just uses key/value, for the others it uses the second element + the value and then aggregates those that belong together.
The second query level returns the following:
p_id | k | v | level
-----+---------+--------------------------------------------+------
1 | address | {"city": "Vancouver", "country": "Canada"} | 2
1 | name | {"name": "Bob"} | 1
2 | address | {"city": "Munich", "country": "Germany"} | 2
2 | name | {"name": "John"} | 1
The aggregation done in the second step, leaves one level too much for the "single element" keys, and that's what we need level for.
If that distinction wasn't made, the final aggregation would return {"name": {"name": "Bob"}, "address": {"city": "Vancouver", "country": "Canada"}} instead of the wanted: {"name": "Bob", "address": {"city": "Vancouver", "country": "Canada"}}.
The expression case level when 1 then v -> k else v end essentially turns {"name": "Bob"} back to "Bob".
So, with the following sample data:
create table kv (p_id integer, "key" text, value text);
insert into kv
values
(1, 'name','Bob'),
(1, 'address.city','Vancouver'),
(1, 'address.country','Canada'),
(2, 'name','John'),
(2, 'address.city','Munich'),
(2, 'address.country','Germany');
then query returns:
p_id | jsonb_object_agg
-----+-----------------------------------------------------------------------
1 | {"name": "Bob", "address": {"city": "Vancouver", "country": "Canada"}}
2 | {"name": "John", "address": {"city": "Munich", "country": "Germany"}}
Online example: https://rextester.com/SJOTCD7977
create table kv (key text, value text);
insert into kv
values
('name','Bob'),
('address.city','Vancouver'),
('address.country','Canada'),
('name','John'),
('address.city','Munich'),
('address.country','Germany');
create view v_kv as select row_number() over() as nRec, key, value from kv;
create view v_datos as
select k1.nrec, k1.value as name, k2.value as address_city, k3.value as address_country
from v_kv k1 inner join v_kv k2 on (k1.nrec + 1 = k2.nrec)
inner join v_kv k3 on ((k1.nrec + 2= k3.nrec) and (k2.nrec + 1 = k3.nrec))
where mod(k1.nrec, 3) = 1;
select json_agg(json_build_object('name',name, 'address', json_build_object('city',address_city, 'country', address_country)))
from v_datos;
Let's say I have a JSON column named data in some MySQL table, and this column is a single array. So, for example, data may contain:
[1,2,3,4,5]
Now I want to select all rows which have a data column where one of its array elements is greater than 2. Is this possible?
I tried the following, but seems it is always true regardless of the values in the array:
SELECT * from my_table
WHERE JSON_EXTRACT(data, '$[*]') > 2;
You may search an array of integers as follows:
JSON_CONTAINS('[1,2,3,4,5]','7','$') Returns: 0
JSON_CONTAINS('[1,2,3,4,5]','1','$') Returns: 1
You may search an array of strings as follows:
JSON_CONTAINS('["a","2","c","4","x"]','"x"','$') Returns: 1
JSON_CONTAINS('["1","2","3","4","5"]','"7"','$') Returns: 0
Note: JSON_CONTAINS returns either 1 or 0
In your case you may search using a query like so:
SELECT * from my_table
WHERE JSON_CONTAINS(data, '2', '$');
SELECT JSON_SEARCH('["1","2","3","4","5"]', 'one', "2") is not null
is true
SELECT JSON_SEARCH('["1","2","3","4","5"]', 'one', "6") is not null
is false
Since MySQL 8 there is a new function called JSON_TABLE.
CREATE TABLE my_table (id INT, data JSON);
INSERT INTO my_table VALUES
(1, "[1,2,3,4,5]"),
(2, "[0,1,2]"),
(3, "[3,4,-10]"),
(4, "[-1,-2,0]");
SELECT DISTINCT my_table.*
FROM my_table, JSON_TABLE(data, "$[*]" COLUMNS(nr INT PATH '$')) as ids
WHERE ids.nr > 2;
+------+-----------------+
| id | data |
+------+-----------------+
| 1 | [1, 2, 3, 4, 5] |
| 3 | [3, 4, -10] |
+------+-----------------+
2 rows in set (0.00 sec)
I use a combination of JSON_EXTRACT and JSON_CONTAINS (MariaDB):
SELECT * FROM table WHERE JSON_CONTAINS(JSON_EXTRACT(json_field, '$[*].id'), 11, '$');
I don't know if we found the solution.
I found with MariaDB a way, to search path in a array. For example, in array [{"id":1}, {"id":2}], I want find path with id equal to 2.
SELECT JSON_SEARCH('name_field', 'one', 2, null, '$[*].id')
FROM name_table
The result is:
"$[1].id"
The asterisk indicate searching the entire array
This example works for me with mysql 5.7 above
SET #j = '{"a": [ "8428341ffffffff", "8428343ffffffff", "8428345ffffffff", "8428347ffffffff","8428349ffffffff", "842834bffffffff", "842834dffffffff"], "b": 2, "c": {"d": 4}}';
select JSON_CONTAINS(JSON_EXTRACT(#j , '$.a'),'"8428341ffffffff"','$') => returns 1
notice about " around search keyword, '"8428341ffffffff"'
A possible way is to deal with the problem as string matching. Convert the JSON to string and match.
Or you can use JSON_CONTAINS.
You can use JSON extract to search and select data
SELECT data, data->"$.id" as selectdata
FROM table
WHERE JSON_EXTRACT(data, "$.id") = '123'
#ORDER BY c->"$.name";
limit 10 ;
SET #doc = '[{"SongLabels": [{"SongLabelId": "111", "SongLabelName": "Funk"}, {"SongLabelId": "222", "SongLabelName": "RnB"}], "SongLabelCategoryId": "test11", "SongLabelCategoryName": "曲风"}]';
SELECT *, JSON_SEARCH(#doc, 'one', '%un%', null, '$[*].SongLabels[*].SongLabelName')FROM t_music_song_label_relation;
result: "$[0].SongLabels[0].SongLabelName"
SELECT song_label_content->'$[*].SongLabels[*].SongLabelName' FROM t_music_song_label_relation;
result: ["Funk", "RnB"]
I have similar problem, search via function
create function searchGT(threshold int, d JSON)
returns int
begin
set #i = 0;
while #i < json_length(d) do
if json_extract(d, CONCAT('$[', #i, ']')) > threshold then
return json_extract(d, CONCAT('$[', #i, ']'));
end if;
set #i = #i + 1;
end while;
return null;
end;
select searchGT(3, CAST('[1,10,20]' AS JSON));
This seems to be possible with to JSON_TABLE function. It's available in mysql version 8.0 or mariadb version 10.6.
With this test setup
CREATE TEMPORARY TABLE mytable
WITH data(a,json) AS (VALUES ('a','[1]'),
('b','[1,2]'),
('c','[1,2,3]'),
('d','[1,2,3,4]'))
SELECT * from data;
we get the following table
+---+-----------+
| a | json |
+---+-----------+
| a | [1] |
| b | [1,2] |
| c | [1,2,3] |
| d | [1,2,3,4] |
+---+-----------+
It's possible to select every row from mytable wich has a value greater than 2 in the json array with this query.
SELECT * FROM mytable
WHERE TRUE IN (SELECT val > 2
FROM JSON_TABLE(json,'$[*]'
columns (val INT(1) path '$')
) as json
)
Returns:
+---+-----------+
| a | json |
+---+-----------+
| c | [1,2,3] |
| d | [1,2,3,4] |
+---+-----------+
I have nearly 50 json files. I want to create Postgres database based on these files. Each file contains data of one table. The files are not very large (at maximum several thousand records). Example data from customers.json (in fact there are more fields, I have simplified it):
[
{
"Id": 55948,
"FullName": "Full name #1",
"Address": "Address #1",
"Turnover": 120400.5,
"DateOfRegistration": "2014-02-13",
"LastModifiedAt": "2015-11-03 12:04:44" },
{
"Id": 55949,
"FullName": "Full name %2",
"Address": "Address #2",
"Turnover": 120000.0,
"DateOfRegistration": "2012-12-01",
"LastModifiedAt": "2015-11-04 17:14:21" }
]
I try to write a function which creates a table and inserts all the data to it. My attempt is based on dynamic query using EXECUTE:
CREATE OR REPLACE FUNCTION import_json(table_name text, data json)
RETURNS VOID AS $$
DECLARE
query text;
colname text;
BEGIN
query := 'CREATE TABLE ' || table_name || ' (';
FOR colname IN SELECT json_object_keys(data->0)
LOOP query := query || lower(colname) || ' text,';
END LOOP;
query := rtrim(query, ',') || ');';
EXECUTE(query);
END $$ LANGUAGE plpgsql;
My function creates a table with expected column names, but all columns are of type text. The problem is that I do not know how to define proper types of columns.
The json files are formatted well and contain integer, numeric, date, timestamp and text values. I would like to get the table:
CREATE TABLE customers (
id integer,
fullname text,
address text,
turnover numeric,
date_of_registration date,
last_modified_at timestamp);
The main question: how can I recognize types of columns in generated table?
Additionally, is there an easy way to transform Pascal to underscore notation ("DateOfRegistration" -> "date_of_registration")?
You can determine the type of a column by examining a value.
The function below formats column's definition from a pair (key, value).
It uses regex pattern matching.
It also transforms column's name to the notation with underscores (using regexp_replace() function).
Of course, the function wont work properly if the value represents NULL, so you have to check that the first json record has all not-null values.
create or replace function format_column(ckey text, cval text)
returns text language sql immutable as $$
select format('%s %s',
lower(regexp_replace(ckey, '(.)([A-Z])', '\1_\2', 'g')),
case
when cval ~ '^[\+-]{0,1}\d+$' then 'integer'
when cval ~ '^[\+-]{0,1}\d*\.\d+$' then 'numeric'
when cval ~ '^"\d\d\d\d-\d\d-\d\d"$' then 'date'
when cval ~ '^"\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d"$' then 'timestamp'
else 'text'
end
)
$$;
select format_column(key, value)
from (
values
('Id', '55948'),
('FullName', '"Full name #1"'),
('Turnover', '120400.5'),
('DateOfRegistration', '"2014-02-13"')
) val(key, value);
format_column
---------------------------
id integer
full_name text
turnover numeric
date_of_registration date
(4 rows)
In the main function you do not need variables or loops.
Use format() function to format strings with parameters and string_agg() to create textual lists.
Since you need both keys and values, use json_each() instead of json_object_keys(). In the second query you can use row_number() to ensure that the aggregated list of values is divided for consecutive records.
create or replace function import_table(table_name text, jdata json)
returns void language plpgsql as $$
begin
execute format('create table %s (%s)', table_name, string_agg(col, ', '))
from (
select format_column(key::text, value::text) col
from json_each(jdata->0)
) sub;
execute format('insert into %s values %s', table_name, string_agg(val, ','))
from (
with lines as (
select row_number() over () rn, line
from (
select json_array_elements(jdata) line
) sub
)
select rn, format('(%s)', string_agg(value, ',')) val
from (
select rn, format('%L', trim(value::text, '"')) as value
from lines, json_each(line)
) sub
group by 1
) sub;
end $$;
Test:
select import_table('customers',
'[{ "Id": 55948,
"FullName": "Full name #1",
"Address": "Address #1",
"Turnover": 120400.5,
"DateOfRegistration": "2014-02-13",
"LastModifiedAt": "2015-11-03 12:04:44" },
{ "Id": 55949,
"FullName": "Full name %2",
"Address": "Address #2",
"Turnover": 120000.0,
"DateOfRegistration": "2012-12-01",
"LastModifiedAt": "2015-11-04 17:14:21" }]');
\d customers
Table "public.customers"
Column | Type | Modifiers
----------------------+-----------------------------+-----------
id | integer |
full_name | text |
address | text |
turnover | numeric |
date_of_registration | date |
last_modified_at | timestamp without time zone |
select * from customers;
id | full_name | address | turnover | date_of_registration | last_modified_at
-------+--------------+------------+----------+----------------------+---------------------
55948 | Full name #1 | Address #1 | 120400.5 | 2014-02-13 | 2015-11-03 12:04:44
55949 | Full name %2 | Address #2 | 120000.0 | 2012-12-01 | 2015-11-04 17:14:21
(2 rows)
I have a table like this:
+--------+--------------------+
| ID | Attribute |
+--------+--------------------+
| 1 |"color" => "red" |
+--------+--------------------+
| 1 |"color" => "green" |
+--------+--------------------+
| 1 |"shape" => "square" |
+--------+--------------------+
| 2 |"color" => "blue" |
+--------+--------------------+
| 2 |"color" => "black" |
+--------+--------------------+
| 2 |"flavor" => "sweat" |
+--------+--------------------+
| 2 |"flavor" => "salty" |
+--------+--------------------+
And I want to run some postgres query that get a result table like this:
+--------+------------------------------------------------------+
| ID | Attribute |
+--------+------------------------------------------------------+
| 1 |"color" => "red, green", "shape" => "square" |
+--------+------------------------------------------------------+
| 2 |"color" => "blue, black", "flavor" => "sweat, salty" |
+--------+------------------------------------------------------+
The attribute column can either be hstore or json format. I wrote it in hstore for an example, but if we cannot achieve this in hstore, but in json, I would change the column to json.
I know that hstore does not support one key to multiple values, when I tried some merge method, it only kept one value for each key. But for json, I didn't find anything that supports multiple value merge like this neither. I think this can be done by function merging values for the same key into a string/text and add it back to the key/value pair. But I'm stuck in implementing it.
Note: if implement this in some function, ideally any key such as color, shape should not appear in the function since keys can be expanded dynamically.
Does anyone have any idea about this? Any advice or brainstorm might help. Thank you!
Just a note before anything else: in your desidered output I would use some proper json and not that kind of lookalike. So a correct output according to me would be:
+--------+----------------------------------------------------------------------+
| ID | Attribute |
+--------+----------------------------------------------------------------------+
| 1 | '{"color":["red","green"], "flavor":[], "shape":["square"]}' |
+--------+----------------------------------------------------------------------+
| 2 | '{"color":["blue","black"], "flavor":["sweat","salty"], "shape":[]}' |
+--------+----------------------------------------------------------------------+
A PL/pgSQL function which parses the json attributes and executes a dynamic query would do the job, something like that:
CREATE OR REPLACE FUNCTION merge_rows(PAR_table regclass) RETURNS TABLE (
id integer,
attributes json
) AS $$
DECLARE
ARR_attributes text[];
VAR_attribute text;
ARR_query_parts text[];
BEGIN
-- Get JSON attributes names
EXECUTE format('SELECT array_agg(name ORDER BY name) AS name FROM (SELECT DISTINCT json_object_keys(attribute) AS name FROM %s) AS s', PAR_table) INTO ARR_attributes;
-- Write json_build_object() query part
FOREACH VAR_attribute IN ARRAY ARR_attributes LOOP
ARR_query_parts := array_append(ARR_query_parts, format('%L, array_remove(array_agg(l.%s), null)', VAR_attribute, VAR_attribute));
END LOOP;
-- Return dynamic query
RETURN QUERY EXECUTE format('
SELECT t.id, json_build_object(%s) AS attributes
FROM %s AS t,
LATERAL json_to_record(t.attribute) AS l(%s)
GROUP BY t.id;',
array_to_string(ARR_query_parts, ', '), PAR_table, array_to_string(ARR_attributes, ' text, ') || ' text');
END;
$$ LANGUAGE plpgsql;
I've tested it and it seems to work, it returns a json with. Here is my test code:
CREATE TABLE mytable (
id integer NOT NULL,
attribute json NOT NULL
);
INSERT INTO mytable (id, attribute) VALUES
(1, '{"color":"red"}'),
(1, '{"color":"green"}'),
(1, '{"shape":"square"}'),
(2, '{"color":"blue"}'),
(2, '{"color" :"black"}'),
(2, '{"flavor":"sweat"}'),
(2, '{"flavor":"salty"}');
SELECT * FROM merge_rows('mytable');
Of course you can pass the id and attribute column names as parameters as well and maybe refine the function a bit, this is just to give you an idea.
EDIT : If you're on 9.4 please consider using jsonb datatype, it's much better and gives you room for improvements. You would just need to change the json_* functions to their jsonb_* equivalents.
If you just want this for display purposes, this might be enough:
select id, string_agg(key||' => '||vals, ', ')
from (
select t.id, x.key, string_agg(value, ',') vals
from t
join lateral each(t.attributes) x on true
group by id, key
) t
group by id;
If you are not on 9.4, you can't use the lateral join:
select id, string_agg(key||' => '||vals, ', ')
from (
select id, key, string_agg(val, ',') as vals
from (
select t.id, skeys(t.attributes) as key, svals(t.attributes) as val
from t
) t1
group by id, key
) t2
group by id;
This will return:
id | string_agg
---+-------------------------------------------
1 | color => red,green, shape => square
2 | color => blue,black, flavor => sweat,salty
SQLFiddle: http://sqlfiddle.com/#!15/98caa/2