Extract key-pair values from JSON objects in MySQL - mysql

From MySQL JSON data field, I'm extracting data from array like so:
SELECT
data ->> '$.fields[*]' as fields
FROM some_database...
which returns:
[{
"id": 111056,
"hint": null,
"slug": "email",
"label": "E-mail",
"value": null,
"field_value": "test#example.com",
"placeholder": null
}, {
"id": 111057,
"hint": null,
"slug": "name",
"label": "Imię",
"value": null,
"field_value": "Aneta",
"placeholder": null
}]
I can also extract single column:
SELECT
data ->> '$.fields[*].field_value' as fields
FROM some_database...
and that returns the following result:
[test#example.com, Aneta]
But how can I extract field_value alongside with label as key-pairs?
Preferred output would be a single multi-row string containing pairs:
label: field_value
label: field_value
...
Using example shown above it would get me following output:
E-mail: test#example.com
Imię: Aneta
One-liner preferred as I have multiple of such arrays to extract from various fields.

Here's an example of extracting the key names as rows:
select j.keyname from some_database
cross join json_table(
json_keys(data->'$[0]'),
'$[*]' columns (
keyname varchar(20) path '$'
)
) as j;
Output:
+-------------+
| keyname |
+-------------+
| id |
| hint |
| slug |
| label |
| value |
| field_value |
| placeholder |
+-------------+
Now you can join that to the values:
select n.n, j.keyname,
json_unquote(json_extract(f.data, concat('$[', n.n, ']."', j.keyname, '"'))) as value
from some_database as d
cross join json_table(
json_keys(d.data->'$[0]'),
'$[*]' columns (
keyname varchar(20) path '$'
)
) as j
cross join n
join some_database as f on n.n < json_length(f.data);
Output:
+---+-------------+------------------+
| n | keyname | value |
+---+-------------+------------------+
| 0 | id | 111056 |
| 0 | hint | null |
| 0 | slug | email |
| 0 | label | E-mail |
| 0 | value | null |
| 0 | field_value | test#example.com |
| 0 | placeholder | null |
| 1 | id | 111057 |
| 1 | hint | null |
| 1 | slug | name |
| 1 | label | Imię |
| 1 | value | null |
| 1 | field_value | Aneta |
| 1 | placeholder | null |
+---+-------------+------------------+
I'm using a utility table n which is just filled with integers.
create table n (n int primary key);
insert into n values (0),(1),(2),(3)...;
If this seems like a lot of complex work, then maybe the lesson is that storing data in JSON is not easy, when you want SQL expressions to work on the discrete fields within JSON documents.

You can use JSON_VALUE:
select JSON_VALUE (json_value_col, '$.selected_key') as selected_value from user_details ;
You can also use JSON_EXTRACT:
select JSON_EXTRACT (json_value_col, '$.selected_key') as selected_value from user_details ;
For more details refer:
https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html

Related

Print key, value pairs from nested MYSQL json

I have extracted a mysql json dictionary strucutre and I wish to get all the values associated with the keys alpha and beta; however I also wish to print the key too. The structure of the dictionary is:
results =
{1:
{"a": {"alpha": 1234,
"beta": 2345},
"b": {"alpha": 1234,
"beta": 2345},
"c": {"alpha": 1234,
"beta": 2345},
},
2:
{"ab": {"alpha": 1234,
"beta": 2345},
"ac": {"alpha": 1234,
"beta": 2345},
"bc": {"alpha": 1234,
"beta": 2345},
},
3:
{"abc": {"alpha": 1234,
"beta": 2345}
}
"random_key": "not_interested_in_this_value"
}
So far I have been had some succes extracting the data I wish using:
SELECT JSON_EXTRACT alpha, beta FROM results;
This gave me the alpha and beta columns; however, I ideally would like to assoicate each value with their key to get:
+-------+---------+---------+
| key | alpha | beta |
+-------+---------+---------+
| a | 1234. | 2345. |
| b | 1234. | 2345. |
| c | 1234. | 2345. |
| ab | 1234. | 2345. |
| ac | 1234. | 2345. |
| bc | 1234. | 2345. |
| abc | 1234. | 2345. |
+-------+---------+---------+
I am very new to mysql and any help is appreciated.
First of all, what you posted is not valid JSON. You can use integers as values, but you can't use integers as keys in objects. Also you have a few spurious , symbols. I had to fix these mistakes before I could insert the data into a table to test.
I was able to solve this using MySQL 8.0's JSON_TABLE() function in the following way:
select
j2.`key`,
json_extract(results, concat('$."',j1.`key`,'"."',j2.`key`,'".alpha')) as alpha,
json_extract(results, concat('$."',j1.`key`,'"."',j2.`key`,'".beta')) as beta
from mytable
cross join json_table(json_keys(results), '$[*]' columns (`key` int path '$')) as j1
cross join json_table(json_keys(json_extract(results, concat('$."',j1.`key`,'"'))), '$[*]' columns (`key` varchar(3) path '$')) as j2
where j2.`key` IS NOT NULL;
Output:
+------+-------+------+
| key | alpha | beta |
+------+-------+------+
| a | 1234 | 2345 |
| b | 1234 | 2345 |
| c | 1234 | 2345 |
| ab | 1234 | 2345 |
| ac | 1234 | 2345 |
| bc | 1234 | 2345 |
| abc | 1234 | 2345 |
+------+-------+------+
If you find this sort of query too difficult, I would encourage you to reconsider whether you want to store data in JSON.
If I were you, I'd store data in normal rows and columns, then the query would be a lot simpler and easier to write and maintain.

How to optimize mysql query to get count from json column which have huge data

I have a query to find the count of rejected serialNos for different reasons. I need to find each reason count within a date limit.I have 3 tables say:
+------------------+--------------+------+-----+-------------------+
| Field | Type | Null | Key | Default |
+------------------+--------------+------+-----+-------------------+
| id | int(11) | NO | PRI | NULL |
| client_id | int(11) |YES | MUL | NULL |
| tc_date | datetime | YES | | NULL |
+------------------+--------------+------+-----+--------------------
mysql> desc job_order_finish_product_serial_no;
+------------------------------+--------------+------+-----+-----+
| Field | Type | Null | Key | Default|
+------------------------------+--------------+------+-----+-----|
id | int(11) | NO | PRI | NULL |
| serial_no | varchar(255) | YES | MUL | NULL |
| specification | json | YES | | NULL |
| client_id | int(11) |YES | MUL | NULL |
| tc_id | int(11) | YES | MUL | NULL |
| job_order_finish_products_id | int(11) | YES | MUL | NULL |
---------------------------------+-------------+-------+----+-------
In specification column my sample data looks like
"Leakage":{
"time":"2021-09-20 10:00:00",
"status":"completed",
"rework":[],
"user":{
"name":"xyz",
"id":1
}
},
"Thickness":{
"time":"2021-09-20 10:00:00",
"status":"rejected",
"rework":[],
"user":{
"name":"xyz",
"id":1
}
},
"Diameter":{
"time":null,
"status":"pending",
"rework":[],
"user":{
}
},
"Bung":{
"time":null,
"status":"pending",
"rework":[],
"user":{
}
}
}
For each serial_no in job_order_finish_product_serial_no, the specification should look like the above snippet. I need to get a count of each reason rejected serial_nos count within a date range. job_order_finish_product_serial_no row count was 2251543 rows.My count query is
select tc.tc_date,
(
SELECT count(serial_no)
from nucleus.job_order_finish_product_serial_no jfps
where jfps.tc_id=tc.id
and specification ->> "$.Thickness.status" = "rejected"
and client_id = 154
) as rejected_thickness,
(
SELECT count(serial_no)
from nucleus.job_order_finish_product_serial_no jfps
where jfps.tc_id=tc.id
and specification ->> "$.Leakage.status" = "rejected"
and client_id = 154
) as rejected_leakage,
(
SELECT count(serial_no)
from nucleus.job_order_finish_product_serial_no jfps
where jfps.tc_id=tc.id
and specification ->> "$.Bung.status" = "rejected"
and client_id = 154
) as rejected_bung
from nucleus.tc_details tc
inner join nucleus.job_order_finish_product_serial_no jfps
ON jfps.tc_id=tc.id
inner join nucleus.job_order_finish_products jofp
ON jofp.id=jfps.job_order_finish_products_id
where tc.tc_date between '2021-09-18 00:00:00'
AND '2021-09-22 23:59:59'
and tc.client_id=154
and jofp.client_id=154
and jfps.client_id=154
group by job_order_finish_product_id,tc.tc_date;
Output:
data rejected_thickness rejected_bung rejected_leakage rejected_diameter
21-09-2021 2 10 23 3
with the above query each subquery taking 2 min to give the result and the entire query taking almost taking 10min. Is there any way to optimize the query? Thank you!
Indexes that may help:
jfps, jofp: INDEX(tc_id, client_id) -- or in the opposite order
tc: INDEX(client_id, tc_date, id)
I prefer this way to write date range tests:
date >= '2021-09-18'
AND date < '2021-09-18' + INTERVAL 4 DAY
JOIN with a 'derived' table might be faster than 3 subqueries
unless count(serial_no) is needed for excluding NULL values of serial_no, use COUNT(*).
Something like
SELECT ...
SUM(jfps.specification ->> "$.Leakage.status" = "rejected")
as rejected_leakage,
SUM(...) as ...,
SUM(...) as ...,
...
JOIN nucleus.job_order_finish_product_serial_no AS jfps
ON jfps.tc_id = tc.id
WHERE ...
AND jfps.client_id = 154

MySQL - How can you select multiple columns on a nested IFNULL...GROUP_CONCAT() condition?

I have a web application which is connected to a MySQL (5.5.64-MariaDB) database.
One of the queries is as follows:
SELECT
d.id,
d.label AS display_label,
d.anchor,
r.id AS regulation_id,
IFNULL(
(SELECT GROUP_CONCAT(value) FROM display_substances `ds`
WHERE `ds`.`display_id` = `d`.`id`
AND ds.substance_id = 1 -- For example, substance ID = 1
GROUP BY `ds`.`display_id`
), "Not Listed"
) `display_value` FROM displays `d`
JOIN groups g ON d.group_id = g.id
JOIN regulations r ON g.regulation_id = r.id
An example of the output is as follows:
+-----+------------------------------------+------------------------------------------------------------------------------------------+
| id | name | display_value |
+-----+------------------------------------+------------------------------------------------------------------------------------------+
| 4 | techfunction | Intermediate / monomer; Corrosion inhibitor / anodiser / galvaniser; Catalyst; Additive |
| 323 | russia_chemsafety_register_display | Not Listed |
| 733 | peru_pcb_display | Not Listed |
+-----+------------------------------------+------------------------------------------------------------------------------------------+
This query does what we need. For explanatory purposes:
There are 2 tables, displays and display_substances
The query is obtaining display_substances.value for each displays.id
If there is no corresponding display_substances.value then the string "Not Listed" (refer to query above) is returned. If there is a corresponding value then display_substances.value is returned. So in the example data above, IDs 323 and 733 refer to a scenario where there is no corresponding entry, therefore we want "Not Listed". Conversely ID 4 does have a value ("Intermediate / monomer; Corrosion inhibitor / anodiser / galvaniser; Catalyst; Additive") so we get that.
The table structures are as follows:
DESCRIBE displays;
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(127) | NO | | NULL | |
| label | varchar(255) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
DESCRIBE display_substances;
+--------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-----------------------+------+-----+---------+----------------+
| id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| display_id | smallint(5) unsigned | NO | MUL | NULL | |
| substance_id | mediumint(8) unsigned | NO | MUL | NULL | |
| value | text | NO | | NULL | |
| automated | tinyint(4) | YES | | NULL | |
+--------------+-----------------------+------+-----+---------+----------------+
I want to be able to return display_substances.automated (refer to table structure above) as a column from my query. But I can't see how to do this.
The reference to the display_substances table is ds, so I cannot use that in the initial SELECT statement because at that point there's no alias. Equally there is no JOIN condition that would make it possible, because not every row returned obtains data from display_substances (i.e. those that are "Not Listed" are not getting anything from that table).
If I want an additional column next to display_value in the sample output above that shows display_substances.automated, or NULL if it doesn't exist, how can I achieve that?
For reference the automated field either contains a 1 (to represent data that has been obtained through automated processes by our application), or NULL if it isn't automated.
there is no JOIN condition that would make it possible, because not
every row returned obtains data from display_substances
For this case you can use a LEFT JOIN:
SELECT d.id, d.label display_label, d.anchor, r.id regulation_id,
COALESCE(ds.value, 'Not Listed') display_value,
ds.automated
FROM displays d
INNER JOIN groups g ON d.group_id = g.id
INNER JOIN regulations r ON g.regulation_id = r.id
LEFT JOIN (
SELECT display_id, GROUP_CONCAT(value) value, MAX(automated) automated
FROM display_substances
WHERE substance_id = 1
GROUP BY display_id
) ds ON ds.display_id = d.id
I used MAX(automated) as the returned column, but you can use GROUP_CONCAT(automated) just like you do for value and also COALESCE():
COALESCE(ds.automated, 'Not Listed')

Parse JSON Array where each member has different schema but same general structure

I have a JSON data feed coming into SQL Server 2016. One of the attributes I must parse contains a JSON array. Unfortunately, instead of implementing a key/value design, the source system sends each member of the array with a different attribute name. The attribute names are not known in advance, and are subject to change/volatility.
declare #json nvarchar(max) =
'{
"objects": [
{"foo":"fooValue"},
{"bar":"barValue"},
{"baz":"bazValue"}
]
}';
select * from openjson(json_query(#json, 'strict $.objects'));
As you can see:
element 0 has a "foo" attribute
element 1 has a "bar" attribute
element 2 has a "baz" attribute:
+-----+--------------------+------+
| key | value | type |
+-----+--------------------+------+
| 0 | {"foo":"fooValue"} | 5 |
| 1 | {"bar":"barValue"} | 5 |
| 2 | {"baz":"bazValue"} | 5 |
+-----+--------------------+------+
Ideally, I would like to parse and project the data like so:
+-----+---------------+----------------+------+
| key | attributeName | attributeValue | type |
+-----+---------------+----------------+------+
| 0 | foo | fooValue | 5 |
| 1 | bar | barValue | 5 |
| 2 | baz | bazValue | 5 |
+-----+---------------+----------------+------+
Reminder: The attribute names are not known in advance, and are subject to change/volatility.
select o.[key], v.* --v.[key] as attributeName, v.value as attributeValue
from openjson(json_query(#json, 'strict $.objects')) as o
cross apply openjson(o.[value]) as v;

How to calculate count of each value in MySQL JSON array?

I have a MySQL table with the following definition:
mysql> desc person;
+--------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | text | YES | | NULL | |
| fruits | json | YES | | NULL | |
+--------+---------+------+-----+---------+-------+
The table has some sample data as follows:
mysql> select * from person;
+----+------+----------------------------------+
| id | name | fruits |
+----+------+----------------------------------+
| 1 | Tom | ["apple", "orange"] |
| 2 | John | ["apple", "mango"] |
| 3 | Tony | ["apple", "mango", "strawberry"] |
+----+------+----------------------------------+
How can I calculate the total number of occurrences for each fruit? For example:
+------------+-------+
| fruit | count |
+------------+-------+
| apple | 3 |
| orange | 1 |
| mango | 2 |
| strawberry | 1 |
+------------+-------+
Some research shows that the JSON_LENGTH function can be used but I cannot find an example similar to my scenario.
You can use JSON_EXTRACT() function to extract each value ("apple", "mango", "strawberry" and "orange") of all three components of the arrays, and then then apply UNION ALL to combine all such queries:
SELECT comp, count(*)
FROM
(
SELECT JSON_EXTRACT(fruit, '$[0]') as comp FROM person UNION ALL
SELECT JSON_EXTRACT(fruit, '$[1]') as comp FROM person UNION ALL
SELECT JSON_EXTRACT(fruit, '$[2]') as comp FROM person
) q
WHERE comp is not null
GROUP BY comp
Indeed If your DB's version is 8, then you can also use JSON_TABLE() function :
SELECT j.fruit, count(*)
FROM person p
JOIN JSON_TABLE(
p.fruits,
'$[*]' columns (fruit varchar(50) path '$')
) j
GROUP BY j.fruit;
Demo
You can't do it without first creating a table with one row per fruit.
CREATE TABLE allfruits (fruit VARCHAR(10) PRIMARY KEY);
INSERT INTO allfruits VALUES ('apple'), ('orange'), ('mango'), ('strawberry');
There is not a good way to generate this from the JSON.
Once you have that table, you can join it to the JSON and then use GROUP BY to count the occurrences.
SELECT fruit, COUNT(*) AS count
FROM allfruits
JOIN person ON JSON_SEARCH(person.fruits, 'one', fruit) IS NOT NULL
GROUP BY fruit;
Output:
+------------+-------+
| fruit | count |
+------------+-------+
| apple | 3 |
| mango | 2 |
| orange | 1 |
| strawberry | 1 |
+------------+-------+
Note that it will do a table-scan on the person table to find each fruit. This is pretty inefficient, and as your person table gets larger, it will become a performance problem.
If you want to optimize for this type of query, then you shouldn't use JSON to store an array of fruits. You should store data in a normalized way, representing the many-to-many relationship between persons and fruits with another table.
This is related to my answer to Is storing a delimited list in a database column really that bad?
I think the simplest solution would be to use JSON_TABLE function.
The query you need is
select ft.fruit, count(ft.fruit) from person,
json_table(
fruits,
'$[*]' columns(
fruit varchar(128) path '$'
)
) as ft
group by ft.fruit
;
You can find working example in this dbfiddle
Fruit demo