How to calculate count of each value in MySQL JSON array? - mysql

I have a MySQL table with the following definition:
mysql> desc person;
+--------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | text | YES | | NULL | |
| fruits | json | YES | | NULL | |
+--------+---------+------+-----+---------+-------+
The table has some sample data as follows:
mysql> select * from person;
+----+------+----------------------------------+
| id | name | fruits |
+----+------+----------------------------------+
| 1 | Tom | ["apple", "orange"] |
| 2 | John | ["apple", "mango"] |
| 3 | Tony | ["apple", "mango", "strawberry"] |
+----+------+----------------------------------+
How can I calculate the total number of occurrences for each fruit? For example:
+------------+-------+
| fruit | count |
+------------+-------+
| apple | 3 |
| orange | 1 |
| mango | 2 |
| strawberry | 1 |
+------------+-------+
Some research shows that the JSON_LENGTH function can be used but I cannot find an example similar to my scenario.

You can use JSON_EXTRACT() function to extract each value ("apple", "mango", "strawberry" and "orange") of all three components of the arrays, and then then apply UNION ALL to combine all such queries:
SELECT comp, count(*)
FROM
(
SELECT JSON_EXTRACT(fruit, '$[0]') as comp FROM person UNION ALL
SELECT JSON_EXTRACT(fruit, '$[1]') as comp FROM person UNION ALL
SELECT JSON_EXTRACT(fruit, '$[2]') as comp FROM person
) q
WHERE comp is not null
GROUP BY comp
Indeed If your DB's version is 8, then you can also use JSON_TABLE() function :
SELECT j.fruit, count(*)
FROM person p
JOIN JSON_TABLE(
p.fruits,
'$[*]' columns (fruit varchar(50) path '$')
) j
GROUP BY j.fruit;
Demo

You can't do it without first creating a table with one row per fruit.
CREATE TABLE allfruits (fruit VARCHAR(10) PRIMARY KEY);
INSERT INTO allfruits VALUES ('apple'), ('orange'), ('mango'), ('strawberry');
There is not a good way to generate this from the JSON.
Once you have that table, you can join it to the JSON and then use GROUP BY to count the occurrences.
SELECT fruit, COUNT(*) AS count
FROM allfruits
JOIN person ON JSON_SEARCH(person.fruits, 'one', fruit) IS NOT NULL
GROUP BY fruit;
Output:
+------------+-------+
| fruit | count |
+------------+-------+
| apple | 3 |
| mango | 2 |
| orange | 1 |
| strawberry | 1 |
+------------+-------+
Note that it will do a table-scan on the person table to find each fruit. This is pretty inefficient, and as your person table gets larger, it will become a performance problem.
If you want to optimize for this type of query, then you shouldn't use JSON to store an array of fruits. You should store data in a normalized way, representing the many-to-many relationship between persons and fruits with another table.
This is related to my answer to Is storing a delimited list in a database column really that bad?

I think the simplest solution would be to use JSON_TABLE function.
The query you need is
select ft.fruit, count(ft.fruit) from person,
json_table(
fruits,
'$[*]' columns(
fruit varchar(128) path '$'
)
) as ft
group by ft.fruit
;
You can find working example in this dbfiddle
Fruit demo

Related

MySQL self join return all rows

Per the example data below, I need a query that returns every row, where if the 'contingent_on' field is NULL, it is returned as NULL, but if it is not NULL it is returned with the 'ticket_name' corresponding to the 'primary_key' value.
I tried self join queries but could only get them to return the not NULL rows.
example table data:
primary_key | ticket_name | contingent_on
1 | site preparation | NULL
2 | tender process | NULL
3 | construction | 1
All rows should be returned, where the in the 'construction' row return, 'site preparation' is input in place of '1' in the 'contingent_on' field.
You need a self left join:
select
t.primary_key,
t.ticket_name,
tt.ticket_name ticket_name2
from tablename t left join tablename tt
on tt.primary_key = t.contingent_on
order by t.primary_key
See the demo.
Results:
| primary_key | ticket_name | ticket_name2 |
| ----------- | ---------------- | ---------------- |
| 1 | site preparation | null |
| 2 | tender process | null |
| 3 | construction | site preparation |
It looks simple query:
select
primary_key,
ticket_name,
case when contingent_on is not null then ticket_name else contingent_on end as contingent_on
from <<your_table>>
order by primary_key

Exotic GROUP BY In MySQL

Consider a typical GROUP BY statement in SQL: you have a table like
+------+-------+
| Name | Value |
+------+-------+
| A | 1 |
| B | 2 |
| A | 3 |
| B | 4 |
+------+-------+
And you ask for
SELECT Name, SUM(Value) as Value
FROM table
GROUP BY Name
You'll receive
+------+-------+
| Name | Value |
+------+-------+
| A | 4 |
| B | 6 |
+------+-------+
In your head, you can imagine that SQL generates an intermediate sorted table like
+------+-------+
| Name | Value |
+------+-------+
| A | 1 |
| A | 3 |
| B | 2 |
| B | 4 |
+------+-------+
and then aggregates together successive rows: the "Value" column has been given an aggregator (in this case SUM), so it's easy to aggregate. The "Name" column has been given no aggregator, and thus uses what you might call the "trivial partial aggregator": given two things that are the same (e.g. A and A), it aggregates them into a single copy of one of the inputs (in this case A). Given any other input it doesn't know what to do and is forced to begin aggregating anew (this time with the "Name" column equal to B).
I want to do a more exotic kind of aggregation. My table looks like
+------+-------+
| Name | Value |
+------+-------+
| A | 1 |
| BC | 2 |
| AY | 3 |
| AZ | 4 |
| B | 5 |
| BCR | 6 |
+------+-------+
And the intended output is
+------+-------+
| Name | Value |
+------+-------+
| A | 8 |
| B | 13 |
+------+-------+
Where does this come from? A and B are the "minimal prefixes" for this set of names: they occur in the data set and every Name has exactly one of them as a prefix. I want to aggregate data by grouping rows together when their Names have the same minimal prefix (and add the Values, of course).
In the toy grouping model from before, the intermediate sorted table would be
+------+-------+
| Name | Value |
+------+-------+
| A | 1 |
| AY | 3 |
| AZ | 4 |
| B | 5 |
| BC | 2 |
| BCR | 6 |
+------+-------+
Instead of using the "trivial partial aggregator" for Names, we would use one that can aggregate X and Y together iff X is a prefix of Y; in that case it returns X. So the first three rows would be aggregated together into a row with (Name, Value) = (A, 8), then the aggregator would see that A and B couldn't be aggregated and would move on to a new "block" of rows to aggregate.
The tricky thing is that the value we're grouping by is "non-local": if A were not a name in the dataset, then AY and AZ would each be a minimal prefix. It turns out that the AY and AZ rows are aggregated into the same row in the final output, but you couldn't know that just by looking at them in isolation.
Miraculously, in my use case the minimal prefix of a string can be determined without reference to anything else in the dataset. (Imagine that each of my names is one of the strings "hello", "world", and "bar", followed by any number of z's. I want to group all of the Names with the same "base" word together.)
As I see it I have two options:
1) The simple option: compute the prefix for each row and group by that value directly. Unfortunately I have an index on the Name, and computing the minimal prefix (whose length depends on the Name itself) prevents me from using that index. This forces a full table scan, which is prohibitively slow.
2) The complicated option: somehow convince MySQL to use the "partial prefix aggregator" for Name. This runs into the "non-locality" problem above, but that's fine as long as we scan the table according to my index on Name, since then every minimal prefix will be encountered before any of the other strings it is a prefix of; we would never try to aggregate AY and AZ together if A were in the dataset.
In a declarative programming language #2 would be rather easy: extract rows one at a time, in alphabetical order, keeping track of the current prefix. If your new row's Name has that as a prefix, it goes in the bucket you're currently using. Otherwise, start a new bucket with that as your prefix. In MySQL I am lost as to how to do it. Note that the set of minimal prefixes is not known beforehand.
Edit 2
It occurred to me that if the table is ordered by Name, this would be a lot easier (and faster). Since I don't know if your data is sorted, I've included a sort in this query, but if the data is sorted, you can strip out (SELECT * FROM table1 ORDER BY Name) t1 and just use FROM table1
SELECT prefix, SUM(`Value`)
FROM (SELECT Name, Value, #prefix:=IF(Name NOT LIKE CONCAT(#prefix, '_%'), Name, #prefix) AS prefix
FROM (SELECT * FROM table1 ORDER BY Name) t1
JOIN (SELECT #prefix := '~') p
) t2
GROUP BY prefix
Updated SQLFiddle
Edit
Having slept on the problem, I realised that there is no need to do the IN, it's enough to just have a WHERE NOT EXISTS clause on the JOINed table:
SELECT t1.Name, SUM(t2.Value) AS `Value`
FROM table1 t1
JOIN table1 t2 ON t2.Name LIKE CONCAT(t1.Name, '%')
WHERE NOT EXISTS (SELECT *
FROM table1 t3
WHERE t1.Name LIKE CONCAT(t3.Name, '_%')
)
GROUP BY t1.Name
Updated Explain (Name changed to UNIQUE key from PRIMARY)
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 index Name Name 11 NULL 6 Using where; Using index; Using temporary; Using filesort
1 PRIMARY t2 ALL NULL NULL NULL NULL 6 Using where; Using join buffer (Block Nested Loop)
3 DEPENDENT SUBQUERY t3 index NULL Name 11 NULL 6 Using where; Using index
Updated SQLFiddle
Original Answer
Here is one way you could do it. First, you need to find all the unique prefixes in your table. You can do that by looking for all values of Name where it does not look like another value of Name with other characters on the end. This can be done with this query:
SELECT Name
FROM table1 t1
WHERE NOT EXISTS (SELECT *
FROM table1 t2
WHERE t1.Name LIKE CONCAT(t2.Name, '_%')
)
For your sample data, that will give
Name
A
B
Now you can sum all the values where the Name starts with one of those prefixes. Note we change the LIKE pattern in this query so that it also matches the prefix, otherwise we wouldn't count the values for A and B in your example:
SELECT t1.Name, SUM(t2.Value) AS `Value`
FROM table1 t1
JOIN table1 t2 ON t2.Name LIKE CONCAT(t1.Name, '%')
WHERE t1.Name IN (SELECT Name
FROM table1 t3
WHERE NOT EXISTS (SELECT *
FROM table1 t4
WHERE t3.Name LIKE CONCAT(t4.Name, '_%')
)
)
GROUP BY t1.Name
Output:
Name Value
A 8
B 13
An EXPLAIN says that both of these queries use the index on Name, so should be reasonably efficient. Here is the result of the explain on my MySQL 5.6 server:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 index PRIMARY PRIMARY 11 NULL 6 Using index; Using temporary; Using filesort
1 PRIMARY t3 eq_ref PRIMARY PRIMARY 11 test.t1.Name 1 Using where; Using index
1 PRIMARY t2 ALL NULL NULL NULL NULL 6 Using where; Using join buffer (Block Nested Loop)
3 DEPENDENT SUBQUERY t4 index NULL PRIMARY 11 NULL 6 Using where; Using index
SQLFiddle Demo
Here are some hints on how to do the task. This locates any prefixes that are useful. That's not what you asked for, but the flow of the query and the usage of #variables, plus the need for 2 (actually 3) levels of nesting, might help you.
SELECT DISTINCT `Prev`
FROM
(
SELECT #prev := #next AS 'Prev',
#next := IF(LEFT(city, LENGTH(#prev)) = #prev, #next, city) AS 'Next'
FROM ( SELECT #next := ' ' ) AS init
JOIN ( SELECT DISTINCT city FROM us ) AS dedup
ORDER BY city
) x
WHERE `Prev` = `Next` ;
Partial output:
+----------------+
| Prev |
+----------------+
| Alamo |
| Allen |
| Altamont |
| Ames |
| Amherst |
| Anderson |
| Arlington |
| Arroyo |
| Auburn |
| Austin |
| Avon |
| Baker |
Check the Al% cities:
mysql> SELECT DISTINCT city FROM us WHERE city LIKE 'Al%' ORDER BY city;
+-------------------+
| city |
+-------------------+
| Alabaster |
| Alameda |
| Alamo | <--
| Alamogordo | <--
| Alamosa |
| Albany |
| Albemarle |
...
| Alhambra |
| Alice |
| Aliquippa |
| Aliso Viejo |
| Allen | <--
| Allen Park | <--
| Allentown | <--
| Alliance |
| Allouez |
| Alma |
| Aloha |
| Alondra Park |
| Alpena |
| Alpharetta |
| Alpine |
| Alsip |
| Altadena |
| Altamont | <--
| Altamonte Springs | <--
| Alton |
| Altoona |
| Altus |
| Alvin |
+-------------------+
40 rows in set (0.01 sec)

merger one row with null values to not null values of another row mysql

I want to merge two rows into one.The below format is in the database.
+----+---------+-----------------------+-------------------------+
| id | appid | photo | signature |
+====+=========+=======================+=========================+
| 1 | 10001 | 10001.photograph.jpg | NULL |
| 2 | 10001 | NULL | 10001.signature.jpg |
+----+---------+-----------------------+-------------------------+
I want a mysql query so that i can fetch data like below,
+--------+------------------------+-------------------------+
| appid | photo | signature |
+========+========================+=========================+
|10001 | 10001.photograph.jpg | 10001.signature.jpg |
+--------+------------------------+-------------------------+
Kindly suggest...
You can also use max function
select appid,
max(photo) photo,
max(signature) signature
from test
group by appid
Demo
This should do this:
select t1.appid,t1.photo,t2.signature from mytable t1 join mytable t2 on t1.appid=t2.appid where t1.id=1 and t2.id=2

SQL select statement optimizing (id, parent_id, child_ids)

we have a very old custom db (oracle, mysql, derby) with the restrictions: no new table fileds, no views, no functions, no procedures.
My table MYTABLE:
| id | ... | parent_id |
------------------------
| 1 | ... | |
| 2 | ... | 1 |
| 3 | ... | 1 |
| 4 | ... | 2 |
| 5 | ... | 1 |
and I my first statement:
select * from MYTABLE where id in ('1','2','3','4','5');
give my 5 records.
Then I need the information about the first (no deeper) child ids.
My current solution:
for (record in records) {
// get child ids as comma separated string list
// e.g. "2,3,5" for id 1
String childIds = getChildIds(record.id);
}
with the second statement in getChildIds(record.Id):
select id from MYTABLE where parent_id='record.Id';
So I have 1 + 5 = 6 statements for the required information.
I'm looking for a solution to select the records from the following "imaginary" table with the "imaginary" field "child_ids":
| id | ... | parent_id | child_ids |
------------------------------------
| 1 | ... | | 2,3,5 |
| 2 | ... | 1 | 4 |
| 3 | ... | 1 | |
| 4 | ... | 2 | |
| 5 | ... | 1 | |
Does anyone have an idea how I can get this information with only one statement (or with 2 statements)?
Thanks for your help, Thomas
FOR MYSQL:
How about using the GROUP_CONCAT() function like the following:
SELECT id, parent_id,
GROUP_CONCAT(child_id ORDER BY child_id SEPARATOR ',') AS child_ids
FROM MYTABLE
WHERE id IN ('1','2','3','4','5')
FOR ORACLE:
If you have a later version of Oracle you could use the LISTAGG() function:
SELECT parent_id,
LISTAGG(child_id, ', ') WITHIN GROUP (ORDER BY child_id) "child_ids"
FROM MYTABLE
WHERE id IN ('1','2','3','4','5')
GROUP BY parent_id
FOR DERBY:
I don't know anything about derby, but doing a little research it uses IBM DB2 SQL syntax. So, maybe using a combination of XMLSERIALIZE(), XMLAGG(), and XMLTEXT() will work for you:
SELECT parent_id,
XMLSERIALIZE(XMLAGG(XMLTEXT(child_id) ORDER BY child_id) AS CLOB(30K))
FROM table GROUP BY parent_id

Get distinct results from several tables

I need to implement mysql query to calculate space used by user's mailbox.
A message thread may have multiple messages (reply, follow up) by 2 parties
(sender/recipient) and is tagged with one or more tags (Inbox, Sent etc.).
The following conditions have to be met:
a) user is either recipient OR author of the message;
b) message IS TAGGED by any of the tags: 1,2,3,4;
c) distinct records only, ie if the thread, containing messages is tagged with
more than one of the 4 tags (for example 1 and 4: Inbox and Sent) the calculation
is done on one tag only
I have tried the following query but I am not able to get distinct values - the
subject/body values are duplicated:
SELECT SUM(LENGTH(subject)+LENGTH(body)) AS sum
FROM om_msg_message omm
JOIN om_msg_index omi ON omm.mid = omi.mid
JOIN om_msg_tags_index omti ON omi.thread_id = omti.thread_id AND omti.uid = user_id
WHERE (omi.recipient = user_id OR omi.author = user_id) AND omti.tag_id IN (1,2,3,4)
GROUP BY omi.mid;
Structure of the tables:
om_msg_message - fields subject and body are the ones to be calculated
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| mid | int(10) unsigned | NO | PRI | NULL | auto_increment |
| subject | varchar(255) | NO | | NULL | |
| body | longtext | NO | | NULL | |
| timestamp | int(10) unsigned | NO | | NULL | |
| reply_to_mid | int(10) unsigned | NO | | 0 | |
+--------------+------------------+------+-----+---------+----------------+
om_msg_index
+-----+-----------+-----------+--------+--------+---------+
| mid | thread_id | recipient | author | is_new | deleted |
+-----+-----------+-----------+--------+--------+---------+
| 1 | 1 | 1392 | 1211 | 0 | 0 |
| 2 | 1 | 1211 | 1392 | 1 | 0 |
+-----+-----------+-----------+--------+--------+---------+
om_msg_tags_index
+--------+------+-----------+
| tag_id | uid | thread_id |
+--------+------+-----------+
| 1 | 1211 | 1 |
| 4 | 1211 | 1 |
| 1 | 1392 | 1 |
| 4 | 1392 | 1 |
+--------+------+-----------+
Here's another solution:
SELECT SUM(LENGTH(omm.subject) + LENGTH(omm.body)) as totalLength
FROM om_msg_message omm
JOIN om_msg_index omi
ON omi.mid = omm.mid
AND (omi.recipient = user_id OR omi.author = user_id)
JOIN (SELECT DISTINCT thread_id
FROM om_msg_tags_index
WHERE uid = user_id
AND tag_id IN (1, 2, 3, 4)) omti
ON omti.thread_id = omi.thread_id
I'm assuming that:
user_id is a parameter marker/host variable, being queried for an individual user.
You want the total of all messages per user, not the total length of each message (which is what the GROUP BY clause in your version was getting you).
That mid in both om_msg_message and om_msg_index is unique.
So, your problem is the IN clause. I'm not a MYSQL guru, but in T-SQL you could change it to have a where clause on a subquery that contained an EXISTS so your join didn't pop out two rows. You need to compensate for the fact that you have two rows with different tagID's associated with each row of your primary join data.
The way I could do it cross-platform would be with four left-joins that linked tables then demanded a non-null value for 1, 2, 3, or 4. Fairly inefficient; I'm sure there's a better way to do it in MySQL, but now that you know what the problem is you might know a better solution.