MySQL many-many JSON aggregation merging duplicate keys

MySQL many-many JSON aggregation merging duplicate keys - mysql

I'm having trouble returning a JSON representation of a many-many join. My plan was to encode the columns returned using the following JSON format
{
"dog": [
"duke"
],
"location": [
"home",
"scotland"
]
}
This format would handle duplicate keys by aggregating the results in a JSON array, howver all of my attempts at aggregating this structure so far have just removed duplicates, so the arrays only ever have a single element.
Tables
Here is a simplified table structure I've made for the purposes of explaining this query.
media
| media_id | sha256 | filepath |
| 1 | 33327AD02AD09523C66668C7674748701104CE7A9976BC3ED8BA836C74443DBC | /photos/cat.jpeg |
| 2 | 323b5e69e72ba980cd4accbdbb59c5061f28acc7c0963fee893c9a40db929070 | /photos/dog.jpeg |
| 3 | B986620404660DCA7B3DEC4EFB2DE80C0548AB0DE243B6D59DA445DE2841E474 | /photos/dog2.jpeg |
| 4 | 1be439dd87cd87087a425c760d6d8edc484f126b5447beb2203d21e09e2a8f11 | /photos/balloon.jpeg |
media_metdata_labels_has_media (for many-many joins)
| media_metadata_labels_label_id | media_media_id |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 5 | 2 |
| 1 | 3 |
| 6 | 3 |
| 7 | 3 |
| 8 | 4 |
| 9 | 4 |
media_metadata_labels
| label_id | label_key | label_value |
| 2 | cat | lily |
| 4 | dog | duke |
| 6 | dog | rex |
| 1 | pet size | small |
| 3 | location | home |
| 7 | location | park |
| 8 | location | scotland |
| 9 | location | sky |
| 5 | location | studio |
My current attempt
My latest attempt at querying this data uses JSON_MERGE_PRESERVE with two arguments, the first is just an empty JSON object and the second is an invalid JSON document. It's invalid because there are duplicate keys, but I was hoping that JSON_MERGE_PRESERVE would merge them. It turns out JSON_MERGE_PRESERVE will only merge duplicates if they're not in the same JSON argument.
For example, this won't merge two keys
SET #key_one = '{}';
SET #key_two = '{"location": ["home"], "location": ["scotland"]}';
SELECT JSON_MERGE_PRESERVE(#key_one, #key_two);
-- returns {"location": ["scotland"]}
but this will
SET #key_one = '{"location": ["home"] }';
SET #key_two = '{"location": ["scotland"]}';
SELECT JSON_MERGE_PRESERVE(#key_one, #key_two);
-- returns {"location": ["home", "scotland"]}
So anyway, here's my current attempt
SELECT
m.media_id,
m.filepath,
JSON_MERGE_PRESERVE(
'{}',
CAST(
CONCAT(
'{',
GROUP_CONCAT(CONCAT('"', l.label_key, '":["', l.label_value, '"]')),
'}'
)
AS JSON)
)
as labels
FROM media AS m
LEFT JOIN media_metadata_labels_has_media AS lm ON lm.media_media_id = m.media_id
LEFT JOIN media_metadata_labels AS l ON l.label_id = lm.media_metadata_labels_label_id
GROUP BY m.media_id, m.filepath
-- HAVING JSON_CONTAINS(labels, '"location"', CONCAT('$.', '"home"')); -- this would let me filter on labels one they're in the correct JSON format
After trying different combinations of JSON_MERGE, JSON_OBJECTAGG, JSON_ARRAYAGG, CONCAT and GROUP_CONCAT this still leaves me scratching my head.

Disclaimer: Since posting this question I've started using mariadb instead of oracle MySQL. The function below should work for MySQL too, but in case it doesn't then any changes required will likely be small syntax fixes.
I solved this by creating a custom aggregation function
DELIMITER //
CREATE AGGREGATE FUNCTION JSON_LABELAGG (
json_key TEXT,
json_value TEXT
) RETURNS JSON
BEGIN
DECLARE complete_json JSON DEFAULT '{}';
DECLARE current_jsonpath TEXT;
DECLARE current_jsonpath_value_type TEXT;
DECLARE current_jsonpath_value JSON;
DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN complete_json;
main_loop: LOOP
FETCH GROUP NEXT ROW;
SET current_jsonpath = CONCAT('$.', json_key); -- the jsonpath to our json_key
SET current_jsonpath_value_type = JSON_TYPE(JSON_EXTRACT(complete_json, current_jsonpath)); -- the json object type at the current path
SET current_jsonpath_value = JSON_QUERY(complete_json, current_jsonpath); -- the json value at the current path
-- if this is the first label value with this key then place it in a new array
IF (ISNULL(current_jsonpath_value_type)) THEN
SET complete_json = JSON_INSERT(complete_json, current_jsonpath, JSON_ARRAY(json_value));
ITERATE main_loop;
END IF;
-- confirm that an array is at this jsonpath, otherwise that's an exception
CASE current_jsonpath_value_type
WHEN 'ARRAY' THEN
-- check if our json_value is already within the array and don't push a duplicate if it is
IF (ISNULL(JSON_SEARCH(JSON_EXTRACT(complete_json, current_jsonpath), "one", json_value))) THEN
SET complete_json = JSON_ARRAY_APPEND(complete_json, current_jsonpath, json_value);
END IF;
ITERATE main_loop;
ELSE
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'Expected JSON label object to be an array';
END CASE;
END LOOP;
RETURN complete_json;
END //
DELIMITER ;
and editing my query to use it
SELECT
m.media_id,
m.filepath,
JSON_LABELAGG(l.label_key, l.label_value) as labels
FROM media AS m
LEFT JOIN media_metadata_labels_has_media AS lm ON lm.media_media_id = m.media_id
LEFT JOIN media_metadata_labels AS l ON l.label_id = lm.media_metadata_labels_label_id
GROUP BY m.media_id, m.filepath

Related

Parse JSON Array where each member has different schema but same general structure

I have a JSON data feed coming into SQL Server 2016. One of the attributes I must parse contains a JSON array. Unfortunately, instead of implementing a key/value design, the source system sends each member of the array with a different attribute name. The attribute names are not known in advance, and are subject to change/volatility.
declare #json nvarchar(max) =
'{
"objects": [
{"foo":"fooValue"},
{"bar":"barValue"},
{"baz":"bazValue"}
]
}';
select * from openjson(json_query(#json, 'strict $.objects'));
As you can see:
element 0 has a "foo" attribute
element 1 has a "bar" attribute
element 2 has a "baz" attribute:
+-----+--------------------+------+
| key | value | type |
+-----+--------------------+------+
| 0 | {"foo":"fooValue"} | 5 |
| 1 | {"bar":"barValue"} | 5 |
| 2 | {"baz":"bazValue"} | 5 |
+-----+--------------------+------+
Ideally, I would like to parse and project the data like so:
+-----+---------------+----------------+------+
| key | attributeName | attributeValue | type |
+-----+---------------+----------------+------+
| 0 | foo | fooValue | 5 |
| 1 | bar | barValue | 5 |
| 2 | baz | bazValue | 5 |
+-----+---------------+----------------+------+
Reminder: The attribute names are not known in advance, and are subject to change/volatility.

select o.[key], v.* --v.[key] as attributeName, v.value as attributeValue
from openjson(json_query(#json, 'strict $.objects')) as o
cross apply openjson(o.[value]) as v;

Cross table with multiselect

I have a table with 2 Columns, filled with strings
CREATE TABLE [tbl_text]
(
[directoryName] nvarchar(200),
[text1] nvarchar(200),
[text2] nvarchar(200)
)
The Strings are build like the following
| Text1 | Text2 |
|------------|----------|
|tz1 tz3 tz2 | al1 al2 |
| tz1 tz3 | al1 al3 |
| tz2 | al3 |
| tz3 tz2 | al1 al2 |
Now i want to Count how many times the TestN or TextN are resulting in the
| Text1 | al1 | al2 | al3 |
|-------|------|------|------|
| tz1 | 2 | 1 | 1 |
| tz2 | 2 | 2 | 1 |
| tz3 | 3 | 2 | 1 |
i tried solving it with an sql-query like this:
TRANSFORM Count(tt.directoryName) AS Value
SELECT tt.Text1
FROM tbl_text as tt
GROUP BY tt.Text1
PIVOT tt.Text2;
This works fine if i got fields only with one value like the third column (the complete datasource has to be like a one-value-style)
But in my case i'm using the strings for a multiselect...
If i try to conform this query onto a datasource filled with the " " between the values the result is complete messed up
Any suggestions how the query should look like to get this result ?

You'll have to split the strings inside Text1/Text2 before you can do anything with them. In VBA, you'd loop a recordset, use the Split() function and insert the results into a temp table.
In Sql Server there are more powerful options available.
Coming from here: Split function equivalent in T-SQL? ,
you should read this page:
http://www.sommarskog.se/arrays-in-sql-2005.html#tablelists

MYSQL Function with a calcuation based on data in a db column

I Have a table that is a lookup for scoring points based on Place (P) and Number of Racers(R)
and scoring formats indicated by points_id. Two cases are shown in the table.
Sometime the points are determined directly by the values of P and N as in points_id =3
other times they are most easily determined by a simple calculation shown in the pts_calc column.
|points_id| P | N |points|pts_calc|
| 1 | 0 | 0 | NULL | pin |
| 1 |DNS| 0 | NULL | nin+1 |
| 3 | 1 | 0 |102.00| NULL |
| 3 | 2 | 0 | 98.00| NULL |
| 3 | 3 | 0 | 96.00| NULL |
| 3 | 4 | 0 | 93.00| NULL |
| 3 | 5 | 0 | 91.00| NULL |
| 3 | 6 | 0 | 89.00| NULL |
| 3 |DNF| 0 | 85.00| NULL |
I was hoping to create a function that returned the points from the three input variables.
points_id, P, N.
Below is what I tried.
CREATE FUNCTION POINTS(pid INT,pin VARCHAR(3),nin INT)
RETURNS DEC(6,2)
DETERMINISTIC
BEGIN
DECLARE pts DECIMAL(6,2);
DECLARE pcalc VARCHAR(20);
SELECT points,pts_calc INTO pts,pcalc FROM scoring_points WHERE points_id=pid AND (P=pin OR P='0') AND (N=nin or N=0);
IF(pts IS NULL) THEN
SET #s= CONCAT('SET pts = ',pcalc);
PREPARE stmt FROM #s;
EXECUTE stmt;
END IF;
RETURN pts;
END
But i got this error.
1336 - Dynamic SQL is not allowed in stored function or trigger
Further research show the Prepare statement is not allowed in functions only but procedures.
I was hoping to do something like;
SELECT SUM(Points(pid,place,numb)) FROM t1 GROUP BY racer.id
But onto plan B (tbd) unless someone has great idea.

I think you might fare better having three numeric columns instead of your pts_calc column:
cPIN - coefficient of pin term
cNIN - coefficient of nin term
cnst - constant term
Your function could then perform:
SELECT IFNULL(points, cPIN*pin + cNIN*nin + cnst) INTO pts
FROM scoring_points
WHERE ...
Depending on your needs, you might even be able to get rid of the points column by just using cnst and leaving the other two equal to 0.

Hierarchical queries in MySQL

I'm trying to find all the parents, grandparents, etc. of a particular field with any depth. For example, given the below structure, if I provide 5, the values returned should be 1, 2, 3 and 4.
| a | b |
-----------
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 3 | 6 |
| 4 | 7 |
How would I do this?

SELECT #id :=
(
SELECT senderid
FROM mytable
WHERE receiverid = #id
) AS person
FROM (
SELECT #id := 5
) vars
STRAIGHT_JOIN
mytable
WHERE #id IS NOT NULL

The following answer is not MYSQL-only, but uses PHP. This answer can be useful for all those that end up on this page during their search (as I did) but are not limited to using MYSQL only.
If you have a database with a nested structure of unknown depth, you can print out the contents using a recursive loop:
function goDownALevel($parent){
$children = $parent->getChildren(); //underlying SQL function
if($children != null){
foreach($children as $child){
//Print the child content here
goDownALevel($child);
}
}
}
This function can also be rewritten in any other language like Javascript.

Getting limited amount of records from hierarchical data

Let's say I have 3 tables (significant columns only)
Category (catId key, parentCatId)
Category_Hierarchy (catId key, parentTrail, catLevel)
Product (prodId key, catId, createdOn)
There's a reason for having a separate Category_Hierarchy table, because I'm using triggers on Category table that populate it, because MySql triggers work as they do and I can't populate columns on the same table inside triggers if I would like to use auto_increment values. For the sake of this problem this is irrelevant. These two tables are 1:1 anyway.
Category table could be:
+-------+-------------+
| catId | parentCatId |
+-------+-------------+
| 1 | NULL |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
| 5 | 3 |
| 6 | 4 |
| ... | ... |
+-------+-------------+
Category_Hierarchy
+-------+-------------+----------+
| catId | parentTrail | catLevel |
+-------+-------------+----------+
| 1 | 1/ | 0 |
| 2 | 1/2/ | 1 |
| 3 | 1/2/3/ | 2 |
| 4 | 1/2/3/4/ | 3 |
| 5 | 1/2/3/5/ | 3 |
| 6 | 1/2/3/4/6/ | 4 |
| ... | ... | ... |
+-------+-------------+----------+
Product
+--------+-------+---------------------+
| prodId | catId | createdOn |
+--------+-------+---------------------+
| 1 | 4 | 2010-02-03 12:09:24 |
| 2 | 4 | 2010-02-03 12:09:29 |
| 3 | 3 | 2010-02-03 12:09:36 |
| 4 | 1 | 2010-02-03 12:09:39 |
| 5 | 3 | 2010-02-03 12:09:50 |
| ... | ... | ... |
+--------+-------+---------------------+
Category_Hierarchy makes it simple to get category subordinate trees like this:
select c.*
from Category c
join Category_Hierarchy h
on (h.catId = c.catId)
where h.parentTrail like '1/2/3/%'
Which would return complete subordinate tree of category 3 (that is below 2, that is below 1 which is root category) including subordinate tree root node. Excluding root node is just one more where condition.
The problem
I would like to write a stored procedure:
create procedure GetLatestProductsFromSubCategories(in catId int)
begin
/* return 10 latest products from each */
/* catId subcategory subordinate tree */
end;
This means if a certain category had 3 direct sub categories (with whatever number of nodes underneath) I would get 30 results (10 from each subordinate tree). If it had 5 sub categories I'd get 50 results.
What would be the best/fastest/most efficient way to do this? If possible I'd like to avoid cursors unless they'd work faster compared to any other solution as well as prepared statements, because this would be one of the most frequent calls to DB.
Edit
Since a picture tells 1000 words I'll try to better explain what I want using an image. Below image shows category tree. Each of these nodes can have an arbitrary number of products related to them. Products are not included in the picture.
So if I'd execute this call:
call GetLatestProductsFromSubCategories(1);
I'd like to effectively get 30 products:
10 latest products from the whole orange subtree
10 latest products from the whole blue subtree and
10 latest products from the whole green subtree
I don't want to get 10 latest products from each node under catId=1 node which would mean 320 products.

Final Solution
This solution has O(n) performance:
CREATE PROCEDURE foo(IN in_catId INT)
BEGIN
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE first_iteration BOOLEAN DEFAULT TRUE;
DECLARE current VARCHAR(255);
DECLARE categories CURSOR FOR
SELECT parentTrail
FROM category
JOIN category_hierarchy USING (catId)
WHERE parentCatId = in_catId;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = TRUE;
SET #query := '';
OPEN categories;
category_loop: LOOP
FETCH categories INTO current;
IF `done` THEN LEAVE category_loop; END IF;
IF first_iteration = TRUE THEN
SET first_iteration = FALSE;
ELSE
SET #query = CONCAT(#query, " UNION ALL ");
END IF;
SET #query = CONCAT(#query, "(SELECT product.* FROM product JOIN category_hierarchy USING (catId) WHERE parentTrail LIKE CONCAT('",current,"','%') ORDER BY createdOn DESC LIMIT 10)");
END LOOP category_loop;
CLOSE categories;
IF #query <> '' THEN
PREPARE stmt FROM #query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
END
Edit
Due to the latest clarification, this solution was simply edited to simplify the categories cursor query.
Note: Make the VARCHAR on line 5 the appropriate size based on your parentTrail column.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL many-many JSON aggregation merging duplicate keys - mysql

Related

Parse JSON Array where each member has different schema but same general structure

Cross table with multiselect

MYSQL Function with a calcuation based on data in a db column

Hierarchical queries in MySQL

Getting limited amount of records from hierarchical data

Categories

Resources