MySQL merging json arrays in group by - mysql

I'm trying to merge a scalar array json field within a group by to have all the distinct values in one list.
Consider the following table:
CREATE TABLE transaction
(
id INT UNSIGNED AUTO_INCREMENT,
source_account_id VARCHAR(32) NOT NULL,
target_account_ids JSON NOT NULL,
PRIMARY KEY (id)
) ENGINE = InnoDB CHARSET utf8mb4;
source_account_ids is a simple array of strings for example '["account1", "account2"]'.
I'd like to gather all the target_account_ids of a single source to have a unified result.
For example:
id
source_account_id
target_account_ids
1.
account1
'["account1", "account2"]'
2.
account1
'["account1", "account3"]'
And the desired result set would be:
source_account_id
target_account_ids
account1
'["account1", "account2", "account3"]'
I tried to play around with JSON_ARRAYAGG but it just adds the arrays within another array and basically results in an "endless" array.

You have to explode the array with JSON_TABLE(), then reduce the values with DISTINCT, then you can recombine them with JSON_ARRAYAGG().
select source_account_id, json_arrayagg(target_account_id) as target_account_ids
from (
select distinct source_account_id, j.account_id as target_account_id
from transaction
cross join json_table(target_account_ids, '$[*]' columns (account_id varchar(32) path '$')) as j
) as t
group by source_account_id;
GROUP_CONCAT() supports a DISTINCT keyword in its argument, but JSON_ARRAYAGG() doesn't (this feature has been requested: https://bugs.mysql.com/bug.php?id=91993).
If this seems like a lot of needless work, or if you can't use JSON_TABLE() because you're still using MySQL 5.7, then you should store multi-valued attributes in normal rows and columns, instead of using JSON.

Related

Mysql - Returning number of rows where JSON document is contained within a target JSON document

I have table with json column data in mysql.
In data column i am expecting to have some array of products for key "products".
I am analyzing the best and fastest approach to get number of rows containing specified value in array of products.
This is what i have tried so far, on 1 million of rows with given results:
SELECT COUNT(*) as "cnt" FROM `components` `c`
WHERE (JSON_CONTAINS(`c`.`data`, '"some product from array"', '$."products"') = true)
first one takes ~4 sec
SELECT COUNT(*) as "cnt" FROM `components` `c`
WHERE (JSON_SEARCH(`c`.`data`, 'one', 'some product from array', null, '$."products"') is not null)
second one takes ~2,5 sec
Is there any faster way i can get this number of rows?
I noticed that from Mysql 8.0.17 it is possible to add multi-valued indexes on a JSON column. Is it possible to create multi-valued index on array of strings. I tried something like:
CREATE INDEX products ON components ( (CAST(data->'$.products' AS VARCHAR(255) ARRAY)) )
but it gives me error. Is there any way to accomplish this?
Best regards.

How to create index on json array type in mysql5.7

Now I use SQL script like SELECT * FROM user WHERE JSON_CONTAINS(users, '[1]');But it will scan full table, it's inefficient. So I want to create the index on users column.
For example, I have a column named users, data looked like [1,2,3,4]. Please tell me how to set index on JSON array type(Generate virtual column). I had read the document on MySQL website, they all talked about to indexing in JSON object type by using JSON_EXTRACT() function.
It's now possible with MySQL 8+
Here is an example:
CREATE TABLE customers (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
custinfo JSON
);
ALTER TABLE customers ADD INDEX comp(id, modified,
(CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );
Use it this way:
SELECT * FROM customers
-> WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
More info:
https://dev.mysql.com/doc/refman/8.0/en/create-index.html
You cannot, not at least the way you intend. At The JSON Data Type we can read:
JSON columns, like columns of other binary types, are not indexed
directly; instead, you can create an index on a generated column that
extracts a scalar value from the JSON column. See Indexing a
Generated Column to Provide a JSON Column Index, for a detailed
example.
So with the restriction comes the workaround ;-)

Where clause for JSON field on array of objects

Working with MySQL 5.7.19, I swear an hour ago this worked, but now im getting nothing returned from my query
CREATE TABLE FlattenedData.blog_posts
(
post_id CHAR(13) NOT NULL PRIMARY KEY UNIQUE,
post_data JSON,
date_published DATETIME NOT NULL, # for primary indexing
date_added DATETIME DEFAULT CURRENT_TIMESTAMP,
date_updated DATETIME DEFAULT CURRENT_TIMESTAMP on UPDATE CURRENT_TIMESTAMP,
post_categories VARCHAR(255) GENERATED ALWAYS AS (post_data->>"$.categories[*].slug"),
post_tags VARCHAR(512) GENERATED ALWAYS AS (post_data->>"$.tags[*].slug"),
KEY idx_date_published (date_published),
INDEX idx_categories (post_categories),
INDEX idx_tags (post_tags),
INDEX idx_categories_tags (post_categories, post_tags)
);
as a note: a post can have multiple categories, just like tags
Here's my query
select
*
from blog_posts
WHERE
post_data->>"$.categories[*].slug" = "site-news"
Like I said, I swear this was working earlier, but now I get nothing back.
Here's the explain:
Even if I delete the indexes and generated columns and just use plain json field, I still suddenly get nothing. The only thing I can use that gets results, is JSON_SEARCH, but there's thousands of records and these are rather large json blobs, and it's possible the searched text can show up in the body
btw, a category field looks like this
[{slug: "site-news", title: "Site News"}, {slug: "personal", title: "Personal"}]
tags follow exactly the same structure
EDIT
I just tried with post_data->>"$.categories[0].slug" = "site-news" and that brought in records. But I need the where clause to take into consideration all elements of the array as I cannot guarantee the array element slot this category is going to be in.
As MySQL's docs state: [*] represents the values of all cells in the array. https://dev.mysql.com/doc/refman/5.7/en/json-path-syntax.html
A core problem of using
select
*
from blog_posts
WHERE
JSON_CONTAINS(post_data->"$.categories[*].slug", json_quote("site-news"))
is that it completely avoids using my indexes, which is going to be key here
Your path expression with wildcard returns an array of values, as an array in JSON notation.
SELECT post_data->>'$[*].slug' FROM blog_posts;
+---------------------------+
| post_data->>'$[*].slug' |
+---------------------------+
| ["site-news", "personal"] |
+---------------------------+
That's clearly not equal to the scalar string 'site-news'.
So you an use JSON_SEARCH() on the JSON array to find a specific string:
SELECT * FROM blog_posts
WHERE JSON_SEARCH(post_data->>'$[*].slug', 'one', 'site-news') IS NOT NULL;
I tested that with MySQL 8.0.3-rc. I loaded this data:
INSERT INTO blog_posts (post_id, date_published, post_data)
VALUES('blah blah', now(), '[{"slug": "site-news", "title": "Site News"}, {"slug": "personal", "title": "Personal"}]');
I know this isn't the format of your post_data, but it still demonstrates that using a wildcard path on JSON returns an array.

Why does comparing a varchar to a numerical value always return True?

I've got the following table in MySQL (MySQL Server 5.7):
CREATE TABLE IF NOT EXISTS SIMCards (
SIMCardID INTEGER UNSIGNED PRIMARY KEY AUTO_INCREMENT,
ICCID VARCHAR(50) UNIQUE NOT NULL,
MSISDN BIGINT UNSIGNED UNIQUE);
INSERT INTO SIMCards (ICCID, MSISDN) VALUES
(89441000154687982548, 905511528749),
(89441000154687982549, 905511528744),
(89441000154687982547, 905511528745);
I then run the following query:
SELECT SIMCardID FROM SIMCards WHERE ICCID = 89441000154687982549;
However, rather than returning just the relevant row, it returns all of them. If I surround the ICCID in quotes, it works fine, e.g.:
SELECT SIMCardID FROM SIMCards WHERE ICCID = '89441000154687982549';
Why does the first SELECT query not work as I expected?
An integer in MySQL has a maximum value (unsigned) of 4294967295. Your IDs are substantially larger than that number. As a result, if you select * from your database by integer, your behavior is going to be undefined because the number you are selecting by cannot be represented by an integer.
I'm not sure exactly why you are getting the results that you are getting, but I do know that trying to select by an integer when your data can't be represented by an integer will definitely not work.
Edit to add detail I forgot: even a bigint in MySQL is not large enough to represent your IDs. So you need to make sure and just always use strings.

SQL coalesce(): what type does the combined column have?

Lets say I use coalesce() to combine two columns into one in select and subsequently a view constructed around such select.
Tables:
values_int
id INTEGER(11) PRIMARY KEY
value INTEGER(11)
values_varchar
id INTEGER(11) PRIMARY KEY
value VARCHAR(255)
vals
id INTEGER(11) PRIMARY KEY
value INTEGER(11) //foreign key to both values_int and values_varchar
The primary keys between values_int and values_varchar are unique and that allows me to do:
SELECT vals.id, coalesce(values_int.value, values_varchar.value) AS value
FROM vals
JOIN values_int ON values_int.id = vals.value
JOIN values_varchar ON values_varchar.id = vals.value
This produces nice assembled view with ID column and combined value column that contains actual values from two other tables combined into single column.
What type does this combined column have?
When turned into view and then queried with a WHERE clause using this combined "value" column, how is that actually handled type-wise? I.e. WHERE value > 10
Som rambling thoughts in the need (most likely wrong):
The reason I am asking this is that the alternative to this design have all three tables merged into one with INT values in one column and VARCHAR in the other. That would of course produce a lots of NULL values in both columns but saved me the JOINs. For some reason I do not like that solution because it would require additional type checking to choose the right column and deal with the NULL values but maybe this presented design would require the same too (if the resulting column is actually VARCHAR). I would hope that it actually passes the WHERE clause down the view to the source (so that the column does NOT have a type per se) but I am likely wrong about that.
You query should be explicit to be clear, In this case mysql is using varchar.
I would write this query like this to be clear
coalesce(values_int.value,cast(values_varchar.value as integer), 0)
or
coalesce(cast(values_int.value as varchar(20)),values_varchar.value,'0')
you should put in that last value unless you want the column to be null if both columns are null.
Returns the data type of expression with the highest data type precedence. If all expressions are nonnullable, the result is typed as nonnullable.
So in your case the type will be VARCHAR(255)
Lets say I use coalesce() to combine two columns into one
NO, that's not the use of COALESCE function. It's used for choosing a provided default value if the column value is null. So in your case, if values_int.value IS NULL then it will select the value in values_varchar.value
coalesce(values_int.value, values_varchar.value) AS value
If you want to combine the data then use concatenation operator (OR) CONCAT() function rather like
concat(values_int.value, values_varchar.value) AS value
Verify it yourself. An easy way to check in MySQL is to DESCRIBE a VIEW you create to capture your dynamic column:
mysql> CREATE VIEW v AS
-> SELECT vals.id, coalesce(values_int.value, values_varchar.value) AS value
-> FROM vals
-> JOIN values_int ON values_int.id = vals.value
-> JOIN values_varchar ON values_varchar.id = vals.value;
Query OK, 0 rows affected (0.01 sec)
Now DESCRIBE v will show you what's what. Note that under MySQL 5.1, I see the column as varbinary(255), but under 5.5 I see varchar(255).