Where clause for JSON field on array of objects - mysql

Working with MySQL 5.7.19, I swear an hour ago this worked, but now im getting nothing returned from my query
CREATE TABLE FlattenedData.blog_posts
(
post_id CHAR(13) NOT NULL PRIMARY KEY UNIQUE,
post_data JSON,
date_published DATETIME NOT NULL, # for primary indexing
date_added DATETIME DEFAULT CURRENT_TIMESTAMP,
date_updated DATETIME DEFAULT CURRENT_TIMESTAMP on UPDATE CURRENT_TIMESTAMP,
post_categories VARCHAR(255) GENERATED ALWAYS AS (post_data->>"$.categories[*].slug"),
post_tags VARCHAR(512) GENERATED ALWAYS AS (post_data->>"$.tags[*].slug"),
KEY idx_date_published (date_published),
INDEX idx_categories (post_categories),
INDEX idx_tags (post_tags),
INDEX idx_categories_tags (post_categories, post_tags)
);
as a note: a post can have multiple categories, just like tags
Here's my query
select
*
from blog_posts
WHERE
post_data->>"$.categories[*].slug" = "site-news"
Like I said, I swear this was working earlier, but now I get nothing back.
Here's the explain:
Even if I delete the indexes and generated columns and just use plain json field, I still suddenly get nothing. The only thing I can use that gets results, is JSON_SEARCH, but there's thousands of records and these are rather large json blobs, and it's possible the searched text can show up in the body
btw, a category field looks like this
[{slug: "site-news", title: "Site News"}, {slug: "personal", title: "Personal"}]
tags follow exactly the same structure
EDIT
I just tried with post_data->>"$.categories[0].slug" = "site-news" and that brought in records. But I need the where clause to take into consideration all elements of the array as I cannot guarantee the array element slot this category is going to be in.
As MySQL's docs state: [*] represents the values of all cells in the array. https://dev.mysql.com/doc/refman/5.7/en/json-path-syntax.html
A core problem of using
select
*
from blog_posts
WHERE
JSON_CONTAINS(post_data->"$.categories[*].slug", json_quote("site-news"))
is that it completely avoids using my indexes, which is going to be key here

Your path expression with wildcard returns an array of values, as an array in JSON notation.
SELECT post_data->>'$[*].slug' FROM blog_posts;
+---------------------------+
| post_data->>'$[*].slug' |
+---------------------------+
| ["site-news", "personal"] |
+---------------------------+
That's clearly not equal to the scalar string 'site-news'.
So you an use JSON_SEARCH() on the JSON array to find a specific string:
SELECT * FROM blog_posts
WHERE JSON_SEARCH(post_data->>'$[*].slug', 'one', 'site-news') IS NOT NULL;
I tested that with MySQL 8.0.3-rc. I loaded this data:
INSERT INTO blog_posts (post_id, date_published, post_data)
VALUES('blah blah', now(), '[{"slug": "site-news", "title": "Site News"}, {"slug": "personal", "title": "Personal"}]');
I know this isn't the format of your post_data, but it still demonstrates that using a wildcard path on JSON returns an array.

Related

MySQL merging json arrays in group by

I'm trying to merge a scalar array json field within a group by to have all the distinct values in one list.
Consider the following table:
CREATE TABLE transaction
(
id INT UNSIGNED AUTO_INCREMENT,
source_account_id VARCHAR(32) NOT NULL,
target_account_ids JSON NOT NULL,
PRIMARY KEY (id)
) ENGINE = InnoDB CHARSET utf8mb4;
source_account_ids is a simple array of strings for example '["account1", "account2"]'.
I'd like to gather all the target_account_ids of a single source to have a unified result.
For example:
id
source_account_id
target_account_ids
1.
account1
'["account1", "account2"]'
2.
account1
'["account1", "account3"]'
And the desired result set would be:
source_account_id
target_account_ids
account1
'["account1", "account2", "account3"]'
I tried to play around with JSON_ARRAYAGG but it just adds the arrays within another array and basically results in an "endless" array.
You have to explode the array with JSON_TABLE(), then reduce the values with DISTINCT, then you can recombine them with JSON_ARRAYAGG().
select source_account_id, json_arrayagg(target_account_id) as target_account_ids
from (
select distinct source_account_id, j.account_id as target_account_id
from transaction
cross join json_table(target_account_ids, '$[*]' columns (account_id varchar(32) path '$')) as j
) as t
group by source_account_id;
GROUP_CONCAT() supports a DISTINCT keyword in its argument, but JSON_ARRAYAGG() doesn't (this feature has been requested: https://bugs.mysql.com/bug.php?id=91993).
If this seems like a lot of needless work, or if you can't use JSON_TABLE() because you're still using MySQL 5.7, then you should store multi-valued attributes in normal rows and columns, instead of using JSON.

Mysql - Returning number of rows where JSON document is contained within a target JSON document

I have table with json column data in mysql.
In data column i am expecting to have some array of products for key "products".
I am analyzing the best and fastest approach to get number of rows containing specified value in array of products.
This is what i have tried so far, on 1 million of rows with given results:
SELECT COUNT(*) as "cnt" FROM `components` `c`
WHERE (JSON_CONTAINS(`c`.`data`, '"some product from array"', '$."products"') = true)
first one takes ~4 sec
SELECT COUNT(*) as "cnt" FROM `components` `c`
WHERE (JSON_SEARCH(`c`.`data`, 'one', 'some product from array', null, '$."products"') is not null)
second one takes ~2,5 sec
Is there any faster way i can get this number of rows?
I noticed that from Mysql 8.0.17 it is possible to add multi-valued indexes on a JSON column. Is it possible to create multi-valued index on array of strings. I tried something like:
CREATE INDEX products ON components ( (CAST(data->'$.products' AS VARCHAR(255) ARRAY)) )
but it gives me error. Is there any way to accomplish this?
Best regards.

Postgres: Create jsonb object with given set of keys and a default value

I am using PostgreSQL 9.6. I have an array like ARRAY['a', 'b']::text[] which comes from application code and is transformed a bit in SQL, so I do not know its length in an application code.
In a table I have a field of type jsonb which I need to set to a json object, where keys are values from the given array and the values are all the same and equal to current timestamp, i.e
| id | my_field |
---------------------------------------------------------
| 1 | {"a":"1544605046.21065", "b":"1544605046.21065"} |
I am trying to find an update query to perform this update, e.g. something like
UPDATE mytable
SET my_field = some_function(ARRAY['a','b']::text[], EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)
WHERE <some_condition>;
I was looking at jsonb_build_object function, which is likely to help me, if I could transform my array, interleaving its elements with current timestamp, however I did not find a way to do this.
Please note, that I am likely to have hundreds of thousands of records to update, therefore I am looking for a fast implementation.
I would be grateful for any advice on this matter.
demo:db<>fiddle
UPDATE my_table
SET my_field = s.json_data
FROM (
SELECT jsonb_object_agg(key, extract(epoch from current_timestamp)) as json_data
FROM unnest(array['a', 'b']) as u(key)
) s
WHERE <some condition>
For using the array element as keys for the json object you need to separate them by unnest. This creates one row for each element.
Aggregating the rows with jsonb_object_agg(key, value). As key your are taking the column of array elements. As value the current_timestamp. This function aggregates into your expected syntax.
Putting this into a subquery allows you to do an update.

How to create index on json array type in mysql5.7

Now I use SQL script like SELECT * FROM user WHERE JSON_CONTAINS(users, '[1]');But it will scan full table, it's inefficient. So I want to create the index on users column.
For example, I have a column named users, data looked like [1,2,3,4]. Please tell me how to set index on JSON array type(Generate virtual column). I had read the document on MySQL website, they all talked about to indexing in JSON object type by using JSON_EXTRACT() function.
It's now possible with MySQL 8+
Here is an example:
CREATE TABLE customers (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
custinfo JSON
);
ALTER TABLE customers ADD INDEX comp(id, modified,
(CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );
Use it this way:
SELECT * FROM customers
-> WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON));
More info:
https://dev.mysql.com/doc/refman/8.0/en/create-index.html
You cannot, not at least the way you intend. At The JSON Data Type we can read:
JSON columns, like columns of other binary types, are not indexed
directly; instead, you can create an index on a generated column that
extracts a scalar value from the JSON column. See Indexing a
Generated Column to Provide a JSON Column Index, for a detailed
example.
So with the restriction comes the workaround ;-)

Comparing strings up to column length (using index)

Basically what I want to do is to reverse the column LIKE 'string%' behavior. Consider following table:
CREATE TABLE texts (
id int not null,
txt varchar(30) not null,
primary key(id),
key `txt_idx` (txt)
) engine=InnoDB;
INSERT INTO texts VALUES(1, 'abcd');
According to B-Tree Index Characteristics following query will utilize txt_idx index:
SELECT txt FROM texts WHERE txt LIKE 'abc%';
Now I want somewhat different behavior. I want the 'abcd' row to be returned when queried for 'abcde'. At the moment I've got stuck with this query:
SELECT txt FROM texts WHERE 'abcde' LIKE CONCAT(txt, '%');
Obviously (confirmed by explain) it does not utilize any index, but my intuition tells me it should be possible to compare particular value against index up to indexed value length (just like strncmp does).
The main reason for this is my huge table with domain entries. I want to select both "example.org" and "something.example.org" (but not "else.example.org") when querying for "www.something.example.org". Splitting and performing multiple queries or applying OR clauses seems to work too slow for me unfortunately.
The only thing I can think of is to convert it to the equivalent IN test:
WHERE txt IN ('a', 'ab', 'abc', 'abcd', 'abcde')