MySQL group and merge JSON values - mysql

I am using some native JSON fields to store information about some application entities in a MySQL 5.7.10 database. I can have 'N' rows per "entity" and need to roll-up and merge the JSON objects together, and any conflicting keys should replace instead of merge. I can do this through code, but if I can do it natively and efficiently in MySQL even better.
I have attempted this using a combination of GROUP_CONCAT and JSON_MERGE, but I've run into two issues:
JSON_MERGE won't take the results of GROUP_CONCAT as a valid argument
JSON_MERGE combines conflicting keys instead of replacing them. What I really need is more of a JSON_SET but with 'N' number of JSON docs instead of "key, value" notation.
Is this possible with the current MySQL JSON implementation?

First of all, GROUP_CONCAT only returns a string, so you have to cast it. Second of all, there is a function doing exactly what you want called JSON_MERGE_PATCH(). Try the following:
SELECT
JSON_MERGE_PATCH(
yourExistingJson,
CAST(
CONCAT(
'[',GROUP_CONCAT(myJson),']'
)
AS JSON)
) AS myJsonArray
....
Just realized your version. You would have to upgrade to 5.7.22 or higher. Is it possible in your case? If not, there may be other ways but they wont be elegant :(

You could do something like the following:
SELECT
CAST(CONCAT(
'[',
GROUP_CONCAT(
DISTINCT JSON_OBJECT(
'foo', mytable.foo,
'bar', mytable.bar
)
),
']'
) AS JSON) AS myJsonArr
FROM mytable
GROUP BY mytable.someGroup;

JSON_MERGE won't take the results of GROUP_CONCAT as a valid argument
GROUP_CONCAT gives a,b,c,d, not a JSON array. Use JSON_ARRAYAGG (introduced in MySQL 5.7.22), which works just like group_concat, but gives a correct array ["a", "b", "c", "d"], that should be accepted with JSON functions.
Prior to 5.7.22, you need to use a workaround:
cast(
concat('["', // begin bracket and quote
group_concat(`field` separator '", "'), // separator comma and quotes
'"]' // end quote and bracket
) as json
)
JSON_MERGE combines conflicting keys instead of replacing them. What I really need is more of a JSON_SET but with 'N' number of JSON docs instead of "key, value" notation.
Use JSON_MERGE_PATCH instead, as introduced in MySQL 5.7.22. JSON_MERGE is a synonym for JSON_MERGE_PRESERVE.
See https://dev.mysql.com/doc/refman/5.7/en/json-function-reference.html.
Read my Best Practices for using MySQL as JSON storage.

Aggregation of JSON Values
For aggregation of JSON values, SQL NULL values are ignored as for other data types. Non-NULL values are converted to a numeric type and aggregated, except for MIN(), MAX(), and GROUP_CONCAT(). The conversion to number should produce a meaningful result for JSON values that are numeric scalars, although (depending on the values) truncation and loss of precision may occur. Conversion to number of other JSON values may not produce a meaningful result.
I just found this in mysql docs

Related

Match Numeric value after comma separated, concatenated by underscore values using MYSQL/MariaDB & REGEXP_SUBSTR

I have field column values stored like:
texta_123,textb_456
My SQL:
SELECT *
FROM mytable
WHERE 456 = REGEXP_SUBSTR(mytable.concatenated_csv_values, 'textb_(?<number>[0-9]+)')
NOTE: I'm aware there are multiple ways of doing this, but for the purposes of example I simplified my query substantially; the part I need to work is REGEXP_SUBSTR()
Effectively, I want to: "query results where an id equals the numeric value extracted after an underscore in a column with comma-separated values"
When I test my Regex, it seems to work fine.
However, in MySQL (technically, I'm using MariaDB 10.4.19), when I run the query I get a warning: "Warning: #1292 Truncated incorrect INTEGER value:textb_456"
You should seriously consider fixing your database design to not store unnormalized CSV data like this. As a temporary workaround, we can use REGEXP_REPLACE along with FIND_IN_SET:
SELECT *
FROM mytable
WHERE FIND_IN_SET(
'456',
REGEXP_REPLACE(concatenated_csv_values, '^.*_', '')) > 0;
The regex trick used here would convert a CSV input of texta_123,textb_456 to just 123,456. Then, we can easily search for a given ID using FIND_IN_SET.

Convert an MySQL script to Presto involving DATE_FORMAT

I'm trying to convert this MySQL line:
if(DATE_FORMAT(y.first_endperiod,"%Y-%m-%d") = DATE_FORMAT(x.end_period,"%Y-%m-%d"), 1, 0) = 1
to PrestoDB. I have tried using date_format, date_parse, and to_char, and all of them return the following error:
An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 40:41: Column '%y-%m-%d' cannot be resolved.
I'm using Athena for querying data from S3 bucket. Any idea how to fix this?
It looks like you're comparing date/time by the date portion, so you should just be able to do this:
CAST(y.first_endperiod AS date) = CAST(x.end_period AS date)
In standard SQL, the double-quotes are used to delimit identifiers, e.g. column names. So your SQL query above is interpreted as if you had a column named %Y-%m-%d. This is unlikely, but technically it'd be a legal identifier in SQL.
You're probably accustomed to MySQL, in which by default double-quotes are used the same as single-quotes, to delimit a string literal. This is a non-standard feature of MySQL.
Switch to single-quotes around your string literals and it should fix your problem:
if(DATE_FORMAT(y.first_endperiod,'%Y-%m-%d') = DATE_FORMAT(x.end_period,'%Y-%m-%d'), 1, 0) = 1
See also Do different databases use different name quote?

Migrate comma separated string to json array

I have an old database, where some columns have comma separated strings stored like this technician,director,website designer
I want to convert them to a JSON array, so I can use the MySQL JSON array type and the methods associated with it.
So basically I am looking for a method to convert technician,director,website designer to ["technician","director","website designer"] in SQL.
The length of the list is arbitrary.
The biggest struggle I am having is how to apply a SQL function on each element in the comma separated string, (So I for example can run the JSON_QUOTE() on each element) as adding the brackets are just a simple CONCAT.
The solution should be for MySQL 5.7.
You can use REPLACE to get the expected string:
SELECT CONCAT('["', REPLACE('technician,director,website designer', ',', '","'), '"]')
-- ["technician","director","website designer"]
Using JSON_VALID you can check if the result of the conversion is a valid JSON value:
SELECT JSON_VALID(CONCAT('["', REPLACE('technician,director,website designer', ',', '","'), '"]'))
-- 1
demo on dbfiddle.uk

MySQL 5.7 - Query to set the value of a JSON key to a JSON Object

Using MySQL 5.7, how to set the value of a JSON key in a JSON column to a JSON object rather than a string.
I used this query:
SELECT json_set(profile, '$.twitter', '{"key1":"val1", "key2":"val2"}')
from account WHERE id=2
Output:
{"twitter": "{\"key1\":\"val1\", \"key2\":\"val2\"}", "facebook": "value", "googleplus": "google_val"}
But it seems like it considers it as a string since the output escapes the JSON characters in it. Is it possible to do that without using JSON_OBJECT()?
There's a couple of options that I know of:
Use the JSON_UNQUOTE function to unquote the output (ie not cast it to string) as documented here
Possibly use the ->> operator and select a specific path, documented here
Has a lot of implications, but you could disable backslashes as an escape character. I haven't tried this, so I don't even know if that works, but it's mentioned in the docs
On balance, I'd either use the ->> operator, or handle the conversion on the client side, depending on what you want to do.

PostgreSQL return result set as JSON array?

I would like to have PostgreSQL return the result of a query as one JSON array. Given
create table t (a int primary key, b text);
insert into t values (1, 'value1');
insert into t values (2, 'value2');
insert into t values (3, 'value3');
I would like something similar to
[{"a":1,"b":"value1"},{"a":2,"b":"value2"},{"a":3,"b":"value3"}]
or
{"a":[1,2,3], "b":["value1","value2","value3"]}
(actually it would be more useful to know both). I have tried some things like
select row_to_json(row) from (select * from t) row;
select array_agg(row) from (select * from t) row;
select array_to_string(array_agg(row), '') from (select * from t) row;
And I feel I am close, but not there really. Should I be looking at other documentation except for 9.15. JSON Functions and Operators?
By the way, I am not sure about my idea. Is this a usual design decision? My thinking is that I could, of course, take the result (for example) of the first of the above 3 queries and manipulate it slightly in the application before serving it to the client, but if PostgreSQL can create the final JSON object directly, it would be simpler, because I still have not included any dependency on any JSON library in my application.
TL;DR
SELECT json_agg(t) FROM t
for a JSON array of objects, and
SELECT
json_build_object(
'a', json_agg(t.a),
'b', json_agg(t.b)
)
FROM t
for a JSON object of arrays.
List of objects
This section describes how to generate a JSON array of objects, with each row being converted to a single object. The result looks like this:
[{"a":1,"b":"value1"},{"a":2,"b":"value2"},{"a":3,"b":"value3"}]
9.3 and up
The json_agg function produces this result out of the box. It automatically figures out how to convert its input into JSON and aggregates it into an array.
SELECT json_agg(t) FROM t
There is no jsonb (introduced in 9.4) version of json_agg. You can either aggregate the rows into an array and then convert them:
SELECT to_jsonb(array_agg(t)) FROM t
or combine json_agg with a cast:
SELECT json_agg(t)::jsonb FROM t
My testing suggests that aggregating them into an array first is a little faster. I suspect that this is because the cast has to parse the entire JSON result.
9.2
9.2 does not have the json_agg or to_json functions, so you need to use the older array_to_json:
SELECT array_to_json(array_agg(t)) FROM t
You can optionally include a row_to_json call in the query:
SELECT array_to_json(array_agg(row_to_json(t))) FROM t
This converts each row to a JSON object, aggregates the JSON objects as an array, and then converts the array to a JSON array.
I wasn't able to discern any significant performance difference between the two.
Object of lists
This section describes how to generate a JSON object, with each key being a column in the table and each value being an array of the values of the column. It's the result that looks like this:
{"a":[1,2,3], "b":["value1","value2","value3"]}
9.5 and up
We can leverage the json_build_object function:
SELECT
json_build_object(
'a', json_agg(t.a),
'b', json_agg(t.b)
)
FROM t
You can also aggregate the columns, creating a single row, and then convert that into an object:
SELECT to_json(r)
FROM (
SELECT
json_agg(t.a) AS a,
json_agg(t.b) AS b
FROM t
) r
Note that aliasing the arrays is absolutely required to ensure that the object has the desired names.
Which one is clearer is a matter of opinion. If using the json_build_object function, I highly recommend putting one key/value pair on a line to improve readability.
You could also use array_agg in place of json_agg, but my testing indicates that json_agg is slightly faster.
There is no jsonb version of the json_build_object function. You can aggregate into a single row and convert:
SELECT to_jsonb(r)
FROM (
SELECT
array_agg(t.a) AS a,
array_agg(t.b) AS b
FROM t
) r
Unlike the other queries for this kind of result, array_agg seems to be a little faster when using to_jsonb. I suspect this is due to overhead parsing and validating the JSON result of json_agg.
Or you can use an explicit cast:
SELECT
json_build_object(
'a', json_agg(t.a),
'b', json_agg(t.b)
)::jsonb
FROM t
The to_jsonb version allows you to avoid the cast and is faster, according to my testing; again, I suspect this is due to overhead of parsing and validating the result.
9.4 and 9.3
The json_build_object function was new to 9.5, so you have to aggregate and convert to an object in previous versions:
SELECT to_json(r)
FROM (
SELECT
json_agg(t.a) AS a,
json_agg(t.b) AS b
FROM t
) r
or
SELECT to_jsonb(r)
FROM (
SELECT
array_agg(t.a) AS a,
array_agg(t.b) AS b
FROM t
) r
depending on whether you want json or jsonb.
(9.3 does not have jsonb.)
9.2
In 9.2, not even to_json exists. You must use row_to_json:
SELECT row_to_json(r)
FROM (
SELECT
array_agg(t.a) AS a,
array_agg(t.b) AS b
FROM t
) r
Documentation
Find the documentation for the JSON functions in JSON functions.
json_agg is on the aggregate functions page.
Design
If performance is important, ensure you benchmark your queries against your own schema and data, rather than trust my testing.
Whether it's a good design or not really depends on your specific application. In terms of maintainability, I don't see any particular problem. It simplifies your app code and means there's less to maintain in that portion of the app. If PG can give you exactly the result you need out of the box, the only reason I can think of to not use it would be performance considerations. Don't reinvent the wheel and all.
Nulls
Aggregate functions typically give back NULL when they operate over zero rows. If this is a possibility, you might want to use COALESCE to avoid them. A couple of examples:
SELECT COALESCE(json_agg(t), '[]'::json) FROM t
Or
SELECT to_jsonb(COALESCE(array_agg(t), ARRAY[]::t[])) FROM t
Credit to Hannes Landeholm for pointing this out
Also if you want selected fields from the table and aggregate them as an array:
SELECT json_agg(json_build_object('data_a',a,
'data_b',b,
)) from t;
The result will look like this:
[{'data_a':1,'data_b':'value1'}
{'data_a':2,'data_b':'value2'}]