In a MySQL 8 JSON column I have a JSON object with values of different types (but with no nested objects). Like this:
{
"abc": "Something123",
"foo": 63.4,
"bar": "Hi world!",
"xyz": false
}
What is the simplest way to select the joined string values? As an example, from the above JSON we should get "Something123 Hi world!"?
Here's a solution for MySQL 8.0:
select group_concat(json_unquote(json_extract(t.data, concat('$.', j.`key`))) separator '') as joined_string
from mytable cross join json_table(json_keys(mytable.data), '$[*]' columns (`key` varchar(20) path '$')) j
join mytable t on json_type(json_extract(t.data, concat('$.', j.`key`)))='STRING';
Output given your example data:
+-----------------------+
| joined_string |
+-----------------------+
| Something123Hi world! |
+-----------------------+
I wonder, however, if this is a good idea to store data in JSON if you need to do this. The solution is complex to develop, and would be hard to debug or modify.
Using JSON to store data as a document when you really want SQL predicates to treat the document's fields as discrete elements is harder than using normal rows and columns.
If you're using MySQL 5.7 or earlier, then the JSON_TABLE() function is not supported. In that case, I'd suggest fetching the whole JSON document into your application, and explode it into an object you can manipulate.
Here is a solution:
SELECT GROUP_CONCAT(va SEPARATOR ' ') str_vals_only
FROM
(
SELECT id, JSON_UNQUOTE(JSON_EXTRACT(my_json_col, CONCAT('$.', j.obj_key))) va
FROM my_tbl,
JSON_TABLE(JSON_KEYS(my_json_col), '$[*]' COLUMNS(obj_key TEXT PATH "$")) j
WHERE JSON_TYPE(JSON_EXTRACT(my_json_col, CONCAT('$.', j.obj_key))) = 'STRING'
) a
GROUP BY id
Not very simple and concise, but I guess that's as good as it gets.
I love the JSON support in MySQL. For some specific cases the JSON type can provide a very nice and flexible solution. That's the main reason I don't even think about switching to MariaDB (which has a much more limited JSON support). But I wish there were more functions for JSON manipulation. The JSON_TABLE function is very powerful, but it can be a bit complicated and verbose.
Related
So I have three databases - an Oracle one, SQL Server one, and a Postgres one. I have a table that has two columns: name, and value, both are texts. The value is a stringified JSON object. I need to update the nested value.
This is what I currently have:
name: 'MobilePlatform',
value:
'{
"iosSupported":true,
"androidSupported":false,
}'
I want to add {"enableTwoFactorAuth": false} into it.
In PostgreSQL you should be able to do this:
UPDATE mytable
SET MobilePlatform = jsonb_set(MobilePlatform::jsonb, '{MobilePlatform,enableTwoFactorAuth}', 'false');
In Postgres, the plain concatenation operator || for jsonb could do it:
UPDATE mytable
SET value = value::jsonb || '{"enableTwoFactorAuth":false}'::jsonb
WHERE name = 'MobilePlatform';
If a top-level key "enableTwoFactorAuth" already exists, it is replaced. So it's an "upsert" really.
Or use jsonb_set() for manipulating nested values.
The cast back to text works implicitly as assignment cast. (Results in standard format; any insignificant whitespace is removed effectively.)
If the content is valid JSON, the storage type should be json to begin with. In Postges, jsonb would be preferable as it's easier to manipulate, but that's not directly portable to the other two RDBMS mentioned.
(Or, possibly, a normalized design without JSON altogether.)
For ORACLE 21
update mytable
set json_col = json_transform(
json_col,
INSERT '$.value.enableTwoFactorAuth' = 'false'
)
where json_exists(json_col, '$?(#.name == "MobilePlatform")')
;
With json_col being JSON or VARCHAR2|CLOB column with IS JSON constraint.
(but must be JSON if you want a multivalue index on json_value.name:
create multivalue index ix_json_col_name on mytable t ( t.json_col.name.string() );
)
Two of the databases you are using support JSON data type, so it doesn't make sense to have them as stringified JSON object in a Text column.
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/json-in-oracle-database.html
PostgreSQL: https://www.postgresql.org/docs/current/datatype-json.html
Apart from these, MSSQL Server also provides methods to work with JSON data type.
MS SQL Server: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver16
Using a JSON type column in any of the above databases would enable you to use their JSON functions to perform the tasks that you are looking for.
If you've to use Text only then you can use replace to add the key-value pair at the end of your JSON
update dataTable set value = REPLACE(value, '}',",\"enableTwoFactorAuth\": false}") where name = 'MobilePlatform'
Here dataTable is the name of table.
The cleaner and less riskier way would be connect to db using the application and use JSON methods such as JSON.parse in Javascript and JSON.loads in Python. This would give you the JSON object (dictionary in case of Python) to work on. You can look for similar methods in other languages as well.
But i would suggest, if possible use JSON columns instead of Text to store the JSON value wherever possible.
I have a use case where I need to calculate set overlaps over arbitrary time periods.
My data looks like this, when loaded into pandas. In MySQL the user_ids is stored with the data type JSON.
I need to calculate the size of the union set when grouping by the date column. E.g, in the example below, if 2021-01-31 is grouped with 2021-02-28, then the result should be
In [1]: len(set([46, 44, 14] + [44, 7, 36]))
Out[1]: 5
Doing this in Python is trivial, but I'm struggling with how to do this in MySQL.
Aggregating the arrays into an array of arrays is easy:
SELECT
date,
JSON_ARRAYAGG(user_ids) as uids
FROM mytable
GROUP BY date
but after that I face two problems:
How to flatten the array of arrays into a single array
How to extract distinct values (e.g. convert the array into a set)
Any suggestions? Thank you!
PS. In my case I can probably get by with doing the flattening and set conversion on the client side, but I was pretty surprised at how difficult something simple like this turned out to be... :/
As mentioned in other comments, storing JSON arrays in your database really is sub-optimal and should really be avoided. Aside from that, it actually is easier to first extract the JSON array (and get the result you wanted from your second point):
SELECT mytable.date, jtable.VAL as user_id
FROM mytable, JSON_TABLE(user_ids, '$[*]' COLUMNS(VAL INT PATH '$')) jtable;
From here on out, we can group the dates again and recombine the user_ids into a JSON array with the JSON_ARRAYAGG function you already found:
SELECT mytable.date, JSON_ARRAYAGG(jtable.VAL) as user_ids
FROM mytable, JSON_TABLE(user_ids, '$[*]' COLUMNS(VAL INT PATH '$')) jtable
GROUP BY mytable.date;
You can try this out in this DB fiddle.
NOTE: this does require mysql 8+/mariaDB 10.6+.
Thank you for the answers.
For anybody who's interested, the solution that I ended up with in the end was to store the data like this:
And then do the set calculations in pandas.
(
df.groupby(pd.Grouper(key="date", freq="QS"),).aggregate(
num_unique_users=(
"user_ids",
lambda uids: len(set([_ for ul in uids for _ in ul])),
),
)
)
I was able to reduce a 20GiB table to around 300MiB, which is fast enough to query and retrieve data from.
I'm storing some json arrays in my MySQL db json field.
e.g: ["Python", "C", "C++", "C#", "R"]
An example of my db:
| name | techs |
|--------|------------------|
| victor | ["Python", "R"] |
| anne | ["C#", "Python"] |
I need to search the lines that the json array contains at least one of the items of another json array. The problem is in the query that I'm executing:
select name from devs
where json_contains('techs', '["Python"]')
This actually works fine and returns all the lines that the array contains "Python" (in this example, [Victor, Anne]), but when I try to pass items that don't exists in any of the arrays:
select * from devs
where json_contains('techs', '["Python", "Java"]')
This didn't return nothing, because there isn't an array with "Python" AND "Java" on it. Perhaps, I would like to receive all the lines with "Python" OR "Java" on their json array.
In this way, Is there a syntax to return the data in the way that I want?
Thanks in advance.
Useful information:
MySQL: v8.0, Working on Windows 10.
MySQL 8.0 has function JSON_OVERLAPS(), which does exactly what you ask for:
Compares two JSON documents. Returns true (1) if the two document have any key-value pairs or array elements in common.
When two comparing two arrays, JSON_OVERLAPS() returns true if they share one or more array elements in common, and false if they do not.
You can use that in a self-join query, like:
select t.*
from mytable t
inner join mytable t1 on json_overlaps(t1.techs, t2.techs)
I am using some native JSON fields to store information about some application entities in a MySQL 5.7.10 database. I can have 'N' rows per "entity" and need to roll-up and merge the JSON objects together, and any conflicting keys should replace instead of merge. I can do this through code, but if I can do it natively and efficiently in MySQL even better.
I have attempted this using a combination of GROUP_CONCAT and JSON_MERGE, but I've run into two issues:
JSON_MERGE won't take the results of GROUP_CONCAT as a valid argument
JSON_MERGE combines conflicting keys instead of replacing them. What I really need is more of a JSON_SET but with 'N' number of JSON docs instead of "key, value" notation.
Is this possible with the current MySQL JSON implementation?
First of all, GROUP_CONCAT only returns a string, so you have to cast it. Second of all, there is a function doing exactly what you want called JSON_MERGE_PATCH(). Try the following:
SELECT
JSON_MERGE_PATCH(
yourExistingJson,
CAST(
CONCAT(
'[',GROUP_CONCAT(myJson),']'
)
AS JSON)
) AS myJsonArray
....
Just realized your version. You would have to upgrade to 5.7.22 or higher. Is it possible in your case? If not, there may be other ways but they wont be elegant :(
You could do something like the following:
SELECT
CAST(CONCAT(
'[',
GROUP_CONCAT(
DISTINCT JSON_OBJECT(
'foo', mytable.foo,
'bar', mytable.bar
)
),
']'
) AS JSON) AS myJsonArr
FROM mytable
GROUP BY mytable.someGroup;
JSON_MERGE won't take the results of GROUP_CONCAT as a valid argument
GROUP_CONCAT gives a,b,c,d, not a JSON array. Use JSON_ARRAYAGG (introduced in MySQL 5.7.22), which works just like group_concat, but gives a correct array ["a", "b", "c", "d"], that should be accepted with JSON functions.
Prior to 5.7.22, you need to use a workaround:
cast(
concat('["', // begin bracket and quote
group_concat(`field` separator '", "'), // separator comma and quotes
'"]' // end quote and bracket
) as json
)
JSON_MERGE combines conflicting keys instead of replacing them. What I really need is more of a JSON_SET but with 'N' number of JSON docs instead of "key, value" notation.
Use JSON_MERGE_PATCH instead, as introduced in MySQL 5.7.22. JSON_MERGE is a synonym for JSON_MERGE_PRESERVE.
See https://dev.mysql.com/doc/refman/5.7/en/json-function-reference.html.
Read my Best Practices for using MySQL as JSON storage.
Aggregation of JSON Values
For aggregation of JSON values, SQL NULL values are ignored as for other data types. Non-NULL values are converted to a numeric type and aggregated, except for MIN(), MAX(), and GROUP_CONCAT(). The conversion to number should produce a meaningful result for JSON values that are numeric scalars, although (depending on the values) truncation and loss of precision may occur. Conversion to number of other JSON values may not produce a meaningful result.
I just found this in mysql docs
I would like to have PostgreSQL return the result of a query as one JSON array. Given
create table t (a int primary key, b text);
insert into t values (1, 'value1');
insert into t values (2, 'value2');
insert into t values (3, 'value3');
I would like something similar to
[{"a":1,"b":"value1"},{"a":2,"b":"value2"},{"a":3,"b":"value3"}]
or
{"a":[1,2,3], "b":["value1","value2","value3"]}
(actually it would be more useful to know both). I have tried some things like
select row_to_json(row) from (select * from t) row;
select array_agg(row) from (select * from t) row;
select array_to_string(array_agg(row), '') from (select * from t) row;
And I feel I am close, but not there really. Should I be looking at other documentation except for 9.15. JSON Functions and Operators?
By the way, I am not sure about my idea. Is this a usual design decision? My thinking is that I could, of course, take the result (for example) of the first of the above 3 queries and manipulate it slightly in the application before serving it to the client, but if PostgreSQL can create the final JSON object directly, it would be simpler, because I still have not included any dependency on any JSON library in my application.
TL;DR
SELECT json_agg(t) FROM t
for a JSON array of objects, and
SELECT
json_build_object(
'a', json_agg(t.a),
'b', json_agg(t.b)
)
FROM t
for a JSON object of arrays.
List of objects
This section describes how to generate a JSON array of objects, with each row being converted to a single object. The result looks like this:
[{"a":1,"b":"value1"},{"a":2,"b":"value2"},{"a":3,"b":"value3"}]
9.3 and up
The json_agg function produces this result out of the box. It automatically figures out how to convert its input into JSON and aggregates it into an array.
SELECT json_agg(t) FROM t
There is no jsonb (introduced in 9.4) version of json_agg. You can either aggregate the rows into an array and then convert them:
SELECT to_jsonb(array_agg(t)) FROM t
or combine json_agg with a cast:
SELECT json_agg(t)::jsonb FROM t
My testing suggests that aggregating them into an array first is a little faster. I suspect that this is because the cast has to parse the entire JSON result.
9.2
9.2 does not have the json_agg or to_json functions, so you need to use the older array_to_json:
SELECT array_to_json(array_agg(t)) FROM t
You can optionally include a row_to_json call in the query:
SELECT array_to_json(array_agg(row_to_json(t))) FROM t
This converts each row to a JSON object, aggregates the JSON objects as an array, and then converts the array to a JSON array.
I wasn't able to discern any significant performance difference between the two.
Object of lists
This section describes how to generate a JSON object, with each key being a column in the table and each value being an array of the values of the column. It's the result that looks like this:
{"a":[1,2,3], "b":["value1","value2","value3"]}
9.5 and up
We can leverage the json_build_object function:
SELECT
json_build_object(
'a', json_agg(t.a),
'b', json_agg(t.b)
)
FROM t
You can also aggregate the columns, creating a single row, and then convert that into an object:
SELECT to_json(r)
FROM (
SELECT
json_agg(t.a) AS a,
json_agg(t.b) AS b
FROM t
) r
Note that aliasing the arrays is absolutely required to ensure that the object has the desired names.
Which one is clearer is a matter of opinion. If using the json_build_object function, I highly recommend putting one key/value pair on a line to improve readability.
You could also use array_agg in place of json_agg, but my testing indicates that json_agg is slightly faster.
There is no jsonb version of the json_build_object function. You can aggregate into a single row and convert:
SELECT to_jsonb(r)
FROM (
SELECT
array_agg(t.a) AS a,
array_agg(t.b) AS b
FROM t
) r
Unlike the other queries for this kind of result, array_agg seems to be a little faster when using to_jsonb. I suspect this is due to overhead parsing and validating the JSON result of json_agg.
Or you can use an explicit cast:
SELECT
json_build_object(
'a', json_agg(t.a),
'b', json_agg(t.b)
)::jsonb
FROM t
The to_jsonb version allows you to avoid the cast and is faster, according to my testing; again, I suspect this is due to overhead of parsing and validating the result.
9.4 and 9.3
The json_build_object function was new to 9.5, so you have to aggregate and convert to an object in previous versions:
SELECT to_json(r)
FROM (
SELECT
json_agg(t.a) AS a,
json_agg(t.b) AS b
FROM t
) r
or
SELECT to_jsonb(r)
FROM (
SELECT
array_agg(t.a) AS a,
array_agg(t.b) AS b
FROM t
) r
depending on whether you want json or jsonb.
(9.3 does not have jsonb.)
9.2
In 9.2, not even to_json exists. You must use row_to_json:
SELECT row_to_json(r)
FROM (
SELECT
array_agg(t.a) AS a,
array_agg(t.b) AS b
FROM t
) r
Documentation
Find the documentation for the JSON functions in JSON functions.
json_agg is on the aggregate functions page.
Design
If performance is important, ensure you benchmark your queries against your own schema and data, rather than trust my testing.
Whether it's a good design or not really depends on your specific application. In terms of maintainability, I don't see any particular problem. It simplifies your app code and means there's less to maintain in that portion of the app. If PG can give you exactly the result you need out of the box, the only reason I can think of to not use it would be performance considerations. Don't reinvent the wheel and all.
Nulls
Aggregate functions typically give back NULL when they operate over zero rows. If this is a possibility, you might want to use COALESCE to avoid them. A couple of examples:
SELECT COALESCE(json_agg(t), '[]'::json) FROM t
Or
SELECT to_jsonb(COALESCE(array_agg(t), ARRAY[]::t[])) FROM t
Credit to Hannes Landeholm for pointing this out
Also if you want selected fields from the table and aggregate them as an array:
SELECT json_agg(json_build_object('data_a',a,
'data_b',b,
)) from t;
The result will look like this:
[{'data_a':1,'data_b':'value1'}
{'data_a':2,'data_b':'value2'}]