Postgres JSON querying - match all pairs, ignore order - json

I have a Postgres table that contains a jsonb column, the data in which is arbitrarily deep.
id | jsonb_data
---|----------------------
1 | '{"a":1}'
2 | '{"a":1,"b":2}'
3 | '{"a":1,"b":2,"c":{"d":4}}'
Given a JSON object in my WHERE clause, I want to find the rows that contain objects that contain the same data and no more, but in any order. Including, preferably, nested objects.
SELECT * FROM table
WHERE json_match_ignore_order(jsonb_data, '{"b":2,"a":1}');
id | jsonb_data
---|-----------
2 | '{"a":1,"b":2}'
This would essentially work identically to the following Ruby code, but I'd really like to do it in the database if possible.
table.select { |row| row.jsonb_data_as_a_hash == {b: 2, a: 1} }
How can I do this?

With jsonb type you can use equal sign even for values with nested objects.
Thus the following will also work:
create table jsonb_table(
id serial primary key,
jsonb_data jsonb
);
insert into jsonb_table(jsonb_data)
values
('{"a":1}'),
('{"a":{"c":5},"b":2}'),
('{"a":{"c":5},"b":2,"c":{"d":4}}');
select * from jsonb_table
where jsonb_data = '{"b":2,"a":{"c":5}}'::jsonb;
You will get rows with objects containing same keys with same values recursively (in this case only the second row).

Related

Sort JSON array by values

Is it possible to sort json arrays so you would get the same result no matter the original value order?
Original:
id json_field
--------------
1 ["john", "mike"]
2 ["mike", "john"]
Altered:
id json_field
--------------
1 ["john", "mike"]
2 ["john", "mike"]
I need this so i could create a virtually generated index to quickly find records with json fields containing same values
ALTER TABLE `table` ADD COLUMN sorted_json_index VARCHAR(255) GENERATED ALWAYS as FUNCTION_TO_SORT_JSON_SOMEHOW(json_field) AFTER json_field

Filter objects inside array in MySQL JSON column [duplicate]

MySQL 5.7.24
Lets say I have 3 rows like this:
ID (PK) | Name (VARCHAR) | Data (JSON)
--------+----------------+-------------------------------------
1 | Admad | [{"label":"Color", "value":"Red"}, {"label":"Age", "value":40}]
2 | Saleem | [{"label":"Color", "value":"Green"}, {"label":"Age", "value":37}, {"label":"Hoby", "value":"Chess"}]
3 | Daniel | [{"label":"Food", "value":"Grape"}, {"label":"Age", "value":47}, {"label":"State", "value":"Sel"}]
Rule #1: The JSON column is dynamic. Means not everybody will have the same structure
Rule #2: Assuming I can't modify the data structure
My question, it it possible to query so that I can get the ID of records where the Age is >= 40? In this case 1 & 3.
Additional Info (after being pointed as duplicate): if you look at my data, the parent container is array. If I store my data like
{"Age":"40", "Color":"Red"}
then I can simply use
Data->>'$.Age' >= 40
My current thinking is to use a stored procedure to loop the array but I hope I don't have to take that route. The second option is to use regex (which I also hope not). If you think "JSON search" is the solution, kindly point to me which one (or some sample for this noob of me). The documentation's too general for my specific needs.
Here's a demo:
mysql> create table letsayi (id int primary key, name varchar(255), data json);
mysql> > insert into letsayi values
-> (1, 'Admad', '[{"label":"Color", "value":"Red"}, {"label":"Age", "value":"40"}]'),
-> (2, 'Saleem', '[{"label":"Color", "value":"Green"}, {"label":"Age", "value":"37"}, {"label":"Hoby", "value":"Chess"}]');
mysql> select id, name from letsayi
where json_contains(data, '{"label":"Age","value":"40"}');
+----+-------+
| id | name |
+----+-------+
| 1 | Admad |
+----+-------+
I have to say this is the least efficient way you could store your data. There's no way to use an index to search for your data, even if you use indexes on generated columns. You're not even storing the integer "40" as an integer — you're storing the numbers as strings, which makes them take more space.
Using JSON in MySQL when you don't need to is a bad idea.
Is it still possible to query age >= 40?
Not using JSON_CONTAINS(). That function is not like an inequality condition in a WHERE clause. It only matches exact equality of a subdocument.
To do an inequality, you'd have to upgrade to MySQL 8.0 and use JSON_TABLE(). I answered another question recently about that: MySQL nested JSON column search and extract sub JSON
In other words, you have to convert your JSON into a format as if you had stored it in traditional rows and columns. But you have to do this every time you query your data.
If you need to use conditions in the WHERE clause, you're better off not using JSON. It just makes your queries much too complex. Listen to this old advice about programming:
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
— Brian Kernighan
how people tackle dynamically added form fields
You could create a key/value table for the dynamic form fields:
CREATE TABLE keyvalue (
user_id INT NOT NULL,
label VARCHAR(64) NOT NULL,
value VARCHAR(255) NOT NULL,
PRIMARY KEY (user_id, label),
INDEX (label)
);
Then you can add key/value pairs for each user's dynamic form entries:
INSERT INTO keyvalue (user_id, label, value)
VALUES (123, 'Color', 'Red'),
(123, 'Age', '40');
This is still a bit inefficient in storage compared to real columns, because the label names are stored every time you enter a user's data, and you still store integers as strings. But if the users are really allowed to store any labels of their own choosing, you can't make those real columns.
With the key/value table, querying for age > 40 is simpler:
SELECT user_id FROM key_value
WHERE label = 'Age' AND value >= 40

MYSQL: How to query where JSON array contain specific label

MySQL 5.7.24
Lets say I have 3 rows like this:
ID (PK) | Name (VARCHAR) | Data (JSON)
--------+----------------+-------------------------------------
1 | Admad | [{"label":"Color", "value":"Red"}, {"label":"Age", "value":40}]
2 | Saleem | [{"label":"Color", "value":"Green"}, {"label":"Age", "value":37}, {"label":"Hoby", "value":"Chess"}]
3 | Daniel | [{"label":"Food", "value":"Grape"}, {"label":"Age", "value":47}, {"label":"State", "value":"Sel"}]
Rule #1: The JSON column is dynamic. Means not everybody will have the same structure
Rule #2: Assuming I can't modify the data structure
My question, it it possible to query so that I can get the ID of records where the Age is >= 40? In this case 1 & 3.
Additional Info (after being pointed as duplicate): if you look at my data, the parent container is array. If I store my data like
{"Age":"40", "Color":"Red"}
then I can simply use
Data->>'$.Age' >= 40
My current thinking is to use a stored procedure to loop the array but I hope I don't have to take that route. The second option is to use regex (which I also hope not). If you think "JSON search" is the solution, kindly point to me which one (or some sample for this noob of me). The documentation's too general for my specific needs.
Here's a demo:
mysql> create table letsayi (id int primary key, name varchar(255), data json);
mysql> > insert into letsayi values
-> (1, 'Admad', '[{"label":"Color", "value":"Red"}, {"label":"Age", "value":"40"}]'),
-> (2, 'Saleem', '[{"label":"Color", "value":"Green"}, {"label":"Age", "value":"37"}, {"label":"Hoby", "value":"Chess"}]');
mysql> select id, name from letsayi
where json_contains(data, '{"label":"Age","value":"40"}');
+----+-------+
| id | name |
+----+-------+
| 1 | Admad |
+----+-------+
I have to say this is the least efficient way you could store your data. There's no way to use an index to search for your data, even if you use indexes on generated columns. You're not even storing the integer "40" as an integer — you're storing the numbers as strings, which makes them take more space.
Using JSON in MySQL when you don't need to is a bad idea.
Is it still possible to query age >= 40?
Not using JSON_CONTAINS(). That function is not like an inequality condition in a WHERE clause. It only matches exact equality of a subdocument.
To do an inequality, you'd have to upgrade to MySQL 8.0 and use JSON_TABLE(). I answered another question recently about that: MySQL nested JSON column search and extract sub JSON
In other words, you have to convert your JSON into a format as if you had stored it in traditional rows and columns. But you have to do this every time you query your data.
If you need to use conditions in the WHERE clause, you're better off not using JSON. It just makes your queries much too complex. Listen to this old advice about programming:
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
— Brian Kernighan
how people tackle dynamically added form fields
You could create a key/value table for the dynamic form fields:
CREATE TABLE keyvalue (
user_id INT NOT NULL,
label VARCHAR(64) NOT NULL,
value VARCHAR(255) NOT NULL,
PRIMARY KEY (user_id, label),
INDEX (label)
);
Then you can add key/value pairs for each user's dynamic form entries:
INSERT INTO keyvalue (user_id, label, value)
VALUES (123, 'Color', 'Red'),
(123, 'Age', '40');
This is still a bit inefficient in storage compared to real columns, because the label names are stored every time you enter a user's data, and you still store integers as strings. But if the users are really allowed to store any labels of their own choosing, you can't make those real columns.
With the key/value table, querying for age > 40 is simpler:
SELECT user_id FROM key_value
WHERE label = 'Age' AND value >= 40

How to organise the storage many to many relations in Json in mysql

I have the table with JSON-field (example)
# table1
id | json_column
---+------------------------
1 | {'table2_ids':[1,2,3], 'sone_other_data':'foo'}
---+------------------------
2 | {'foo_data':'bar', 'table2_ids':[3,5,11]}
And
# table2
id | title
---+------------------------
1 | title1
---+------------------------
2 | title2
---+------------------------
...
---+------------------------
11 | title11
Yes, I know about stored many-to-many relation in the third table. But it's a duplication data (in first case relations in Json_column, in second in third-table)
I know about generated columns in MySQL, but I don't understand how to use it for stored m2m relations. Maybe I have use views to get pairs of table1.id <-> table2.id. But how use index in this case?
I can't understand your explanation for why you can't use a third table to represent the many-to-many pairs. Using a third table is of course the best solution.
I think views have no relevance to this problem.
You could use JSON_EXTRACT() to access individual members of the array. You can use a generated column to pull each member out so you can easily reference it as an individual value.
create table table1 (
id int auto_increment primary key,
json_column json,
first_table2_id int as (json_extract(json_column, '$.table2_ids[0]'))
);
insert into table1 set json_column = '{"table2_ids":[1,2,3], "sone_other_data":"foo"}'
(You must use double-quotes inside a JSON string, and single-quotes to delimit the whole JSON string.)
select * from table1;
+----+-----------------------------------------------------+-----------------+
| id | json_column | first_table2_id |
+----+-----------------------------------------------------+-----------------+
| 1 | {"table2_ids": [1, 2, 3], "sone_other_data": "foo"} | 1 |
+----+-----------------------------------------------------+-----------------+
But this is still a problem: In SQL, the table must have the columns defined by the table metadata, and all rows therefore have the same columns. There no such thing as each row populating additional columns based on the data.
So you need to create another extra column for each potential member of the array of table2_ids. If the array has fewer elements than the number of columns, JSON_EXTRACT() will fill in NULL when the expression returns nothing.
alter table table1 add column second_table2_id int as (json_extract(json_column, '$.table2_ids[1]'));
alter table table1 add column third_table2_id int as (json_extract(json_column, '$.table2_ids[2]'));
alter table table1 add column fourth_table2_id int as (json_extract(json_column, '$.table2_ids[3]'));
I'll query using vertical output, so the columns will be easier to read:
select * from table1\G
*************************** 1. row ***************************
id: 1
json_column: {"table2_ids": [1, 2, 3], "sone_other_data": "foo"}
first_table2_id: 1
second_table2_id: 2
third_table2_id: 3
fourth_table2_id: NULL
This is going to get very awkward. How many columns do you need? That depends on how many table2_ids is the maximum length of the array.
If you need to search for rows in table1 that reference some specific table2 id, which column should you search? Any of the columns may have that value.
select * from table1
where first_table2_id = 2
or second_table2_id = 2
or third_table2_id = 2
or fourth_table2_id = 2;
You could put an index on each of these generated columns, but the optimizer won't use them.
These are some reasons why storing comma-separated lists is a bad idea, even inside a JSON string, if you need to reference individual elements.
The better solution is to use a traditional third table to store the many-to-many data. Each value is stored on its own row, so you don't need many columns or many indexes. You can search one column if you need to look up references to a given value.
select * from table1_table2 where table2_id = 2;

How should I batch-insert MySQL records that have dependents?

I have business objects that can have "categories", so I'm using a simple table structure that looks like
objects:
| id | name |
categories:
| obj_id | cat |
Adding a new object along with its categories is simple enough:
INSERT INTO OBJECTS (name) VALUES ("This thing");
/* store the LAST_INSERT_ID() */
INSERT INTO categories values (lastId, "A category"), (lastId, "Another category"), ...;
The wrinkle is, I'm using JDBC batch queries to do the object inserts, so I'm not sure how to get the last-insert-id. I need to bulk insert lots of objects but somehow keep the IDs from each so I can bulk-insert the related categories for each.
I thought about writing a stored procedure where I could pass the category as an array (nope, MySQL doesn't have an array type) or a comma-separated list (ditto for built-in string tokenization). Is there an easier way? Something like "INSERT INTO ... JOIN ..."?