Creating a good mysql schema for web front end usage - mysql

I have recently started working with a company who sends me data via JSON, the JSON looks like this:
[{
"name": "company1",
"dataset": null,
"data": [{
"x": "2015-01-01T00:00",
"y": 182
},
{
"x": "2015-01-02T00:00",
"y": 141
}
]
},
{
"name": "company2",
"dataset": null,
"data": [{
"x": "2015-01-01T00:00",
"y": 182
},
{
"x": "2015-01-02T00:00",
"y": 141
}
]
},
{
"name": "company3",
"dataset": null,
"data": [{
"x": "2015-01-01T00:00",
"y": 182
},
{
"x": "2015-01-02T00:00",
"y": 141
}
]
}
]
I get 57 of these daily (Almost identical with the only difference being that the Y value changes accoridng to which metric it is) one for each metric tracked by the company. As you can see the way in which they've written the JSON (X & Y Key value pairs) make it rather hard to store nicely.
I've made 57 tables in MySQL, one for each JSON that inserts the values for that specific metric however querying to get all activity for a day takes a LONG time due to the amount of joins.
I'm hoping one of you might be able to tell me the best way in whihc to insert this into a mysql db table for where I end up either 1 table containing all 57 values or the best way to query across 57 tables without waiting hours for mysql to load it.
this is a personal project for my own business so funds are tight and I am doing what I can at the moment - sorry if this sounds ridiculous!

If I were to be required to store this data, I would personally be inclined to use a table for all the results, with a company table holding the 'master' information about each company.
The company table would be structured like this:
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(50) -- Arbitrary size - change as needed
The company_update table would be structured like this:
company_update_id INT NOT NULL AUTO_INCREMENT,
company_id INT NOT NULL,
update_timestamp DATETIME,
update_value INT -- This may need to be another type
There would be a foreign key between company_update.company_id to company.company_id.
When receiving the JSON update:
Check that the name exists in the company table. If not, create it.
Get the company's unique ID (for use in the next step) from the company table.
For each item in the data array, add a record to company_update using the appropriate company ID.
Now in order to get results for all companies, I would just use a query like:
SELECT c.name,
cu.update_timestamp,
cu.update_value
FROM company c
INNER JOIN company_update cu ON cu.company_id = c.company_id
ORDER BY c.name, cu.update_timestamp DESC
Note that I have made the following assumptions which you may need to address:
The company name size is at most 50 characters
The y value in the data array is an integer

Related

Problem returning JSON data from table with more than one row

Environment: I’m using Oracle 19c database, so I am using the JSON functions declared already in PLSQL (for instance JSON_TABLE).
What I am trying to do: I am trying to query JSON data from a CLOB column when there are multiple rows in the CLOB table
Problem: When I have one row in the base table (with the CLOB column containing the JSON data), I can query the JSON data successfully but when I add a second row to the base table my query fails with an ORA-01427: single-row subquery returns more than one row.
Example:
I created a table with a CLOB to hold JSON data.
Create Table MY_RECEIVED_DATA
(LOAD_ID NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY,
DATE_LOADED DATE DEFAULT SYSDATE NOT NULL,
DATA_BUCKET CLOB CONSTRAINT check_json CHECK (DATA_BUCKET IS JSON))
LOB (DATA_BUCKET) STORE AS SECUREFILE (
TABLESPACE MYTAB
ENABLE STORAGE IN ROW
CHUNK 8192
RETENTION
NOCACHE
LOGGING)
TABLESPACE MYTAB;
I then inserted 3 rows into the table showing the employees of Acme Inc.
insert into MY_RECIEVED_DATA (DATA_BUCKET)
VALUES ('[
{ "displayName": "McNubbins, Bubba",
"employeeType": "Minion",
"givenName": "Bubba",
"company": "Acme Inc"},
{ "displayName": "Blorgg, Gupda",
"employeeType": "Minion Supervisor",
"givenName": "Goopy",
"company": "Acme Inc"},
{ "displayName": "Zumba, Blippins",
"employeeType": "Overlord",
"givenName": "Blippy",
"company": "Acme Inc"}
]');
Commit;
I then ran this select to see if I had any data and got this result
select * from MY_RECIEVED_DATA;
LOAD_ID DATE_LOADED DATA_BUCKET
4 8/2/2022 1:45:16 PM (CLOB)
I then ran this query to look at the JSON data in the DATA_BUCKET with these results
select *
from json_table((select DATA_BUCKET from MY_RECIEVED_DATA), '$[*]'
COLUMNS ("displayName", "employeeType","givenName","company")) jt
group by "displayName","employeeType","givenName","company";
DISPLAYNAME EMPLOYEETYPE GIVENNAME COMPANY
McNubbins, Bubba Minion Bubba Acme Inc
Blorgg, Gupda Minion Supervisor Goopy Acme Inc
Zumba, Blippins Overlord Blippy Acme Inc
I then insert my second row of data
insert into MY_RECIEVED_DATA (DATA_BUCKET)
VALUES ('[
{ "displayName": "Corbin, Sammy",
"employeeType": "Jr. Minion",
"givenName": "Slimy",
"company": "Pipe Wrenches Llc."},
{ "displayName": "Zima, Dancy",
"employeeType": "Minion",
"givenName": "Dancy",
"company": "Pipe Wrenches Llc."},
{ "displayName": "Hoptom, Clarence",
"employeeType": "Overlord",
"givenName": "Sir",
"company": "Pipe Wrenches Llc."}
]');
Commit;
I then re-ran this select to see if I had any data and got this result
select * from MY_RECIEVED_DATA;
LOAD_ID DATE_LOADED DATA_BUCKET
4 8/2/2022 1:45:16 PM (CLOB)
5 8/2/2022 2:10:34 PM (CLOB)
So now I have two rows both containing JSON data in the DATA_BUCKET column. I then run this same query to get the JSON columns
select *
from json_table((select DATA_BUCKET from MY_RECIEVED_DATA), '$[*]'
COLUMNS ("displayName", "employeeType","givenName","company")) jt
group by "displayName","employeeType","givenName","company";
When I run the select I get the following error:
ORA-01427: single-row subquery returns more than one row
The system will be updating the table several time a day and we will have to process each new row's JSON data.
Question:
How can I query multiple rows of JSON data?
Don't use a subquery; join the real table to the json_table clause:
select jt.*
from MY_RECEIVED_DATA mrd
cross apply json_table(
mrd.DATA_BUCKET,
'$[*]'
COLUMNS (
"displayName", "employeeType","givenName","company"
)
) jt
displayName
employeeType
givenName
company
McNubbins, Bubba
Minion
Bubba
Acme Inc
Blorgg, Gupda
Minion Supervisor
Goopy
Acme Inc
Zumba, Blippins
Overlord
Blippy
Acme Inc
Corbin, Sammy
Jr. Minion
Slimy
Pipe Wrenches Llc.
Zima, Dancy
Minion
Dancy
Pipe Wrenches Llc.
Hoptom, Clarence
Overlord
Sir
Pipe Wrenches Llc.
db<>fiddle
Not sure why you have the group-by clause; it works with or without it, but it isn't really adding anything. If you think you might have duplicates then you can use distinct instead of grouping.
I'd also suggest you give the columns non-quoted aliases, as in this db<>fiddle, but it depends how you'll use the result.

How to search within MySQL JSON object array?

Consider the following JSON object,
[
{
"id": 5964460916832,
"name": "Size",
"value": "Small",
"position": 1,
"product_id": 4588516409440
},
{
"id": 5964460916833,
"name": "Size",
"value": "Medium",
"position": 2,
"product_id": 4588516409440
},
{
"id": 5964460916834,
"name": "Size",
"value": "Large",
"position": 3,
"product_id": 4588516409440
}
]
This is a value present in a table field called custom_attrs of JSON data type in a MySQL 8.0 table. I wanted to search the JSON data to match with multiple fields in the same object.
For example,
I wanted to see if there's a match for name "Size" and value "Medium" within the same object. It should not match the name in the first object and the value in the second object.
While we can always use JSON table, I don't prefer that due to the complexities it brings during the JOINs.
JSON_SEARCH supports LIKE operator, but it cannot ensure if it's from the same object
JSON_CONTAINS supports multiple fields but not LIKE as follows,
SET #doc = CAST('[{"id":5964460916832,"name":"Size","value":"Small","position":1,"product_id":4588516409440},{"id":5964460916833,"name":"Size","value":"Medium","position":2,"product_id":4588516409440},{"id":5964460916834,"name":"Size","value":"Large","position":3,"product_id":4588516409440}]' AS JSON);
SELECT JSON_CONTAINS(#doc, '{"name":"Size", "value":"Small"}')
Is there any way to get the same JSON_CONTAINS like functionality with partial search like, {"name":"Size", "value":"%sma%"}
Any help on this would be greatly helpful.
JSON_CONTAINS() only works with equality, not with pattern matching.
The JSON_TABLE() function is the solution intended to address the task you are trying to do. But you said you don't want to use it.
You can simulate JSON_TABLE() using other functions.
select * from (
select
json_unquote(json_extract(col, concat('$[',n.i,'].id'))) as `id`,
json_unquote(json_extract(col, concat('$[',n.i,'].name'))) as `name`,
json_unquote(json_extract(col, concat('$[',n.i,'].value'))) as `value`
from (select #doc as col) j
cross join (select 0 as i union select 1 union select 2 union select 3 union select 4 union select 5 ...) as n
) as t
where t.`id` is not null
order by id, `name`;
Output:
+---------------+------+--------+
| id | name | value |
+---------------+------+--------+
| 5964460916832 | Size | Small |
| 5964460916833 | Size | Medium |
| 5964460916834 | Size | Large |
+---------------+------+--------+
You could then easily add a condition like AND value LIKE '%sma%'.
As you can see, this query is even more complex than if you had used JSON_TABLE().
Really, any solution is going to be complex when you store your data in JSON format, then try to use SQL expressions and relational operations to query them as if they are normalized data. This is because you're practically implementing a mini-database within the functions of a real database. This is sometimes called the Inner-Platform Effect:
The inner-platform effect is the tendency of software architects to create a system so customizable as to become a replica, and often a poor replica, of the software development platform they are using. This is generally inefficient and such systems are often considered to be examples of an anti-pattern.
If you want simple queries, you should store data in normal rows and columns, not in JSON. Then you could get your result using quite ordinary SQL:
SELECT id, name, value FROM MyTable WHERE name = 'Size' AND value LIKE '%sma%';

psql equivalent of pandas .to_dict('index')

I want to return a psql table, but I want to return it in json format.
Let's say the table looks like this...
id
name
value
1
joe
6
2
bob
3
3
joey
2
But I want to return it as an object like this...
{
"1": {
"name": "joe",
"value": 6
},
"2": {
"name": "bob",
"value": 3
},
"3": {
"name": "joey",
"value": 2
}
}
So if I were doing this with pandas and the table existed as a dataframe, I could transform it like this...
df.set_index('id').to_dict('index')
But I want to be able to do this inside the psql code.
The closest I've gotten is by doing something like this
select
json_build_object (
id,
json_build_object (
'name', name,
'value', value
)
)
from my_table
But instead of aggregating this all into one object, the result is a bunch of separate objects separated by rows at the key level... that being said, it's kinda the same idea...
Any ideas?
You want jsonb_object_agg() to get this:
select jsonb_object_agg(id, jsonb_build_object('name', name, 'value', value))
from my_table
But this is not going to work well for any real-world sized tables. There is a limit of roughly 1GB for a single value. So this might fail with an out-of-memory error with larger tables (or values inside the columns)

How to aggregate array values in JSONB?

I have the following PostgreSQL table:
CREATE TABLE orders
(
id uuid NOT NULL,
order_date timestamp without time zone,
data jsonb
);
Where data contains json documents like this:
{
"screws": [
{
"qty": 1000,
"value": "Wood screw"
},
{
"qty": 500,
"value": "Drywall screw"
},
{
"qty": 500,
"value": Concrete screw"
}
],
"nails": [
{
"qty": 1000,
"value": "Round Nails"
}
]
}
How do I can get an overall quantity for all types of screws across all orders? Something like this :)
select value, sum(qty) from orders where section = 'screws' group by value;
I am not quite sure why you are trying to sum up the qty values, because the GROUP BY value makes only sense if there would be several times the same value which can be summed, e.g. if you would have twice the value Wood screw.
Nevertheless, this would be the query:
step-by-step demo:db<>fiddle
SELECT
elems ->> 'value' AS value,
SUM((elems ->> 'qty')::int) AS qty
FROM
orders,
jsonb_array_elements(data -> 'screws') elems
GROUP BY 1
Expand the screw array into one row per array element by jsonb_array_elements()
Get the qty value by the ->> operator (which gives out a text type) and cast this into type int
If really necessary, aggregate these key/value pairs.

Retrieve first N records of a JSON array with a Postgresql query

PostgreSQL has some native JSON operations since verison 9.3. Suppose you have a table, my_table, with a json column, my_json_col, structured as follows:
[
{ "id": 1, "some_field": "blabla" },
{ "id": 2, "some_field": "foo" }
...
]
To retrieve the n-th element of my_json_col, you would execute something like: SELECT my_json_col->n FROM my_table WHERE .... So if n = 1, the query would return the "id": 2 record in my example.
I want to retrieve the first n elements, e.g. if n = 2 the query should return the first two records in my example. Is this possible?
In PostgreSQL 12, you can do:
SELECT jsonb_path_query_array('["a","b","c","d","e","f"]', '$[0 to 3]');
jsonb_path_query_array
------------------------
["a", "b", "c", "d"]
(1 row)
I think you need to convert the JSON array to a regular Postgres array, then take a slice of it:
select (array_agg(e))[2:3]
from (select json_array_elements('[{"id":1},{"id":2},{"id":3},{"id":4}]'::json)) x(e);
If you need the result to be JSON, you can use array_to_json:
select array_to_json((array_agg(e))[2:3])
from (select json_array_elements('[{"id":1},{"id":2},{"id":3},{"id":4}]'::json)) x(e);
For anyone who will stumble here trying the same thing. Here is what I faced
I had a similar problem, I needed the N result from jsonb field which was containing an array. The result I needed was all the fields of the table and N number of data from the data field where the condition satisfies.
After going through this again and again I found out that the accepted answer aggregates the rows. So if you have a table
id
type
data
status
1
employee
{{"age": 29, "name": "EMP 1"}, {"age": 30, "name": "EMP 2"},...}
Active
2
manager
{{"age": 28, "name": "MNG 1"}, {"age": 30, "name": "MNG 2"},...}
Active
and you run the query
select (array_agg(e))[2:3]
from (select json_array_elements(data::json) from <table>) x(e);
then the output will be
data
{{"age": 30, "name": "EMP 2"}, {"age": 28, "name": "MNG 1"}}
which was not needed in my case, What I needed was n data for each individual row where the condition satisfies e.g.
data
{{"age": 29, "name": "EMP 1"},{"age": 30, "name": "EMP 2"}}
{{"age": 28, "name": "MNG 1"},{"age": 30, "name": "MNG 2"}}
so after searching a little and going through the link provided by #Paul A Jungwirth
in the accepted answer. I found out that this can also be achieved by
select (ARRAY(select json_array_elements_text(data::json)))[0:2]
and the result will give you the n number of data from the jsonb field where the condition satisfies, you can also access other fields with it as well. Like let's say you want id and n number of data out of this table then you can do that just by adding id in the select query. (I was unable to get "id" in the query in the accepted answer)
select id, (ARRAY(select json_array_elements(data::json)))[0:2] from table where condition
will give output
id
data
1
{{"age": 29, "name": "EMP 1"},{"age": 30, "name": "EMP 2"}}
2
{{"age": 28, "name": "MNG 1"},{"age": 30, "name": "MNG 2"}}
Hope this will be helpful to someone.