Efficient database schema - mysql

I have two MySQL tables that storing some data like below :
table INFO:
the "key" must be unique in this INFO table, and "group" can be duplicate for each key.
info_id: pk
group
key
1
GrA
aaa
2
GrA
bbb
3
GrB
ccc
4
GrC
ddd
table HISTORY: if the product "product_name" hasn't info_id (using SELECT sql query),
then insert the info_id for product_type.
index: pk
product_name
group
info_id
1
ProductA
GrA
1
2
ProductA
GrB
3
3
ProductA
GrA
2
4
ProductB
GrA
1
5
ProductC
GrA
1
6
ProductC
GrA
2
7
ProductD
GrC
4
8
ProductD
GrA
2
9
ProductE
GrB
3
running sql query client is python.
above table is working now, but records of INFO table are over 600,000 and records of HISTORY table are over 5,000,000.
the SQL query performance is really slow, one query ends in 5 secs after run the query.
to get faster performance for each query result, I want to rebuilding these schema.
Edit:
Hello,
I'm using below queries:
SELECT COUNT(group) FROM INFO : to get count of specific group
SELECT * FROM INFO WHERE group = "GrB" and key = "EEE"
INSERT INTO INFO(group, key) VALUES("GrB", "EEE") : insert if query 1 result is None
SELECT * FROM HISTORY WHERE product_name = "ProductA" and info_id = "4"
INSERT INTO HISTORY(product_name, group, info_id) VALUES("ProductA", "GrC", "4") : insert if query 4 result is None

I could better explain the problem if you had provided the CREATE TABLEs and the SELECT. Meanwhile, I will guess that key is either not indexed or indexed by itself. Based on how you described the table, this would be much faster:
CREATE TABLE info (
key VARCHAR(...) ...,
grp VARCHAR(...) ...,
PRIMARY KEY(key),
INDEX(grp) -- needed if you ever look up all the keys for a grp
) ENGINE=InnoDB
and replace info_id by key in the other table.
But then why have grp in both tables? Show us the schema and query; I may come up with a better way.

Related

PDO Deleting entries from table that does not exist in post request data

I have a table which consist of columns person_id, level_id, is_admin
person_id | level_id | is_admin
--------------------------------
1 | 1 | 1
1 | 2 | 0
3 | 2 | 1
In the server side, I have a function that accepts a request data which is an array of objects:
[
{person_id: 5, level_id: 1, is_admin: 1},
{person_id: 1, level_id: 2, is_admin: 0}
]
What I want to achieve is that, delete rows from the table whose values of columns person_id, level_id, is_admin does not exists in the post request data.
For example, the expected output of the delete query:
person_id | level_id | is_admin
--------------------------------
1 | 1 | 1
3 | 2 | 1
Notice that the second row is deleted.
EDIT: You might wonder delete entries that do not exists in post data, yes that's right. because the function meant to insert things in the table and delete existing rows that does not exist in the post data.
My current delete query is:
$delete = "
DELETE FROM pivotTable
WHERE NOT EXISTS (
SELECT * FROM (
SELECT
{$personId} AS person_id,
{$levelId} AS level_id,
{$isAdmin} AS is_admin
) as delTemp
);
";
$this->pdo->exec($delete);
no error, but it seems that it's not deleting the row in the database.
Easiest way to debug this would be to run the query as a SELECT:
SELECT FROM pivotTable
WHERE NOT EXISTS (
SELECT * FROM (
SELECT
{$personId} AS person_id,
{$levelId} AS level_id,
{$isAdmin} AS is_admin
) as delTemp
);
After this you can check if the rows you want to delete are correct.
I would also recommend looking into using WHERE NOT IN
as in:
DELETE FROM pivotTable
WHERE (person_id, level_id, is_admin) NOT IN ((5,1,1), (1,2,0));
Also it seems that you aren't using prepared statements which will lead you to be vulnerable to SQL Injection, I would recommend reading on prepared statements here:
https://phpdelusions.net/pdo

MySQL - how to get count of a single item frequency in a table of CSV values

I have a mysql table called "projects" with a single field containing CSV lists of project Ids. Assume that I cannot change the table structure.
I need a query that will allow me to quickly retrieve a count of rows that contain a particular project id, for example:
select count(*) from projects where '4' in (project_ids);
This returns just 1 result, which is incorrect (should be 3 results), but I think that it illustrates what I'm attempting to do.
CREATE TABLE `projects` (
`project_ids` varchar(255) DEFAULT NULL
);
INSERT INTO `projects` (`project_ids`)
VALUES
('1,2,4'),
('1,2'),
('4'),
('4,5,2'),
('1,2,5');
I was hoping that there might be a simple mysql function that would achieve this so that I don't have to anything complex sql-wise.
You could use this approach:
SELECT COUNT(*)
FROM projects
WHERE CONCAT(',', project_ids, ',') LIKE '%,4,%';
Or use FIND_IN_SET for a built-in way:
SELECT COUNT(*)
FROM projects
WHERE FIND_IN_SET('4', project_ids) > 0;
But, as to that which Gordon's comment alludes, a much better table design would be to have a junction table which relates a primary key in one table to all projects in another table. That junction table, based off your sample data, would look like this:
PK | project_id
1 | 1
1 | 2
1 | 4
2 | 1
2 | 2
3 | 4
4 | 4
4 | 5
4 | 2
5 | 1
5 | 2
5 | 5
With this design, if you wanted to find the count of PK's having a project_id of 4, you would only need a much simpler (and sargable) query:
SELECT COUNT(*)
FROM junction_table
WHERE project_id = 4;
You would need to use a like condition as follows
select count(*)
from projects
where concat(',',project_ids,',') like '%,4,%';

Why does this simple Left Join return data from unmatched rows?

Please see the simple http://sqlfiddle.com/#!9/e853f/1 for this problem in operation.
I refer to MySQL ver 5.6.12-log
As I understand it, a left join returns NULL for columns in the rightmost dataset where the key in the left dataset does not exist in the right dataset.
However, I am getting data returned from the right hand side even where the left hand key does not exist in the right.
Can anyone explain what is going on here?
The SQLfiddle creates:
A table with 6 rows, each containing just an integer ID
A second table with 3 rows containing some of those integer IDs plus two
more INT fields
A view based upon that second table that returns 3 rows containing the integer ID plus a textual field, derived from the two other INT fields
(Obviously, the 3 IDs in the view correspond to some of the IDs in the 6 row table.)
The SQL
SELECT * FROM LEFT JOIN ON table_ID = view_ID;
returns 6 rows as expected but all of them have data in the textual field instead of the 3 unmatched ones being NULL
BUT
If the method used in the view to derive the textual column is slightly altered, then the Left Join SQL gives the correct result.
(You can show this by selectively commenting out one or other of the two methods in sql fiddle)
But surely doesn't the optimiser evaluate the view first, so it shouldn't matter how the data is created, just what it contains?
(This s a much simplified version of an earlier question of mine that I admit was rather too complicated to illicit sensible answers)
It has been suggested (Jeroen Mostert)that I show data and expected results. Here it is:
Table person
personID
--------
1
2
3
4
5
6
View payment_state
payment_personID | state
----------------------------
1 | 'equal'
2 | 'under'
3 | 'over'
Query
SELECT * FROM person
LEFT JOIN payment_state
ON personID = payment_personID;
Expected result
personID | payment_personID |state
-------------------------------------
1 | 1 | 'equal'
2 | 2 | 'under'
3 | 3 | 'over'
4 | NULL | NULL
5 | NULL | NULL
6 | NULL | NULL
Actual result
personID | payment_personID |state
-------------------------------------
1 | 1 | 'equal'
2 | 2 | 'under'
3 | 3 | 'over'
4 | NULL | 'equal'
5 | NULL | 'equal'
6 | NULL | 'equal'
I beg to disagree with other answers. This is a MySQL defect. Actually it is bug #83707 in MySQL 5.6. It looks it's fixed in MySQL 5.7
This bug is already fixed in MariaDB 5.5.
The internal join strategy such as Nested Loop Join, Merge Join, or Hash Join does not matter. The result should be correct in any case.
I tried the same query in PostgreSQL and Oracle and it works as expected, returning null values on the last three rows.
Oracle Example
CREATE TABLE person (personID INT);
INSERT INTO person (personID) VALUES (1);
INSERT INTO person (personID) VALUES(2);
INSERT INTO person (personID) VALUES(3);
INSERT INTO person (personID) VALUES(4);
INSERT INTO person (personID) VALUES(5);
INSERT INTO person (personID) VALUES(6);
CREATE TABLE payments (
payment_personID INT,
Due INT,
Paid INT) ;
INSERT INTO payments (payment_personID, due, paid) VALUES (1, 5, 5);
INSERT INTO payments (payment_personID, due, paid) VALUES (2, 5, 3);
INSERT INTO payments (payment_personID, due, paid) VALUES (3, 5, 8);
CREATE VIEW payment_state AS (
SELECT
payment_personID,
CASE
WHEN COALESCE(paid,0) < COALESCE(due,0) AND due <> 0 THEN 'under'
WHEN COALESCE(paid,0) > COALESCE(due,0) THEN 'over'
WHEN COALESCE(paid,0) = COALESCE(due,0) THEN 'equal'
END AS state
FROM payments);
SELECT *
FROM
person
LEFT JOIN
payment_state
ON personID = payment_personID;
Result:
PERSONID PAYMENT_PERSONID STATE
======== ================ =====
1 1 equal
2 2 under
3 3 over
6 <null> <null>
5 <null> <null>
4 <null> <null>
Works perfectly!
PostgreSQL Example
CREATE TABLE person (personID INT);
INSERT INTO person (personID) VALUES
(1),(2),(3),(4),(5),(6);
CREATE TABLE payments (
payment_personID INT,
Due INT,
Paid INT) ;
INSERT INTO payments (payment_personID, due, paid) VALUES
(1, 5, 5), (2, 5, 3), (3, 5, 8);
CREATE VIEW payment_state AS (
SELECT
payment_personID,
CASE
WHEN COALESCE(paid,0) < COALESCE(due,0) AND due <> 0 THEN 'under'
WHEN COALESCE(paid,0) > COALESCE(due,0) THEN 'over'
WHEN COALESCE(paid,0) = COALESCE(due,0) THEN 'equal'
END AS state
FROM payments);
SELECT *
FROM
person
LEFT JOIN
payment_state
ON personID = payment_personID;
Result:
personid payment_personid state
======== ================ =====
1 1 equal
2 2 under
3 3 over
4 <null> <null>
5 <null> <null>
6 <null> <null>
Also, works perfectly!
Processing algorithm of your view causes this result. For default, MySQL usually chooses MERGE, because it is more efficient. If you create a view with "TEMPTABLE" algorithm you will be able to see NULL for the unmatched rows.
http://www.mysqltutorial.org/create-sql-views-mysql.aspx
CREATE ALGORITHM = TEMPTABLE VIEW payment_state AS (
SELECT
payment_personID,
CASE
WHEN IFNULL(paid,0) < IFNULL(due,0) AND due <> 0 THEN 'under'
WHEN IFNULL(paid,0) > IFNULL(due,0) THEN 'over'
WHEN IFNULL(paid,0) = IFNULL(due,0) THEN 'equal'
END AS state
FROM payments);
This is the normal way LEFT JOIN works. It appends new columns to the result, then fills them with:
values pulled from the table being JOINed if the JOIN succeeds,
NULLs if the JOIN doesn't match (that includes the fields you joined ON)!
Normally there is no distinction between NULLs pulled from real tables (where JOIN matched) and NULLs filled in because the JOIN didn't match. The CASE + IFNULL just look for NULLs and swaps them to 0s (no matter their source). That's why you have results in the state column even in unmatched rows.
As a matter of fact, if you want to know if a given NULL you are looking at was a result of not matching a JOIN, you need to explicitly check this - if all key fields you JOINed on are NULLs, when the NULL in this column is a result of a fill-in. If the fields from key are present in this row yet there is still a NULL in other column, then it is there because it was pulled from the table you JOINed.

Update many-to-many relation table without violating primay contraint

Considering two tables with a many-to-many relation :
Company Speciality
--------- ---------
id id
--------- ---------
1 21
2 22
3 23
4
CompanySpeciality
--------------------------
company_id | speciality_id
--------------------------
1 | 21
1 | 22
4 | 21
4 | 23
I want to delete company 4, and associate its specialities to the company 1.
If I use a simple UPDATE statement on CompanySpeciality to set "company_id = 1 WHERE company_id=4", I'm facing the primary contraint violation because the pair 1|21 already exists.
Is the a way to update the relation table with a single query ? This query should only affect rows that will not be duplicated.
The result would be :
CompanySpeciality
--------------------------
company_id | speciality_id
--------------------------
1 | 21
1 | 22
1 | 23
something to the effect of:
UPDATE CompanySpecialty
SET company_id=1
WHERE company_id=4
AND NOT EXISTS (SELECT * FROM CompanySpecialty cs WHERE cs.company_id=1 AND cs.specialty_id=CompanySpecialty.specialty_id);
should work for you. (i haven't tested the exact syntax, but using a NOT EXISTS clause should help you eliminate the problem of violating primary key restraints).
you will then have to remove the extra records left in the table for company 4 in a separate query:
DELETE FROM CompanySpecialty
WHERE company_id=4;
You don't want to UPDATE, you want to INSERT and ignore dupes:
INSERT IGNORE INTO CompanySpeciality (company_id, speciality_id)
SELECT 1, speciality_id
FROM CompanySpeciality
WHERE company_id=4
you won't be able to both update and delete records in a single query. You can use transactions:
mysql: select, insert, delete and update in one query

Efficiency question - Selecting numeric data from one field

I have a pair of tables and I need to search for numeric values in Table1 that match associated IDs on Table2. For example:
Table1
ID | Item
1 Cat
3 Frog
9 Dog
11 Horse
Table2
Category | Contains
Group 1 1
Group 2 3|9
Group 3 3|9|11
Originally I was thinking a LIKE would work, but if I searched for "1", I'd end up matching "11". I looked into SETs, but the MySQL docs state that the maximum number of elements is 64 and I have over 200 rows of items in Table1. I could wrap each item id with a character (e.g. "|1|") but that doesn't seem very efficient. Each Group will have unique items (e.g., there won't be two Cats in the same Group).
I found a similar topic as my problem and one of the answers suggested making another table, but I don't understand how that would work. A new table containing what, exactly?
The other option I have is to split the Contains into 6 separate columns, since there's never going to be more than 6 items in a Group, but then I'm not sure how to search all 6 columns without relying on six OR queries:
Category | C1 | C2 | C3 | C4 (etc)
Group 1 1 null null null
Group 2 3 9 null null
Group 3 3 9 11 null
SELECT * FROM Table2 WHERE C1 = '1' OR C2 = '1' OR C3 = '1' etc.
I'm not sure what the most efficient way of handling this is. I could use some advice from those with more experience with normalizing this kind of data please. Thank you.
I think it'd be best to create another table to normalize your data, however what you're proposing is not exactly what I'd suggest.
Realistically what you are modeling is a many-to-many relationship between table1 and table2. This means that one row in table1 can be associated with many rows in table2, and vice versa.
In order to create this kind of relation, you need a third table, which we can call rel_table1_table2 for now.
rel_table1_table2 will contain only primary key values from the two associated tables, which in this case seem to be table1.ID and table2.Category.
When you want to associate a row in table1 with a row in table2, you'd add a row to rel_table1_table2 with the primary key values from table1 and table2 respectively.
Example:
INSERT INTO rel_table1_table2 (ID, Category) VALUES (1, "Group 1")
When you need to find out what Items belong to a Category, you'd simply query your association table, for example:
SELECT i.Item from table1 t1 join rel_table1_table2 r on t1.ID=r.ID join table2 t2 on r.Category=t2.Category WHERE t2.Category="Group 3"
Does that make sense?
That "new" table would contain one row for each category an animal belongs to.
create table animal(
animal_id
,name
,primary key(animal_id)
)
create table category(
category_id
,name
,primary key(category_id)
)
create table animal_categories(
animal_id
,category_id
,primary key(animal_id, category_id)
)
For your example data, the animal_categories table would contain:
category_id | animal_id
+-----------+------------+
| 1 | 1 |
| 2 | 3 |
| 2 | 9 |
| 3 | 3 |
| 3 | 9 |
| 3 | 11 |
+-----------+------------+
Instead of using "like" use "REGEXP" so that you don't get "11" when looking for "1"
Break Table2.Contains in another table which joins Item and Category:
Item Item_Category Category
------ -------------- ---------
ID (1)----(*)ItemID Name
Name CategoryID(*)-------(1) ID
Now, your query will look like:
SELECT Category.* FROM Category, Item_Category
WHERE (Item_Category.CategoryID = Category.ID)
AND (Item_Category.ItemID IN (1, 2, 3, 11))
It seems like your problem is the way you are using the rows in Table 2. In databases it should always trigger a red flag when you find yourself using a list of values in a row.
Rather than having each category be in a single row in table 2, how about using the same category in multiple rows, with the Contains column only storing a single value. Your example could be changed to:
Table 1
ID | Item
1 Cat
3 Frog
9 Dog
11 Horse
Table 2
Category | Contains
Group 1 1
Group 2 3
Group 2 9
Group 3 3
Group 3 9
Group 3 11
Now when you want to find out "What items does group 2 contain?", you can write a query for that which selects all of the "Group 2" category rows from Table 2. When you want to find out, "What is the name of item 9", you can write a query that selects a row from Table 1.