Reduce number of joins in mysql - mysql

I have 12 fixed tables (group, local, element, sub_element, service, ...), each table with different numbers of rows.
The columns 'id_' in all table is a primary key (int). The others columns are of datatype varchar(20). The maximum number of rows in these tables are 300.
Each table was created in this way:
CREATE TABLE group
(
id_G int NOT NULL,
name_group varchar(20) NOT NULL,
PRIMARY KEY (id_G)
);
|........GROUP......| |.......LOCAL.......| |.......SERVICE.......|
| id_G | name_group | | id_L | name_local | | id_S | name_service |
+------+------------+ +------+------------+ +------+--------------+
| 1 | group1 | | 1 | local1 | | 1 | service1 |
| 2 | group2 | | 2 | local2 | | 2 | service2 |
And I have one table that combine all these tables depending on user selects.
The 'id_' come from fixed tables selected by the user are recorded into this table.
This table was crate in this way:
CREATE TABLE group
(
id_E int NOT NULL,
event_name varchar(20) NOT NULL,
id_G int NOT NULL,
id_L int NOT NULL,
...
PRIMARY KEY (id_G)
);
The tables (event) look like this:
|....................EVENT.....................|
| id_E | event_name | id_G | id_L | ... |id_S |
+------+-------------+------+------+-----+-----+
| 1 | mater1 | 1 | 1 | ... | 3 |
| 2 | master2 | 2 | 2 | ... | 6 |
This table get greater each day, an now it has about thousunds of rows.
Column id_E is the primary key (int), event_name is varchar(20).
This table has, in addition of id_E and event_name columns, 12 other columns the came from the fixed tables.
Every time than I need to retrieve information on the event table, to turn more readable, I need to do about 12 joins.
My query look like this where i need to retrieve all columns from table event:
SELECT event_name, name_group, name_local ..., name_service
FROM event
INNER JOIN group on event.id_G = group.id_G
INNER JOIN local on event.id_L = local.id_L
...
INNER JOIN service on event.id_S = service.id_S
WHERE event.id_S = 7 (for example)
This slows down my system performance. Is there a way to reduce the number of joins? I've heard about using Natural Keys, but I think this is not a good idea to form my case thinking in future maintenance.
My queries are taking about 7 seconds and I need to reduce this time.
I changed the WHERE clause and this caused not affect. So, I am sure that the problem is that the query has so many joins.
Could someone give some help? thanks a lot...

MySQL has a great keyword of "STRAIGHT_JOIN" and might be what you are looking for. First, each of your lookup tables (id/description) I have to assume already have an index on the ID column since that is primary key.
Your event table is the one you are querying as the primary basis of the details and joining to the lookups per their respective IDs. As long as your WHERE clause applicable to the EVENT table is optimized, such as the ID you are looking for, it SHOULD be virtually instantaneous.
If it is not, then it might be that MySQL is trying to think for you and take one of the secondary lookup tables and make it a primary basis of the query for whatever reason, such as much lower record count. In this case, add the keyword and try it..
SELECT STRAIGHT_JOIN ... rest of your query
This tells MySQL to do the query in the order you gave it, thus the Event table first and it's where clause on the ID. It should find that one thing, then grab all the corresponding lookup descriptions from the other tables.

Create indexes, concretely use compound indexes, for instance, start creating a compound index for event and groups:
on table events create one for (event id, group id).
then, on the group table create another one for the next relation (group id, local id).
on local do the same with service, and so on...

Related

Could a slow join on two tables be avoided by adding the desired columns from one to the other, or do I have bad relationships?

I have Table A
+-------------+---------+----------+
| id | int | NOT NULL |
| name | varchar | NOT NULL |
| number | varchar | NOT NULL |
| description | varchar | NOT NULL |
| type | varchar | NOT NULL |
+-------------+---------+----------+
I then create table C
+--------+---------+----------+
| B_id | int | NOT NULL |
| number | varchar | NOT NULL |
| qty | int | NOT NULL |
+--------+---------+----------+
Our current query looks like the following:
SELECT C.*, A.* FROM C
JOIN A ON A.number = C.number
WHERE C.B_id = '<insert any number here>'
This join seems to be running a little slow even though we've created an INDEX on A.number. My question is, could we simply avoid the join by taking the desired columns we want from A and add them as columns in table C, or is this bad practice.
I ask this also because at my day job, in our schemas, we have several tables that reference the same column names from table to table. The are indexed, and from millions of rows of data pull seamlessly. Why can I not achieve this with such small tables? Am I setting up the relationships incorrectly?
Yes, violating normalization is sometimes necessary to resolve performance problems.
But you should try to avoid this, and only do it when absolutely necessary. Adding the redundant columns means you need to ensure that the columns in C are always in sync with A. You may be able to do this with triggers, but it adds complexity and performance impacts to all queries that update the tables.
This shouldn't normally be needed for individual columns that can be fetched using a simple join on indexed columns. It can be more useful for aggregated data, since queries that perform grouping and aggregation can be very expensive for large datasets. For instance, if you frequently need transaction totals by date, you could use the Event Scheduler to update a table with these totals every night. Past transactions are not usually changed, so you don't have to worry about this getting out of sync with the raw transactions table.
Your particular query would benefit from this index:
C: INDEX(B_id)
The query then would
find the index rows in C for the given B_id
reach over to C's data BTree to get C.*
use C.number to reach into A's INDEX(number)
reach over to A's data Btree to get A.*
If you don't need all of *, there may be further optimizations (by using a "covering" index).
Note: The above assumes ENGINE=InnoDB.

MySQL Database Table optimization for FASTER Querying & Performance

I have 2 tables in a my MySQL Database.
Let's call 1st main, 2nd final.
TABLE `main` has the structure | TABLE `final` has the structure
|
`id` --> PRIMARY KEY (Auto Increment) | `id` --> PRIMARY KEY (Auto Increment)
| `id_main` --> ?? (Need help here)
|
id | name | info | id | id_main | name | info(changed)
--------------------- | ---------------------------------------
1 | Peter | 5,9 | 1 | 2 | Butters | 0.3,34
2 | Butters | 3,3 | 2 | 4 | Stewie | 1.2,4.4
3 | Stan | 2,96 | 3 | 1 | Peter | 5.7,0.9
4 | Stewie | 1,84 | 4 | 3 | Stan | 4.8,0.74
After analysing data in main the results get put into final.
As you can see final has an extra column (id_main) which points back to main.id
In actuality these 2 tables are 100 million+ rows each, my problem arises while performing SQL queries.
How should final especially (id & id_main) be configured so that Querying from main to final is the fastest.
Can I do away with final.id (PRIMARY KEY, Auto Increment) & keep
final.id_main (As an UNIQUE Index?)
OR
Should I keep id AS PRIMARY KEY (AI) & final.id_main AS UNIQUE Index?
I would be making calls like:
int id_From_Main= 10000;
SELECT `id_main` FROM `final` WHERE `id`='"+id_From_Main+"'
If there's a 1:1 relation between those tables, I don't see any reason why they would need two separate auto-incremented primary keys.
I would remove the final.id column and have the final.id_main as a non-auto-incremented primary key and a foreign key to the main.id column.
In general, you can also have a table without a primary key at all. It depends on if you want to be able to select specific individual rows or not.
I don't understand your query SELECT id_main FROM final WHERE id = '"+id_From_Main+"' — you're trying to select the value of ID from main by ID from main. What's the purpose, why are you trying to get the value you already have?
Anyway, you're not providing enough information to give you a qualified answer. You have to optimize you data structures according to queries you'll be doing.
Make sure you have indexes on columns which you are using in the WHERE clausule. If you're selecting by final.id_main, have an index on that column. If you're selecting by final.id_main and final.name, have a composite index on both columns, etc.
Do you really need to have the name column in both tables? It's a bad database design, unless it's some performance optimization (to avoid a join).
So, you should:
collect all queries you're currently using, set proper indexes according to them
remove any unnecessary columns (e.g. final.id, final.name)
use the EXPLAIN on your queries to get execution information (you can also use the Explain analyzer to help you interpret the results)
you can try query profiling
In mysql, you have to define id as PK because it is auto_increment. Define id_main as UNIQUE.

Very slow MySQL query performance

I've a query that takes about 18 seconds to finish:
THE QUERY:
SELECT YEAR(c.date), MONTH(c.date), p.district_id, COUNT(p.owner_id)
FROM commission c
INNER JOIN partner p ON c.customer_id = p.id
WHERE (c.date BETWEEN '2018-01-01' AND '2018-12-31')
AND (c.company_id = 90)
AND (c.source = 'ACTUAL')
AND (p.id IN (3062, 3063, 3064, 3065, 3066, 3067, 3068, 3069, 3070, 3071,
3072, 3073, 3074, 3075, 3076, 3077, 3078, 3079, 3081, 3082, 3083, 3084,
3085, 3086, 3087, 3088, 3089, 3090, 3091, 3092, 3093, 3094, 3095, 3096,
3097, 3098, 3099, 3448, 3449, 3450, 3451, 3452, 3453, 3454, 3455, 3456,
3457, 3458, 3459, 3460, 3461, 3471, 3490, 3491, 6307, 6368, 6421))
GROUP BY YEAR(c.date), MONTH(c.date), p.district_id
The commission table has around 2,8 millions of records, of which 860 000+ belong to the current year 2018. The partner table has at this moment 8600+ records.
RESULT
| `YEAR(c.date)` | `MONTH(c.date)` | district_id | `COUNT(c.id)` |
|----------------|-----------------|-------------|---------------|
| 2018 | 1 | 1 | 19154 |
| 2018 | 1 | 5 | 9184 |
| 2018 | 1 | 6 | 2706 |
| 2018 | 1 | 12 | 36296 |
| 2018 | 1 | 15 | 13085 |
| 2018 | 2 | 1 | 21231 |
| 2018 | 2 | 5 | 10242 |
| ... | ... | ... | ... |
55 rows retrieved starting from 1 in 18 s 374 ms
(execution: 18 s 368 ms, fetching: 6 ms)
EXPLAIN:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
|----|-------------|-------|------------|-------|------------------------------------------------------------------------------------------------------|----------------------|---------|-----------------|------|----------|----------------------------------------------|
| 1 | SIMPLE | p | null | range | PRIMARY | PRIMARY | 4 | | 57 | 100 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | c | null | ref | UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73,IDX_6F7146F09395C3F3,IDX_6F7146F0979B1AD6,IDX_6F7146F0AA9E377A | IDX_6F7146F09395C3F3 | 5 | p.id | 6716 | 8.33 | Using where |
DDL:
create table if not exists commission (
id int auto_increment
primary key,
date date not null,
source enum('ACTUAL', 'EXPECTED') not null,
customer_id int null,
transaction_id varchar(255) not null,
company_id int null,
constraint UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73 unique (company_id, transaction_id, source),
constraint FK_6F7146F09395C3F3 foreign key (customer_id) references partner (id),
constraint FK_6F7146F0979B1AD6 foreign key (company_id) references companies (id)
) collate=utf8_unicode_ci;
create index IDX_6F7146F09395C3F3 on commission (customer_id);
create index IDX_6F7146F0979B1AD6 on commission (company_id);
create index IDX_6F7146F0AA9E377A on commission (date);
I noted that by removing the partner IN condition MySQL takes only 3s. I tried to replace it doing something crazy like this:
AND (',3062,3063,3064,3065,3066,3067,3068,3069,3070,3071,3072,3073,3074,3075,3076,3077,3078,3079,3081,3082,3083,3084,3085,3086,3087,3088,3089,3090,3091,3092,3093,3094,3095,3096,3097,3098,3099,3448,3449,3450,3451,3452,3453,3454,3455,3456,3457,3458,3459,3460,3461,3471,3490,3491,6307,6368,6421,'
LIKE CONCAT('%,', p.id, ',%'))
and the result was about 5s... great! but it's a hack.
WHY this query is taking a very long execution time when I uses IN statement? workaround, tips, links, etc. Thanks!
MySQL can use one index at a time. For this query you need a compound index covering the aspects of the search. Constant aspects of the WHERE clause should be used before range aspects like:
ALTER TABLE commission
DROP INDEX IDX_6F7146F0979B1AD6,
ADD INDEX IDX_6F7146F0979B1AD6 (company_id, source, date)
Here's what the Optimizer sees in your query.
Checking whether to use an index for the GROUP BY:
Functions (YEAR()) in the GROUP BY, so no.
Multiple tables (c and p) mentioned, so no.
For a JOIN, Optimizer will (almost always) start with one, then reach into the other. So, let's look at the two options:
If starting with p:
Assuming you have PRIMARY KEY(id), there is not much to think about. It will simply use that index.
For each row selected from p, it will then look into c, and any variation of this INDEX would be optimal.
c: INDEX(company_id, source, customer_id, -- in any order (all are tested "=")
date) -- last, since it is tested as a range
If starting with c:
c: INDEX(company_id, source, -- in any order (all are tested "=")
date) -- last, since it is tested as a range
-- slightly better:
c: INDEX(company_id, source, -- in any order (all are tested "=")
date, -- last, since it is tested as a range
customer_id) -- really last -- added only to make it "covering".
The Optimizer will look at "statistics" to crudely decide which table to start with. So, add all the indexes I suggested.
A "covering" index is one that contains all the columns needed anywhere in the query. It is sometimes wise to extend a 'good' index with more columns to make it "covering".
But there is a monkey wrench in here. c.customer_id = p.id means that customer_id IN (...) effectively exists. But now there are two "range-like" constraints -- one is an IN, the other is a 'range'. In some newer versions, the Optimizer will happily jump around due to the IN and still be able to do "range" scans. So, I recommend this ordering:
Test(s) of column = constant
Test(s) with IN
One 'range' test (BETWEEN, >=, LIKE with trailing wildcard, etc)
Perhaps add more columns to make it "covering" -- but don't do this step if you end up with more than, say, 5 columns in the index.
Hence, for c, the following is optimal for the WHERE, and happens to be "covering".
INDEX(company_id, source, -- first, but in any order (all "=")
customer_id, -- "IN"
date) -- last, since it is tested as a range
p: (same as above)
Since there was an IN or "range", there is no use seeing if the index can also handle the GROUP BY.
A note on COUNT(x) -- it checks that x is NOT NULL. It is usually just as correct to say COUNT(*), which counts the number of rows without any extra checking.
This is a non-starter since it hides the indexed column (id) in a function:
AND (',3062,3063,3064,3065,3066,...6368,6421,'
LIKE CONCAT('%,', p.id, ',%'))
With your LIKE-hack you are tricking optimizer so it uses different plan (most probably using IDX_6F7146F0AA9E377A index on the first place).
You should be able to see this in explain.
I think the real issue in your case is the second line of explain: server executing multiple functions (MONTH, YEAR) for 6716 rows and then trying to group all these rows. During this time all these 6716 rows should be stored (in memory or on disk that is based on your server configuration).
SELECT COUNT(*) FROM commission WHERE (date BETWEEN '2018-01-01' AND '2018-12-31') AND company_id = 90 AND source = 'ACTUAL';
=> How many rows are we talking about?
If the number in above query is much lower then 6716 I'd try to add covering index on columns customer_id, company_id, source and date. Not sure about the best order as it depends on data you have (check cardinality for these columns). I'd started with index (date, company_id, source, customer_id). Also, I'd add unique index (id, district_id, owner_id) on partner.
It is also possible to add additional generated stored columns _year and _month (if your server is a bit old you can add normal columns and fill them in with trigger) to rid off the multiple function executions.

MYSQL, PHP, order by not working, primary key

I am generating a mySQL query from PHP.
Part of the query re-orders a table based on some variables (which do not include the primary key).
The code doesn't produce errors, however the table is not sorted.
I echo'd out the SQL code, and it looks correct, I tried running it directly in phpMyAdmin, and it runs also without error, but the table is still not sorted as requested.
alter table anavar order by dset_name, var_id;
I am pretty sure that this has to do with the fact that I have a primary key variable (UID) which is not present in the sort.
Both prior and post running the query the table remains ordered by UID. Deleting UID and re-running the query results in a correctly sorted table, but this seems like an overkill solution.
Any suggestions?
create table t2
( id int auto_increment primary key,
someInt int not null,
thing varchar(100) not null,
theWhen datetime not null,
key(theWhen) -- creates an index on theWhen
);
-- my table now has 2 indexes on it
-- see it by running `show indexes from t2`
-- truncate table t2;
insert t2(someInt,thing,theWhen) values
(17,'chopstick','2016-05-08 13:00:00'),
(14,'alligator','2016-05-01'),
(11,'snail','2016-07-08 19:00:00');
select * from t2; -- returns in physical order (the primary key `id`)
select * from t2 order by thing; -- returns via thing, which has no index anyway
select * from t2 order by theWhen,thing; -- partial index use
note that indexes aren't even used until you have a significant number of rows in a db anyway
Edit (new data comes in)
insert t2 (someInt,thing,theWhen) values (777,'apple',now());
select t2.id,t2.thing,t2.theWhen,#rnk:=#rnk+1 as rank
from t2
cross join (select #rnk:=0) xParams
order by thing;
+----+-----------+---------------------+------+
| id | thing | theWhen | rank |
+----+-----------+---------------------+------+
| 2 | alligator | 2016-05-01 00:00:00 | 1 |
| 4 | apple | 2016-09-04 15:04:50 | 2 |
| 1 | chopstick | 2016-05-08 13:00:00 | 3 |
| 3 | snail | 2016-07-08 19:00:00 | 4 |
+----+-----------+---------------------+------+
Focus on the fact that you can maintain your secondary indices and generate a rank on the fly whenever you want.

mysql fast select query without reading all db

I have a large database with two tables: stat and total.
The example of the relation is the following:
STAT:
| ID | total event |
+--------+--------------+
| 7 | 2 |
| 8 | 1 |
TOTAL:
|ID | Event |
+---+--------------+
| 7 | "hello" |
| 7 | "everybody" |
| 8 | "hi" |
This is a very simplified version; also consider that STAT table could have 500K records, and for each STAT I can have about 200 TOTAL rows.
Currently, if I run a simple SELECT query in table TOTAL the system is terribly slow.
Could anyone help me with some advice for the creation of the TOTAL table? Is it possible to say to MySQL that the id column is already sorted so that there is no reason to scan all the rows till the end where, for example, id=7?
Add INDEX(ID) to your tables (both), if you did not already.
SELECT COUNT(*) FROM TOTAL WHERE ID=7 -> if ID is indexed, this will be fast.
You can add an index, and furthermore you can partition your table.
As per #ypercube's comment, tables are not stored in a sorted state, so one cannot "tell" this to the database. However you can add an index on tables to make them faster to search.
One important thing to check - it looks like TOTAL.ID is intended as a foreign key - if so, the table TOTAL should have a primary key called ID. Rename the existing column of that name to STAT_ID instead, so it is obvious what it is a foreign key for. Then add an index on STAT_ID.
Lastly, as a point of style, I recommend that you make your table and column names case-insensitive, and write them in lower-case. It makes it easier to read SQL when keywords are in upper case, and database objects are in lower.