Assistance in Improving a query's performace - mysql

Overview:
I have a system that builds the query statements. Some of which must join some tables to others based on set parameters passed into the system. When running some performance tests on the queries created I noticed that some of the queries were doing FULL TABLE SCANS, which in many cases, from what I've read is not good for large tables.
What I'm trying to do:
1 - Remove the full table scans
2 - Speed up the Query
3 - Find out if there is a more efficient query I can have the system build instead
The Query:
SELECT a.p_id_one, b.p_id_two, b.fk_id_one, c.fk_id_two, d.fk_id_two,
d.id_three, d.fk_id_one
FROM ATable a
LEFT JOIN BTable b ON a.p_id_one = b.fk_id_one
LEFT JOIN CTable c ON b.p_id_two = c.fk_id_two
LEFT JOIN DTable d ON b.p_id_two = d.fk_id_two
WHERE a.p_id_one = 1234567890
The Explain
Query Time
Showing rows 0 - 10 (11 total, Query took 0.0016 seconds.)
Current issues:
1 - Query time for my system/DBMS (phpmyadmin) takes between 0.0013 seconds and 0.0017 seconds.
What have I done to fix?
The full table scans or 'ALL' type queries are being ran on tables ('BTable', 'DTable') so I've tried to use FORCE INDEX on the appropriate ids.
Using FORCE INDEX removes the full table scans but it doesn't speed up the
performance.
I double checked my fk_constraints and index relationships to ensure I'm not missing anything. So far everything checks out.
2 - Advisor shows multiple warnings a few relate back to the full table scans and the indexes.
Question(s):
Assume all indexes are available and created
1 - Is there a better way to perform this query?
2 - How many joins are too many joins?
3 - Could the joins be the problem?
4 - Does the issue rest within the WHERE clause?
5 - What optimize technique/tool could I have missed?
6 - How can I get this query to perform at a speed between 0.0008 and 0.0001?
If images and visuals are needed to help clarify my situation please do ask in a comment below. I appreciate any and all assistance.
Thank you =)

"p_id_one" does not tell us much. Is this an auto_increment? Real column names sometimes gives important clues of cardinality and intent. As Willem said, "there must be more to this issue" and "what is the overall problem".
LEFT -- do you need it? It prevents certain forms of optimizations; remove it if the 'right' table row is not optional.
WHERE a.p_id_one = 1234567890 needs INDEX(p_id_one). Is that the PRIMARY KEY already? In that case, an extra INDEX is not needed. (Please provide SHOW CREATE TABLE.)
Are those really the columns/expressions you are SELECTing? It can make a difference -- especially when suggesting a "covering index" as an optimization.
Please provide the output from EXPLAIN SELECT ... (That is not the discussion you did provide.) That output would help with clues of 1:many, cardinality, etc.
If these are FOREIGN KEYs, you already have indexes on b.fk_id_one, c.fk_id_two, d.fk_id_two; so that is nothing more to do there.
1.6ms is an excellent time for a query involving 4 tables. Don't plan on speeding it up significantly. You probably handle hundreds of connections doing thousands of similar queries per second. Do you need more than that?
Are you using InnoDB? That is better at concurrent access.
Your example does not seem to have any full table scans; please provide an example that does.
ALL on a 10-row table is nothing to worry about. On a million-row table it is a big deal. Will your tables grow significantly? You should note this when worrying about ALL: A full table scan is sometimes faster than using the 'perfect' index. The optimizer decide on the scan when the estimated number of rows is more than about 20% of the table. A table scan is efficient because it is scanning straight through the table, even if skipping 80% of the rows. Using an index is more complex -- the index is scanned, but for each row found in the index, a lookup is needed into the data to find the row. If you see ALL when you don't think you should, then probably the index is not very selective. Don't worry.
Don't use FORCE INDEX -- although it may help the query with today's values, it may hurt tomorrow's query.

Related

Group by, Order by and Count MySQL performance

I have the next query to get the 15 most sold plates in a place:
This query is taking 12 seconds to execute over 100,000 rows. I think this execution takes too long, so I am searching a way to optmize the query.
I ran the explain SQL command on PHPMyAdmin and i got this:
[![enter image description here][1]][1]
According to this, the main problem is on the p table which is scanning the entire table, but how can I fix this? The id of p table is a primary key, do I need to set it also as an index? Also, is there anything else I can do to make the query runs faster?
You can make a relationship between the two tables.
https://database.guide/how-to-create-a-relationship-in-mysql-workbench/
Beside this you can also use a left join so you won't load the whole right table in.
Order by is a slow function in MySQL, if you are using code afterwards you can just do it in the code that is much faster than order by.
I hope I helped and Community feel free to edit :)
You did include the explain plan but you did not give any information about your table structure, data distribution, cardinality nor volumes. Assuming your indices are accurate and you have an even data distribution, the query is having to process over 12 million rows - not 100,000. But even then, that is relatively poor performance. But you never told us what hardware this sits on nor the background load.
A query with so many joins is always going to be slow - are they all needed?
the main problem is on the p table which is scanning the entire table
Full table scans are not automatically bad. The cost of dereferencing an index lookup as opposed to a streaming read is about 20 times more. Since the only constraint you apply to this table is its joins to other tables, there's nothing in the question you asked to suggest there is much scope for improving this.

Explain query - MySQL not using index from table

I'm trying to learn the explain statement in MySQL but ran into a wall.
For my experiment, I created two tables (each having 10 rows) and ran explain over a simple join. Naturally, no indexes were used and 10*10 = 100 rows were scanned (I've added the output in images because the very long output of EXPLAIN was being wrapped on itself. The code is also in this pastebin):
I then added primary keys and indexes and reissued the explain command:
But as you can see, the users table is still being fully scanned by MySQL, as if there was no primary key. What is going wrong?
This is a bit long for a comment.
Basically, your tables are too small. You cannot get reasonable performance indications on such small data -- the query only needs to load two data pages into memory for the query. A nested loop join requires 100 comparisons. By comparison, loading indexes and doing the binary search is probably about the same amount of effort, if not more.
If you want to get a feel for explain, then use tables with a few tens of thousands of rows.
You seem to be asking about EXPLAIN, INDEXing, and optimizing particular SELECTs.
For this:
select u.name
from users as u
join accounts as a on u.id = a.user_id
where a.amount > 1000;
the optimizer will pick between users and accounts for which table to look at first. Then it will repeatedly reach into the other table.
Since you say a.amount > ... but nothing about u, the optimizer is very likely to pick a first.
If a.amount > 1000 is selective enough (less than, say, 20% of the rows) and there is INDEX(amount), it will use that index. Else it will do a table scan of a.
To reach into u, it needs some index starting with id. Keep in mind that a PRIMARY KEY is an index.
This, and many more basics, are covered in my index cookbook.
See also myxlpain for a discussion of EXPLAIN.
Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.
EXPLAIN FORMAT=JSON SELECT... is also somewhat cryptic, but it does have more details than a regular EXPLAIN.
well,
As your main filter has '>' comparison operator, it does full table scan Because it may or may not return all rows.
as you join the 'accounts' table with 'user_id' column, it shows the 'user_id' index in Possible Keys, but it doesn't use it, because of the FULL TABLE SCAN process.

MySQL: SELECT millions of rows

I have a dataset of about 32Million rows that I'm trying to export to provide some data for an analytics project.
Since my final data query will be large, I'm trying limit the number of rows I have to work with initially. I'm doing this by running a create table on the main table (32Million) records with a join on another table that's about 5k records. I made indexes on the columns where the JOIN is taking place, but not on the other where conditions. This query has been running for over 4 hours now.
What could I have done to speed this up and if there is something, would it be worth it to stop this query, do it, and start over? The data set is static and I'm not worried about preserving anything or proper database design long-term. I just need to get the data out and will discard the schema.
A simplified version of the query is below
CREATE TABLE RELEVANT_ALERTS
SELECT a.time, s.name,s.class, ...
FROM alerts a, sig s
WHERE a.IP <> 0
AND a.IP not between x and y
AND s.class in ('c1','c2','c3')
Try explain select to see what is going on first of all. Are your indexes properly setup?
Also you are not joining the two tables with their primary keys, is that on purpose? Where is your primary key and foreign key?
Can you also provide us with a table schema?
Also, could your hardware be the problem? How much does RAM and processing power does it have? I hope you are not running this on single core processor as that is bound to take a long time
I have a table with 2,000,000,000 (2 billion rows, 219 Gig) and it doesn't take more than 0.3 seconds to execute similar query to yours with properly setup indexes. This is on a 8 (2ghz) core processor with 64gb ram. So not the beefiest setup for the size of the database, but the indexes are held in the memory, so the queries can be fast.
It should not take that long. Can you please make sure you have indexes on the a.IP And s.class.
Also cant you put a.IP <> = 0 comparison after a.IP not between x and y, so you already have a filtered set for 0 comparison (as that will compare every single record I believe)
You can move s.class as the first comparison depending on how many rows s table has to really speed up the comparison.
Your join is a full cross-join it seems. That will take really really long in any case. Is there no common field in both tables? Why do you need this join? If you really want to do this, you should first create two tables from alerts and sig that fulfill your WHERE conditions and then join the resulting tables if you must.
Agree with Vish.
In addition, depending on your query workload, you could probably change the internal storage engine to MyISAM if it is currently InnoDB, since Mysiam is more optimized for read-only queries.
ALTER TABLE my_table ENGINE = MyISAM;
Also, you could change the isolation level of your database. For example, to set isolation level to read uncommitted:
SET tx_isolation = 'READ-UNCOMMITTED';
first try "explain select" to see what is slowing it down then try to add some indexes if you don't have any
Trust me, 4 hours is very normal: because you have a table of 32 millions rows and with the join you juste multiply 32 millions with 5000 so your query have a complexity of 320000000 * 5000 ...
to avoid that i suggest you to use an ETL WORFLOW ... Like Microsoft SSIS...
Withh SSIS you can reduce a lot the query's TIME...

Optimizing the Joining of Multiple MySQL Views

I have multiple views in my database that I am trying to perform a JOIN on when certain queries get very complex. As a worst case I would have to join 3 views with the following stats:
View 1 has 60K+ rows with 26 fields.
View 2 has 60K+ rows with 15 fields.
View 3 has 80K+ rows with 8 fields.
Joining views 1 and 2 seem to be no problem, but anytime I try to join the third view the query hangs. I'm wondering if there are any best practices I should be following to keep these querys from hanging. I've tried to use the smallest fields possible (medium/small ints where possible, ect).
We are using MySQL 5.0.92 community edition with MyISAM tables. Not sure if InnoDB would be more efficient.
As a last resort I thinking of splitting the one query into two, hitting views 1 & 2 with the first query, and then view 3 separately with the 3rd. Is there any downside to this other than making 2 queries?
Thanks.
You need to use EXPLAIN to understand why the performance is poor.
I wouldn't think you need to worry about MyISAM vs. InnoDB for this particular read performance just yet. MyISAM versus InnoDB
I am going to post my comments as an answer:
1) Take a look at the EXPLAIN command and see what it says.
2) Check the performance of the individual views. Are they as fast as you think on their own?
3) The columns you are using in your WHERE or JOIN clauses, do the underlying tables have indexes that apply to them? Something to have in mind:
A composite index (an index with more than one column) with columns
(a, b) would not help when you query only for b. It helps with a, and
a + b, but not with only b. That's why the single index you added
improved the situation
4) Are you using the all the columns and all the views? If you don't wouldn't it be simpler to take a look at the views and come up with a query instead?
If its possible to get what how the original VIEWs are defined, then use that as a basis to create your own single query might be a better approach... Way back, another person had similar issues on their query. He needed to get back to the raw table of one such view to ensure it had proper indexes to accept the optimization of the query he was trying to perform. Remember a view is a subset of something else and does not have an index to work with. So, if you can't take advantage of an index at the root table of a view, you could see such a performance hit.

MySQL indexes - how many are enough?

I'm trying to fine-tune my MySQL server so I check my settings, analyzing slow-query log, and simplify my queries if possible.
Sometimes it is enough if I am indexing correctly, sometimes not. I've read somewhere (please correct me if this is stupidity) that more indexes than I need make the same effect, like if I don't have any of indexes.
How many indexes are enough? You can say it depends on hundreds of factors, but I'm curious about how can I clean up my mysql-slow.log enough to reduce server load.
Furthermore, I saw some "interesting" log entries like this:
# Query_time: 0 Lock_time: 0 Rows_sent: 22 Rows_examined: 44
SELECT * FROM `categories` ORDER BY `orderid` ASC;
The table in question contains exactly 22 rows, index set in orderid. Why is this query showing up in the log after all? Why examine 44 rows if it only contains 22?
The amount of indexing and the line of doing too much will depend on a lot of factors. On small tables like your "categories" table you usually don't want or need an index and it can actually hurt performance. The reason being is that it takes I/O (i.e. time) to read an index and then more I/O and time to retrieve the records associated with the matched rows. An exception being when you only query the columns contained within the index.
In your example you are retrieving all the columns and with only 22 rows and it may be faster to just do a table scan and sort those instead of using the index. The optimizer may/should be doing this and ignoring the index. If that is the case, then the index is just taking up space with no benefit. If your "categories" table is accessed often, you may want to consider pinning it into memory so the db server keeps it accessible without having to goto the disk all the time.
When adding indexes you need to balance out disk space, query performance, and the performance of updating and inserting into the tables. You can get away with more indexes on tables that are static and don't change much as opposed to tables with millions of updates a day. You'll start feeling the affects of index maintenance at that point. What is acceptable in your environment though is and can only be determined by you and your organization.
When doing your analysis, be sure to generate/update your table and index statistics so that you can be assured of accurate calculations.
As a general rule, you should have indexes on all primary keys (you don't have a choice in that), all foreign keys, and any other fields you commonly use to fetch rows.
For example, if I commonly look up users by username, I would have that indexed, even if user ID was the primary key.
How many indexes depends entirely on the queries your running, what kinds of joins are being done (if any), the kind of data stored in the table and how big the tables are (as well as many other factors). There's really no exact science to it. The greatest tool in your arsenal for figuring out how to optimize a query is explain. Using explain you can find out what kind of joins are being down, what possible keys could be used and which key (if any) was used as well as how many rows were examined for each table in the join.
Using this information you can decide how to key your tables and/or modify your queries to make them more efficient. The syntax for explain is very simple.
EXPLAIN SELECT * FROM `categories` ORDER BY `orderid` ASC;
Note, explain does not actually run the query. So if you're using this to debug a query that takes 5 minutes to run, explain will still be very fast.
You do need to be careful when adding indexes though as they do cause inserts and updates to go slower and on very large tables this performance hit can become noticeable. Especially if that same table is used for a lot of reads. While adding a lot of indexes generally won't kill the performance of a query, you should still only add them as yo
Also keep in mind that MySQL will use a maximum of one index per select statement (although if you are using a join, it can also use one for each join). So indexing just because is a waste of disk space and will slow the database down on writes. If you commonly use a where statement on two columns, do one index containing both of those columns, it will be significantly faster than indexing just one alone.
An index can speed up a SELECT query, but it will slow down INSERT/UPDATE/DELETE queries because they need to update the index as well, not just the row.
This is just personal opinion (I've got no facts to back it up), but I think that if there is a query that is taking a long time and an index would speed it up - go for it! "Too many" indexes would be if you added indexes that didn't do any good (e.g. there were no queries it would speed up). For example, a silly thing to do would be to place an index on every column "just because".
There's no magic number for the "best" number of indexes. The basic rule is this: add indexes for queries that are used often and/or need to run quickly.
Having "too many" indexes shouldn't slow down queries, but it each index added adds a small amount of time to add/update items in the db (since it modifies the indices as well), and a small amount of space. However, if you're just adding indexes as required, this is probably not a big concern.