How to do performance tuning for huge MySQL table? - mysql

I have a MySQL table MtgoxTrade(id,time,price,amount,type,tid) with more than 500M+ records, i need to query the three fields (time,price,amount) from all records:
SELECT time, price, amount FROM MtgoxTrade;
It spends 110 seconds on Win7 which is too slow,my questions are:
Will a compound index help on this? Note that my SQL query has no WHERE clause
Any other optimization could be made improve the query performance here?
Updated: I'm sorry that MtgoxTrade table have totally 6 fields: (id,time,price,amount,type,tid). My SQL only need to query three fields (time,price,amount). And i already tried to add composite index on (time,price,amount), but seems no help.

If this is your real query - NO, nothing could possibly help. Come to think of it - you are asking to deliver contents of whole 500M+ table! It will be slow no matter what you do - whole table must be processed.
If you can constrain your program logic to only process some smaller subset of your table, then it is possible to make it faster.
For example, you can process only results for last month using WHERE clause:
SELECT time, price, amount
FROM MtgoxTrade
WHERE time BETWEEN '2013-09-01' AND '2013-09-21'
This can work really fast, but you would still need to add index on time field, like this:
CREATE INDEX mtgoxtrade_time_idx ON mtgoxtrade (time);

Related

Group by, Order by and Count MySQL performance

I have the next query to get the 15 most sold plates in a place:
This query is taking 12 seconds to execute over 100,000 rows. I think this execution takes too long, so I am searching a way to optmize the query.
I ran the explain SQL command on PHPMyAdmin and i got this:
[![enter image description here][1]][1]
According to this, the main problem is on the p table which is scanning the entire table, but how can I fix this? The id of p table is a primary key, do I need to set it also as an index? Also, is there anything else I can do to make the query runs faster?
You can make a relationship between the two tables.
https://database.guide/how-to-create-a-relationship-in-mysql-workbench/
Beside this you can also use a left join so you won't load the whole right table in.
Order by is a slow function in MySQL, if you are using code afterwards you can just do it in the code that is much faster than order by.
I hope I helped and Community feel free to edit :)
You did include the explain plan but you did not give any information about your table structure, data distribution, cardinality nor volumes. Assuming your indices are accurate and you have an even data distribution, the query is having to process over 12 million rows - not 100,000. But even then, that is relatively poor performance. But you never told us what hardware this sits on nor the background load.
A query with so many joins is always going to be slow - are they all needed?
the main problem is on the p table which is scanning the entire table
Full table scans are not automatically bad. The cost of dereferencing an index lookup as opposed to a streaming read is about 20 times more. Since the only constraint you apply to this table is its joins to other tables, there's nothing in the question you asked to suggest there is much scope for improving this.

Assistance in Improving a query's performace

Overview:
I have a system that builds the query statements. Some of which must join some tables to others based on set parameters passed into the system. When running some performance tests on the queries created I noticed that some of the queries were doing FULL TABLE SCANS, which in many cases, from what I've read is not good for large tables.
What I'm trying to do:
1 - Remove the full table scans
2 - Speed up the Query
3 - Find out if there is a more efficient query I can have the system build instead
The Query:
SELECT a.p_id_one, b.p_id_two, b.fk_id_one, c.fk_id_two, d.fk_id_two,
d.id_three, d.fk_id_one
FROM ATable a
LEFT JOIN BTable b ON a.p_id_one = b.fk_id_one
LEFT JOIN CTable c ON b.p_id_two = c.fk_id_two
LEFT JOIN DTable d ON b.p_id_two = d.fk_id_two
WHERE a.p_id_one = 1234567890
The Explain
Query Time
Showing rows 0 - 10 (11 total, Query took 0.0016 seconds.)
Current issues:
1 - Query time for my system/DBMS (phpmyadmin) takes between 0.0013 seconds and 0.0017 seconds.
What have I done to fix?
The full table scans or 'ALL' type queries are being ran on tables ('BTable', 'DTable') so I've tried to use FORCE INDEX on the appropriate ids.
Using FORCE INDEX removes the full table scans but it doesn't speed up the
performance.
I double checked my fk_constraints and index relationships to ensure I'm not missing anything. So far everything checks out.
2 - Advisor shows multiple warnings a few relate back to the full table scans and the indexes.
Question(s):
Assume all indexes are available and created
1 - Is there a better way to perform this query?
2 - How many joins are too many joins?
3 - Could the joins be the problem?
4 - Does the issue rest within the WHERE clause?
5 - What optimize technique/tool could I have missed?
6 - How can I get this query to perform at a speed between 0.0008 and 0.0001?
If images and visuals are needed to help clarify my situation please do ask in a comment below. I appreciate any and all assistance.
Thank you =)
"p_id_one" does not tell us much. Is this an auto_increment? Real column names sometimes gives important clues of cardinality and intent. As Willem said, "there must be more to this issue" and "what is the overall problem".
LEFT -- do you need it? It prevents certain forms of optimizations; remove it if the 'right' table row is not optional.
WHERE a.p_id_one = 1234567890 needs INDEX(p_id_one). Is that the PRIMARY KEY already? In that case, an extra INDEX is not needed. (Please provide SHOW CREATE TABLE.)
Are those really the columns/expressions you are SELECTing? It can make a difference -- especially when suggesting a "covering index" as an optimization.
Please provide the output from EXPLAIN SELECT ... (That is not the discussion you did provide.) That output would help with clues of 1:many, cardinality, etc.
If these are FOREIGN KEYs, you already have indexes on b.fk_id_one, c.fk_id_two, d.fk_id_two; so that is nothing more to do there.
1.6ms is an excellent time for a query involving 4 tables. Don't plan on speeding it up significantly. You probably handle hundreds of connections doing thousands of similar queries per second. Do you need more than that?
Are you using InnoDB? That is better at concurrent access.
Your example does not seem to have any full table scans; please provide an example that does.
ALL on a 10-row table is nothing to worry about. On a million-row table it is a big deal. Will your tables grow significantly? You should note this when worrying about ALL: A full table scan is sometimes faster than using the 'perfect' index. The optimizer decide on the scan when the estimated number of rows is more than about 20% of the table. A table scan is efficient because it is scanning straight through the table, even if skipping 80% of the rows. Using an index is more complex -- the index is scanned, but for each row found in the index, a lookup is needed into the data to find the row. If you see ALL when you don't think you should, then probably the index is not very selective. Don't worry.
Don't use FORCE INDEX -- although it may help the query with today's values, it may hurt tomorrow's query.

mysql determining which part of a query is the slowest

I've written a select statement in mySQL. The duration is 50 seconds, and the fetch is 206 seconds. This is a long time. I'd like to understand WHICH part of my query is inefficient so I can improve its run time, but I'm not sure how to do that in mySQL.
My table has a little over 1,000,000 records. I have an index built in as well:
KEY `idKey` (`id`,`name`),
Here is my query:
SELECT name, id, alt_id, count(id), min(cost), avg(resale), code from
history where name like "%brian%" group by id;
I've looked at the mySQL Execution Plan, but I can't garner from that what is wrong:
If I highlight over the "Full Index Scan" part of the image, I see this:
Access Type: Index
Full Index Scan
Key/Index:
Used Key Parts: id, name
Possible Keys: idKey, id-Key, nameKey
Attach Condition:
(`allhistory`.`history`.`name` LIKE '%brian%')
Rows Examined Per Scan: 1098181
Rows Produced Per Join: 1098181
Filter: 100%
I know I can just scan a smaller subset of data by adding a LIMIT 100 into the query, and while it makes the time much shorter, (28 second duration, 0.000 sec Fetch,) I also want to see all the records - so I don't really want to put a limit on it.
Can someone more knowledgeable on this topic suggest where my query, my index, or my methodology might be inefficient for what I'm trying to accomplish?
This question has a solution only in mysql full text search functionality.
I don't consider the use of like a workable solution. Table scans are not a solution with millions of rows.
I wrote up an answer in this link, I hope you find a workable solution for yours with that reference and quick walk thru.
Here is one of the Mysql Manual Pages on Full Text Search.
I'm thinking your covered index may be backwards. Try switching the order (name, id). That way the WHERE clause can take advantage of the index.

Does it improve performance to index a date column?

I have a table with millions of rows where one of the columns is a TIMESTAMP and against which I frequently select for date ranges. Would it improve performance any to index that column, or would that not furnish any notable improvement?
EDIT:
So, I've indexed the TIMESTAMP column. The following query
select count(*) from interactions where date(interaction_time) between date('2013-10-10') and date(now())
Takes 3.1 seconds.
There are just over 3 million records in the interactions table.
The above query produces a result of ~976k
Does this seem like a reasonable amount of time to perform this task?
If you want improvement on the efficiency of queries, you need 2 things:
First, index the column.
Second, and this is more important, make sure the conditions on your queries are sargable, i.e. that indexes can be used. In particular, functions should not be used on the columns. In your example, one way to write the condition would be:
WHERE interaction_time >= '2013-10-10'
AND interaction_time < (CURRENT_DATE + INTERVAL 1 DAY)
The general rule with indexes is they speed retrieval of data with large data sets, but SLOW the insertion and update of records.
If you have millions of rows, and need to select a small subset of them, then an index most likely will improve performance when doing a SELECT. (If you need most or all of them if will make little or no difference.)
Without an index, a table scan (ie read of every record to locate required ones) will occur which can be slow.
With tables with only a few records, a table scan can actually be faster than an index, but this is not your situation.
Another consideration is how many discrete values you have. If you only have a handful of different dates, indexing probably won't help much if at all, however if you have a wide range of dates the index will most likely help.
One caveat, if the index is very big and won't fit in memory, you may not get the performance benefits you might hope for.
Also you need to consider what other fields you are retrieving, joins etc, as they all have an impact.
A good way to check how performance is impacted is to use the EXPLAIN statement to see how mySQL will execute the query.
It would improve performance if:
there are at least "several" different values
your query uses a date range that would select less than "most" of the rows
To find out for sure, use EXPLAIN to show what index is being used. Use explain before creating the index and again after - you should see that the new index is being used or not. If its being used, you can be confident performance is better.
You can also simply compare query timings.
For
select count(*) from interactions where date(interaction_time) between date('2013-10-10') and date(now())
query to be optimized you need to do the following:
Use just interaction_time instead of date(interaction_time)
Create an index that covers interaction_time column
(optional) Use just '2013-10-10' not date('2013-10-10')
You need #1 because indexes are only used if the columns are used in comparisons as-is, not as arguments in another expressions.
Adding an index on date column definitely increases performance.
My table has 11 million rows, and a query to fetch rows which were updated on a particular date took the following time according to conditions:
Without index: ~2.5s
With index: ~5ms

Retrieve min and max values from different tables with same strucure

I have some logs tables with the same structure. Each tables is related to a site and count billion of entries. The reason of this split is to perform quick and efficient query, because 99.99% of the query are related to the site.
But at this time, I would like to retrieve the min and max value of a column of these tables?
I can't manage to write the SQL request. Should I use UNION?
I am just looking for the request concept, not the final SQL request.
You could use a UNION, yes. Something like this should do:
SELECT MAX(PartialMax) AS TotalMax
FROM
( SELECT MAX(YourColumn) AS PartialMax FROM FirstTable UNION ALL SELECT MAX(YourColumn) AS PartialMax FROM SecondTable ) AS X;
If you have an index over the column you want to find a MAX inside, you should have very good performance as the query should seek to the end of the index on that column to find the maximum value very rapidly. Without an index on that column, the query has to scan the whole table to find the maximum value since nothing inherently orders it.
Added some details to address a concern about "enormous queries".
I'm not sure what you mean by "enormous". You could create a VIEW that does the UNIONs for you; then, you use the view and it will make the query very small:
SELECT MAX(YourColumn) FROM YourView;
but that just optimizes for the size of your query's text. Why do you believe it is important to optimize for that? The VIEW can be helpful for maintenance -- if you add or remove a partition, just fix the view appropriately. But a long query text shouldn't really be a problem.
Or by "enormous", are you worried about the amount of I/O the query will do? Nothing can help that much, aside from making sure each table has an index on YourColumn so that maximum value on each partition can be found very quickly.