I have a table with 10 columns, It contains about 8 million rows. I'm doing a statistically job with this table. The problem is when I run more time, when id grows, the select query slow more.
Here is the query:
select * from transaction
where id > :pointer
AND col_a = :col_a
AND col_b >= :from
order by id ASC limit 5000
Both of 3 fields in the query have been created index.
After each loop, I will run query again with new pointer value, the value of pointer is id of last row of previous result set, I don't use OFFSET.
Finally, I took half of day to run the script with this query, too long.
So how can I fix this performance problem.
Both of 3 fields in the query have been created index
Mysql can use only one index per table in a query. If you created separate indexes for each field, then MySQL can use only one of them to speed up your query, not all 3.
I would create a multi-column index on id, col_a, col_b fields (in this order). This way a single index can be used to satisfy all 3 conditions in the where criteria and the order by as well.
After each loop, I will run query again with new pointer value
Your code suggests that you use some kind of a parametrised query, but we cannot determine if it is a proper MySQL prepared statement. If it is not, then consider using MySQL prepared statement for this process.
Prepare the query before the loop and then in the loop just adjust the parameters and then execute the prepared statement again. This way MySQL will parse the query only once, not each time the code loops.
Related
I have a table which has 40,000 rows. From code side around at same exact second about 20,000 users will need to run a query to find their related row. What is better approach here ?
Loading all 40,000 rows in cache and running a for loop on them to find record ?
Simply query database.
Here is what query will look like. Where parameter will be users IP.
SELECT * FROM iplist where ipfrom <= INET_ATON('xxx.xxx.xx.xx') limit 1;
MySQL already caches the data, in the form of the InnoDB Buffer Pool. As pages of data and indexes are requested, they are copied to RAM, and used for any subsequent queries.
You should define an index for the column you search on, if you don't already have an index or a primary key defined for that column:
ALTER TABLE iplist ADD INDEX (ipfrom);
Then searching for a specific value in that column won't require a table-scan, it will narrow down the search efficiently.
Note when you use LIMIT, you should also use ORDER BY, otherwise the row you get will be the first one read in index order, which may not always be what you want. If you use ORDER BY redundantly (i.e. the same order it reads the index), then it will be optimized out.
SELECT * FROM iplist where ipfrom <= INET_ATON(?) ORDER BY ipfrom LIMIT 1;
I’m facing problems with Mysql and Index. My problem is that I created an index to mysql database with 50M rows.
I’m trying to do this :
SELECT userid from database where userid=4 ORDER by id Desc limit 10
I created an index in userid, and when I explain the query it shows me in Rows field 92000 records, but it takes at least 15-40s to display results in first selection. If I try again a second later of first select it runs very fast, 0.02s.
I realized that if I change Select without use index my explain goes to 3000 in Rows field, but have the same time problem.
Other information that can be important is that my table is myIsam.
What’s the problem with my index and my query?
My Google cloud sql table have 1126571 rows currently and adding minimum 30 thousand every day.When execute the query :
select count(distinct sno) as tot from visits
sql prompt it will generate following error:
Error 0: Unable to execute statement
. Is Cloud SQL Query liable to 60 seconds exceed exception. How can overcome the problem when the table become large.
Break the table into two tables. One to receive new visits ... transactions ... one for reporting. Index the reporting table. Transfer and clear data on a regular basis.
The transaction table will remain relatively small and thus it will be fast to count. The reporting table will be fast to count because of the index.
add an INDEX in your column sno and it will improve its performance.
ALTER TABLE visits ADD INDEX (sno)
Try to split your select query for many parts, for example, the first select query must be limited to 50000, and then the second select query must be started from 50000 and limited to 50000 and so on.
You can do that by this scenario :
1- Get records count.
2- Make a loop and make it end at the records count.
3- For each loop, make the select query select 50000 records and append the results to a datatable (depends on what's your programming language)
4- In the next loop, you must start selecting from where previous loop ended, for example, the second query must select the next 50000 records and so on.
You can specify your select starting index by this SQL query statement:
SELECT * FROM mytable somefield LIMIT 50000 OFFSET 0;
Then you will get the whole data that you want.
NOTE : make a test to see what's the maximum records count can be loaded in 60 sec, this will decrease your loops and therefore, increased performance.
Which is the complexity of the "group by" statement in MySQL?
I am managing vaery big tables and I also would like to know if there is any method to calculate how much time a query is going to take.
This question is impossible to answer with knowledge of what the entire query looks like. Some group bys can be prohibitively expensive while others are very cheap, it all depends on how the indexes in the database are set up, if the value you group by can be cached etc.
For example, this is a very cheap group by:
CREATE TABLE t (a INT, KEY(a));
SELECT * FROM WHERE 1 GROUP BY a;
Since a is an index.
But for something like this, it's very expensive since it would require a table scan.
CREATE TABLE t (a INT);
SELECT * FROM WHERE 1 GROUP BY a;
Generally if a key is not available, the database will creates a temporary table in memory for group by clauses, go through all the values, insert each value into the temporary table with an index to the corresponding row in the result set, then it will select from the temporary table, pick the first row from each column and send that back as the result. Depending on if you use the "extra" rows per group by clause (ie. using MAX(), GROUP_CONCAT() or similar) it will need to fetch all rows again.
You can use EXPLAIN to figure out what strategy MySQL will use, the 'Extra' (in ascending order of cost to execute) 'Using index' if an index can be used, 'Using filesort' if reading all rows from disk will be necessary, and column will contain 'Using Temporary' if a temporary will be required
If I SELECT IDs then UPDATE using those IDs, then the UPDATE query is faster than if I would UPDATE using the conditions in the SELECT.
To illustrate:
SELECT id FROM table WHERE a IS NULL LIMIT 10; -- 0.00 sec
UPDATE table SET field = value WHERE id IN (...); -- 0.01 sec
The above is about 100 times faster than an UPDATE with the same conditions:
UPDATE table SET field = value WHERE a IS NULL LIMIT 10; -- 0.91 sec
Why?
Note: the a column is indexed.
Most likely the second UPDATE statement locks much more rows, while the first one uses unique key and locks only the rows it's going to update.
The two queries are not identical. You only know that the IDs are unique in the table.
UPDATE ... LIMIT 10 will update at most 10 records.
UPDATE ... WHERE id IN (SELECT ... LIMIT 10) may update more than 10 records if there are duplicate ids.
I don't think there can be a one straight-forward answer to your "why?" without doing some sort of analysis and research.
The SELECT queries are normally cached, which means that if you run the same SELECT query multiple times, the execution time of the first query is normally greater than the following queries. Please note that this behavior can only be experienced where the SELECT is heavy and not in scenarios where even the first SELECT is much faster. So, in your example it might be that the SELECT took 0.00s because of the caching. The UPDATE queries are using different WHERE clauses and hence it is likely that their execution times are different.
Though the column a is indexed, but it is not necessary that MySQL must be using the index when doing the SELECT or the UPDATE. Please study the EXPLAIN outputs. Also, see the output of SHOW INDEX and check if the "Comment" column reads "disabled" for any indexes? You may read more here - http://dev.mysql.com/doc/refman/5.0/en/show-index.html and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.
Also, if we ignore the SELECT for a while and focus only on the UPDATE queries, it is obvious that they aren't both using the same WHERE condition - the first one runs on id column and the latter on a. Though both columns are indexed but it does not necessarily mean that all the table indexes perform alike. It is possible that some index is more efficient than the other depending on the size of the index or the datatype of the indexed column or if it is a single- or multiple-column index. There sure might be other reasons but I ain't an expert on it.
Also, I think that the second UPDATE is doing more work in the sense that it might be putting more row-level locks compared to the first UPDATE. It is true that both UPDATES are finally updating the same number of rows. But where in the first update, it is 10 rows that are locked, I think in the second UPDATE, all rows with a as NULL (which is more than 10) are locked before doing the UPDATE. Perhaps MySQL first applies the locking and then runs the LIMIT clause to update only limited records.
Hope the above explanation makes sense!
Do you have a composite index or separate indexes?
If it is a composite index of id and a columns,
In 2nd update statement the a column's index would not be used. The reason is that only the left most prefix indexes are used (unless if a is the PRIMARY KEY)
So if you want the a column's index to be used, you need in include id in your WHERE clause as well, with id first then a.
Also it depends on what storage engine you are using since MySQL does indexes at the engine level, not server.
You can try this:
UPDATE table SET field = value WHERE id IN (...) AND a IS NULL LIMIT 10;
By doing this id is in the left most index followed by a
Also from your comments, the lookups are much faster because if you are using InnoDB, updating columns would mean that the InnoDB storage engine would have to move indexes to a different page node, or have to split a page if the page is already full, since InnoDB stores indexes in sequential order. This process is VERY slow and expensive, and gets even slower if your indexes are fragmented, or if your table is very big
The comment by Michael J.V is the best description. This answer assumes a is a column that is not indexed and 'id' is.
The WHERE clause in the first UPDATE command is working off the primary key of the table, id
The WHERE clause in the second UPDATE command is working off a non-indexed column. This makes the finding of the columns to be updated significantly slower.
Never underestimate the power of indexes. A table will perform better if the indexes are used correctly than a table a tenth the size with no indexing.
Regarding "MySQL doesn't support updating the same table you're selecting from"
UPDATE table SET field = value
WHERE id IN (SELECT id FROM table WHERE a IS NULL LIMIT 10);
Just do this:
UPDATE table SET field = value
WHERE id IN (select id from (SELECT id FROM table WHERE a IS NULL LIMIT 10));
The accepted answer seems right but is incomplete, there are major differences.
As much as I understand, and I'm not a SQL expert:
The first query you SELECT N rows and UPDATE them using the primary key.
That's very fast as you have a direct access to all rows based on the fastest possible index.
The second query you UPDATE N rows using LIMIT
That will lock all rows and release again after the update is finished.
The big difference is that you have a RACE CONDITION in case 1) and an atomic UPDATE in case 2)
If you have two or more simultanous calls of the case 1) query you'll have the situation that you select the SAME id's from the table.
Both calls will update the same IDs simultanously, overwriting each other.
This is called "race condition".
The second case is avoiding that issue, mysql will lock all rows during the update.
If a second session is doing the same command it will have a wait time until the rows are unlocked.
So no race condition is possible at the expense of lost time.