I've a big innodb table which contain 10.000.000 rows
this query
SELECT count(id) FROM table_name takes 4-6 seconds to execute
I need to decrease the query execution time
1) can someone advice how to achieve this without changing the table to MyIsam
2) In case we need to use a MySQL cache how can we turn that ON on the server?
Lie about the count in the application. Really, in rare cases you need exact number.
Approximate row count (but as fast as select count(*) in MyISAM) you can get from
SELECT MAX(id) - MIN(id) AS count FROM table
if you still need the exact number you can create a table with the count number and update it with a trigger ON INSERT and ON DELETE
If your table does not change often, using the MySQL query cache is a good solution.
[mysqld]
query_cache_type = 1
query_cache_size = 10M
Also add index on field 'id' if it doesnt exist. Also if you are writing an application store count in a seperate table as you insert or update. Its handy.
Related
Can any one help me to re-write the query to speed up the execution time? It took 37 seconds to execute.
DELETE FROM storefront_categories
WHERE userid IN (SELECT userid
FROM MASTER
where expirydate<'2020-2-4'
)
At the same time, this query took only 4.69 seconds only to execute.
DELETE FROM storefront_categories
WHERE userid NOT IN (SELECT userid FROM MASTER)
The table storefront_categories have 97K records where as in MASTER have 40K records. We have created a index on MASTER.expirydate field.
When deleting 40K rows, expect it to take time. The main cost (assuming adequate indexing and a decent query) is the overhead of transactional semantics of an "atomic" delete. This involves making a copy of each row being deleted, just in case there is a crash. That way, InnoDB can bring the database back to what it had been before the crash.
When deleting 40% of a table, it is much faster to copy the rows to keep into another table then swap tables.
When deleting a large number of rows (regardless of the percentage), it is better to do it in chunks. And it is best to walk through the table based on the PRIMARY KEY.
I discuss both of those techniques, plus others, in http://mysql.rjweb.org/doc.php/deletebig
As for the query formulation:
It is version-dependent; old versions of MySQL did a poor job on some flavors.
NOT IN (SELECT ...) and NOT EXISTS tend to be the worst performers.
IN (SELECT ...) and/or EXISTS may be better.
"Multi-table DELETE is another option. It works like JOIN.
(Bottom line: You did not say what version you are running; I can't predict which formulation will be best.)
My blog avoids the formulation debate.
The query looks fine as it is.
I would suggest the following indexes for optimization:
master(expiry_date, userid)
storefront_categories(userid)
The first index is a covering index for the subquery on master: it means that the database should be able to execute the subquery by looking at the index only (whereas with just expiry_date in the index, it still needs to look at the table data to fetch the related userid).
The second index lets the database optimize the in operation.
I would try with exists :
DELETE
FROM storefront_categories
WHERE EXISTS (SELECT 1
FROM MASTER M
WHERE M.userid = storefront_categories.userid AND
M.expirydate <'2020-02-04'
);
Index would be metter here i would expect index on storefront_categories(userid) & MASTER(userid, expirydate).
I would advise you to use NOT EXISTS with the correct index:
DELETE sc
FROM storefront_categories sc
WHERE NOT EXISTS (SELECT 1
FROM master m
WHERE m.userid = sc.userid AND
m.expirydate < '2020-02-04'
);
The index you want is on master(userid, expirydate). The order of the columns is important. For this version, an index on storefront_categories does not help.
Note that I changed the date format. I recommend using YYYY-MM-DD to avoid ambiguity -- and to use the full 10 characters.
I have the need to get last id (primary key) of a table (InnoDB), and to do so I perform the following query:
SELECT (SELECT `AUTO_INCREMENT` FROM `information_schema`.`TABLES` WHERE `TABLE_SCHEMA` = 'mySchema' AND `TABLE_NAME` = 'myTable') - 1;
which returns the wrong AUTO_INCREMENT. The problem is the TABLES table of information_schema is not updated with the current value, unless I run the following query:
ANALYZE TABLE `myTable`;
Why doesn't MySQL update information_schema automatically, and how could I fix this behavior?
Running MySQL Server 8.0.13 X64.
Q: Why doesn't MySQL update information_schema automatically, and how could I fix this behavior?
A: InnoDB holds the auto_increment value in memory, and doesn't persist that to disk.
Behavior of metadata queries (e.g. SHOW TABLE STATUS) is influenced by setting of innodb_stats_on_metadata and innodb_stats_persistent variables.
https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_stats_on_metadata
Forcing an ANALYZE everytime we query metadata can be a drain on performance.
Other than the settings of those variables, or forcing statistics to be collected by manually executing the ANALYZE TABLE, I don't think there's a "fix" for the issue.
(I think that mostly because I don't think it's a problem that needs to be fixed.)
To get the highest value of an auto_increment column in a table, the normative pattern is:
SELECT MAX(`ai_col`) FROM `myschema`.`mytable`
What puzzles me is why we need to retrieve this particular piece of information. What are we going to use it for?
Certainly, we aren't going to use that in application code to determine a value that was assigned to a row we just inserted. There's no guarantee that the highest value isn't from a row that was inserted by some other session. And we have LAST_INSERT_ID() mechanism to retrieve the value of a row our session just inserted.
If we go with the ANALYZE TABLE to refresh statistics, there's still a small some time between that and a subsequent SELECT... another session could slip in another INSERT so that the value we get from the gather stats could be "out of date" by the time we retrieve it.
SELECT * FROM tbl ORDER BY insert_datetime DESC LIMIT 1;
will get you all the data from, the "latest" inserted row. No need to deal with AUTO_INCREMENT, no need to use subqueries, no ANALYZE, no information_schema, no extra fetch once you have the id, no etc, etc.
Yes, you do need an index on the column that you use to determine what is "latest". Yes, id could be used, but it should not be. AUTO_INCREMENT values are guaranteed to be unique, but nothing else.
I've been running a website, with a large amount of data in the process.
A user's save data like ip , id , date to the server and it is stored in a MySQL database. Each entry is stored as a single row in a table.
Right now there are approximately 24 million rows in the table
Problem 1:
Things are getting slow now, as a full table scan can take too many minutes but I already indexed the table.
Problem 2:
If a user is pulling a select data from table it could potentially block all other users (as the table is locked) access to the site until the query is complete.
Our server
32 Gb Ram
12 core with 24 thread cpu
table use MyISAM engine
EXPLAIN SELECT SUM(impresn), SUM(rae), SUM(reve), `date` FROM `publisher_ads_hits` WHERE date between '2015-05-01' AND '2016-04-02' AND userid='168' GROUP BY date ORDER BY date DESC
Lock to comment from #Max P. If you write to MyIsam Tables ALL SELECTs are blocked. There is only a Table lock. If you use InnoDB there is a ROW Lock that only locks the ROWs they need. Aslo show us the EXPLAIN of your Queries. So it is possible that you must create some new one. MySQL can only handle one Index per Query. So if you use more fields in the Where Condition it can be useful to have a COMPOSITE INDEX over this fields
According to explain, query doesn't use index. Try to add composite index (userid, date).
If you have many update and delete operations, try to change engine to INNODB.
Basic problem is full table scan. Some suggestion are:
Partition the table based on date and dont keep more than 6-12months data in live system
Add an index on user_id
Having a simple mysql table with id (primary key) and hash (index). Some other columns (varchar / int) but no queries on them needed.
My total table size is around 350MB with 2.5M rows.
SELECT COUNT(*) FROM table LIMIT 1;
Is taking about 0.5 - 1s. My innodb buffer is set at 1GB. I've also tried variations (without improvements) like:
SELECT COUNT(id) FROM table LIMIT 1;
SELECT COUNT(*) FROM table WHERE id > 0 LIMIT 1;
A single
SELECT * FROM table WHERE id = 'x' LIMIT 1;
would return within 1 ms (localhost mysql). Any tips on improving the slow count (0.5 - 1s) would be greatly appreciated.
You can find a bried explanation here. In short, innodb has to make a full table scan in order to count all rows (without a where clause, which would utilize an index).
See also this answer.
BTW, I can't see any point in using LIMIT 1 in your query. Since there is no group by clause, it will always return one record.
Some time ago I have found for me, that MyISAM tables make these operations faster. But not all tables and architectures can be MyISAM. Check your schema, maybe you can switch this table to MyISAM.
Also use COUNT(1) instead of COUNT(*)
And another technique for you. Create trigger and save count in separated place. Create counter_table and folowing trigger:
DELIMITER //
CREATE TRIGGER update_counter AFTER INSERT ON table_name
FOR EACH ROW
BEGIN
UPDATE counter_table
SET counter = counter + 1
END;
If I SELECT IDs then UPDATE using those IDs, then the UPDATE query is faster than if I would UPDATE using the conditions in the SELECT.
To illustrate:
SELECT id FROM table WHERE a IS NULL LIMIT 10; -- 0.00 sec
UPDATE table SET field = value WHERE id IN (...); -- 0.01 sec
The above is about 100 times faster than an UPDATE with the same conditions:
UPDATE table SET field = value WHERE a IS NULL LIMIT 10; -- 0.91 sec
Why?
Note: the a column is indexed.
Most likely the second UPDATE statement locks much more rows, while the first one uses unique key and locks only the rows it's going to update.
The two queries are not identical. You only know that the IDs are unique in the table.
UPDATE ... LIMIT 10 will update at most 10 records.
UPDATE ... WHERE id IN (SELECT ... LIMIT 10) may update more than 10 records if there are duplicate ids.
I don't think there can be a one straight-forward answer to your "why?" without doing some sort of analysis and research.
The SELECT queries are normally cached, which means that if you run the same SELECT query multiple times, the execution time of the first query is normally greater than the following queries. Please note that this behavior can only be experienced where the SELECT is heavy and not in scenarios where even the first SELECT is much faster. So, in your example it might be that the SELECT took 0.00s because of the caching. The UPDATE queries are using different WHERE clauses and hence it is likely that their execution times are different.
Though the column a is indexed, but it is not necessary that MySQL must be using the index when doing the SELECT or the UPDATE. Please study the EXPLAIN outputs. Also, see the output of SHOW INDEX and check if the "Comment" column reads "disabled" for any indexes? You may read more here - http://dev.mysql.com/doc/refman/5.0/en/show-index.html and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.
Also, if we ignore the SELECT for a while and focus only on the UPDATE queries, it is obvious that they aren't both using the same WHERE condition - the first one runs on id column and the latter on a. Though both columns are indexed but it does not necessarily mean that all the table indexes perform alike. It is possible that some index is more efficient than the other depending on the size of the index or the datatype of the indexed column or if it is a single- or multiple-column index. There sure might be other reasons but I ain't an expert on it.
Also, I think that the second UPDATE is doing more work in the sense that it might be putting more row-level locks compared to the first UPDATE. It is true that both UPDATES are finally updating the same number of rows. But where in the first update, it is 10 rows that are locked, I think in the second UPDATE, all rows with a as NULL (which is more than 10) are locked before doing the UPDATE. Perhaps MySQL first applies the locking and then runs the LIMIT clause to update only limited records.
Hope the above explanation makes sense!
Do you have a composite index or separate indexes?
If it is a composite index of id and a columns,
In 2nd update statement the a column's index would not be used. The reason is that only the left most prefix indexes are used (unless if a is the PRIMARY KEY)
So if you want the a column's index to be used, you need in include id in your WHERE clause as well, with id first then a.
Also it depends on what storage engine you are using since MySQL does indexes at the engine level, not server.
You can try this:
UPDATE table SET field = value WHERE id IN (...) AND a IS NULL LIMIT 10;
By doing this id is in the left most index followed by a
Also from your comments, the lookups are much faster because if you are using InnoDB, updating columns would mean that the InnoDB storage engine would have to move indexes to a different page node, or have to split a page if the page is already full, since InnoDB stores indexes in sequential order. This process is VERY slow and expensive, and gets even slower if your indexes are fragmented, or if your table is very big
The comment by Michael J.V is the best description. This answer assumes a is a column that is not indexed and 'id' is.
The WHERE clause in the first UPDATE command is working off the primary key of the table, id
The WHERE clause in the second UPDATE command is working off a non-indexed column. This makes the finding of the columns to be updated significantly slower.
Never underestimate the power of indexes. A table will perform better if the indexes are used correctly than a table a tenth the size with no indexing.
Regarding "MySQL doesn't support updating the same table you're selecting from"
UPDATE table SET field = value
WHERE id IN (SELECT id FROM table WHERE a IS NULL LIMIT 10);
Just do this:
UPDATE table SET field = value
WHERE id IN (select id from (SELECT id FROM table WHERE a IS NULL LIMIT 10));
The accepted answer seems right but is incomplete, there are major differences.
As much as I understand, and I'm not a SQL expert:
The first query you SELECT N rows and UPDATE them using the primary key.
That's very fast as you have a direct access to all rows based on the fastest possible index.
The second query you UPDATE N rows using LIMIT
That will lock all rows and release again after the update is finished.
The big difference is that you have a RACE CONDITION in case 1) and an atomic UPDATE in case 2)
If you have two or more simultanous calls of the case 1) query you'll have the situation that you select the SAME id's from the table.
Both calls will update the same IDs simultanously, overwriting each other.
This is called "race condition".
The second case is avoiding that issue, mysql will lock all rows during the update.
If a second session is doing the same command it will have a wait time until the rows are unlocked.
So no race condition is possible at the expense of lost time.