Doing a more efficient COUNT - mysql

I have a page that loads some high-level statistics. Nothing fancy, just about 5 metrics. There are two particular queries that takes about 5s each to load:
+ SELECT COUNT(*) FROM mybooks WHERE book_id IS NOT NULL
+ SELECT COUNT(*) FROM mybooks WHERE is_media = 1
The table has about 500,000 rows. Both columns are indexed.
This information changes all the time, so I don't think that caching here would work. What are some techniques to use that could speed this up? I was thinking:
Create a denormalized stats table that is updated whenever the columns are updated.
Load the slow queries via ajax (this doesn't speed it up, but it allows the page to load immediately).
What would be suggested here? The requirement is that the page loads within 1s.
Table structure:
id (pk, autoincrementing)
book_id (bigint)
is_media (boolean)

The stats table is probably the biggest/quickest bang for buck. Assuming you have full control of your MySQL server and don't already have job scheduling in place to take care of this, you could remedy this by using the mysql event scheduler. As Vlad mentioned above, your data will be a bit out of date. Here is a quick example:
Example stats table
CREATE TABLE stats(stat VARCHAR(20) PRIMARY KEY, count BIGINT);
Initialize your values
INSERT INTO stats(stat, count)
VALUES('all_books', 0), ('media_books', 0);
Create your event that updates every 10 minutes
DELIMITER |
CREATE EVENT IF NOT EXISTS updateBookCountsEvent
ON SCHEDULE EVERY 10 MINUTE STARTS NOW()
COMMENT 'Update book counts every 10 minutes'
DO
BEGIN
UPDATE stats
SET count = (SELECT COUNT(*) FROM mybooks)
WHERE stat = 'all_books';
UPDATE stats
SET count = (SELECT COUNT(*) FROM mybooks WHERE is_media = 1)
WHERE stat = 'media_books';
END |
Check to see if it executed
SELECT * FROM mysql.event;
No? Check to see if the event scheduler is enabled
SELECT ##GLOBAL.event_scheduler;
If it is off you'll want to enable it on startup using the param --event-scheduler=ON or setting it in you my.cnf. See this answer or the docs.

There are a couple of things you can do to speed up the query.
Run optimize table on your mybooks table
Change your book_id column to be an int unsigned, which allows for 4.2 billions values and takes 4 bytes instead of 8 (bigint), making the table and index more efficient.
Also I'm not sure if this will work but rather than doing count(*) I would just select the column in the where clause. So for example your first query would be SELECT COUNT(book_id) FROM mybooks WHERE book_id IS NOT NULL

Related

MySQL query with a union for 100 tables vs 100 separate queries what is more performant?

I have a legacy project using MySQL DB with MyISAM tables, and the DB design is far from perfect. And I encountered with N+1 problem because of having entity table and amount of entity_SOME_ID_subentity tables with similar base structure and some random additional columns, where SOME_ID is a primary key value of records from entity table.
For sure this is absolutely not great, but let's assume this is our initial condition and cannot be changed in the short term. So I need to optimize a query where I need to select some amount of records from entity table and some aggregated data from related entity_SOME_ID_subentity table. The aggregation will use only columns that are similar in all subentity tables. Initially, this was implemented as a single query to entity table and then a lot of queries in a loop to corresponding entity_SOME_ID_subentity tables.
I cannot use joins since each entity has a separate subentity table, so maybe using a union can help to reduce the number of queries down to 2, where the second one will use a union for subqueries to each required subentity table.
An additional note is that I need to do sorting of all stuff before pagination will be applied.
Can you advice is it worth at all to try the approach with the union in this situation, or performance be bad in both cases? Or maybe you have better ideas about how this can be handled?
Update:
The query to entity table is trivial and looks like:
SELECT col1, col2, col3 FROM entity WHERE ... LIMIT 10 OFFSET 0;
And the query to entity_SOME_ID_subentity looks like:
SELECT count(id) total, min(start_date) started, max(completion_date) completed
FROM entity_1234_subentity
ORDER BY started;
Here entity_1234_subentity is an example of how table names look like.
And using unions can look like:
SELECT count(id) total, min(start_date) started, max(completion_date) completed
FROM entity_1111_subentity
UNION
(SELECT count(id) total, min(start_date) started, max(completion_date) completed
FROM entity_2222_subentity)
UNION
(SELECT count(id) total, min(start_date) started, max(completion_date) completed
FROM entity_3333_subentity)
...
ORDER BY started
That's a typical design which seemed to be smart by the time of being created but turns out to be absolutely not scalable... Have seen a lot projects like this. If I where you, I would create an index for the search function.
You could
a) User external indexing/search engine SOLR or ElasticSearch.
b) in your RDBS create a index-table containing the recurring information from all sub-tables (like id, start_date, completion_date in your case) which gets updated either on every sub-table update, or, if there are to many places in the code you had to change, every hour/day whatever by cronjob.
It smells like someone tried to implement table inheritance and left a mess.
You can fix this using JSON and views, possibly faster than you can write out 100 unions.
In a transaction (or at least test it on a copy) modify entity so it can hold all the information in the subtables.
Add all the common columns from the subtables into entity.
Add a JSON column to hold the grab bag of data.
alter table entity add column start_date datetime;
alter table entity add column completion_date datetime;
alter table entity add column data json;
If you're not into JSON, you can use a traditional key/value table to store the extra columns, though this loses some flexibility because the value must be a string.
create table entity_data (
entity_id bigint not null,
key varchar(255) not null,
value varchar(255) not null
);
Then, for each subtable...
Update each entity's info with its subentity info. The common columns are updated directly. The rest turn into JSON.
update entity e
inner join entity_123_subentity s on e.id = 123
set
start_date = s.start_date,
completion_date = s.completion_date,
data = json_object(`extra1`, s.extra1, `extra2`, s.extra2)
where id = 123
Once that's done and verified, drop the subtable and replace it with a view.
drop entity_123_subentity;
create view entity_123_subentity
(id, start_date, completion_date, extra1, extra2)
as
select
id, start_date, completion_date, data->>'$.extra1', data->>'$.extra2'
from entity
where subid = 123;
Repeat until there are no more subtables.
New queries can be written efficiently, and old queries will still work until they can be rewritten.

Fastest way to remove a HUGE set of row keys from a table via primary key? [duplicate]

I have two tables. Let's call them KEY and VALUE.
KEY is small, somewhere around 1.000.000 records.
VALUE is huge, say 1.000.000.000 records.
Between them there is a connection such that each KEY might have many VALUES. It's not a foreign key but basically the same meaning.
The DDL looks like this
create table KEY (
key_id int,
primary key (key_id)
);
create table VALUE (
key_id int,
value_id int,
primary key (key_id, value_id)
);
Now, my problem. About half of all key_ids in VALUE have been deleted from KEY and I need to delete them in a orderly fashion while both tables are still under high load.
It would be easy to do
delete v
from VALUE v
left join KEY k using (key_id)
where k.key_id is null;
However, as it's not allowed to have a limit on multi table delete I don't like this approach. Such a delete would take hours to run and that makes it impossible to throttle the deletes.
Another approach is to create cursor to find all missing key_ids and delete them one by one with a limit. That seems very slow and kind of backwards.
Are there any other options? Some nice tricks that could help?
Any solution that tries to delete so much data in one transaction is going to overwhelm the rollback segment and cause a lot of performance problems.
A good tool to help is pt-archiver. It performs incremental operations on moderate-sized batches of rows, as efficiently as possible. pt-archiver can copy, move, or delete rows depending on options.
The documentation includes an example of deleting orphaned rows, which is exactly your scenario:
pt-archiver --source h=host,D=db,t=VALUE --purge \
--where 'NOT EXISTS(SELECT * FROM `KEY` WHERE key_id=`VALUE`.key_id)' \
--limit 1000 --commit-each
Executing this will take significantly longer to delete the data, but it won't use too many resources, and without interrupting service on your existing database. I have used it successfully to purge hundreds of millions of rows of outdated data.
pt-archiver is part of the Percona Toolkit for MySQL, a free (GPL) set of scripts that help common tasks with MySQL and compatible databases.
Directly from MySQL documentation
If you are deleting many rows from a large table, you may exceed the
lock table size for an InnoDB table. To avoid this problem, or simply
to minimize the time that the table remains locked, the following
strategy (which does not use DELETE at all) might be helpful:
Select the rows not to be deleted into an empty table that has the same structure as the original table:
INSERT INTO t_copy SELECT * FROM t WHERE ... ;
Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:
RENAME TABLE t TO t_old, t_copy TO t;
Drop the original table:
DROP TABLE t_old;
No other sessions can access the tables involved while RENAME TABLE
executes, so the rename operation is not subject to concurrency
problems. See Section 12.1.9, “RENAME TABLE Syntax”.
So in Your case You may do
INSERT INTO value_copy SELECT * FROM VALUE WHERE key_id IN
(SELECT key_id FROM `KEY`);
RENAME TABLE value TO value_old, value_copy TO value;
DROP TABLE value_old;
And according to what they wrote here RENAME operation is quick and number of records doesn't affect it.
What about this for having a limit?
delete x
from `VALUE` x
join (select key_id, value_id
from `VALUE` v
left join `KEY` k using (key_id)
where k.key_id is null
limit 1000) y
on x.key_id = y.key_id AND x.value_id = y.value_id;
First, examine your data. Find the keys which have too many values to be deleted "fast". Then find out which times during the day you have the smallest load on the system. Perform the deletion of the "bad" keys during that time. For the rest, start deleting them one by one with some downtime between deletes so that you don't put to much pressure on the database while you do it.
May be instead of limit divide whole set of rows into small parts by key_id:
delete v
from VALUE v
left join KEY k using (key_id)
where k.key_id is null and v.key_id > 0 and v.key_id < 100000;
then delete rows with key_id in 100000..200000 and so on.
You can try to delete in separated transaction batches.
This is for MSSQL, but should be similar.
declare #i INT
declare #step INT
set #i = 0
set #step = 100000
while (#i< (select max(VALUE.key_id) from VALUE))
BEGIN
BEGIN TRANSACTION
delete from VALUE where
VALUE.key_id between #i and #i+#step and
not exists(select 1 from KEY where KEY.key_id = VALUE.key_id and KEY.key_id between #i and #i+#step)
set #i = (#i+#step)
COMMIT TRANSACTION
END
Create a temporary table!
drop table if exists batch_to_delete;
create temporary table batch_to_delete as
select v.* from `VALUE` v
left join `KEY` k on k.key_id = v.key_id
where k.key_id is null
limit 10000; -- tailor batch size to your taste
-- optional but may help for large batch size
create index batch_to_delete_ix_key on batch_to_delete(key_id);
create index batch_to_delete_ix_value on batch_to_delete(value_id);
-- do the actual delete
delete v from `VALUE` v
join batch_to_delete d on d.key_id = v.key_id and d.value_id = v.value_id;
To me this is a kind of task the progress of which I would want to see in a log file. And I would avoid solving this in pure SQL, I would use some scripting in Python or other similar language. Another thing that would bother me is that lots of LEFT JOINs with WHERE IS NOT NULL between the tables might cause unwanted locks, so I would avoid JOINs either.
Here is some pseudo code:
max_key = select_db('SELECT MAX(key) FROM VALUE')
while max_key > 0:
cur_range = range(max_key, max_key-100, -1)
good_keys = select_db('SELECT key FROM KEY WHERE key IN (%s)' % cur_range)
keys_to_del = set(cur_range) - set(good_keys)
while 1:
deleted_count = update_db('DELETE FROM VALUE WHERE key IN (%s) LIMIT 1000' % keys_to_del)
db_commit
log_something
if not deleted_count:
break
max_key -= 100
This should not bother the rest of the system very much, but may take long. Another issue is to optimize the table after you deleted all those rows, but this is another story.
If the target columns are properly indexed this should go fast,
DELETE FROM `VALUE`
WHERE NOT EXISTS(SELECT 1 FROM `key` k WHERE k.key_id = `VALUE`.key_id)
-- ORDER BY key_id, value_id -- order by PK is good idea, but check the performance first.
LIMIT 1000
Alter the limit from 10 to 10000 to get acceptable performance, and rerun it several times.
Also take in mind that this mass deletes will perform locks and backups for each row ..
multiple the execution time for each row several times ...
There are some advanced methods to prevent this, but the easiest workaround
is just to put a transaction around this query.
Do you have SLAVE or Dev/Test environment with same data?
The first step is to find out your data distribution if you are worried about a particular key having 1 million value_ids
SELECT v.key_id, COUNT(IFNULL(k.key_id,1)) AS cnt
FROM `value` v LEFT JOIN `key` k USING (key_id)
WHERE k.key_id IS NULL
GROUP BY v.key_id ;
EXPLAIN PLAN for above query is much better than adding
ORDER BY COUNT(IFNULL(k.key_id,1)) DESC ;
Since you don't have partitioning on key_id (too many partitions in your case) and want to keep database running during your delete process, the option is to delete in chucks with SLEEP() between different key_id deletes to avoid overwhelming server. Don't forget to keep an eye on your binary logs to avoid disk filling.
The quickest way is :
Stop application so data is not changed.
Dump key_id and value_id from VALUE table with only matching key_id in KEY table by using
mysqldump YOUR_DATABASE_NAME value --where="key_id in (select key_id from YOUR_DATABASE_NAME.key)" --lock-all --opt --quick --quote-names --skip-extended-insert > VALUE_DATA.txt
Truncate VALUE table
Load data exported in step 2
Start Application
As always, try this in Dev/Test environment with Prod data and same infrastructure so you can calculate downtime.
Hope this helps.
I am just curious what the effect would be of adding a non-unique index on key_id in table VALUE. Selectivity is not high at all (~0.001) but I am curious how that would affect the join performance.
Why don't you split your VALUE table into several ones according to some rule like key_id module some power of 2 (like 256 for example)?

Update mysql big table hang too time

Performance problem on update MySql MyISAM big table making column ascending order based on an index on same table
My problem is that the server have only 4 GB memory.
I have to do an update query like this: previous asked question
Mine is this:
set #orderid = 0;
update images im
set im.orderid = (select #orderid := #orderid + 1)
ORDER BY im.hotel_id, im.idImageType;
On im.hotel_id, im.idImageType I have an ascending index.
On im.orderid I have an ascending index too.
The table have 21 millions records and is an MyIsam table.
The table is this:
CREATE TABLE `images` (
`photo_id` int(11) NOT NULL,
`idImageType` int(11) NOT NULL,
`hotel_id` int(11) NOT NULL,
`room_id` int(11) DEFAULT NULL,
`url_original` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
`url_max300` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
`url_square60` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
`archive` int(11) NOT NULL DEFAULT '0',
`orderid` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`photo_id`),
KEY `idImageType` (`idImageType`),
KEY `hotel_id` (`hotel_id`),
KEY `hotel_id_idImageType` (`hotel_id`,`idImageType`),
KEY `archive` (`archive`),
KEY `room_id` (`room_id`),
KEY `orderid` (`orderid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The problem is the performance: hang for several minutes!
Server disk go busy too.
My question is: there is a better manner to achieve the same result?
Have I to partition the table or something else to increase the performance?
I cannot modify server hardware but can tuning MySql application db server settings.
best regards
Tanks to every body. Yours answers help me much. I think that now I have found a better solution.
This problem involve in two critical issue:
efficient paginate on large table
update large table.
To go on efficient paginate on large table I have found a solution by make a previous update on the table but doing so I fall in issues on the 51 minute time needed to the updates and consequent my java infrastructure time out (spring-batch step).
Now by yours help, I found two solution to paginate on large table, and one solution to update large table.
To reach this performance the server need memory. I try this solution on develop server using 32 GB memory.
common solution step
To paginate follow a fields tupla like I needed I have make one index:
KEY `hotel_id_idImageType` (`hotel_id`,`idImageType`)
to achieve the new solution we have to change this index by add the primary key part to the index tail KEY hotel_id_idImageType (hotel_id,idImageType, primary key fields):
drop index hotel_id_idImageType on images;
create index hotelTypePhoto on images (hotel_id, idImageType, photo_id);
This is needed to avoid touch table and use only the index file ...
Suppose we want the 10 records after the 19000000 record.
The decimal point is this , in this answers
solution 1
This solution is very practice and not needed the extra field orderid and you have not to do any update before the pagination:
select * from images im inner join
(select photo_id from images
order by hotel_id, idImageType, photo_id
limit 19000000,10) k
on im.photo_id = k.photo_id;
To make the table k on my 21 million table records need only 1,5 sec because it use only the three field in index hotelTypePhoto so haven't to access to the table file and work only on index file.
The order was like the original required (hotel_id, idImageType) because is included in (hotel_id, idImageType, photo_id): same subset...
The join take no time so every first time the paginate is executed on the same page need only 1,5 sec and this is a good time if you have to execute it in a batch one on 3 months.
On production server using 4 GB memory the same query take 3,5 sec.
Partitioning the table do not help to improve performance.
If the server take it in cache the time go down or if you do a jdbc params statment the time go down too (I suppose).
If you have to use it often, it have the advantage that it do not care if the data change.
solution 2
This solution need the extra field orderid and need to do the orderid update one time by batch import and the data have not to change until the next batch import.
Then you can paginate on the table in 0,000 sec.
set #orderid = 0;
update images im inner join (
select photo_id, (#orderid := #orderid + 1) as newOrder
from images order by hotel_id, idImageType, photo_id
) k
on im.photo_id = k.photo_id
set im.orderid = k.newOrder;
The table k is fast almost like in the first solution.
This all update take only 150,551 sec much better than 51 minute!!! (150s vs 3060s)
After this update in the batch you can do the paginate by:
select * from images im where orderid between 19000000 and 19000010;
or better
select * from images im where orderid >= 19000000 and orderid< 19000010;
this take 0,000sec to execute first time and all other time.
Edit after Rick comment
Solution 3
This solution is to avoid extra fields and offset use. But need too take memory of the last page read like in this solution
This is a fast solution and can work on online server production using only 4GB memory
Suppose you need to read last ten records after 20000000.
There is two scenario to take care:
You can start read it from the first to the 20000000 if you need all of it like me and update some variable to take memory of last page read.
you have to read only the last 10 after 20000000.
In this second scenario you have to do a pre query to find the start page:
select hotel_id, idImageType, photo_id
from images im
order by hotel_id, idImageType, photo_id limit 20000000,1
It give to me:
+----------+-------------+----------+
| hotel_id | idImageType | photo_id |
+----------+-------------+----------+
| 1309878 | 4 | 43259857 |
+----------+-------------+----------+
This take 6,73 sec.
So you can store this values in variable to next use.
Suppose we named #hot=1309878, #type=4, #photo=43259857
Then you can use it in a second query like this:
select * from images im
where
hotel_id>#hot OR (
hotel_id=#hot and idImageType>#type OR (
idImageType=#type and photo_id>#photo
)
)
order by hotel_id, idImageType, photo_id limit 10;
The first clause hotel_id>#hot take all records after the actual first field on scrolling index but lost some record. To take it we have to do the OR clause that take on the first index field all remained unread records.
This take only 0,10 sec now.
But this query can be optimized (bool distributive):
select * from images im
where
hotel_id>#hot OR (
hotel_id=#hot and
(idImageType>#type or idImageType=#type)
and (idImageType>#type or photo_id>#photo
)
)
order by hotel_id, idImageType, photo_id limit 10;
that become:
select * from images im
where
hotel_id>#hot OR (
hotel_id=#hot and
idImageType>=#type
and (idImageType>#type or photo_id>#photo
)
)
order by hotel_id, idImageType, photo_id limit 10;
that become:
select * from images im
where
(hotel_id>#hot OR hotel_id=#hot) and
(hotel_id>#hot OR
(idImageType>=#type and (idImageType>#type or photo_id>#photo))
)
order by hotel_id, idImageType, photo_id limit 10;
that become:
select * from images im
where
hotel_id>=#hot and
(hotel_id>#hot OR
(idImageType>=#type and (idImageType>#type or photo_id>#photo))
)
order by hotel_id, idImageType, photo_id limit 10;
Are they the same data we can get by the limit?
To quick not exhaustive test do:
select im.* from images im inner join (
select photo_id from images order by hotel_id, idImageType, photo_id limit 20000000,10
) k
on im.photo_id=k.photo_id
order by im.hotel_id, im.idImageType, im.photo_id;
This take 6,56 sec and the data is the same that the query above.
So the test is positive.
In this solution you have to spend 6,73 sec only the first time you need to seek on first page to read (but if you need all you haven't).
To real all other page you need only 0,10 sec a very good result.
Thanks to rick to his hint on a solution based on store the last page read.
Conclusion
On solution 1 you haven't any extra field and take 3,5 sec on every page
On solution 2 you have extra field and need a big memory server (32 GB tested) in 150 sec. but then you read the page in 0,000 sec.
On solution 3 you haven't any extra field but have to store last page read pointer and if you do not start reading by the first page you have to spend 6,73 sec for first page. Then you spend only 0,10 sec on all the other pages.
Best regards
Edit 3
solution 3 is exactly that suggested by Rick. Im sorry, in my previous solution 3 I have do a mistake and when I coded the right solution then I have applied some boolean rule like distributive property and so on, and after all I get the same Rich solution!
regards
You can use some of this:
Update engine to InnoDB, it blocks only one row, not all the table on update.
Create #temp table with photo_id and good orderid and than update your table from this temp:
update images im, temp tp
set im.orderid = tp.orderid
where im.photo_id = tp.photo_id
it will be fastest way and when you fill your tmp table - you have no blocks on primary table.
You can drop indexes before mass update. After all your single update you have rebuilding of indexes and it has a long time.
KEY `hotel_id` (`hotel_id`),
KEY `hotel_id_idImageType` (`hotel_id`,`idImageType`),
DROP the former; the latter takes care of any need for it. (This won't speed up the original query.)
"The problem is the performance: hang for several minutes!" What is the problem?
Other queries are blocked for several minutes? (InnoDB should help.)
You run this update often and it is annoying? (Why in the world??)
Something else?
This one index is costly while doing the Update:
KEY `orderid` (`orderid`)
DROP it and re-create it. (Don't bother dropping the rest.) Another reason for going with InnoDB is that these operations can be done (in 5.6) without copying the table over. (21M rows == long time if it has to copy the table!)
Why are you building a second Unique index (orderid) in addition to photo_id, which is already Unique? I ask this because there may be another way to solve the real problem that does not involve this time-consuming Update.
I have two more concrete suggestions, but I want to here your answers first.
Edit Pagination, ordered by hotel_id, idImageType, photo_id:
It is possible to read the records in order by that triple. And even to "paginate" through them.
If you "left off" after ($hid, $type, $pid), here would be the 'next' 20 records:
WHERE hotel_id >= $hid
AND ( hotel_id > $hid
OR idImageType >= $type
AND ( idImageType > $type
OR photo_id > $pid
)
)
ORDER BY hotel_id, idImageType, photo_id
LIMIT 20
and have
INDEX(hotel_id, idImageType, photo_id)
This avoids the need for orderid and its time consuming Update.
It would be simpler to paginate one hotel_id at a time. Would that work?
Edit 2 -- eliminate downtime
Since you are reloading the entire table periodically, do this when you reload:
CREATE TABLE New with the recommended index changes.
Load the data into New. (Be sure to avoid your 51-minute timeout; I don't know what is causing that.)
RENAME TABLE images TO old, New TO images;
DROP TABLE old;
That will avoid blocking the table for the load and for the schema changes. There will be a very short block for the atomic Step #3.
Plan on doing this procedure each time you reload your data.
Another benefit -- After step #2, you can test the New data to see if it looks OK.

MySQL Query Optimization for large table

I have a very large database of images and i need to run an update to increment the view count on the images. every hour there are over one million unique rows to update. Right now it takes about an hour to run this query is there anyway to have this run faster?
i'm creating a memory table:
CREATE TABLE IF NOT EXISTS tmp_views_table (
key VARCHAR(7) NOT NULL,
views INT NOT NULL,
primary key ( `key` )
) ENGINE = MEMORY
Then I insert 1000 views at a time using a loop that runs until all the views have been inserted into the memory table:
insert low_priority into tmp_views_table
values ('key', 'count'),('key', 'count'),('key', 'count'), etc...
Then i run an update on the actual table like this:
update images, tmp_views_table
set images.views = images.views+tmp_views_table.views
where images.key = tmp_views_table.key
this last update is the one that is taking around an hour, the memory table stuff runs pretty quickly.
Is there a faster way that i can do this update?
Are you using Innodb, right? Try general tuning of mysql and innodb engine to allow for faster data changes.
I suppose you have an index on the key field of images table. You can try your update query also without index on the memory table - in that case the query optimizer should choose full table scan of the memory table.
I have never used joins with UPDATE statements, so I don't know exactly it is executed, but maybe the JOIN is taking too long. Maybe you can post an EXPLAIN result of that query.
Here is what I have used in one project to do the something similar - insert/update real-time data to temp table and merge it to aggregate table once a day, so can try if it will execute faster.
INSERT INTO st_views_agg (pageid,pagetype,day,count)
SELECT pageid,pagetype,DATE(`when`) AS day, COUNT(*) AS count FROM st_views_pending WHERE (pagetype=4) GROUP BY pageid,pagetype,day
ON DUPLICATE KEY UPDATE count=count+VALUES(count);

Optimisation of volatile data querying

I'm trying to solve a problem with latency on a to a mysql-5.0 db.
The query itself is extremely simple: SELECT SUM(items) FROM tbl WHERE col = 'val'
There's an index on col and there are not more than 10000 values to sum in the worst case (mean of count(items) for all values of col would be around 10).
The table has up to 2M rows.
The query is run frequently enough that sometimes the execution time goes up to 10s, although 99% of them take << 1s
The query is not really cachable - in almost every case, each query like this one will be followed by an insert to that table in the next minute and showing old values is out of question (billing information).
keys are good enough - ~100% hits
The result I'm looking for is every single query < 1s. Are there any ways to improve the select time without changes to the table? Alternatively, are there any interesting changes that would help to resolve the problem? I thought about simply having a table where the current sum is updated for every col right after every insert - but maybe there are better ways to do it?
Another approach is to add a summary table:
create table summary ( col varchar(10) primary key, items int not null );
and add some triggers to tbl so that:
on insert:
insert into summary values( new.col, new.items )
on duplicate key update set summary.items = summary.items + new.items;
on delete:
update summary set summary.items = summary.items - old.items where summary.col = old.col
on update:
update summary set summary.items = summary.items - old.items where summary.col = old.col;
update summary set summary.items = summary.items + new.items where summary.col = new.col;
This will slow down your inserts, but allow you to hit a single row in the summary table for
select items from summary where col = 'val';
The biggest problem with this is bootstrapping the values of the summary table. If you can take the application offline, you can easily initialise summary with values from tbl.
insert into summary select col, sum(items) from tbl group by col;
However, if you need to keep the service running, it is a lot more difficult. If you have a replica, you can stop replication, build the summary table, install the triggers, restart replication, then failover the service to using the replica, and then repeat the process on the retired primary.
If you cannot do that, then you could update the summary table one value of col at a time to reduce the impact:
lock table write tbl, summary;
delete from summary where col = 'val';
insert into summary select col, sum(items) from tbl where col = 'val';
unlock tables;
Or if you can tolerate a prolonged outage:
lock table write tbl, summary;
delete from summary;
insert into summary select col, sum(items) from tbl group by col;
unlock tables;
A covering index should help:
create index cix on tbl (col, items);
This will enable the sum to be performed without reading from the data file - which should be faster.
You should also track how effective your key-buffer is, and whether you need to allocate more memory for it. This can be done by polling the server status and watching the 'key%' values:
SHOW STATUS LIKE 'Key%';
MySQL Manual - show status
The ratio between key_read_requests (ie. the number of index lookups) versus key_reads (ie. number of requests that required index blocks to be read from disk) is important. The higher the number of disk reads, the slower the query will run. You can improvethis by increasing the keybuffer size in the config file.