I am in the process of setting up a a multi-server installation where some user data gets synchronized across servers depending on the user's priority rating (a number going from 0 to ...). The synchronization is intended to be done as a lazy background job with users with a higher priority seeing their data synchronized faster.
To that end I have a DBTable
CREATE TABLE IF NOT EXISTS `spiegel` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`priority` tinyint(1) NOT NULL,
`cql` varchar(4096) NOT NULL,
PRIMARY KEY (`id`),
KEY `priority` (`priority`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
where all non-zero priority users have their INSERT/DELETE/UPDATE transactions noted. A cron job periodically (every 10 minutes) runs a PHP script that takes a few (I want to limit this to N entries, say, 100) entries and deals with the process of synchronizing the databases on the other servers.
What I want to do is
Pickup a maximum N entries ordered by their priority
Removing those entries from the table
The first bit is ok
SELECT * FROM `spiegel` ORDER BY priority LIMIT 100;
This is where my knowledge of SQL is letting me down. It is not clear to me how I can efficiently remove the "picked up" entries. The best I have been able to do is to create a CSV list of the ids of the removed entries and then use a
DELETE FROM `spiegel` WHERE id in idcsvlist
However, I suspect that this is a long winded way of doing things. I'd be much obliged to anyone who might be able to suggest a better approach.
mySQL subquery limit prblm Juz Refer the link
Try thiz :
SELECT * FROM `spiegel` ORDER BY priority LIMIT 100;
Then
DELETE FROM spiegel WHERE id IN (select * from(SELECT id FROM `spiegel` ORDER BY priority LIMIT 100)as temp);
This is the typical way to do it (taken from https://stackoverflow.com/a/4041332/3565972)
DELETE FROM `spiegel` WHERE id IN (SELECT id FROM `spiegel` ORDER BY priority LIMIT 100);
A note for anyone running into this thread. In order to deliver the results I was after you need
ORDER BY priority DESC
The DESC bit was missing in the answers here.
Related
Given this table in MySQL 5.6:
create table PlayerSession
(
id bigint auto_increment primary key,
lastActivity datetime not null,
player_id bigint null,
...
constraint FK4410E05525A98981
foreign key (player_id) references Player (id)
)
How can it possibly be that this query returns about 2000 rows instantly:
SELECT * FROM PlayerSession
WHERE player_id = ....
ORDER BY lastActivity DESC
but adding LIMIT 1 makes it take 4 seconds, even though all that should do is pick the first result?
Using EXPLAIN I found the only difference to be that without the limit, filesort is used. From what I gather, this should make it slower, not faster. The whole table contains about 2M rows.
Also, adding LIMIT 3 or anything higher than that, gives the same performance as no limit.
And yes, I have since created an index on playerId, lastActivity, which, surprise surprise, makes it fast again. While that takes the immediate stress out of the situation (the server was rather overloaded), it doesn't really explain the mystery.
What specific version of 5.6? Please provide EXPLAIN FORMAT=JSON SELECT .... Please provide SHOW CREATE TABLE; we need to see the other indexes, plus datatypes.
INDEX(playerId, lastActivity) lets the query avoid "filesort".
A possible reason for the strange timings could be caching. Run each query twice to avoid that hiccup.
Performance problem on update MySql MyISAM big table making column ascending order based on an index on same table
My problem is that the server have only 4 GB memory.
I have to do an update query like this: previous asked question
Mine is this:
set #orderid = 0;
update images im
set im.orderid = (select #orderid := #orderid + 1)
ORDER BY im.hotel_id, im.idImageType;
On im.hotel_id, im.idImageType I have an ascending index.
On im.orderid I have an ascending index too.
The table have 21 millions records and is an MyIsam table.
The table is this:
CREATE TABLE `images` (
`photo_id` int(11) NOT NULL,
`idImageType` int(11) NOT NULL,
`hotel_id` int(11) NOT NULL,
`room_id` int(11) DEFAULT NULL,
`url_original` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
`url_max300` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
`url_square60` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
`archive` int(11) NOT NULL DEFAULT '0',
`orderid` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`photo_id`),
KEY `idImageType` (`idImageType`),
KEY `hotel_id` (`hotel_id`),
KEY `hotel_id_idImageType` (`hotel_id`,`idImageType`),
KEY `archive` (`archive`),
KEY `room_id` (`room_id`),
KEY `orderid` (`orderid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The problem is the performance: hang for several minutes!
Server disk go busy too.
My question is: there is a better manner to achieve the same result?
Have I to partition the table or something else to increase the performance?
I cannot modify server hardware but can tuning MySql application db server settings.
best regards
Tanks to every body. Yours answers help me much. I think that now I have found a better solution.
This problem involve in two critical issue:
efficient paginate on large table
update large table.
To go on efficient paginate on large table I have found a solution by make a previous update on the table but doing so I fall in issues on the 51 minute time needed to the updates and consequent my java infrastructure time out (spring-batch step).
Now by yours help, I found two solution to paginate on large table, and one solution to update large table.
To reach this performance the server need memory. I try this solution on develop server using 32 GB memory.
common solution step
To paginate follow a fields tupla like I needed I have make one index:
KEY `hotel_id_idImageType` (`hotel_id`,`idImageType`)
to achieve the new solution we have to change this index by add the primary key part to the index tail KEY hotel_id_idImageType (hotel_id,idImageType, primary key fields):
drop index hotel_id_idImageType on images;
create index hotelTypePhoto on images (hotel_id, idImageType, photo_id);
This is needed to avoid touch table and use only the index file ...
Suppose we want the 10 records after the 19000000 record.
The decimal point is this , in this answers
solution 1
This solution is very practice and not needed the extra field orderid and you have not to do any update before the pagination:
select * from images im inner join
(select photo_id from images
order by hotel_id, idImageType, photo_id
limit 19000000,10) k
on im.photo_id = k.photo_id;
To make the table k on my 21 million table records need only 1,5 sec because it use only the three field in index hotelTypePhoto so haven't to access to the table file and work only on index file.
The order was like the original required (hotel_id, idImageType) because is included in (hotel_id, idImageType, photo_id): same subset...
The join take no time so every first time the paginate is executed on the same page need only 1,5 sec and this is a good time if you have to execute it in a batch one on 3 months.
On production server using 4 GB memory the same query take 3,5 sec.
Partitioning the table do not help to improve performance.
If the server take it in cache the time go down or if you do a jdbc params statment the time go down too (I suppose).
If you have to use it often, it have the advantage that it do not care if the data change.
solution 2
This solution need the extra field orderid and need to do the orderid update one time by batch import and the data have not to change until the next batch import.
Then you can paginate on the table in 0,000 sec.
set #orderid = 0;
update images im inner join (
select photo_id, (#orderid := #orderid + 1) as newOrder
from images order by hotel_id, idImageType, photo_id
) k
on im.photo_id = k.photo_id
set im.orderid = k.newOrder;
The table k is fast almost like in the first solution.
This all update take only 150,551 sec much better than 51 minute!!! (150s vs 3060s)
After this update in the batch you can do the paginate by:
select * from images im where orderid between 19000000 and 19000010;
or better
select * from images im where orderid >= 19000000 and orderid< 19000010;
this take 0,000sec to execute first time and all other time.
Edit after Rick comment
Solution 3
This solution is to avoid extra fields and offset use. But need too take memory of the last page read like in this solution
This is a fast solution and can work on online server production using only 4GB memory
Suppose you need to read last ten records after 20000000.
There is two scenario to take care:
You can start read it from the first to the 20000000 if you need all of it like me and update some variable to take memory of last page read.
you have to read only the last 10 after 20000000.
In this second scenario you have to do a pre query to find the start page:
select hotel_id, idImageType, photo_id
from images im
order by hotel_id, idImageType, photo_id limit 20000000,1
It give to me:
+----------+-------------+----------+
| hotel_id | idImageType | photo_id |
+----------+-------------+----------+
| 1309878 | 4 | 43259857 |
+----------+-------------+----------+
This take 6,73 sec.
So you can store this values in variable to next use.
Suppose we named #hot=1309878, #type=4, #photo=43259857
Then you can use it in a second query like this:
select * from images im
where
hotel_id>#hot OR (
hotel_id=#hot and idImageType>#type OR (
idImageType=#type and photo_id>#photo
)
)
order by hotel_id, idImageType, photo_id limit 10;
The first clause hotel_id>#hot take all records after the actual first field on scrolling index but lost some record. To take it we have to do the OR clause that take on the first index field all remained unread records.
This take only 0,10 sec now.
But this query can be optimized (bool distributive):
select * from images im
where
hotel_id>#hot OR (
hotel_id=#hot and
(idImageType>#type or idImageType=#type)
and (idImageType>#type or photo_id>#photo
)
)
order by hotel_id, idImageType, photo_id limit 10;
that become:
select * from images im
where
hotel_id>#hot OR (
hotel_id=#hot and
idImageType>=#type
and (idImageType>#type or photo_id>#photo
)
)
order by hotel_id, idImageType, photo_id limit 10;
that become:
select * from images im
where
(hotel_id>#hot OR hotel_id=#hot) and
(hotel_id>#hot OR
(idImageType>=#type and (idImageType>#type or photo_id>#photo))
)
order by hotel_id, idImageType, photo_id limit 10;
that become:
select * from images im
where
hotel_id>=#hot and
(hotel_id>#hot OR
(idImageType>=#type and (idImageType>#type or photo_id>#photo))
)
order by hotel_id, idImageType, photo_id limit 10;
Are they the same data we can get by the limit?
To quick not exhaustive test do:
select im.* from images im inner join (
select photo_id from images order by hotel_id, idImageType, photo_id limit 20000000,10
) k
on im.photo_id=k.photo_id
order by im.hotel_id, im.idImageType, im.photo_id;
This take 6,56 sec and the data is the same that the query above.
So the test is positive.
In this solution you have to spend 6,73 sec only the first time you need to seek on first page to read (but if you need all you haven't).
To real all other page you need only 0,10 sec a very good result.
Thanks to rick to his hint on a solution based on store the last page read.
Conclusion
On solution 1 you haven't any extra field and take 3,5 sec on every page
On solution 2 you have extra field and need a big memory server (32 GB tested) in 150 sec. but then you read the page in 0,000 sec.
On solution 3 you haven't any extra field but have to store last page read pointer and if you do not start reading by the first page you have to spend 6,73 sec for first page. Then you spend only 0,10 sec on all the other pages.
Best regards
Edit 3
solution 3 is exactly that suggested by Rick. Im sorry, in my previous solution 3 I have do a mistake and when I coded the right solution then I have applied some boolean rule like distributive property and so on, and after all I get the same Rich solution!
regards
You can use some of this:
Update engine to InnoDB, it blocks only one row, not all the table on update.
Create #temp table with photo_id and good orderid and than update your table from this temp:
update images im, temp tp
set im.orderid = tp.orderid
where im.photo_id = tp.photo_id
it will be fastest way and when you fill your tmp table - you have no blocks on primary table.
You can drop indexes before mass update. After all your single update you have rebuilding of indexes and it has a long time.
KEY `hotel_id` (`hotel_id`),
KEY `hotel_id_idImageType` (`hotel_id`,`idImageType`),
DROP the former; the latter takes care of any need for it. (This won't speed up the original query.)
"The problem is the performance: hang for several minutes!" What is the problem?
Other queries are blocked for several minutes? (InnoDB should help.)
You run this update often and it is annoying? (Why in the world??)
Something else?
This one index is costly while doing the Update:
KEY `orderid` (`orderid`)
DROP it and re-create it. (Don't bother dropping the rest.) Another reason for going with InnoDB is that these operations can be done (in 5.6) without copying the table over. (21M rows == long time if it has to copy the table!)
Why are you building a second Unique index (orderid) in addition to photo_id, which is already Unique? I ask this because there may be another way to solve the real problem that does not involve this time-consuming Update.
I have two more concrete suggestions, but I want to here your answers first.
Edit Pagination, ordered by hotel_id, idImageType, photo_id:
It is possible to read the records in order by that triple. And even to "paginate" through them.
If you "left off" after ($hid, $type, $pid), here would be the 'next' 20 records:
WHERE hotel_id >= $hid
AND ( hotel_id > $hid
OR idImageType >= $type
AND ( idImageType > $type
OR photo_id > $pid
)
)
ORDER BY hotel_id, idImageType, photo_id
LIMIT 20
and have
INDEX(hotel_id, idImageType, photo_id)
This avoids the need for orderid and its time consuming Update.
It would be simpler to paginate one hotel_id at a time. Would that work?
Edit 2 -- eliminate downtime
Since you are reloading the entire table periodically, do this when you reload:
CREATE TABLE New with the recommended index changes.
Load the data into New. (Be sure to avoid your 51-minute timeout; I don't know what is causing that.)
RENAME TABLE images TO old, New TO images;
DROP TABLE old;
That will avoid blocking the table for the load and for the schema changes. There will be a very short block for the atomic Step #3.
Plan on doing this procedure each time you reload your data.
Another benefit -- After step #2, you can test the New data to see if it looks OK.
I need to store game user profiles in mysql innodb table. But there can be million of them, and profile record consists of 200+ int32 values and several others for in-game data. And I need to evaluate ranking of user for each those 200+ ints.
i plan to make it this way (for 121 level for example ):
select count(distinct(val121)) from ldbrd where
val121<$current_usr_val order by val121 desc
or using helper table {id, maxscore, minscore, count} to boost up speed.
I see two problems here:
table has 200+ fields of int. is it ok? Will such table be fast?
sql has limitation 64 indexes, so I cannot just make an index for each int field, but without indexes such search will be slow.
But I'm pretty sure that this task is common. I mean there are many games with hundreds of levels - candy crush is most known I think. So how do people solve it? Thanks!
---update
I've decided to write profiles in binary fields (all those ints in one field) in one table and create another table for ranking evaluation: {id, user_id, level_no, score, timestamp}. But if we have 100000 users and 200 levels we get 20M rows in this table.
Yes, all fields have indexes, but it's too much for VDS with 2 cores 2.4Ghz and 2 Gb RAM. For example, query:
SELECT COUNT(*) FROM leaderboard WHERE level_no = 153
AND score < 10000 ORDER BY score DESC
finished in 19 seconds! It's too much, moreover there will be many requests per second and every will need rank evaluation.
Also I have a thought to store a table for each level, maybe it will be faster this way (at least they will be much smaller).
Another thought - to generate say every hour top 100 for each level. It's pretty simple and fast:
SELECT score FROM leaderboard WHERE
level_no = 156 ORDER BY score DESC LIMIT 100
But every one wants to know exactly his ranking. We can add 'rank' field however ranking evaluation in 20M row table will take very long time, server will be busy for others during this query, I'd like to avoid it.
How do people usually do such things?
-----update 2
table schema:
CREATE TABLE IF NOT EXISTS `leaderboard` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`store_user_id` char(64) NOT NULL,
`level_no` int(11) NOT NULL,
`score` int(11) NOT NULL,
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `store_user_id` (`store_user_id`),
KEY `level_no` (`level_no`),
KEY `score` (`score`),
KEY `timestamp` (`timestamp`),
KEY `lev_sc` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (id)
PARTITIONS 10 */ AUTO_INCREMENT=19999831 ;
I've finally partitioned the table and added (level_no, score) key. Was 19 sec, now: 0,017s. Pretty cool!
Still there are questions:
should I partition table in production? I heard people had problems with partitioning.
what's the best option - to have 500 tables - one per level, or one table 500 times bigger like I have now? Table per level work a little bit faster (0,048s vs 0,072s), but it's as long as I have several tables. Will 500+ tables work with the same speed?
i have a table similar to below, which has GUID as the key.
i am trying to display the content of tis using paging which has GUID as key, but running into issue of how do i do that?
CREATE TABLE `planetgeni`.`PostComment` (
`PostCommentId` CHAR(36) DEFAULT NULL,
`UserId` INT NOT NULL,
`CreatedAt` DATETIME NULL DEFAULT NULL ,
.
.
.
PRIMARY KEY (`PostCommentId`)
)
ENGINE=InnoDB DEFAULT CHARSET=latin1;
if it was a Int key my Stored procedure would look something like this , giving me next 10 order by desc. But with GUID not sure how to do that type of paging.
getPostComment( int lastPostID)
where PostCommentId< lastPostID order by PostCommentId desc LIMIT 10;
You can still do this with GUID's, but since GUID's are pseudorandom, when you ORDER BY postcommentid the order probably won't be what you want. You probably want something in approximately chronological order, and as you sort by the random GUID, the order will be repeatable, but random.
As #James comments, you could use another column for the sort order, but that column would need to be unique, or else you would either miss some duplicate rows (if you use >) or repeat values on the next page (if you use >=).
You'll just have to use LIMIT with OFFSET. MySQL optimizes LIMIT queries, so it quits examining rows once it finds the rows it needs for the page. But it also must examine all the preceding rows, so the query gets more expensive as you advance through higher-numbered pages.
A couple of ways to mitigate this:
Don't let your users view higher-numbered pages. Definitely don't give them a direct link to the "Last" page. Just give them a link to the "Next" page and hope they give up searching before they advance so far that the queries become very costly.
Fetch more than one page at a time, and cache it. For instance, instead of LIMIT 10, you could LIMIT 70 and then keep the results in memcached or something. Use application code to present 10 rows at a time, until the user advances through that set of rows. Then only if they go on to the 8th page, run another SQL query. Users typically don't search through more than a few pages, so the chance you'll have to run a second or a third query become very small.
Change column by which You are using in 'order by'.
getPostComment( int lastPostID)
where PostCommentId< lastPostID order by CreatedAt,UserId desc LIMIT 10;
The following query is pretty simple. It selects the last 20 records from a messages table for use in a paging scenario. The first time this query is run, it takes from 15 to 30 seconds. Subsequent runs take less than a second (I expect some caching is involved). I am trying to determine why the first time takes so long.
Here's the query:
SELECT DISTINCT ID,List,`From`,Subject, UNIX_TIMESTAMP(MsgDate) AS FmtDate
FROM messages
WHERE List='general'
ORDER BY MsgDate
LIMIT 17290,20;
MySQL version: 4.0.26-log
Here's the table:
messages CREATE TABLE `messages` (
`ID` int(10) unsigned NOT NULL auto_increment,
`List` varchar(10) NOT NULL default '',
`MessageId` varchar(128) NOT NULL default '',
`From` varchar(128) NOT NULL default '',
`Subject` varchar(128) NOT NULL default '',
`MsgDate` datetime NOT NULL default '0000-00-00 00:00:00',
`TextBody` longtext NOT NULL,
`HtmlBody` longtext NOT NULL,
`Headers` text NOT NULL,
`UserID` int(10) unsigned default NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `List` (`List`,`MsgDate`,`MessageId`),
KEY `From` (`From`),
KEY `UserID` (`UserID`,`List`,`MsgDate`),
KEY `MsgDate` (`MsgDate`),
KEY `ListOnly` (`List`)
) TYPE=MyISAM ROW_FORMAT=DYNAMIC
Here's the explain:
table type possible_keys key key_len ref rows Extra
------ ------ ------------- -------- ------- ------ ------ --------------------------------------------
m ref List,ListOnly ListOnly 10 const 18002 Using where; Using temporary; Using filesort
Why is it using a filesort when I have indexes on all the relevant columns? I added the ListOnly index just to see if it would help. I had originally thought that the List index would handle both the list selection and the sorting on MsgDate, but it didn't. Now that I added the ListOnly index, that's the one it uses, but it still does a filesort on MsgDate, which is what I suspect is taking so long.
I tried using FORCE INDEX as follows:
SELECT DISTINCT ID,List,`From`,Subject, UNIX_TIMESTAMP(MsgDate) AS FmtDate
FROM messages
FORCE INDEX (List)
WHERE List='general'
ORDER BY MsgDate
LIMIT 17290,20;
This does seem to force MySQL to use the index, but it doesn't speed up the query at all.
Here's the explain for this query:
table type possible_keys key key_len ref rows Extra
------ ------ ------------- ------ ------- ------ ------ ----------------------------
m ref List List 10 const 18002 Using where; Using temporary
UPDATES:
I removed DISTINCT from the query. It didn't help performance at all.
I removed the UNIX_TIMESTAMP call. It also didn't affect performance.
I made a special case in my PHP code so that if I detect the user is looking at the last page of results, I add a WHERE clause that returns only the last 7 days of results:
SELECT m.ID,List,From,Subject,MsgDate
FROM messages
WHERE MsgDate>='2009-11-15'
ORDER BY MsgDate DESC
LIMIT 20
This is a lot faster. However, as soon as I navigate to another page of results, it must use the old SQL and takes a very long time to execute. I can't think of a practical, realistic way to do this for all pages. Also, doing this special case makes my PHP code more complex.
Strangely, only the first time the original query is run takes a long time. Subsequent runs of either the same query or a query showing a different page of results (i.e., only the LIMIT clause changes) are very fast. The query slows down again if it has not been run for about 5 minutes.
SOLUTION:
The best solution I came up with is based on Jason Orendorff and Juliet's idea.
First, I determine if the current page is closer to the beginning or end of the total number of pages. If it's closer to the end, I use ORDER BY MsgDate DESC, apply an appropriate limit, then reverse the order of the returned records.
This makes retrieving pages close to the beginning or end of the resultset much faster (first time now takes 4-5 seconds instead of 15-30). If the user wants to navigate to a page near the middle (currently around the 430th page), then the speed might drop back down. But that would be a rare case.
So while there seems to be no perfect solution, this is much better than it was for most cases.
Thank you, Jason and Juliet.
Instead of ORDER BY MsgDate LIMIT 17290,20, try ORDER BY MsgDate DESC LIMIT 20.
Of course the results will come out in the reverse order, but that should be easy to deal with.
EDIT: Do your MessageId values always increase with time? Are they unique?
If so, I would make an index:
UNIQUE KEY `ListMsgId` ( `List`, `MessageId` )
and query based on the message ids rather than the date when possible.
-- Most recent messages (in reverse order)
SELECT * FROM messages
WHERE List = 'general'
ORDER BY MessageId DESC
LIMIT 20
-- Previous page (in reverse order)
SELECT * FROM messages
WHERE List = 'general' AND MessageId < '15885830'
ORDER BY MessageId DESC
LIMIT 20
-- Next page
SELECT * FROM messages
WHERE List = 'general' AND MessageId > '15885829'
ORDER BY MessageId
LIMIT 20
I think you're also paying for having varchar columns where an int type would be a lot faster. For example, List could instead be a ListId that points to an entry in a separate table. You might want to try it out in a test database to see if that's really true; I'm not a MySQL expert.
You can drop the ListOnly key. The compound index List already contains all the information in it.
Your EXPLAIN for the List-indexed query looks much better, lacking the filesort. You may be able to get better real performance out of it by swapping the ORDER as suggested by Jason, and maybe losing the UNIX_TIMESTAMP call (you can do that in the application layer, or just use Unix timestamps stored as INTEGER in the schema).
What version of my SQL are you using? Some of the older versions used the LIMIT clause as a post-process filter (meaning get all the record requested from the server, but only display the 20 you requested back).
You can see from your explain, 18002 rows are coming back, even though you are only showing 20 of them. Is there any way to adjust your selection criteria to identify the 20 rows you want to return, rather than getting 18000 rows back and only showing 20 of them???