I recently switched servers because Joyent is ending their service soon. But the queries on rimuhosting seem to take significantly longer (2-6 times). But there's a huge variance in behavior: most queries run .02 seconds or less and then sometimes those same exact queries take .5 seconds or more. Both servers were running mySQL and PHP (similar version but not the same exact number).
The server load is 20-40% Idle for the CPU. Most of the memory is being used, but they tell me that's normal. The tech support tells me it's not swapping:
Here's what it looks like right now: (though memory usage will increase to near max eventually, like last time)
Mem: 1513548k total, 1229316k used, 284232k free, 63540k buffers
Swap: 131064k total, 0k used, 131064k free, 981420k cached
SQL max connections is set to 400.
So, why am i getting these super slow queries, sometimes?
Here is an example of a query that is sometimes .01 second and sometimes greater then 1 second:
SELECT (!attacked AND (firstLoginDate > 1348703469 )) as protected,
id, universe.uid, universe.name AS obj_name,top,left, guilds.name as alliance,
rotate,what, player_data.first, player_data.last,
aid AS gid, (aid=1892 AND aid>0) as am,
fleet LIKE '%Turret%' AS turret,
startLeft, startTop, endLeft, endTop, duration, startTime, movetype,
moving,speed, defend, hp, lastAttack>1349740269 AS ra FROM universe LEFT JOIN player_data ON universe.uid=player_data.uid
LEFT JOIN guilds ON aid=guilds.gid
WHERE ( sector='21_82' OR sector='22_82' OR sector='21_83' OR sector='22_83' ) OR
( universe.uid=1568425485 AND ( upgrading=1 OR building=1 ))
Yes, I do have indexes on all the appropriate columns. And all 3 tables featured above are InnoDB tables, which means they are only row locked, not table locked.
but this is interesting: (new server)
Innodb_row_lock_time_avg 400 The average time to acquire a row lock, in milliseconds.
Innodb_row_lock_time_max 4,010 The maximum time to acquire a row lock, in milliseconds.
Innodb_row_lock_waits 31 The number of times a row lock had to be waited for.
why does it take so long to get a row lock?
my old server was able to get the row lock faster:
Innodb_row_lock_time_avg 26 The average time to acquire a row lock, in milliseconds.
Here's the new server:
Opened_tables 5,500 (in just 2 hours) The number of tables that have been opened. If opened tables is big, your table cache value is probably too small.
table cache 256
table locks waited 3,302 (in just 2 hours)
Here is the old server:
Opened_tables 420
table cache 64
Does that makes sense? If I increase the table Cache will that alleviate things?
Note: i have 1.5 GB on this server
Here is the explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE universe index_merge uidwhat,uid,uidtopleft,upgrading,building,sector sector,uid 38,8 NULL 116 Using sort_union(sector,uid); Using where
1 SIMPLE player_data ref mainIndex mainIndex 8 jill_sp.universe.uid 1
1 SIMPLE guilds eq_ref PRIMARY PRIMARY 8 jill_sp.player_data.aid 1
Related
I have to create a table that assigns an user id and a product id to some data (models 2 one to many relationships). I will make a lot of queries like
select * from table where userid = x;
The first thing that I am interested is how big should the table get before the query starts to be observable (let's say it takes more than 1 second).
Also, how this can be optimised?
I know that this might depend on the implementation. I will use mysql for this specific project, but I am interested in more general answers as well.
It all depends on the horse power of your machine. The make that query more efficient, create an index with "userid"
how big should the table get before the query starts to be observable (let's say it takes more than 1 second)
There are too many factors to deterministically measure run time. CPU speed, memory, I/O speed, etc. are just some of the external factors.
how this can be optimized?
That's more staightforward. If there is an index on userid then the query will likely to an index seek which is about as fast as you can get as far as finding the record. If the userid is a clustered index then it will be faster because it won't have to use the position from the index to find the record in data pages - the data is physically organized as part of the index.
let's say it takes more than 1 second
With an index on userid, Mysql will manage to find the correct row in (worst case) Oh (log n). In "seconds" it now depends on the performance of your machine.
It is impossible to give you an exact number, without considering how long one operation takes.
As an Example: Assuming you have a database with 4 records. This requires 2 operations worst case. Any time, you double your data, one more operation is required.
for example:
# records | # operations to find entry in worst case
2 1
4 2
8 3
16 4
...
4096 12
...
~1 B 30
~2 B 31
So, with a huge amount of records - time almost remains constant. For 1 Billion records, you would need to perform ~ 30 operations.
And like that it continues: 2 Billion records, 31 operations.
so, let's say your query executes in 0.001 second for 4096 entries (12 ops)
it would take arround (0.001 / 12 * 30) 0.0025 seconds for 1 Billion records.
Heavy Sidenode: this is just considering the runtime complexity of the binary search, but it shows how the complexity would scale.
In a nutshell: Your database would be unimpressed by a single query on an indexed value. However, if you run a heavy amount of those queries at the same time, time increases ofc.
I have a table with a few million rows. Currently, I'm working my way through them 10,000 at a time by doing this:
for (my $ival = 0; $ival < $c_count; $ival += 10000)
{
my %record;
my $qry = $dbh->prepare
( "select * from big_table where address not like '%-XX%' limit $ival, 10000");
$qry->execute();
$qry->bind_columns( \(#record{ #{$qry->{NAME_lc} } } ) );
while (my $record = $qry->fetch){
this_is_where_the_magic_happens($record)
}
}
I did some benchmarking and I found that the prepare/execute part, while initially fast, slows down considerably after multiple 10,000 row batch. Is this a boneheaded way to write this? I just know if I try to select everything in one go, this query takes forever.
Here's some snippets from the log:
(Thu Aug 21 12:51:59 2014) Processing records 0 to 10000
SQL Select => 1 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)
(Thu Aug 21 12:52:13 2014) Processing records 10000 to 20000
SQL Select => 1 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
(Thu Aug 21 12:52:25 2014) Processing records 20000 to 30000
SQL Select => 2 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
(Thu Aug 21 12:52:40 2014) Processing records 30000 to 40000
SQL Select => 5 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
(Thu Aug 21 12:52:57 2014) Processing records 40000 to 50000
SQL Select => 13 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)
...
(Thu Aug 21 14:33:19 2014) Processing records 650000 to 660000
SQL Select => 134 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)
(Thu Aug 21 14:35:50 2014) Processing records 660000 to 670000
SQL Select => 138 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
(Thu Aug 21 14:38:27 2014) Processing records 670000 to 680000
SQL Select => 137 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
(Thu Aug 21 14:41:00 2014) Processing records 680000 to 690000
SQL Select => 134 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Would it be faster to do some other way? Should I remove the 'where' clause and just throw out results I don't want in the loop?
Thanks for the help.
Others have made useful suggestions. I'll just add a few thoughts that come to mind...
Firstly, see my old but still very relevant Advanced DBI Tutorial. Specifically page 80 which addresses paging through a large result set, which is similar to your situation. It also covers profiling and fetchrow_hashref vs bind_columns.
Consider creating a temporary table with an auto increment field, loading it with the data you want via an INSERT ... SELECT ... statement, then building/enabling an index on the auto increment field (which will be faster than loading the data with the index already enabled), then select ranges of rows from that temporary table using the key value. That will be very fast for fetching but there's an up-front cost to build the temporary table.
Consider enabling mysql_use_result in DBD::mysql. Rather than load all the rows into memory within the driver, the driver will start to return rows to the application as they stream in from the server. This reduces latency and memory use but comes at the cost of holding a lock on the table.
You could combine using mysql_use_result with my previous suggestion, but it might be simpler to combine it with using SELECT SQL_BUFFER_RESULT .... Both would avoid the lock problem (which might not be a problem for you anyway). Per the docs, SQL_BUFFER_RESULT "forces the result to be put into a temporary table". (Trivia: I think I suggested SQL_BUFFER_RESULT to Monty many moons ago.)
The problem is you're running multiple queries. Your dataset may also change between queries - you may miss rows or see duplicate rows since you're running multiple queries; inserts or deletions on the items you're searching will affect this.
The reason the first ones go fast is because the DB is truncating the query when it hits 10,000 items. It's not getting all the rows matching your query, and thus running faster. It's not 'getting slower', just doing more of the work, over and over and over - getting the first 10,000 rows, getting the first 20,000 rows, the first 30,000 rows. You've written a Schlemiel the painter's database query. (http://www.joelonsoftware.com/articles/fog0000000319.html)
You should run the query without a limit and iterate over the resultset. This will ensure data integrity. You may also want to look into using where clauses that can take advantage of database indices to get a faster response to your query.
What #Oesor says is correct, you don't want to run multiple queries (unless you know you are the only one that can modify this table).
However, you have other issues.
You don't have an ORDER BY clause. Without that, your LIMIT is meaningless since you won't necessarily get the same order each time.
Consider using LIMIT n OFFSET m rather than LIMIT m,n - it's supported by PostgreSQL and is clearer to users of other database systems.
Decide whether you are using bind_columns or returning a row reference - I can't see why you are trying to do both. Perhaps fetchrow_hashref is what you want.
Oh - be particularly careful of using bind_colunmns with SELECT *. What happens if you add a new column to the table? What if that new column is called ival?
OK, now let's look at what you're doing. It's not obvious actually, since ...magic_happens isn't a terribly descriptive name. If it's updates, then try to do this all in the database. MySQL isn't as capable as PostgreSQL, but you're still better off doing things like batch updates within the RDBMS rather than shuffling large amounts back and fore.
If not, and you want to batch or "page" the result-set then:
1. Order by primary-key (or some other unique colum-set)
2. Keep track of the final key in that batch
3. Use that in a "greater than" test in the query statement.
This will allow you to use an index (if you have one) on the relevant unique columns and should let the database skip forward to row #30000 without having to read and discard 29,999 other rows first.
According to your benchmarking numbers, the CPU times are very small, so you need a profile on your DBI ayer. Try to run your code to collect that statistics with DBI::Profile.
You probably need to define an index on your table to avoid full scan for that query.
I am running mySQL queries using the heidiSQL editor. When it tells me my query time it will sometimes also include a network time:
Duration for 1 query: 1.194 sec. (+ 10.078 sec. network)
But it can't really be the network since everything is on my own computer?? Is that extra time something that would disappear with another setup or do I need to improve my query performance the usual way (rewriting/reworking)? It's hard for me to improve performance on a query when I'm not even sure what's causing the poor performance.
EDIT: Profiling info
I used this neat profiling sql: http://www.mysqlperformanceblog.com/2012/02/20/how-to-convert-show-profiles-into-a-real-profile/
Query 1:
Select count(*) from my_table_with_100_thousand_rows;
"Duration for 1 query: 0.390 sec."
(This one did not show any network time, but almost .4 seconds for a simple count(*) seems a lot.)
STATE Total_R Pct_R Calls R/Call
Sending data 0.392060 35.84 1 0.3920600000
freeing items 0.000214 0.02 1 0.0002140000
starting 0.000070 0.01 1 0.0000700000
Opening tables 0.000031 0.00 1 0.0000310000
statistics 0.000024 0.00 1 0.0000240000
init 0.000020 0.00 1 0.0000200000
(shorter times not included)
Query 2:
select * from 4 tables with many rows, joined by primary_key-foreign_key or indexed column.
"Duration for 1 query: 0.156 sec. (+ 10.140 sec. network)" (the times below add up to more than the total?)
STATE Total_R Pct_R Calls R/Call
Sending data 16.424433 NULL 1 16.4244330000
freeing items 0.000390 NULL 1 0.0003900000
starting 0.000116 NULL 1 0.0001160000
statistics 0.000054 NULL 1 0.0000540000
Opening tables 0.000050 NULL 1 0.0000500000
init 0.000046 NULL 1 0.0000460000
preparing 0.000033 NULL 1 0.0000330000
optimizing 0.000028 NULL 1 0.0000280000
(shorter times not included)
Query 3:
Same as query 2 but with count * instead of select *
"Duration for 1 query: 10.047 sec."
STATE Total_R Pct_R Calls R/Call
Sending data 10.050007 NULL 1 10.0500070000
(shorter times not included)
It seems to me that it includes network time in the "duration" if it has to display a lot of rows, but this does NOT mean that I can subtract this time if it doesn't have to display the rows. It's real query time. Does this seem right?
Old question!
I'm pretty sure Heidi counts as "network time" the elapsed time --
from receipt of the first response packet over the network
to receipt of the last response packet in the result set.
So, for your SELECT COUNT(*) FROM big _f_table query the first packet comes back right away, and declares that there's a single column containing an integer.
The rest of that result set comes when the query engine is done counting the rows. So Heidi's so-called "network time" is the time to count the rows. That's practically instantaneous for MyISAM, and takes a while for InnoDB.
For your SELECT tons of columns FROM complex join the same thing applies. The first packet arrives when the query planner has figured out what columns will be in the result set. The last packet arrives when all that data has finally been transferred to Heidi over your computer's internal loopback (localhost) network.
It's like what you see in your browser devtools. The query time is analogous to the "time to first byte", and the "network time" is the time to deliver the rest of the result. Time to first byte is the query parsing / planning time PLUS the time to get enough information to send something for the result set metadata. Network time is the time to get the rest. If the query planner can stream the rows to you directly from the table storage you'll have a high proportion of network time. If, on the other hand, it has to crunch the data (for example with ORDER BY) you'll have a higher proportion of query time. But don't try to overthink this stuff. MariaDB and MySQL are very complex, with layers of caching and fetching. The way they satisfy queries is sometimes hard to figure out.
I have a table created like this:
CREATE TABLE rh857_omf.picture(MeasNr TINYINT UNSIGNED, ExperimentNr TINYINT UNSIGNED,
Time INT, SequenceNr SMALLINT UNSIGNED, Picture MEDIUMBLOB, PRIMARY KEY(MeasNr,
ExperimentNr, Time, SequenceNr));
The first four rows MeasNR, ExperimentNr, Time and SequenceNr are the identifiers and are set as primary key. The fifth row, Picture, is the payload. Its an 800x800 Pixel 8Bit grey value picture (Size = 625 kBytes).
If I want to load a picture, I use the following command:
SELECT Picture FROM rhunkn1_omf.picture WHERE MeasNr = 2 AND
ExperimentNr = 3 AND SequenceNr = 150;
In the MySQL workbench, I see the duration and the fetch time if I run this command! For smaller tables (800 MBytes, 2376 entries, picture 640x480), its very fast (<100ms). If I take a bigger table (5800 MBytesm, 9024 entries), it gets very slow (>9s).
For instance, I run the following command (on the big table):
SELECT Picture FROM rhunkn1_omf.picture WHERE MeasNr = 2 AND
ExperimentNr = 3 AND SequenceNr = 1025 LIMIT 0, 1000;
the first time it takes 5.2 / 3.9 seconds (duration / fetch). The same command for the second time takes 0.2 / 0.2 seconds. If I change the SequenceNr
SELECT Picture FROM rhunkn1_omf.picture WHERE MeasNr = 2 AND
ExperimentNr = 3 AND SequenceNr = 977 LIMIT 0, 1000;
its also very fast 0.1 / 0.3 seconds
But if I change the the ExperimentNr, for instance
SELECT Picture FROM rhunkn1_omf.picture WHERE MeasNr = 2
AND ExperimentNr = 4 AND SequenceNr = 1025 LIMIT 0, 1000;
it takes long time 4.4 / 5.9 seconds.
Does anybody know why the database behaves like that and how I could improve the speed? Does it help if I create several smaller picture tables and split the load for each table? By the way, I use MySQL 5.1.62 and the MyISAM tables, but I also tested InnoDB which was even slower.
It would help if you could post the EXPLAIN for the query - mostly, the answers are in there (somewhere).
However, at a guess, I'd explain this behaviour by the fact your primary key includes TIME, and your queries don't; therefore, they may make only partial use of the index. I'd guess the query plan uses the index to filter out records in the MEASNR and ExperimentNr range, and then scans for matching sequenceNrs. If there are many records which match the first two criteria, that could be quite slow.
The reason you see a speed up second time round is that the queries get cached; this is not hugely predictable, depending on load, cache size etc.
Try creating an index which matches your "where" clause, and see EXPLAIN tells you.
This mysql query is running for around 10 hours and has not finished. Something is horribly wrong.
Two tables (text and spam) are here. Spam stores the ids of spam entrys in text that I want to delete.
DELETE FROM tname.text WHERE old_id IN (SELECT textid FROM spam);
spam has just 2 columns, both are ints. 800K entries has a file size of several Mbs. Both ints are primary keys.
text has 3 columns. id (prim key), text, flags. around 1200K entries, and around 2.1 gigabyte size (most spam).
The server is a xeon quad, 2 gigabyte ram (don't ask me why). Only apache (why?) and mysqld is running. Its an old free bsd and mysql 4.1.2 (don't ask me why)
Threads: 6 Questions: 188805 Slow queries: 318 Opens: 810 Flush tables: 1 Open tables: 157 Queries per second avg: 7.532
Mysql my.cnf:
[mysqld]
datadir=/usr/local/mysql
log-error=/usr/local/mysql/mysqld.err
pid-file=/usr/local/mysql/mysqld.pid
tmpdir=/var/tmp
innodb_data_home_dir =
innodb_log_files_in_group = 2
join_buffer_size=2M
key_buffer_size=32M
max_allowed_packet=1M
max_connections=800
myisam_sort_buffer_size=32M
query_cache_size=8M
read_buffer_size=2M
sort_buffer_size=2M
table_cache=256
skip-bdb
log-slow-queries = slow.log
long_query_time = 1
#skip-innodb
#default-table-type=innodb
innodb_data_file_path = /usr/local/mysql/ibdata1:10M:autoextend
innodb_log_group_home_dir = /usr/local/mysql/
innodb_buffer_pool_size = 128M
innodb_log_file_size = 16M
innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit=1
#innodb_additional_mem_pool_size=1M
#innodb_lock_wait_timeout=50
log-bin
server-id=201
[isamchk]
key_buffer_size=128M
read_buffer_size=128M
write_buffer_size=128M
sort_buffer_size=128M
[myisamchk]
key_buffer_size=128M[server:~] dmesg | grep memory
real memory = 2146828288 (2047 MB)
avail memory = 2095534080 (1998 MB)
read_buffer_size=128M
write_buffer_size=128M
sort_buffer_size=128M
tmpdir=/var/tmp
The query is using just one cpu, top says 25% cpu time (so 1 of 4).
real memory = 2146828288 (2047 MB)
avail memory = 2095534080 (1998 MB)
62 processes: 2 running, 60 sleeping
CPU states: 25.2% user, 0.0% nice, 1.6% system, 0.0% interrupt, 73.2% idle
Mem: 244M Active, 1430M Inact, 221M Wired, 75M Cache, 112M Buf, 31M Free
Swap: 4096M Total, 1996K Used, 4094M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11536 mysql 27 20 0 239M 224M kserel 3 441:16 94.29% mysqld
Any idea how to fix it?
In my experience sub queries are often a cause of slow execution times in SQL statements, therefor I try to avoid them. Try this:
DELETE tname FROM tname INNER JOIN spam ON (tname.old_id = spam.textid);
Disclaimer: This query is not tested, make backups first! :-)
Your choice of where id in (select ...) will always perform poorly.
Instead, use a normal join which will be very efficient:
DELETE `text`
FROM spam
join `text` on `text`.old_id = spam.textid;
Notice selection from spam first, then joining to text, which will give the best performance.
of corse it will take a lot of time because it execute the subquery for every record but by using INNER JOIN directly this query is executed only one time
lets think that the query will take
10 ms for 50000 rec full time = 50000 * 10 ms ---> 8.333 minutes !! at least don't forget the condition and deleting time .....
but using join the query will be executed only one time :
DELETE t FROM tname.text t INNER JOIN (SELECT textid FROM spam) sq on t.old_id = sq.textid ;
Copy rows that are not in spam form text to new table. Then delete text table and rename created table.
Good idea is not to add any keys to created table. Add keys after renaming.
I think you might want to chunk the deletes down with a LIMIT and you might want to do that delete in a JOIN. I wrote a bit more about this in this article which helps specifically with archiving of data and deleting no longer needed rows.
https://shatteredsilicon.net/blog/2021/07/12/mariadb-mysql-performance-tuning-optimization-how-to-delete-faster-on-mysql/