I have a table created like this:
CREATE TABLE rh857_omf.picture(MeasNr TINYINT UNSIGNED, ExperimentNr TINYINT UNSIGNED,
Time INT, SequenceNr SMALLINT UNSIGNED, Picture MEDIUMBLOB, PRIMARY KEY(MeasNr,
ExperimentNr, Time, SequenceNr));
The first four rows MeasNR, ExperimentNr, Time and SequenceNr are the identifiers and are set as primary key. The fifth row, Picture, is the payload. Its an 800x800 Pixel 8Bit grey value picture (Size = 625 kBytes).
If I want to load a picture, I use the following command:
SELECT Picture FROM rhunkn1_omf.picture WHERE MeasNr = 2 AND
ExperimentNr = 3 AND SequenceNr = 150;
In the MySQL workbench, I see the duration and the fetch time if I run this command! For smaller tables (800 MBytes, 2376 entries, picture 640x480), its very fast (<100ms). If I take a bigger table (5800 MBytesm, 9024 entries), it gets very slow (>9s).
For instance, I run the following command (on the big table):
SELECT Picture FROM rhunkn1_omf.picture WHERE MeasNr = 2 AND
ExperimentNr = 3 AND SequenceNr = 1025 LIMIT 0, 1000;
the first time it takes 5.2 / 3.9 seconds (duration / fetch). The same command for the second time takes 0.2 / 0.2 seconds. If I change the SequenceNr
SELECT Picture FROM rhunkn1_omf.picture WHERE MeasNr = 2 AND
ExperimentNr = 3 AND SequenceNr = 977 LIMIT 0, 1000;
its also very fast 0.1 / 0.3 seconds
But if I change the the ExperimentNr, for instance
SELECT Picture FROM rhunkn1_omf.picture WHERE MeasNr = 2
AND ExperimentNr = 4 AND SequenceNr = 1025 LIMIT 0, 1000;
it takes long time 4.4 / 5.9 seconds.
Does anybody know why the database behaves like that and how I could improve the speed? Does it help if I create several smaller picture tables and split the load for each table? By the way, I use MySQL 5.1.62 and the MyISAM tables, but I also tested InnoDB which was even slower.
It would help if you could post the EXPLAIN for the query - mostly, the answers are in there (somewhere).
However, at a guess, I'd explain this behaviour by the fact your primary key includes TIME, and your queries don't; therefore, they may make only partial use of the index. I'd guess the query plan uses the index to filter out records in the MEASNR and ExperimentNr range, and then scans for matching sequenceNrs. If there are many records which match the first two criteria, that could be quite slow.
The reason you see a speed up second time round is that the queries get cached; this is not hugely predictable, depending on load, cache size etc.
Try creating an index which matches your "where" clause, and see EXPLAIN tells you.
Related
I have a MySQL table with more than a 3 billion rows hosted on Google Cloud SQL.
I wish to understand how the total size on disk can be explained from the column data-types, number of rows and the indexes.
I was hoping that it would be something like
size_of_table_in_bytes = num_rows * [ Sum over i {bytes_for_datatype_of_column(i))}
+ Sum over j {Bytes_for_index(j)} ]
But I end up with incorrect disk-size than how much my database size shows.
Using bytes per datatype on
https://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html
and additional bytes in InnoDB header and indexes from
https://dev.mysql.com/doc/refman/5.7/en/innodb-physical-record.html#innodb-compact-row-format-characteristics
Here is my understanding of bytes occupied by header, each column and each index
TABLE `depth` (
Bytes| Column/Header/Index
2| variable length header Ceil(num columns/8) = Ceil (10/8)
5| Fixed Length Header
3|`date` date DEFAULT NULL,
7|`receive_time` datetime(3) DEFAULT NULL,
8|`instrument_token` bigint(20) unsigned DEFAULT NULL,
1|`level `tinyint(3) unsigned DEFAULT NULL,
2|`bid_count` smallint(5) unsigned DEFAULT NULL,
8|`bid_size` bigint(20) unsigned DEFAULT NULL,
4|`bid_price` float DEFAULT NULL,
4|`ask_price` float DEFAULT NULL,
8|`ask_size` bigint(20) unsigned DEFAULT NULL,
2|`ask_count` smallint(5) unsigned DEFAULT NULL,
6|KEY `date_time_sym (`date`,`receive_time`,`instrument_token`),
6|KEY `date_sym_time (`date`,`instrument_token`,`receive_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8`
which totals to 72 bytes.
But as per SHOW TABLE STATUS, Avg_row_length = 79.
Question 1: Where am I getting the number of bytes per row wrong?
I am reasonably sure that there are no nulls in my data.
Assuming, I am making some mistake in counting bytes, and using 79 bytes per row
and counting rows using SELECT COUNT(*) as 3,017,513,240:
size_of_table_in_bytes = 79*3,017,513,240 = 238,383,545,960
Another way to get the size is to use MySQL query
SHOW TABLE STATUS from mktdata where Name = "depth";
Here I get a table output with one row, with value of a few important fields as:
Name: depth
Engine:InnoDB
Version:10
Row_format:Dynamic
Rows: 1,72,08,21,447
Avg_row_length: 78
Index_length: 1,83,90,03,07,456
Data_length: 1,35,24,53,32,480
At first I was alarmed, how Rows is 1.7 Billion instead of 3.01 Billion, but I found this in the documentation
Rows
The number of rows. Some storage engines, such as MyISAM, store the exact
count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40 to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.
So, 3.01 Billion seems right for number of rows, and therefore I expect table size to be 238 GB.
But then, if I add up, Data_length and Index_length, I get 319,145,639,936
Question 2: Why do I get 319 GB by one method and 238 GB as another. Which one is right if any?
Moreover the overall size shown for the SQL database by Google Cloud SQL Console is 742 GB. The only other table I have, trade, has exactly 1/5th the number of rows of depth and 5 columns. It's size by summing Data_length and Index_length is 57 GB. If I add both the table sizes I get 376 GB.
Question 3: 742 GB seems roughly twice of 376 GB (actually 752). Could this be because of the back-up? I know Google Cloud SQL does an automatic back-up once a day?
Because of plausibility of Question 3 above, I got a doubt that my simple method of size = num_rows*num_bytes_per_row is wrong! This is really troubling me, and will appreciate any help in resolving this.
There is more overhead than you mentioned. 20 bytes/row might be close.
Don't trust SHOW TABLE STATUS for giving "Rows", use SELECT COUNT(*) ... Notice how it was off by nearly a factor of 2.
Compute the other way: 135245332480 / 3017513240 = 45 bytes.
From 45 bytes, I deduce that a lot of the cells are NULL?
Each column in each row has 1- or 2-byte overhead.
The ROW_FORMAT matters.
TEXT and BLOB (etc) have radically different rules than simple datatypes.
The indexes take a lot more than the 6 bytes you mentioned (see your other posting).
BTree structure has some overhead. When loaded in order, 15/16 of each block is filled (that is mentioned somewhere in the docs). After churn, the range can easily be 50-100% is filled; a BTree gravitates to 69% full (hence the 1.45 in the other posting).
Reserving an equal amount of space for backup...
I don't know if that is what they are doing.
If they use mysqldump (or similar), it is not a safe formula -- the text dump of the database could be significantly larger or smaller.
If they use LVM, then they have room for a full binary dump. But that does not make sense because of COW.
(So, I give up on Q3.)
Could the Cloud service be doing some kind of compression?
I'm trying to identify performance bottleneck on CREATE TABLE AS / INSERT INTO type of query. As per my previous investigation I'd found that source SELECT statement runs pretty smooth with response time under 1 sec. No issues on that side. The problem appears when I try to insert the result (approx. 9000 rows) into innodb table. This part takes more than 30 sec. - it doesn't meet expected SLA.
Right now I'm looking into INFORMATION_SCHEMA.PROFILING information in order to understand what is going wrong with my query. But output I'm getting is a bit confusing.
show profiles;
Returns the below result.
-------------------------------------------------------------------------
Query_ID | Duration |Query
-------------------------------------------------------------------------
'3' |'0.03072700' |'drop table `test_db`.`page_55`'
'4' |'30.27254125' |'CREATE TABLE `test_db`.`page_55` AS SELECT ...'
'5' |'0.00010050' |'SHOW WARNINGS'
I can clearly see query ID = 4 which is my candidate with duration of 30 seconds. But when I investigate it further in INFORMATION_SCHEMA.PROFILING then there is not any step that might cause such a long response. All entries are with Duration < 1 second as verified by below query
SELECT sum(DURATION)
FROM INFORMATION_SCHEMA.PROFILING
WHERE QUERY_ID = 4;
It returns 0.003308s which is completely different value than I've previously got from show profiles.
Does anyone has any idea how to interpret such a behavior?
Thank you very much.
Martin
I have to create a table that assigns an user id and a product id to some data (models 2 one to many relationships). I will make a lot of queries like
select * from table where userid = x;
The first thing that I am interested is how big should the table get before the query starts to be observable (let's say it takes more than 1 second).
Also, how this can be optimised?
I know that this might depend on the implementation. I will use mysql for this specific project, but I am interested in more general answers as well.
It all depends on the horse power of your machine. The make that query more efficient, create an index with "userid"
how big should the table get before the query starts to be observable (let's say it takes more than 1 second)
There are too many factors to deterministically measure run time. CPU speed, memory, I/O speed, etc. are just some of the external factors.
how this can be optimized?
That's more staightforward. If there is an index on userid then the query will likely to an index seek which is about as fast as you can get as far as finding the record. If the userid is a clustered index then it will be faster because it won't have to use the position from the index to find the record in data pages - the data is physically organized as part of the index.
let's say it takes more than 1 second
With an index on userid, Mysql will manage to find the correct row in (worst case) Oh (log n). In "seconds" it now depends on the performance of your machine.
It is impossible to give you an exact number, without considering how long one operation takes.
As an Example: Assuming you have a database with 4 records. This requires 2 operations worst case. Any time, you double your data, one more operation is required.
for example:
# records | # operations to find entry in worst case
2 1
4 2
8 3
16 4
...
4096 12
...
~1 B 30
~2 B 31
So, with a huge amount of records - time almost remains constant. For 1 Billion records, you would need to perform ~ 30 operations.
And like that it continues: 2 Billion records, 31 operations.
so, let's say your query executes in 0.001 second for 4096 entries (12 ops)
it would take arround (0.001 / 12 * 30) 0.0025 seconds for 1 Billion records.
Heavy Sidenode: this is just considering the runtime complexity of the binary search, but it shows how the complexity would scale.
In a nutshell: Your database would be unimpressed by a single query on an indexed value. However, if you run a heavy amount of those queries at the same time, time increases ofc.
There is a search page in webapplication(Pagination is used : 10 records per page). Database used : Mysql. Table has around 1000 00records.Query is tuned as in query is using index (checked Explain plan).Result set that fetches around 17000 rows and takes around 5 sec .Can any please suggest how to optimize search Query.(Note : Tried to use limit but query time did not improve).
Query Eg:
Select * from abc
Join def on abc.id=def.id
where date >= '2013-09-03'
and date <='2014-10-01'
and def.state=1
-- id on both table is indexed
-- date and state column cannot be indexed as they have low SI.
I recently switched servers because Joyent is ending their service soon. But the queries on rimuhosting seem to take significantly longer (2-6 times). But there's a huge variance in behavior: most queries run .02 seconds or less and then sometimes those same exact queries take .5 seconds or more. Both servers were running mySQL and PHP (similar version but not the same exact number).
The server load is 20-40% Idle for the CPU. Most of the memory is being used, but they tell me that's normal. The tech support tells me it's not swapping:
Here's what it looks like right now: (though memory usage will increase to near max eventually, like last time)
Mem: 1513548k total, 1229316k used, 284232k free, 63540k buffers
Swap: 131064k total, 0k used, 131064k free, 981420k cached
SQL max connections is set to 400.
So, why am i getting these super slow queries, sometimes?
Here is an example of a query that is sometimes .01 second and sometimes greater then 1 second:
SELECT (!attacked AND (firstLoginDate > 1348703469 )) as protected,
id, universe.uid, universe.name AS obj_name,top,left, guilds.name as alliance,
rotate,what, player_data.first, player_data.last,
aid AS gid, (aid=1892 AND aid>0) as am,
fleet LIKE '%Turret%' AS turret,
startLeft, startTop, endLeft, endTop, duration, startTime, movetype,
moving,speed, defend, hp, lastAttack>1349740269 AS ra FROM universe LEFT JOIN player_data ON universe.uid=player_data.uid
LEFT JOIN guilds ON aid=guilds.gid
WHERE ( sector='21_82' OR sector='22_82' OR sector='21_83' OR sector='22_83' ) OR
( universe.uid=1568425485 AND ( upgrading=1 OR building=1 ))
Yes, I do have indexes on all the appropriate columns. And all 3 tables featured above are InnoDB tables, which means they are only row locked, not table locked.
but this is interesting: (new server)
Innodb_row_lock_time_avg 400 The average time to acquire a row lock, in milliseconds.
Innodb_row_lock_time_max 4,010 The maximum time to acquire a row lock, in milliseconds.
Innodb_row_lock_waits 31 The number of times a row lock had to be waited for.
why does it take so long to get a row lock?
my old server was able to get the row lock faster:
Innodb_row_lock_time_avg 26 The average time to acquire a row lock, in milliseconds.
Here's the new server:
Opened_tables 5,500 (in just 2 hours) The number of tables that have been opened. If opened tables is big, your table cache value is probably too small.
table cache 256
table locks waited 3,302 (in just 2 hours)
Here is the old server:
Opened_tables 420
table cache 64
Does that makes sense? If I increase the table Cache will that alleviate things?
Note: i have 1.5 GB on this server
Here is the explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE universe index_merge uidwhat,uid,uidtopleft,upgrading,building,sector sector,uid 38,8 NULL 116 Using sort_union(sector,uid); Using where
1 SIMPLE player_data ref mainIndex mainIndex 8 jill_sp.universe.uid 1
1 SIMPLE guilds eq_ref PRIMARY PRIMARY 8 jill_sp.player_data.aid 1