I have a MySQL (MyISAM) database with different tables. Lets take for example the database "rh955_omf" with the following tables:
signal (600 MBytes, 17925 entries)
picture (5'355 MBytes, 17925 entries)
velocity (680 MBytes, 4979 entries)
Actually I'm just concentrating onto the signal table entries. Therefore I want to describe this table a bit better. It's created as following:
CREATE TABLE rh955_omf.signal(MeasNr TINYINT UNSIGNED, ExperimentNr TINYINT UNSIGNED, Time INT, SequenceNr SMALLINT UNSIGNED, MeanBeatRate SMALLINT UNSIGNED, MedBeatRate SMALLINT UNSIGNED, MeanAmp1 MEDIUMINT UNSIGNED, MeanAmp2 MEDIUMINT UNSIGNED, StdDeviationAmp1 DOUBLE, StdDeviationAmp2 DOUBLE, MeanDeltaAmp MEDIUMINT UNSIGNED, Offset INT UNSIGNED, NrOfPeaks SMALLINT UNSIGNED, `Signal` MEDIUMBLOB, Peakcoord MEDIUMBLOB, Validity BOOL, Comment VARCHAR(255), PRIMARY KEY(MeasNr, ExperimentNr, SequenceNr));
I load the values from this table with the following command:
SELECT MeanBeatRate FROM rh955_omf.signal WHERE MeasNr = 3 AND ExperimentNr = 10 AND SequenceNr BETWEEN 0 AND 407
If I load the whole "MeanBeatRate" row (int 16 values) for the first time, it takes me about 54 seconds (MeasNr = 1..3, ExperimentNr = 1..24, SequenceNr >= min AND <= max). If I load it again, it takes 0.5 seconds (cache).
So what I want to do is speeding up the database. Therefore, I created some new databases, but didn't put all the tables into the different databases:
rh955_copy_omf: signal table
rh955_p_copy_omf: signal table, picture table
rh955_v_p_copy_omf: signal table, picture table, velocity table
I restarted the computer and loaded all the "MeanBeatRate" values from the different tables. That gave me the following time:
rh955_omf: 54s (as mentioned before)
rh955_copy_omf: 3.1s
rh955_p_copy_omf: 12.9s
rh955_v_p_copy_omf: 10.7s
So it looks like the time to load the data is dependent of the other tables in the database. Is this even possible (because I'm just searching in the "signal" table)? And what is even more confusing: In the table "rh955_v_p_copy_omf" I have all the data I also have in the original table, but the performance is ~5 times better. Any explanation for that behavior? I would be thankful for any help because I'm really stuck at this point and need to increase the database performance...
Additional information: In one case, I stored the data in the table with the command "LOAD DATA INFILE 'D:/Exported MySQL/rh955/signal.omf' INTO table rh955_omf.signal" (that's the case where loading data is slow), in the other cases I stored the data line by line. Maybe that's the case why the performance is different? If yes, what's the workaround to store data from a file?
Are they indexed the same?
Are the server parameters the same for each DB (same memory, start up configuration and parameters, etc)?
Are they on the same disk?
If they are on the same disk is there some other application running at the same time which influences where the read-write heads are?
Do you stop the other databases each time or have some still running some times?
Related
everyone. Here is a problem in my mysql server.
I have a table about 40,000,000 rows and 10 columns.
Its size is about 4GB.And engine is innodb.
It is a master database, and only execute one sql like this.
insert into mytable ... on duplicate key update ...
And about 99% sqls executed update part.
Now the server is becoming slower and slower.
I heard that split table may enhance its performance. Then I tried on my personal computer, splited into 10 tables, failed , also tried 100 ,failed too. The speed became slower instead. So I wonder why splitting tables didn't enhance the performance?
Thanks in advance.
more details:
CREATE TABLE my_table (
id BIGINT AUTO_INCREMENT,
user_id BIGINT,
identifier VARCHAR(64),
account_id VARCHAR(64),
top_speed INT UNSIGNED NOT NULL,
total_chars INT UNSIGNED NOT NULL,
total_time INT UNSIGNED NOT NULL,
keystrokes INT UNSIGNED NOT NULL,
avg_speed INT UNSIGNED NOT NULL,
country_code VARCHAR(16),
update_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY(id), UNIQUE KEY(user_id)
);
PS:
I also tried different computers with Solid State Drive and Hard Disk Drive, but didn't help too.
Splitting up a table is unlikely to help at all. Ditto for PARTITIONing.
Let's count the disk hits. I will skip counting non-leaf nodes in BTrees; they tend to be cached; I will count leaf nodes in the data and indexes; they tend not to be cached.
IODKU does:
Read the index block containing the for any UNIQUE keys. In your case, that is probably user_id. Please provide a sample SQL statement. 1 read.
If the user_id entry is found in the index, read the record from the data as indexed by the PK(id) and do the UPDATE, and leave this second block in the buffer_pool for eventual rewrite to disk. 1 read now, 1 write later.
If the record is not found, do INSERT. The index block that needs the new row was already read, so it is ready to have a new entry inserted. Meanwhile, the "last" block in the table (due to id being AUTO_INCREMENT) is probably already cached. Add the new row to it. 0 reads now, 1 write later (UNIQUE). (Rewriting the "last" block is amortized over, say, 100 rows, so I am ignoring it.)
Eventually do the write(s).
Total, assuming essentially all take the UPDATE path: 2 reads and 1 write. Assuming the user_id follows no simple pattern, I will assume that all 3 I/Os are "random".
Let's consider a variation... What if you got rid of id? Do you need id anywhere else? Since you have a UNIQUE key, it could be the PK. That is replace your two indexes with just PRIMARY KEY(user_id). Now the counts are:
1 read
If UPDATE, 0 read, 1 write
If INSERT, 0 read, 0 write
Total: 1 read, 1 write. 2/3 as many as before. Better, but still not great.
Caching
How much RAM do you have?
What is the value of innodb_buffer_pool_size?
SHOW TABLE STATUS -- What are Data_length and Index_length?
I suspect that the buffer_pool is not big enough, and possible could be raised. If you have more than 4GB of RAM, make it about 70% of RAM.
Others
SSDs should have helped significantly, since you appear to be I/O bound. Can you tell whether you are I/O-bound or CPU-bound?
How many rows are you updating at once? How long does it take? Is it batched, or one at a time? There may be a significant improvement possible here.
Do you really need BIGINT (8 bytes)? INT UNSIGNED is only 4 bytes.
Is a transaction involved?
Is the Master having a problem? The Slave? Both? I don't want to fix the Master in such a way that it messes up the Slave.
Try to split your database into some mysql instances using mysql proxy just like mysql-proxy or haproxy instead of one mysql instance. Maybe you can have great performance.
I am looking into storing a "large" amount of data and not sure what the best solution is, so any help would be most appreciated. The structure of the data is
450,000 rows
11,000 columns
My requirements are:
1) Need as fast access as possible to a small subset of the data e.g. rows (1,2,3) and columns (5,10,1000)
2) Needs to be scalable will be adding columns every month but the number of rows are fixed.
My understanding is that often its best to store as:
id| row_number| column_number| value
but this would create 4,950,000,000 entries? I have tried storing as just rows and columns as is in MySQL but it is very slow at subsetting the data.
Thanks!
Build the giant matrix table
As N.B. said in comments, there's no cleaner way than using one mysql row for each matrix value.
You can do it without the id column:
CREATE TABLE `stackoverflow`.`matrix` (
`rowNum` MEDIUMINT NOT NULL ,
`colNum` MEDIUMINT NOT NULL ,
`value` INT NOT NULL ,
PRIMARY KEY ( `rowNum`, `colNum` )
) ENGINE = MYISAM ;
You may add a UNIQUE INDEX on colNum, rowNum, or only a non-unique INDEX on colNum if you often access matrix by column (because PRIMARY INDEX is on ( `rowNum`, `colNum` ), note the order, so it will be inefficient when it comes to select a whole column).
You'll probably need more than 200Go to store the 450.000x11.000 lines, including indexes.
Inserting data may be slow (because there are two indexes to rebuild, and 450.000 entries [1 per row] to add when adding a column).
Edit should be very fast, as index wouldn't change and value is of fixed size
If you access same subsets (rows + cols) often, maybe you can use PARTITIONing of the table if you need something "faster" than what mysql provides by default.
After years of experience (20201 edit)
Re-reading myself years later, I would say the "cache" ideas are totally dumb, as it's MySQL role to handle these sort of cache (it should actually already be in the innodb pool cache).
A better thing would be, if matrix is full of zeroes, not storing the zero values, and consider 0 as "default" in the client code. That way, you may lightenup the storage (if needed: mysql should actually be pretty fast responding to queries event on such 5 billion row table)
Another thing, if storage makes issue, is to use a single ID to identify both row and col: you say number of rows is fixed (450000) so you may replace (row, col) with a single (id = 450000*col+row) value [tho it needs BIGINT so maybe not better than 2 columns)
Don't do like below: don't reinvent MySQL cache
Add a cache (actually no)
Since you said you add values, and doesn't seem to edit matrix values, a cache can speed up frequently asked rows/columns.
If you often read the same rows/columns, you can cache their result in another table (same structure to make it easier):
CREATE TABLE `stackoverflow`.`cachedPartialMatrix` (
`rowNum` MEDIUMINT NOT NULL ,
`colNum` MEDIUMINT NOT NULL ,
`value` INT NOT NULL ,
PRIMARY KEY ( `rowNum`, `colNum` )
) ENGINE = MYISAM ;
That table will be void at the beginning, and each SELECT on the matrix table will feed the cache. When you want to get a column / row:
SELECT the row/column from that caching table
If the SELECT returns a void/partial result (no data returned or not enough data to match the expected row/column number) then do the SELECT on the matrix table
Save the SELECT from the matrix table to the cachingPartialMatrix
If the caching matrix gets too big, clear it (the bigger cached matrix is, the slower it becomes)
Smarter cache (actually, no)
You can make it even smarter with a third table to count how many times a selection is done:
CREATE TABLE `stackoverflow`.`requestsCounter` (
`isRowSelect` BOOLEAN NOT NULL ,
`index` INT NOT NULL ,
`count` INT NOT NULL ,
`lastDate` DATETIME NOT NULL,
PRIMARY KEY ( `isRowSelect` , `index` )
) ENGINE = MYISAM ;
When you do a request on your matrix (one may use TRIGGERS) for the Nth-row or Kth-column, increment the counter. When the counter gets big enough, feed the cache.
lastDate can be used to remove some old values from the cache (take care: if you remove the Nth-column from cache entries because its ``lastDate```is old enough, you may break some other entries cache) or to regularly clear the cache and only leave the recently selected values.
In my database I have table named fo_image_guestimage, It Contains more then 2,63,000 rows only. But when i trying to update only one content in that, it yakes too much of time (121.683ms)
My Table structure-
my query execution and its time
How to minimize the query time in mysql? My table type was InnoDB.
EDIT 1-
My DATABASE Size - 3.5 GB , fo_guest_image table size 2.8 GB
Table Structure
CREATE TABLE `fo_guest_image` (
`Fo_Image_Id` INT(10) NOT NULL AUTO_INCREMENT,
`Fo_Image_Regno` VARCHAR(10) NULL,
`Fo_Image_GuestHistoryId` INT(10) NOT NULL,
`Fo_Image_Photo` LONGBLOB NOT NULL,
`Fo_Image_Doc1` LONGBLOB NOT NULL,
`Fo_Image_Doc2` LONGBLOB NOT NULL,
`Fo_Image_Doc3` LONGBLOB NOT NULL,
`Fo_Image_Doc4` LONGBLOB NOT NULL,
`Fo_Image_Doc5` LONGBLOB NOT NULL,
`Fo_Image_Doc6` LONGBLOB NOT NULL,
`Fo_Image_Billno` VARCHAR(10) NULL,
PRIMARY KEY (`Fo_Image_Id`)
)
ENGINE=InnoDB
ROW_FORMAT=DEFAULT
AUTO_INCREMENT=36857
Query With Execution Time -
select COUNT(Fo_Image_Regno) from fo_guest_image; Time: 11.483ms
select * from fo_guest_image where Fo_Image_Regno='G13603'; Time: 101.381ms
update fo_guest_image set Fo_Image_Regno='T13603' where Fo_Image_Regno='G13603'; Time: 144.360ms
I have tried a nonblob table: - fo_daybook Size 400 KB
Query With Execution Time -
select * from fo_daybook; Time: 0.144ms
select fo_daybok_Regno from fo_daybook; Time: 0.004ms
update fo_daybook set fo_daybok_Regno ='T13603' where fo_daybok_Regno ='G13603'; Time: 0.011ms
My Client Added 1000 rows per day in fo_guest_image. Now fo_guest_image table size 2.8 GB, surely it increase day by day. I am scary if the table has one day reach 10 G.B. Then what will happen to performance.
Short solution: add index on the column "Fo_Image_Regno"
Longer, but much better, solution: do not store images in the tables. Store only links to the images in the table. And then store images in local folders/directories on the system. It will be much better now and in the future
My apologies for incorrect first unswer. I did not realise that you are updating a non-blob column. However all my reservations about using BLOB columns to store documents are valid.
Re this UPDATE statement updated 14119 rows. It had to read all 2.9 million rows find those 14k that match WHERE clause and only then update them.
Check how long a SELECT query will run. I'm sure it will be pretty close to timing of update statement. 2.9 mn rows is not a small dataset.
Adding an index on fo_image_GuestHistoryID will speed up this query, but slightly slow down inserts into the table. Also it will take some time to create the index.
Index will speed it up, but there costs of having an index. In case of adding 1000 ros a day benefits of indes should outweight its costs.
I have a table with symbol names (e.g. functions) and their start memory address and end memory address placement. Now I want to look up many addresses that are between the start and end addresses and map to each symbol name (or simpler the start addr as below example).
I do a query like this:
SELECT r.caller_addr AS caller_addr,sm.addrstart AS caller FROM rets AS r
JOIN symbolmap AS sm ON r.caller_addr BETWEEN sm.addrstart AND sm.addrend;
rets is a table that contains approximately a million caller_addr. The symbolmap table is created as:
CREATE TABLE
symbolmap
(addrstart BIGINT NOT NULL,
addrend BIGINT NOT NULL,
name VARCHAR(45),
PRIMARY KEY (addrstart),
UNIQUE INDEX (addrend)) ENGINE = InnoDB;
All the addrstart to addrend rows are none overlapping, i.e. there can only be one row hit for any requested addr (r.caller_addr in the example). The symbolmap table contains 42000 rows. I have tried a few other index methods as well, but still the select takes very long time (many 10s of minutes) and has not managed to finish.
Any suggestions on better indexes or other select statements that have better performance? I'm running this on MySQL 5.1.41 and I don't need to worry about portability.
When I have searched for what others do I only find results with constant boundaries and not when finding the row having the right boundaries. But it seems to me like a quite general problem.
Try to combine the two columns in a single index:
CREATE TABLE
symbolmap
(addrstart BIGINT NOT NULL,
addrend BIGINT NOT NULL,
name VARCHAR(45),
PRIMARY KEY (addrstart, addrend)
) ENGINE = InnoDB;
Also make sure that caller_addr is also bigint
I’m seeing a performance behavior in mysqld that I don’t understand.
I have a table t with a primary key id and three data columns col1, … col4.
The data are in 4 TSV files 'col1.tsv', … 'col4.tsv'. The procedure I use to ingest them is:
CREATE TABLE t (
id INT NOT NULL,
col1 INT NOT NULL,
col2 INT NOT NULL,
col3 INT NOT NULL,
col4 CHAR(12) CHARACTER SET latin1 NOT NULL );
LOAD DATA LOCAL INFILE # POP 1
'col1.tsv' INTO TABLE t (id, col1);
ALTER TABLE t ADD PRIMARY KEY (id);
SET GLOBAL hot_keys.key_buffer_size= # something suitable
CACHE INDEX t IN hot_keys;
LOAD INDEX INTO CACHE t;
DROP TABLE IF EXISTS tmpt;
CREATE TABLE tmpt ( id INT NOT NULL, val INT NOT NULL );
LOAD DATA LOCAL INFILE 'col2.tsv' INTO TABLE tmpt tt;
INSERT INTO t (id, col2) # POP 2
SELECT tt.id, tt.val FROM tmpt tt
ON DUPLICATE KEY UPDATE col2=tt.val;
DROP TABLE IF EXISTS tmpt;
CREATE TABLE tmpt ( id INT NOT NULL, val INT NOT NULL );
LOAD DATA LOCAL INFILE 'col3.tsv' INTO TABLE tmpt tt;
INSERT INTO t (id, col3) # POP 3
SELECT tt.id, tt.val FROM tmpt tt
ON DUPLICATE KEY UPDATE col3=tt.val;
DROP TABLE IF EXISTS tmpt;
CREATE TABLE tmpt ( id INT NOT NULL,
val CHAR(12) CHARACTER SET latin1 NOT NULL );
LOAD DATA LOCAL INFILE 'col4.tsv' INTO TABLE tmpt tt;
INSERT INTO t (id, col4) # POP 4
SELECT tt.id, tt.val FROM tmpt tt
ON DUPLICATE KEY UPDATE col4=tt.val;
Now here’s the performance thing I don’t understand. Sometimes the POP 2
and 3 INSERT INTO … SELECT … ON DUPLICATE KEY UPDATE queries run very fast with mysqld
occupying 100% of a core and at other times mysqld bogs down at 1% CPU reading t.MYD, i.e. table t’s MyISAM data file, at random offsets.
I’ve had a very hard time isolating in which circumstances it is fast and
in which it is slow but I have found one repeatable case:
In the above sequence, POP 2 and 3 are very slow. But if I create t
without col4 then POP 2 and POP 3 are very fast. Why?
And if, after that, I add col4 with an ALTER TABLE query then POP 4 runs
very fast too.
Again, when the INSERTs run slow, mysqld is bogged down in file IO
reading from random offsets in table t’s MyISAM data file. I don’t even
understand why it is reading that file.
MySQL server version 5.0.87. OS X 10.6.4 on Core 2 Duo iMac.
UPDATE
I eventually found (what I think is) the answer to this question. The mysterious difference between some inserts being slow and some fast is dependent on the data.
The clue was: when the insert is slow, mysqld is seeking on average 0.5GB between reads on t.MYD. When it is fast, successive reads have tiny relative offsets.
The confusion arose because some of the 'col?.tsv' files happen to have their rows in roughly the same order w.r.t. the id column while others are randomly ordered relative to them.
I was able to drastically reduce overall processing time by using sort(1) on the tsv files before loading and inserting them.
It's a pretty open question... here's a speculative, open answer. :)
... when the INSERTs run slow, mysqld is bogged down in file IO reading from random offsets in table t’s MyISAM data file. I don’t even understand why it is reading that file.
I can think of two possible explanations:
Even after it knows there is a primary key collision, it has to see what used to be in the field that will be updated -- if it is coincidentally the destination value already, 0 in this case, it won't perform the update -- i.e. zero rows affected.
Moreover, when you update a field, I believe MySQL re-writes the whole row back to disk (if not multiple rows due to paging), and not just that single field as one might assume.
But if I create t without col4 then POP 2 and POP 3 are very fast. Why?
If it's a fixed-row size MyISAM table, which it looks like due to the datatypes in the table, then including the CHAR field, even if it's blank, will make the table 75% larger on disk (4 bytes per INT field = 16 bytes, whereas the CHAR(12) would add another 12 bytes). So, in theory, you'll need to read/write 75% more.
Does your dataset fit in memory? Have you considered using InnoDB or Memory tables?
Addendum
If the usable/active/hot dataset goes from fitting in memory to not fitting in memory, an orders of magnitude decrease in performance isn't unheard of. A couple reads:
http://jayant7k.blogspot.com/2010/10/foursquare-outage-post-mortem.html
http://www.mysqlperformanceblog.com/2010/11/19/is-there-benefit-from-having-more-memory/