In my database I have table named fo_image_guestimage, It Contains more then 2,63,000 rows only. But when i trying to update only one content in that, it yakes too much of time (121.683ms)
My Table structure-
my query execution and its time
How to minimize the query time in mysql? My table type was InnoDB.
EDIT 1-
My DATABASE Size - 3.5 GB , fo_guest_image table size 2.8 GB
Table Structure
CREATE TABLE `fo_guest_image` (
`Fo_Image_Id` INT(10) NOT NULL AUTO_INCREMENT,
`Fo_Image_Regno` VARCHAR(10) NULL,
`Fo_Image_GuestHistoryId` INT(10) NOT NULL,
`Fo_Image_Photo` LONGBLOB NOT NULL,
`Fo_Image_Doc1` LONGBLOB NOT NULL,
`Fo_Image_Doc2` LONGBLOB NOT NULL,
`Fo_Image_Doc3` LONGBLOB NOT NULL,
`Fo_Image_Doc4` LONGBLOB NOT NULL,
`Fo_Image_Doc5` LONGBLOB NOT NULL,
`Fo_Image_Doc6` LONGBLOB NOT NULL,
`Fo_Image_Billno` VARCHAR(10) NULL,
PRIMARY KEY (`Fo_Image_Id`)
)
ENGINE=InnoDB
ROW_FORMAT=DEFAULT
AUTO_INCREMENT=36857
Query With Execution Time -
select COUNT(Fo_Image_Regno) from fo_guest_image; Time: 11.483ms
select * from fo_guest_image where Fo_Image_Regno='G13603'; Time: 101.381ms
update fo_guest_image set Fo_Image_Regno='T13603' where Fo_Image_Regno='G13603'; Time: 144.360ms
I have tried a nonblob table: - fo_daybook Size 400 KB
Query With Execution Time -
select * from fo_daybook; Time: 0.144ms
select fo_daybok_Regno from fo_daybook; Time: 0.004ms
update fo_daybook set fo_daybok_Regno ='T13603' where fo_daybok_Regno ='G13603'; Time: 0.011ms
My Client Added 1000 rows per day in fo_guest_image. Now fo_guest_image table size 2.8 GB, surely it increase day by day. I am scary if the table has one day reach 10 G.B. Then what will happen to performance.
Short solution: add index on the column "Fo_Image_Regno"
Longer, but much better, solution: do not store images in the tables. Store only links to the images in the table. And then store images in local folders/directories on the system. It will be much better now and in the future
My apologies for incorrect first unswer. I did not realise that you are updating a non-blob column. However all my reservations about using BLOB columns to store documents are valid.
Re this UPDATE statement updated 14119 rows. It had to read all 2.9 million rows find those 14k that match WHERE clause and only then update them.
Check how long a SELECT query will run. I'm sure it will be pretty close to timing of update statement. 2.9 mn rows is not a small dataset.
Adding an index on fo_image_GuestHistoryID will speed up this query, but slightly slow down inserts into the table. Also it will take some time to create the index.
Index will speed it up, but there costs of having an index. In case of adding 1000 ros a day benefits of indes should outweight its costs.
Related
I am having difficulties in optimizing this SQL statement in MySQL. I have two tables that are populated independently and so the times logged in each table's column will not be the same. What I want is a single table (view) that lists all the records in the sensor_history with the current process information that was present at the sensor's measurement_time. If a process log time was not present, I can live with NULLs in the process fields in the resulting view for that particular record.
What I have here works but it is brute force and woefully inefficient. There are about 500k records in the sensor_history table and about 20k records in the process_history table. I have tried getting my head around different join methods but I run into syntax issues or bad results. I have tried some online optimizers without success and so I am hoping someone here can point me in the right direction.
For simplicity, I have removed the foreign key relations to other tables. There are no indices in use but feel free to suggest any that may help. Here are the basics:
CREATE TABLE `sensor_history` (
`measurement_time_utc` int(11) NOT NULL,
`sensor_id` int(11) NOT NULL,
`sensor_measurement_x` double NOT NULL,
`sensor_measurement_y` double NOT NULL,
`sensor_measurement_z` double NOT NULL,
`sensor_quality` int(11) NOT NULL
);
CREATE TABLE `process_history` (
`log_time_utc` int(11) NOT NULL,
`process_id` int(11) NOT NULL,
`process_speed` double NOT NULL,
`process_load` int(11) NOT NULL
);
CREATE VIEW `rollup` AS SELECT
`sensor_history`.`measurement_time_utc`,
`sensor_history`.`sensor_id`,
`sensor_history`.`sensor_measurement_x`,
`sensor_history`.`sensor_measurement_y`,
`sensor_history`.`sensor_measurement_z`,
`sensor_history`.`sensor_quality`,
(SELECT `process_history`.`process_id` FROM `process_history` WHERE `sensor_history`.`measurement_time_utc`>=`process_history`.`log_time_utc` ORDER BY `process_history`.`log_time_utc` DESC LIMIT 1) AS `process_id`,
(SELECT `process_history`.`process_speed` FROM `process_history` WHERE `sensor_history`.`measurement_time_utc`>=`process_history`.`log_time_utc` ORDER BY `process_history`.`log_time_utc` DESC LIMIT 1) AS `process_speed`,
(SELECT `process_history`.`process_load` FROM `process_history` WHERE `sensor_history`.`measurement_time_utc`>=`process_history`.`log_time_utc` ORDER BY `process_history`.`log_time_utc` DESC LIMIT 1) AS `process_load`
FROM `sensor_history`;
How can I make a more efficient rollup view? Thanks in advance.
Views are really hard to optimize in MySQL. Your best hope is for an index on:
process_history(log_time_utc, process_id, process_speed)
The last two columns are included so the index covers the query and doesn't need to refer to the data pages.
While you are trying to figure out what the Analysts really need, let's do some improvements that are easier to do now than later.
DOUBLE takes 8 bytes and delivers about 16 significant digits. That is gross overkill for every sensor I have heard of. Consider the 4-byte FLOAT, which gives you about 7 significant digits.
(Where am I going with this? Capturing "sensor" data keeps coming, and it eventually fills up disk and that makes it slow. So, let's shrink things soon.)
INT is 4 bytes and has a range of +/- 2 billion. Are you expecting that many sensors? How about a 1-byte TINYINT UNSIGNED with a range of 0..255? Or `SMALLINT UNSIGNED (1-bytes, range 0..64K)? Ditto for any other ids.
Or... Do you really need to save all the data? Maybe day-old data can be summarized down to hourly min, max, avg, etc? And month-old data is needed only to a day's resolution?
We have lots to discuss once your analysts explain to you what the do want. Then you need to read-between-the-lines to see what they will want. (I can help there, too.)
I have a table like this:
create table test (
id int primary key auto_increment,
idcard varchar(30),
name varchar(30),
custom_value varchar(50),
index i1(idcard)
)
I insert 30,000,000 rows to the table
and then I execute:
select * from test where idcard='?'
The statement cost 12 seconds to return
when I use iostat to monitor disk
the read speed is about 6 mb/s while the util is 94%
is any way to optimize it?
12 seconds may be realistic.
Assumptions about the question:
A total of 30M rows, but only 3000 rows in the resultset.
Not enough room to cache things in RAM or you are running from a cold start.
InnoDB or MyISAM (the analysis is the same; the details are radically different).
Any CHARACTER SET and COLLATION for idcard.
INDEX(idcard) exists and is used in the query.
HDD disk drive, not SSD.
Here's a breakdown of the processing:
Go to the index, find the first entry with ?, scan forward until hitting an entry that is not ? (about 3K rows later).
For each of those 3K items, reach into the table to find all the columns (cf SELECT *.
Deliver them.
Step 1: Fast.
Step 2: This is (based on the assumption of not being cached) costly. It may involve about 3K disk hits. For an HDD, that would be about 30 seconds. So, 12 seconds could imply some of the stuff was cached or happened to be near each other.
Step 3: This is a network cost, which I am not considering.
Run the query a second time. It may take only 1 second the this time -- because all 3K blocks are cached in RAM! And iostat will show zero activity!
is any way to optimize it?
Well...
You already have the best index.
What are you going to do with 3000 rows all at once? Is this a one-time task?
When using InnoDB, innodb_buffer_pool_size should be about 70% of available RAM, but not so big that it leads to swapping. What is its setting, and how much RAM do you have and what else is running on the machine?
Could you do more of the task while you are fetching the 3K rows?
Switching to SSDs would help, but I don't like hardware bandaids; they are not reusable.
How big is the table (in GB) -- perhaps 3GB data plus index? (SHOW TABLE STATUS.) If you can't make the buffer_pool big enough for it, and you have a variety of queries that compete for different parts of this (and other) tables, then more RAM may be beneficial.
Seems more like an I/O limitation than something that could be solved by adding indices. What will improve the speed is change the collation of the idcard column to latin1_bin. This uses only 1 byte per character. It also uses binary comparison which is faster than case insensitive comparison.
Only do this if you have no special characters in the idcard column, because the character set of latin1 is quite limited.
ALTER TABLE `test` CHANGE COLUMN `idcard` `idcard` VARCHAR(30) COLLATE 'latin1_bin' AFTER `id`;
Furthermore the ROW_FORMAT=FIXED also improves the speed. ROW_FORMAT=FIXED is not available using the InnoDB engine, but it is with MyISAM. The resulting table I now have is shown below. It's 5 times quicker (80% less time) with select statements than the initial table.
Note that I also changed the collation for 'name' and 'custom_value' to latin1_bin. This does make quite a difference in speed in my test setup, and I'm still figuring out why.
CREATE TABLE `test` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`idcard` VARCHAR(30) COLLATE 'latin1_bin',
`name` VARCHAR(30) COLLATE 'latin1_bin',
`custom_value` VARCHAR(50) COLLATE 'latin1_bin',
PRIMARY KEY (`id`),
INDEX `i1` (`idcard`)
)
ENGINE=MyISAM
ROW_FORMAT=FIXED ;
You may try adding the three other columns in the select clause to the index:
CREATE INDEX idx ON test (idcard, id, name, custom_value);
The three columns other than idcard are being added to allow the index to cover everything being selected. The problem with your current index is that it is only on idcard. This means that once MySQL has traversed down to each leaf node in the index, it would have to do another seek back to the clustered index to lookup the values of all columns mentioned in the select *. As a result of this, MySQL may choose to ignore the index completely. The suggestion I made above avoids this additional seek.
I'd like to ask a question about how to improve performance in a big MySQL table using innodb engine:
There's currently a table in my database with around 200 million rows. This table periodically stores the data collected by different sensors. The structure of the table is as follows:
CREATE TABLE sns_value (
value_id int(11) NOT NULL AUTO_INCREMENT,
sensor_id int(11) NOT NULL,
type_id int(11) NOT NULL,
date timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
value int(11) NOT NULL,
PRIMARY KEY (value_id),
KEY idx_sensor id (sensor_id),
KEY idx_date (date),
KEY idx_type_id (type_id) );
At first, I thought of partitioning the table in months, but due to the steady addition of new sensors it would reach the current size in about a month.
Another solution that I came up with was partitioning the table by sensors. However, due to the limit of 1024 partitions of MySQL that wasn't an option.
I believe that the right solution would be using a table with the same structure for each of the sensors:
sns_value_XXXXX
This way there would be more than 1.000 tables with an estimated size of 30 million rows per year. These tables could, at the same time, be partitioned in months for fastest access to data.
What problems would result from this solution? Is there a more normalized solution?
Editing with additional information
I consider the table to be big in relation to my server:
Cloud 2xCPU and 8GB Memory
LAMP (CentOS 6.5 and MySQL 5.1.73)
Each sensor may have more than one variable types (CO, CO2, etc.).
I mainly have two slow queries:
1) Daily summary for each sensor and type (avg, max, min):
SELECT round(avg(value)) as mean, min(value) as min, max(value) as max, type_id
FROM sns_value
WHERE sensor_id=1 AND date BETWEEN '2014-10-29 00:00:00' AND '2014-10-29 12:00:00'
GROUP BY type_id limit 2000;
This takes more than 5 min.
2) Vertical to Horizontal view and export:
SELECT sns_value.date AS date,
sum((sns_value.value * (1 - abs(sign((sns_value.type_id - 101)))))) AS one,
sum((sns_value.value * (1 - abs(sign((sns_value.type_id - 141)))))) AS two,
sum((sns_value.value * (1 - abs(sign((sns_value.type_id - 151)))))) AS three
FROM sns_value
WHERE sns_value.sensor_id=1 AND sns_value.date BETWEEN '2014-10-28 12:28:29' AND '2014-10-29 12:28:29'
GROUP BY sns_value.sensor_id,sns_value.date LIMIT 4500;
This also takes more than 5 min.
Other considerations
Timestamps may be repeated due to inserts characteristics.
Periodic inserts must coexist with selects.
No updates nor deletes are performed on the table.
Suppositions made to the "one table for each sensor" approach
Tables for each sensor would be much smaller so access would be faster.
Selects will be performed only on one table for each sensor.
Selects mixing data from different sensors are not time-critical.
Update 02/02/2015
We have created a new table for each year of data, which we have also partitioned in a daily basis. Each table has around 250 million rows with 365 partitions. The new index used is as Ollie suggested (sensor_id, date, type_id, value) but the query still takes between 30 seconds and 2 minutes. We do not use the first query (daily summary), just the second (vertical to horizontal view).
In order to be able to partition the table, the primary index had to be removed.
Are we missing something? Is there a way to improve the performance?
Many thanks!
Edited based on changes to the question
One table per sensor is, with respect, a very bad idea indeed. There are several reasons for that:
MySQL servers on ordinary operating systems have a hard time with thousands of tables. Most OSs can't handle that many simultaneous file accesses at once.
You'll have to create tables each time you add (or delete) sensors.
Queries that involve data from multiple sensors will be slow and convoluted.
My previous version of this answer suggested range partitioning by timestamp. But that won't work with your value_id primary key. However, with the queries you've shown and proper indexing of your table, partitioning probably won't be necessary.
(Avoid the column name date if you can: it's a reserved word and you'll have lots of trouble writing queries. Instead I suggest you use ts, meaning timestamp.)
Beware: int(11) values aren't aren't big enough for your value_id column. You're going to run out of ids. Use bigint(20) for that column.
You've mentioned two queries. Both these queries can be made quite efficient with appropriate compound indexes, even if you keep all your values in a single table. Here's the first one.
SELECT round(avg(value)) as mean, min(value) as min, max(value) as max,
type_id
FROM sns_value
WHERE sensor_id=1
AND date BETWEEN '2014-10-29 00:00:00' AND '2014-10-29 12:00:00'
GROUP BY type_id limit 2000;
For this query, you're first looking up sensor_id using a constant, then you're looking up a range of date values, then you're aggregating by type_id. Finally you're extracting the value column. Therefore, a so-called compound covering index on (sensor_id, date, type_id, value) will be able to satisfy your query directly with an index scan. This should be very fast for you--certainly faster than 5 minutes even with a large table.
In your second query, a similar indexing strategy will work.
SELECT sns_value.date AS date,
sum((sns_value.value * (1 - abs(sign((sns_value.type_id - 101)))))) AS one,
sum((sns_value.value * (1 - abs(sign((sns_value.type_id - 141)))))) AS two,
sum((sns_value.value * (1 - abs(sign((sns_value.type_id - 151)))))) AS three
FROM sns_value
WHERE sns_value.sensor_id=1
AND sns_value.date BETWEEN '2014-10-28 12:28:29' AND '2014-10-29 12:28:29'
GROUP BY sns_value.sensor_id,sns_value.date
LIMIT 4500;
Again, you start with a constant value of sensor_id and then use a date range. You then extract both type_id and value. That means the same four column index I mentioned should work for you.
CREATE TABLE sns_value (
value_id bigint(20) NOT NULL AUTO_INCREMENT,
sensor_id int(11) NOT NULL,
type_id int(11) NOT NULL,
ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
value int(11) NOT NULL,
PRIMARY KEY (value_id),
INDEX query_opt (sensor_id, ts, type_id, value)
);
Creating separate table for a range of sensors would be an idea.
Do not use the auto_increment for a primary key, if you dont have to. Usually DB engine is clustering the data by its primary key.
Use composite key instead, depends from your usecase, the sequence of columns may be different.
EDIT: Also added the type into the PK. Considering the queries, i would do it like this. Choosing the field names is intentional, they should be descriptive and always consider the reserverd words.
CREATE TABLE snsXX_readings (
sensor_id int(11) NOT NULL,
reading int(11) NOT NULL,
reading_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
type_id int(11) NOT NULL,
PRIMARY KEY (reading_time, sensor_id, type_id),
KEY idx date_idx (date),
KEY idx type_id (type_id)
);
Also, consider summarizing the readings or grouping them into a single field.
You can try get randomize summary data
I have similar table. table engine myisam(smallest table size), 10m record, no index on my table because useless(tested). Get all range for the all data. result:10sn this query.
SELECT * FROM (
SELECT sensor_id, value, date
FROM sns_value l
WHERE l.sensor_id= 123 AND
(l.date BETWEEN '2013-10-29 12:28:29' AND '2015-10-29 12:28:29')
ORDER BY RAND() LIMIT 2000
) as tmp
ORDER BY tmp.date;
This query on first step get between dates and sorting randomize first 2k data, on the second step sort data. the query every time get 2k result for different data.
I'm trying to run what I believe to be a simple query on a fairly large dataset, and it's taking a very long time to execute -- it stalls in the "Sending data" state for 3-4 hours or more.
The table looks like this:
CREATE TABLE `transaction` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`uuid` varchar(36) NOT NULL,
`userId` varchar(64) NOT NULL,
`protocol` int(11) NOT NULL,
... A few other fields: ints and small varchars
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `uuid` (`uuid`),
KEY `userId` (`userId`),
KEY `protocol` (`protocol`),
KEY `created` (`created`)
) ENGINE=InnoDB AUTO_INCREMENT=61 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 COMMENT='Transaction audit table'
And the query is here:
select protocol, count(distinct userId) as count from transaction
where created > '2012-01-15 23:59:59' and created <= '2012-02-14 23:59:59'
group by protocol;
The table has approximately 222 million rows, and the where clause in the query filters down to about 20 million rows. The distinct option will bring it down to about 700,000 distinct rows, and then after grouping, (and when the query finally finishes), 4 to 5 rows are actually returned.
I realize that it's a lot of data, but it seems that 4-5 hours is an awfully long time for this query.
Thanks.
Edit: For reference, this is running on AWS on a db.m2.4xlarge RDS database instance.
Why don't you profile a query and see what exactly is happening?
SET PROFILING = 1;
SET profiling_history_size = 0;
SET profiling_history_size = 15;
/* Your query should be here */
SHOW PROFILES;
SELECT state, ROUND(SUM(duration),5) AS `duration (summed) in sec` FROM information_schema.profiling WHERE query_id = 3 GROUP BY state ORDER BY `duration (summed) in sec` DESC;
SET PROFILING = 0;
EXPLAIN /* Your query again should appear here */;
I think this will help you in seeing where exactly query takes time and based on result you can perform optimization operations.
This is a really heavy query. To understand why it takes so long you should understand the details.
You have a range condition on the indexed field, that is MySQL finds the smallest created value in the index and for each value it gets the corresponding primary key from the index, retrieves the row from disk, and fetches the required fields (protocol, userId) missing in the current index record, puts them in a "temporary table", making the groupings on those 700000 rows. The index can actually be used and is used here only for speeding up the range condition.
The only way to speed it up, is to have an index that contains all the necessary data, so that MySQL would not need to make on disk lookups for the rows. That is called a covering index. But you should understand that the index will reside in memory and will contain ~ sizeOf(created+protocol+userId+PK)*rowCount bytes, that may become a burden as itself for the queries that update the table and for other indexes. It is easier to create a separate aggregates table and periodically update the table using your query.
Both distinct and group by will need to sort and store temporary data on the server. With that much data that might take a while.
Indexing different combinations of userId, created and protocol will help, but I can't say how much or what index will help the most.
Starting from a certain version of MariaDB (maybe since 10.5), I noticed that after importing a dump with
mysql dbname < dump.sql
the optimizer thinks things different from how they are, making the wrong decisions about indexes.
In general even listing tables innodb with phpmyadmin becomes very very slow.
I noticed that running
ANALYZE TABLE myTable;
fixes.
So after each import I run, that it's equal to run ANALYZE on each table
mysqlcheck -aA
Fro clarity all other tables in the DB work as expected, and load ~2million rows in a fraction of a second. The one table of just ~600 rows is taking 10+minutes to load in navcat.
I can't think of any possible reason for this. There are just 4 columns. One of them is a large text field, but I've worked with large text fields before and they've never been this slow.
running explain select * from parser_queue I get
id setect type table type possible keys key key len ref rows extra
1 SIMPLE parser_queue ALL - - - - 658 -
The profile tells me that 453 seconds are spent 'sending data'
I've also got this in the 'Status' tab. I don't understand most of it, but these numbers are much higher than my other tables.
Bytes_received 31
Bytes_sent 32265951
Com_select 1
Created_tmp_files 16
Handler_read_rnd_next 659
Key_read_requests 9018487
Key_reads 3928
Key_write_requests 310431
Key_writes 4290
Qcache_hits 135077
Qcache_inserts 14289
Qcache_lowmem_prunes 4133
Qcache_queries_in_cache 983
Questions 1
Select_scan 1
Table_locks_immediate 31514
The data stored in the text field is about 12000 chars on average.
There is a primary, auto increment int id field, a tinyint status field, a text field, and a timestamp field with on update current timestamp.
OK I will try out both answers, but I can answer the questions quickly first:
Primary key on the ID field is the only key. This table is used for queuing, with ~50 records added/deleted per hour, but I only created it yesterday. Could it become corrupted in such a short time?
It is MyISAM
More work trying to isolate the problem:
repair table did nothing
optimize table did nothing
created a temp table. queries were about 50% slower on the temp table.
Deleted the table and rebuilt it. SELECT * takes 18 seconds with just 4 rows.
Here is the SQL I used to create the table:
CREATE TABLE IF NOT EXISTS `parser_queue` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint(4) NOT NULL DEFAULT '1',
`data` text NOT NULL,
`last_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
Stranger still, everything seems fine on my local box. The slowness only happens on the dev site.
For clarity: there are more than 100 tables on the dev site and this is the only one that is funky.
OK I have disabled all cron jobs which use this table. SHOW PROCESSLIST does not reveal any locks on the table.
Changing the engine to InnoDB did not produce any significant improvement (86 seconds vs 94 for MyISAM)
any other ideas? . . .
Running SHOW PROCESSLIST during the query reveals it spends most of its time writing to net
If you suspect corruption somewhere, you can try either (or both) of the following:
CREATE TABLE temp SELECT * FROM parser_queue;
This will create a new table "identical" to the previous one, except it will be recreated. Alternatively (or maybe after you've made a copy), you can try:
REPAIR TABLE parser_queue;
You may also want to try optimizing the table; it might have gotten fragmented since you are using it as a queue.
OPTIMIZE TABLE parser_queue;
You can determine if the table is fragmented by running SHOW TABLE STATUS LIKE 'Data_Free' and see if this produces a high number.
Update
You say you are storing gzcompressed data in the TEXT columns. Try changing the TEXT column to BLOB instead, which is meant to handle binary data, such as compressed text.
The name gives away that you are using the table for queueing (lots of inserts and delets, maybe?). Maybe you have had the table a while and it's heavily fragmented. If my assumptions are correct, try OPTIMIZE TABLE parser_queue;
You can read more about this in the manual:
http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html
Right, the problem seems to have been only this: the text fields where too huge.
Running
SELECT id, status, last_updated FROM parser_queue
takes less time than
SELECT data FROM parser_queue WHERE id = 6
Since all the queries I will be running return only one row, the slowdown will not affect me so much. I'm already using gzcompress on the data stored, so I don't think there is much more I could do anyway.