Speed Up A Large Insert From Select Query With Multiple Joins - mysql

I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'

baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop

Related

Add an effective index on a huge table

I have a MySQL database table with more than 34M rows (and growing).
CREATE TABLE `sensordata` (
`userID` varchar(45) DEFAULT NULL,
`instrumentID` varchar(10) DEFAULT NULL,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
`data` varchar(200) DEFAULT NULL,
`dataState` varchar(45) NOT NULL DEFAULT 'Original',
`gps` varchar(45) DEFAULT NULL,
`location` varchar(45) DEFAULT NULL,
`speed` varchar(20) NOT NULL DEFAULT '0',
`unitID` varchar(5) NOT NULL DEFAULT '1',
`parameterID` varchar(5) NOT NULL DEFAULT '1',
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
`status` varchar(7) DEFAULT 'Offline',
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
I access this table from multiple threads (at least 400 threads) every minute to insert data into the table.
As the table was growing, it was getting slower to read and write the data. One SELECT query used to take about 25 seconds, then I added a unique index
UNIQUE INDEX idx_userInsDate ( userID,instrumentID,utcDateTime)
This reduced the read time from 25 seconds to some milliseconds but it has increased the insert time as it has to update the index for each record.
Also If I run a SELECT query from multiple threads as the same time the queries take too long to return the data.
This is an example query
Select dateTime from sensordata WHERE userID = 'someUserID' AND instrumentID = 'someInstrumentID' AND dateTime between 'startDate' AND 'endDate' order by dateTime asc;
Can someone help me, to improve the table schema or add an effective index to improve the performance, please.
Thank you in advance
A PRIMARY KEY is a UNIQUE key. Toss the redundant UNIQUE(id) !
Is id referenced by any other tables? If not, then get rid of it all together. Instead have just
PRIMARY KEY ( userID, instrumentID, utcDateTime)
That is, if that triple is guaranteed to be unique. You mentioned DST -- use the datatype TIMESTAMP instead of DATETIME. Doing that, you can convert to DATETIME if needed, thereby eliminating one of the columns.
That one index (the PK) takes virtually no space since it is "clustered" with the data in InnoDB.
Your table is awfully fat with all those VARCHARs. For example, status can be reduced to a 1-byte ENUM. Others can be normalized. Things like speed can be either a 4-byte FLOAT or some smaller DECIMAL, depending on how much range and precision you need.
With 34M wide rows, you have probably recently exceeded the cacheability of the RAM you have. By making the row narrower, you will postpone that overflow.
Why attack the indexes? Every UNIQUE (including PRIMARY) index is checked before allowing the row to be inserted. By getting it down to 1 index, that minimizes the cost there. (InnoDB really needs a PRIMARY KEY.)
INT is 4 bytes. Do you have a billion instruments? Maybe instrumentID could be SMALLINT UNSIGNED, which is 2 bytes, with a max of 64K? Think about all the other IDs.
You have 400 INSERTs/minute, correct? That is not bad. If you get to 400/second, we need to have a different talk.
("Fill factor" is not tunable in MySQL because it does not make much difference.)
How much RAM do you have? What is the setting for innodb_buffer_pool_size? Optimal is somewhere around 70% of available RAM.
Let's see your main queries; there may be other issues to address.
It's not the indexes at fault here. It's your data types. As the size of the data on disk grows, the speed of all operations decrease. Indexes can certainly help speed up selects - provided your data is properly structured - but it appears that it isnt
CREATE TABLE `sensordata` (
`userID` int, /* shouldn't this have a foreign key constraint? */
`instrumentID` int,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
/* what exactly are you putting here? Are you sure it's not causing any reduncy? */
`data` varchar(200) DEFAULT NULL,
/* your states will be a finite number of elements. They can be represented by constants in your code or a set of values in a related table */
`dataState` int,
/* what's this? Sounds like what you are saving in location */
`gps` varchar(45) DEFAULT NULL,
`location` point,
`speed` float,
`unitID` int DEFAULT '1',
/* as above */
`parameterID` int NOT NULL DEFAULT '1',
/* are you sure this is different from data? */
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
/* as above and isn't this the same as */
`status` int,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
1st of all: Avoid varchars for indexes and especially IDs. Each character position in the varchar generates an own index-entry internally!
2nd: Your select uses dateTime, your index is set to utcDateTime. It will only take userID and instrumentID and ignore the utcDateTime-Part.
Advise: Change your data types for the ids and change your index to match the query (dateTime, not utcDateTime)
Using an index decreases your performance on inserts, unluckily, there is nothing such as a fill factor for indexes in mysql right now. So the best thing you can do is try the indexes to be as small as possible.
Another approach on heavily loaded databases with random access would be: write to an unindexed table, read from an indexed one. At a given time, build the indexes and swap the tables (may require a third table for the index creation while leaving the other ones untouched in between).

Simple select query takes more time in very large table in MySQL database in C# application

I am using a MySQL database in my ASP.NET with C# web application. The MySQL Server version is 5.7 and there is 8 GB RAM in the PC. When I am executing the select query in MySQL database table, it takes more time in execution; a simple select query takes around 42 seconds. Across 1 crorerecord (10 million records) in the table. I have also done indexing for the table. How can I fix this?
The following is my table structure.
CREATE TABLE `smstable_read` (
`MessageID` int(11) NOT NULL AUTO_INCREMENT,
`ApplicationID` int(11) DEFAULT NULL,
`Api_userid` int(11) DEFAULT NULL,
`ReturnMessageID` varchar(255) DEFAULT NULL,
`Sequence_Id` int(11) DEFAULT NULL,
`messagetext` longtext,
`adtextid` int(11) DEFAULT NULL,
`mobileno` varchar(255) DEFAULT NULL,
`deliverystatus` int(11) DEFAULT NULL,
`SMSlength` int(11) DEFAULT NULL,
`DOC` varchar(255) DEFAULT NULL,
`DOM` varchar(255) DEFAULT NULL,
`BatchID` int(11) DEFAULT NULL,
`StudentID` int(11) DEFAULT NULL,
`SMSSentTime` varchar(255) DEFAULT NULL,
`SMSDeliveredTime` varchar(255) DEFAULT NULL,
`SMSDeliveredTimeTicks` decimal(28,0) DEFAULT '0',
`SMSSentTimeTicks` decimal(28,0) DEFAULT '0',
`Sent_SMS_Day` int(11) DEFAULT NULL,
`Sent_SMS_Month` int(11) DEFAULT NULL,
`Sent_SMS_Year` int(11) DEFAULT NULL,
`smssent` int(11) DEFAULT '1',
`Batch_Name` varchar(255) DEFAULT NULL,
`User_ID` varchar(255) DEFAULT NULL,
`Year_ID` int(11) DEFAULT NULL,
`Date_Time` varchar(255) DEFAULT NULL,
`IsGroup` double DEFAULT NULL,
`Date_Time_Ticks` decimal(28,0) DEFAULT NULL,
`IsNotificationSent` int(11) DEFAULT NULL,
`Module_Id` double DEFAULT NULL,
`Doc_Batch` decimal(28,0) DEFAULT NULL,
`SMS_Category_ID` int(11) DEFAULT NULL,
`SID` int(11) DEFAULT NULL,
PRIMARY KEY (`MessageID`),
KEY `index2` (`ReturnMessageID`),
KEY `index3` (`mobileno`),
KEY `BatchID` (`BatchID`),
KEY `smssent` (`smssent`),
KEY `deliverystatus` (`deliverystatus`),
KEY `day` (`Sent_SMS_Day`),
KEY `month` (`Sent_SMS_Month`),
KEY `year` (`Sent_SMS_Year`),
KEY `index4` (`ApplicationID`,`SMSSentTimeTicks`),
KEY `smslength` (`SMSlength`),
KEY `studid` (`StudentID`),
KEY `batchid_studid` (`BatchID`,`StudentID`),
KEY `User_ID` (`User_ID`),
KEY `Year_Id` (`Year_ID`),
KEY `IsNotificationSent` (`IsNotificationSent`),
KEY `isgroup` (`IsGroup`),
KEY `SID` (`SID`),
KEY `SMS_Category_ID` (`SMS_Category_ID`),
KEY `SMSSentTimeTicks` (`SMSSentTimeTicks`)
) ENGINE=MyISAM AUTO_INCREMENT=16513292 DEFAULT CHARSET=utf8;
The following is my select query:
SELECT messagetext, SMSSentTime, StudentID, batchid,
User_ID,MessageID,Sent_SMS_Day, Sent_SMS_Month,
Sent_SMS_Year,Module_Id,Year_ID,Doc_Batch
FROM smstable_read
WHERE StudentID=977 AND SID = 8582 AND MessageID>16013282
You need to learn about compound indexes and covering indexes. Read about those things.
Your query is slow because it's doing a half-scan of the table. It uses the primary key to find the first row with a qualifying MessageID, then looks at every row of the table to find matching rows.
Your filter criteria are StudentID = constant, SID = constant AND MessageID > constant. That means you need those three columns, in that order, in an index. The first two filter criteria will random-access your index to the correct place. The third criterion will scan the index starting right after the constant value in your query. It's called an Index Range Scan operation, and it's quite efficient.
ALTER TABLE smstable_read
ADD INDEX StudentSidMessage (StudentId, SID, MessageId);
This compound index should make your query efficient. Notice that in MyISAM, the primary key column of a table should appear in compound indexes. That's cool in this case because it's also part of your query criteria.
If this query is used very frequently, you could make a covering index: you could add the other columns of the query (the ones mentioned in your SELECT clause) to the index.
But, unfortunately you have defined your messageText column with a longtext data type. That allows for each message to contain up to four gigabytes. (Why? Is this really SMS data? There's a limit of 160 bytes per message in SMS. Four gigabytes >> 160 bytes.)
Now the point of a covering index is to allow the query to be satisfied entirely from the index, without referring back to the table. But when you include a longtext or any other LOB column in an index, it only contains a subset of the data. So the point of the covering index is lost.
If I were you I would change my table so messageText was a VARCHAR(255) data type, and then create this covering index:
ALTER TABLE smstable_read
ADD INDEX StudentSidMessage (StudentId, SID, MessageId,
SMSSentTime, batchid,
User_ID, Sent_SMS_Day, Sent_SMS_Month,
Sent_SMS_Year,Module_Id,Year_ID,Doc_Batch,
messageText);
(Notice that you should put variable-length items last in the index if you can.)
If you can't change your application to handle VARCHAR(255) then go with the first index I mentioned.
Pro tip: putting lots of single-column indexes on MySQL tables rarely helps SELECT performance and always harms INSERT and UPDATE performance. You need an index on your primary key, and you need indexes to support the queries you run. Extra indexes are harmful.
It looks like your database is not properly indexed and even not properly normalized. Normalizing your database will go a long way to speed up all your queries. Particularly in view of the fact that mysql used only one index per table in a query. Even though you have lot's of indexes, they cannot be used.
Your current query filters on StudentID,SID, and MessageID. The last is an inequality comparision so an index will not be very effective with that but the other two columns are equality comparisons. I suggest an index like this:
KEY `studid` (`StudentID`,`SID`)
Follow that up by dropping your existing index on SID. If you find that you don't want to drop it because it's used in another query, further evidence that your table is in desperate need of normalization.
Too many indexes slow down inserts and adds a little overhead to each SELECT because the query planner needs more effort to figure out which index to use.

Very Slow simple MySql query with index

i have this table :
CREATE TABLE `messenger_contacts` (
`number` varchar(15) NOT NULL,
`has_telegram` tinyint(1) NOT NULL DEFAULT '0',
`geo_state` int(11) NOT NULL DEFAULT '0',
`geo_city` int(11) NOT NULL DEFAULT '0',
`geo_postal` int(11) NOT NULL DEFAULT '0',
`operator` tinyint(1) NOT NULL DEFAULT '0',
`type` tinyint(1) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `messenger_contacts`
ADD PRIMARY KEY (`number`),
ADD KEY `geo_city` (`geo_city`),
ADD KEY `geo_postal` (`geo_postal`),
ADD KEY `type` (`type`),
ADD KEY `type1` (`operator`),
ADD KEY `has_telegram` (`has_telegram`),
ADD KEY `geo_state` (`geo_state`);
with about 11 million records.
A simple count select on this table takes about 30 to 60 seconds to complete witch seems very high.
select count(number) from messenger_contacts where geo_state=1
I am not a Database pro so beside setting indexes i don't know what else i can do to make the query faster?
UPDATE:
OK , i made some changes to column type and size:
CREATE TABLE IF NOT EXISTS `messenger_contacts` (
`number` bigint(13) unsigned NOT NULL,
`has_telegram` tinyint(1) NOT NULL DEFAULT '0' ,
`geo_state` int(2) NOT NULL DEFAULT '0',
`geo_city` int(4) NOT NULL DEFAULT '0',
`geo_postal` int(10) NOT NULL DEFAULT '0',
`operator` tinyint(1) NOT NULL DEFAULT '0' ,
`type` tinyint(1) NOT NULL DEFAULT '0' ,
PRIMARY KEY (`number`),
KEY `has_telegram` (`has_telegram`,`geo_state`),
KEY `geo_city` (`geo_city`),
KEY `geo_postal` (`geo_postal`),
KEY `type` (`type`),
KEY `type1` (`operator`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Now the query only takes 4 to 5 seconds with * and number
Tanks every one for your help, even the guy that gave me -1. this would be good enough for now considering that my server is a low end hardware and i will be caching the select count results.
Maybe
select count(geo_state) from messenger_contacts where geo_state=1
as it will give the same result but will not use number column from the clustered index?
If this does not help, I would try to change number column into INT type, which should reduce the index size, or try to increase amount of memory MySQL could use for caching indexes.
You did not change the datatypes. INT(11) == INT(2) == INT(100) -- each is a 4-byte signed integer. You probably want 1-byte unsigned TINYINT UNSIGNED or 2-byte SMALLINT UNSIGNED.
It is a waste to index "flags", which I assume type and has_telegram are. The optimizer will never use them because it will less efficient than simply doing a table scan.
The standard coding pattern is:
select count(*)
from messenger_contacts
where geo_state=1
unless you need to not count NULLs, which is what COUNT(geo_state) implies.
Once you have the index on geo_state (or an index starting with geo_state), the query will scan the index (which is a separate BTree structure) starting with the first occurrence of geo_state=1 until the last, counting as it goes. That is, it will touch 1.1 millions index entries. So, a few seconds is to be expected. Counting a 'rare' geo_state will run much faster.
The reason for 30-60 seconds versus 4-5 seconds is very likely to be caching. The former had to read stuff from disk; the latter did not. Run the query twice.
Using the geo_state index will be faster for that query than using the PRIMARY KEY unless there are caching differences.
INDEX(number,geo_state) is virtually useless for any of the SELECTs mentioned -- geo_state should be first. This is an example of a "covering" index for the select count(number)... case.
More on building indexes.

How to optimized mysql query having large dataset

I have two tables with the following schema,
CREATE TABLE `open_log` (
`delivery_id` varchar(30) DEFAULT NULL,
`email_id` varchar(50) DEFAULT NULL,
`email_activity` varchar(30) DEFAULT NULL,
`click_url` text,
`email_code` varchar(30) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sent_log` (
`email_id` varchar(50) DEFAULT NULL,
`delivery_id` varchar(50) DEFAULT NULL,
`email_code` varchar(50) DEFAULT NULL,
`delivery_status` varchar(50) DEFAULT NULL,
`tries` int(11) DEFAULT NULL,
`creation_ts` varchar(50) DEFAULT NULL,
`creation_dt` varchar(50) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The email_id and delivery_id columns in both tables make up a unique key.
The open_log table have 2.5 million records where as sent_log table has 0.25 million records.
I want to filter out the records from open log table based on the unique key (email_id and delivery_id).
I'm writing the following query.
SELECT * FROM open_log
WHERE CONCAT(email_id,'^',delivery_id)
IN (
SELECT DISTINCT CONCAT(email_id,'^',delivery_id) FROM sent_log
)
The problem is the query is taking too much time to execute. I've waited for an hour for the query completion but didn't succeed.
Kindly, suggest what I can do to make it fast since, I have the big data size in the tables.
Thanks,
Faisal Nasir
First, rewrite your query using exists:
SELECT *
FROM open_log ol
WHERE EXISTS (SELECT 1
FROM send_log sl
WHERE sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id
);
Then, add an index so this query will run faster:
create index idx_sendlog_emailid_deliveryid on send_log(email_id, delivery_id);
Your query is slow for a variety of reasons:
The use of string concatenation makes it impossible for MySQL to use an index.
The select distinct in the subquery is unnecessary.
Exists can be faster than in.
If this request is often on, you can greatly increase it by create bigint id column, enven if it not unique.
For example you can put trigger and create column like this
alter table sent_log for_get bigint;
After that create trigger/ update it to put hash into that bigint
for_get=CONV(substr(md5(concat(email_id, delivery_id)),1,10),16,10)
If you have such column in both table and index on it, query will be like
SELECT *
FROM open_log ol
left join send_log sl on sl.for_get=ol.for_get
WHERE sl.email_id is not null and sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id;
That query will be fast.

MySQL prcedures take too much time and the tables are very large

I have a large live database where around 1000 users are updating 2 or more updates every minute. at the same time there are 4 users are getting reports and adding new items. the main 2 tables contains around 2 Million and 4 Million rows till present.
Queries using these tables are taking too much time, even simple queries like:
"SELECT COUNT(*) FROM MyItemsTable" and "SELECT COUNT(*) FROM MyTransactionsTable"
are taking 10 seconds and 26 seconds
large reports now are taking 15mins !!! toooooo much time.
All the table that I'm using are innodb
is there any way to solve this problem before I read about reputation ??
Thank you in advance for any help
Edit
Here is the structure and indexes of MyItemsTable:
CREATE TABLE `pos_MyItemsTable` (
`itemid` bigint(15) NOT NULL,
`uploadid` bigint(15) NOT NULL,
`itemtypeid` bigint(15) NOT NULL,
`statusid` int(1) NOT NULL,
`uniqueid` varchar(10) DEFAULT NULL,
`referencenb` varchar(30) DEFAULT NULL,
`serialnb` varchar(25) DEFAULT NULL,
`code` varchar(50) DEFAULT NULL,
`user` varchar(16) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`pass` varchar(100) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`expirydate` date DEFAULT NULL,
`userid` bigint(15) DEFAULT NULL,
`insertdate` datetime DEFAULT NULL,
`updateuser` bigint(15) DEFAULT NULL,
`updatedate` datetime DEFAULT NULL,
`counternb` int(1) DEFAULT '0',
PRIMARY KEY (`itemid`),
UNIQUE KEY `referencenb_unique` (`referencenb`),
KEY `MyItemsTable_r04` (`itemtypeid`),
KEY `MyItemsTable_r05` (`uploadid`),
KEY `FK_MyItemsTable` (`statusid`),
KEY `ind_MyItemsTable_serialnb` (`serialnb`),
KEY `uniqueid_key` (`uniqueid`),
KEY `ind_MyItemsTable_insertdate` (`insertdate`),
KEY `ind_MyItemsTable_counternb` (`counternb`),
CONSTRAINT `FK_MyItemsTable` FOREIGN KEY (`statusid`) REFERENCES `MyItemsTable_statuses` (`statusid`),
CONSTRAINT `MyItemsTable_r04` FOREIGN KEY (`itemtypeid`) REFERENCES `itemstypes` (`itemtypeid`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `MyItemsTable_r05` FOREIGN KEY (`uploadid`) REFERENCES `uploads` (`uploadid`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Just having few indexes does not mean your tables and queries are optimized.
Try to identify the querties that run the slowest and add specific indexes there.
Selecting * from a huge table .. where you have columns that contain text / images / files
will be aways slow. Try to limit the selection of such fat columns when you don't need them.
future readings:
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/
and some more advanced configurations:
http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/
http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/
source
UPDATE:
try to use composite keys for some of the heaviest queries,
by placing the main fields that are compared in ONE index:
`MyItemsTable_r88` (`itemtypeid`,`statusid`, `serialnb`), ...
this will give you faster results for queries that complare only columns from the index :
SELECT * FROM my_table WHERE `itemtypeid` = 5 AND `statusid` = 0 AND `serialnb` > 500
and extreamlly fast if you search and select values from the index:
SELECT `serialnb` FROM my_table WHERE `statusid` = 0 `itemtypeid` IN(1,2,3);
This are really basic examples you will have to read a bit more and analyze the data for the best results.