mysql bulk table script execution - mysql

I have 120 tables in my project.
Now I have to migrate MSSQL to MySQL.
So I did all Queries to create those tables that are already worked.
Now my problem is when I execute this script in MSSQL it completes within a second.
But MySQL takes around 4 min to complete its execution.
I want to improve my performance in MySQL. But I don't know how to do that if anyone knows please help me.
Thank you
Here is my sample table Script
MySQL
CREATE TABLE `rb_tbl_bak` (
`BakPathId` int NOT NULL AUTO_INCREMENT,
`BakPath` varchar(500) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
`BakDate` datetime(3) DEFAULT NULL,
PRIMARY KEY (`BakPathId`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
MSSQL
--Create table and its columns
CREATE TABLE [dbo].[RB_Tbl_Bak] (
[BakPathId] [int] NOT NULL IDENTITY (1, 1),
[BakPath] [nvarchar](500) NULL,
[BakDate] [datetime] NULL);
GO
like this way, I have to complete for 120+ tables

Oh well, In this case, MySQL databases take time.
You can turn on profiling to get an idea of what takes so long. An example is given using Mysql's CLI:-
SET profiling = 1;
CREATE TABLE rb_tbl_back (id BIGINT UNSIGNED NOT NULL PRIMARY KEY);
SET profiling = 1;
You should get a response like this:-
mysql> SHOW PROFILES;
| Query_ID | Duration | Query |
+----------+------------+-------------------------------------------------------------+
| 1 | 0.00913800 | CREATE TABLE rb_tbl_back (id BIGINT UNSIGNED NOT NULL PRIMARY KEY) |
+----------+------------+-------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SHOW PROFILE FOR QUERY 1;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000071 |
| checking permissions | 0.000007 |
| Opening tables | 0.001698 |
| System lock | 0.000043 |
| creating table | 0.007260 |
| After create | 0.000004 |
| query end | 0.000004 |
| closing tables | 0.000015 |
| freeing items | 0.000031 |
| logging slow query | 0.000002 |
| cleaning up | 0.000003 |
+----------------------+----------+
11 rows in set (0.00 sec)
If you read the profiling documentation, there are other flags for showing the profile of the query CPU, BLOCK IO, etc that might help you on the 'creating table' stage.
I got this answer from here

Related

Recommended MySQL INDEX for storing domain names

I'm trying to store about 100 Million domain names in a MySQL database, but I can't figure out the right INDEX method to use on the domain names.
The issue being that LIKE queries will also be executed:
SELECT id FROM domains WHERE domain LIKE '%.example.com'
or
SELECT id FROM domains WHERE domain LIKE 'example.%'
If it makes it easier, '%example%' is not a requirement, but at best a nice to have / be able to.
What would be the proper index to use? Left to right (example.%) should be realitivly straight forward, but right to left (%.example.com) is problematic but the most common query.
I'm using MariaDB 10.3 on Linux. DB running on a PCI-e SSD, lookup times longer then 10 seconds should be coincided "unacceptable"
You can spend one virtual permanent column (rdomain) in your table where the virtual function stores the domainname in reverse order like REVERSE(domain). so it is possible to search from start of string i.e. search for '%.mydomain.com' -> WHERE rdomain like REVERSE('%.mydomain.com
the table
CREATE TABLE `myreverse` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`domain` varchar(64) CHARACTER SET latin1 DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_domain` (`domain`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
add the column
ALTER TABLE myreverse
ADD COLUMN rdomain VARCHAR(64) AS (REVERSE(domain)),
ADD KEY idx_rdomain (rdomain);
insert some data
INSERT INTO `myreverse` (`id`, `domain`)
VALUES
(2, 'img.google.com'),
(3, 'w3.google.com'),
(1, 'www.coogle.com'),
(4, 'www.google.de'),
(5, 'www.mydomain.com');
see the data
mysql> SELECT * from myreverse;
+----+------------------+------------------+
| id | domain | rdomain |
+----+------------------+------------------+
| 1 | www.google.com | moc.elgoog.www |
| 2 | img.google.com | moc.elgoog.gmi |
| 3 | w3.coogle.com | moc.elgooc.3w |
| 4 | www.google.de | ed.elgoog.www |
| 5 | www.mydomain.com | moc.niamodym.www |
+----+------------------+------------------+
5 rows in set (0.01 sec)
mysql>
now you can query with reverse order and MySQL can use the index.
query
mysql> select * from myreverse WHERE rdomain like REVERSE('%.google.com');
+----+----------------+----------------+
| id | domain | rdomain |
+----+----------------+----------------+
| 3 | w3.google.com | moc.elgoog.3w |
| 2 | img.google.com | moc.elgoog.gmi |
+----+----------------+----------------+
2 rows in set (0.00 sec)
mysql>
Here you can see that the optimizer use the index.
mysql> EXPLAIN select * from myreverse WHERE rdomain like REVERSE('%.google.com');
+----+-------------+-----------+------------+-------+---------------+-------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+-------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | myreverse | NULL | range | idx_rdomain | idx_rdomain | 195 | NULL | 2 | 100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+-------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
mysql>
I'm not sure an index would help you here. If you can't change the database, your options seem limited. One thing you could do, is if you're running both a subdomain and domain query back to back, to run the subdomain query first. That should help reduce the number of rows the domain query has to cover.
It would definitely help if you split the URL between subdomains and domains into different columsn in the database. Have indexes for both of them. Then you could query the subdomains only and the domains only. It should speed things up. And if there are a lot of repeating values, you should normalize those fields so to remove repetition and speed up queries even more.

Lock wait timeout exceeded with one query running

I run the following query and it is the only query running on my large (2 vCPU, 7.5 GB RAM, 100GB SSD) RDS hosted database.
DELETE
FROM books
WHERE book_type = '/type/edition'
AND json LIKE '%"languages":%'
AND json NOT LIKE '%/eng%';
But I get the following error.
Error Code: 1205. Lock wait timeout exceeded; try restarting transaction
I increased the timeout to 1200 seconds using SET innodb_lock_wait_timeout = 1200;.
However, I get that same error. There are no other queries running on the database, it's newly created and not in production. Here is the result of show processlist:
+---+----------+----------------------------------------------------------+-------------+-------+-----+----------+------------------------------------------------------------------------------------------------------+
| 1 | rdsadmin | localhost:37959 | | Sleep | 10 | | |
+---+----------+----------------------------------------------------------+-------------+-------+-----+----------+------------------------------------------------------------------------------------------------------+
| 5 | website | host109-156-119-150.range109-156.btcentralplus.com:57923 | openlibrary | Sleep | 606 | | |
| 6 | website | host109-156-119-150.range109-156.btcentralplus.com:57924 | openlibrary | Query | 599 | updating | DELETE FROM books WHERE book_type = '/type/edition' AND json LIKE '%"languages":%' AND json NOT LIKE |
| 8 | website | host109-156-119-150.range109-156.btcentralplus.com:58021 | openlibrary | Sleep | 145 | | |
| 9 | website | host109-156-119-150.range109-156.btcentralplus.com:58022 | openlibrary | Query | 0 | init | show processlist |
+---+----------+----------------------------------------------------------+-------------+-------+-----+----------+------------------------------------------------------------------------------------------------------+
Here is the schema for this table.
CREATE TABLE `books` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`book_type` varchar(50) DEFAULT NULL,
`book_key` varchar(50) DEFAULT NULL,
`revision` tinyint(4) DEFAULT NULL,
`last_modified` varchar(50) DEFAULT NULL,
`json` text,
`date` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `book_type` (`book_type`),
KEY `book_key` (`book_key`),
KEY `revision` (`revision`)
) ENGINE=InnoDB AUTO_INCREMENT=97545025 DEFAULT CHARSET=utf8;
Please note, this table has about 100 million rows and contains 51GB of data.
Why am I getting a lock wait timeout? I thought this error could occur only when you are running multiple queries.
Well as you have tried most ways, maybe you could try :
create index book_type and json ? Maybe this can help some.
Otherwise try to split the delete operation,
e.g.
DELETE
FROM books
WHERE
ID between 1 and 1000000
AND book_type = '/type/edition'
AND json LIKE '%"languages":%'
AND json NOT LIKE '%/eng%';
and then run again where ID between 1000001 and 2000000
etc.
I have do same thing. see it very simple way
DELETE FROM `database`.`table` WHERE ((action="limit") AND (info='login') AND (creation < DATE_SUB(NOW(), INTERVAL 10 MINUTE)))
also face error then follow:
You can set it to higher value in /etc/my.cnf permanently with this line
[mysqld]
innodb_lock_wait_timeout=120
and restart mysql. If you cannot restart mysql at this time, run this:
SET GLOBAL innodb_lock_wait_timeout = 120;
You could also just set it for the duration of your session
SET innodb_lock_wait_timeout = 120;

dbf2mysql not inserting records

I am using dbf2mysql library http://manpages.ubuntu.com/manpages/natty/man1/dbf2mysql.1.html to port some data to mysql, but when i try to view the inserted records nothing is inserted.
Here is the command I am running:
$ dbf2mysql -vvv -q -h localhost -P password -U root smb/C_clist.DBF -d opera_dbf -t pricelists -c
Opening dbf-file smb/C_clist.DBF
dbf-file: smb/C_clist.DBF - Visual FoxPro w. DBC, MySQL-dbase: opera_dbf, MySQL-table: pricelists
Number of records: 12
Name Length Display Type
-------------------------------------
CL_CODE 8 0 C
CL_DESC 30 0 C
CL_CURR 3 0 C
CL_FCDEC 1 0 N
Making connection to MySQL-server
Dropping original table (if one exists)
Building CREATE-clause
Sending create-clause
CREATE TABLE pricelists (CL_CODE varchar(8) not null,
CL_DESC varchar(30) not null,
CL_CURR varchar(3) not null,
CL_FCDEC int not null)
fields in dbh 4, allocated mem for query 279, query size 139
Inserting records
Inserting record 0
LOAD DATA LOCAL INFILE '/tmp/d2mygo04TM' REPLACE INTO table pricelists fields terminated by ',' enclosed by ''''
Closing up....
then in mysql, the tables are created with the correct fields types, but no data:
mysql> use opera_dbf;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> describe pricelists;
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| CL_CODE | varchar(8) | NO | | NULL | |
| CL_DESC | varchar(30) | NO | | NULL | |
| CL_CURR | varchar(3) | NO | | NULL | |
| CL_FCDEC | int(11) | NO | | NULL | |
+----------+-------------+------+-----+---------+-------+
4 rows in set (0.13 sec)
mysql> select * from pricelists;
Empty set (0.00 sec)
mysql>
What am I missing?
I removed the -q option and it works
-q dbf2mysql: "Quick" mode. Inserts data via temporary file using
'LOAD DATA INFILE' MySQL statement. This increased insertion
speed on my PC 2-2.5 times. Also note that during whole 'LOAD
DATA' affected table is locked.

Need help understanding how mysql indexes work

I have a table that looks like this:
CREATE TABLE `metric` (
`metricid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`host` varchar(50) NOT NULL,
`userid` int(10) unsigned DEFAULT NULL,
`lastmetricvalue` double DEFAULT NULL,
`receivedat` int(10) unsigned DEFAULT NULL,
`name` varchar(255) NOT NULL,
`sampleid` tinyint(3) unsigned NOT NULL,
`type` tinyint(3) unsigned NOT NULL DEFAULT '0',
`lastrawvalue` double NOT NULL,
`priority` tinyint(3) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`metricid`),
UNIQUE KEY `unique-metric` (`userid`,`host`,`name`,`sampleid`)
) ENGINE=InnoDB AUTO_INCREMENT=1000000221496 DEFAULT CHARSET=utf8
It has 177,892 rows at the moment, and when I run the following query:
select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE m.userid = 8
AND (host, name, sampleid) IN (('localhost','0.4350799184758216cpu-3/cpu-nice',0),
('localhost','0.4350799184758216cpu-3/cpu-system',0),
('localhost','0.4350799184758216cpu-3/cpu-idle',0),
('localhost','0.4350799184758216cpu-3/cpu-wait',0),
('localhost','0.4350799184758216cpu-3/cpu-interrupt',0),
('localhost','0.4350799184758216cpu-3/cpu-softirq',0),
('localhost','0.4350799184758216cpu-3/cpu-steal',0),
('localhost','0.4350799184758216cpu-4/cpu-user',0),
('localhost','0.4350799184758216cpu-4/cpu-nice',0),
('localhost','0.4350799184758216cpu-4/cpu-system',0),
('localhost','0.4350799184758216cpu-4/cpu-idle',0),
('localhost','0.4350799184758216cpu-4/cpu-wait',0),
('localhost','0.4350799184758216cpu-4/cpu-interrupt',0),
('localhost','0.4350799184758216cpu-4/cpu-softirq',0),
('localhost','0.4350799184758216cpu-4/cpu-steal',0),
('localhost','_util/billing-bytes',0),('localhost','_util/billing-metrics',0));
it takes 0.87 seconds to return results, explain is:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: m
type: ref
possible_keys: unique-metric
key: unique-metric
key_len: 5
ref: const
rows: 85560
Extra: Using where
1 row in set (0.00 sec)
profile looks like this:
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000160 |
| checking permissions | 0.000010 |
| Opening tables | 0.000021 |
| exit open_tables() | 0.000008 |
| System lock | 0.000008 |
| mysql_lock_tables(): unlocking | 0.000005 |
| exit mysqld_lock_tables() | 0.000007 |
| init | 0.000068 |
| optimizing | 0.000018 |
| statistics | 0.000091 |
| preparing | 0.000042 |
| executing | 0.000005 |
| Sending data | 0.870180 |
| innobase_commit_low():trx_comm | 0.000012 |
| Sending data | 0.000111 |
| end | 0.000009 |
| query end | 0.000009 |
| ha_commit_one_phase(-1) | 0.000015 |
| innobase_commit_low():trx_comm | 0.000004 |
| ha_commit_one_phase(-1) | 0.000005 |
| query end | 0.000005 |
| closing tables | 0.000012 |
| freeing items | 0.000562 |
| logging slow query | 0.000005 |
| cleaning up | 0.000005 |
| sleeping | 0.000006 |
+--------------------------------+----------+
Which seems way too high for me. I've tried to replace the userid = 8 and (host, name, sampleid) IN part of the first query to (userid, host, name, sampleid) IN and this query runs about 0.5s - almost 2 times quicker, for reference, here's the query:
select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE (userid, host, name, sampleid) IN ((8,'localhost','0.4350799184758216cpu-3/cpu-nice',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-system',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-idle',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-wait',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-interrupt',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-softirq',0),
(8,'localhost','0.4350799184758216cpu-3/cpu-steal',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-user',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-nice',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-system',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-idle',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-wait',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-interrupt',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-softirq',0),
(8,'localhost','0.4350799184758216cpu-4/cpu-steal',0),
(8,'localhost','_util/billing-bytes',0),
(8,'localhost','_util/billing-metrics',0));
its explain looks like this:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: m
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 171121
Extra: Using where
1 row in set (0.00 sec)
Next I've updated the table to contain a single joined column:
alter table `metric` add `forindex` varchar(120) not null default '';
update metric set forindex = concat(userid,`host`,`name`,sampleid);
alter table metric add index `forindex` (`forindex`);
Updated the query to have only 1 string searched:
select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE (forindex) IN (('8localhost0.4350799184758216cpu-3/cpu-nice0'),
('8localhost0.4350799184758216cpu-3/cpu-system0'),
('8localhost0.4350799184758216cpu-3/cpu-idle0'),
('8localhost0.4350799184758216cpu-3/cpu-wait0'),
('8localhost0.4350799184758216cpu-3/cpu-interrupt0'),
('8localhost0.4350799184758216cpu-3/cpu-softirq0'),
('8localhost0.4350799184758216cpu-3/cpu-steal0'),
('8localhost0.4350799184758216cpu-4/cpu-user0'),
('8localhost0.4350799184758216cpu-4/cpu-nice0'),
('8localhost0.4350799184758216cpu-4/cpu-system0'),
('8localhost0.4350799184758216cpu-4/cpu-idle0'),
('8localhost0.4350799184758216cpu-4/cpu-wait0'),
('8localhost0.4350799184758216cpu-4/cpu-interrupt0'),
('8localhost0.4350799184758216cpu-4/cpu-softirq0'),
('8localhost0.4350799184758216cpu-4/cpu-steal0'),
('8localhost_util/billing-bytes0'),
('8localhost_util/billing-metrics0'));
And now I get the same results in 0.00 sec! Explain is:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: m
type: range
possible_keys: forindex
key: forindex
key_len: 362
ref: NULL
rows: 17
Extra: Using where
1 row in set (0.00 sec)
So to summarize, here are the results:
m.userid = X AND (host, name, sampleid) IN - index used, 85560 rows scanned, runs in 0.9s
(userid, host, name, sampleid) IN - index not used, 171121 rows scanned, runs in 0.5s
additional column with compound index replaced with an index over a concatenated utility column - index used, 17 rows scanned, runs in 0s
Why does second query run faster than the first? And why is the third query so much faster than the rest? Should I keep such a column for the sole purpose of faster searching?
Mysql version is:
mysqld Ver 5.5.34-55 for Linux on x86_64 (Percona XtraDB Cluster (GPL), wsrep_25.9.r3928)
Indexes help your search terms in the WHERE clause by narrowing down the search as much as possible. You can see this happening...
The rows field of EXPLAIN gives an estimate of how many rows the query will have to examine to find the rows that match your query. By comparing the rows reported in each EXPLAIN, you can see how much better your better-optimized query is:
rows: 85560 -- first query
rows: 171121 -- second query examines 2x more rows, but it was probably
-- faster because the data was buffered after the first query
rows: 17 -- third query examines 5,000x fewer rows than first query
You would also notice in the SHOW PROFILE details if you ran that for the third query that "Sending data" is a lot faster for the quicker query. This process state indicates how long it took to copy rows from the storage engine up to the SQL layer of MySQL. Even when doing memory-to-memory copying, this takes a while for so many thousands of rows. This is why indexes are so beneficial.
For more useful explanation, see my presentation How to Design Indexes, Really.

SELECTing non-indexed column increases 'sending data' 25x - why and how to improve?

Given this table on local MySQL instance 5.1 with query caching off:
show create table product_views\G
*************************** 1. row ***************************
Table: product_views
Create Table: CREATE TABLE `product_views` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`dateCreated` datetime NOT NULL,
`dateModified` datetime DEFAULT NULL,
`hibernateVersion` bigint(20) DEFAULT NULL,
`brandName` varchar(255) DEFAULT NULL,
`mfrModel` varchar(255) DEFAULT NULL,
`origin` varchar(255) NOT NULL,
`price` float DEFAULT NULL,
`productType` varchar(255) DEFAULT NULL,
`rebateDetailsViewed` tinyint(1) NOT NULL,
`rebateSearchZipCode` int(11) DEFAULT NULL,
`rebatesFoundAmount` float DEFAULT NULL,
`rebatesFoundCount` int(11) DEFAULT NULL,
`siteSKU` varchar(255) DEFAULT NULL,
`timestamp` datetime NOT NULL,
`uiContext` varchar(255) DEFAULT NULL,
`siteVisitId` bigint(20) NOT NULL,
`efficiencyLevel` varchar(255) DEFAULT NULL,
`siteName` varchar(255) DEFAULT NULL,
`clicks` varchar(1024) DEFAULT NULL,
`rebateFormDownloaded` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `siteVisitId` (`siteVisitId`,`siteSKU`),
KEY `FK52C29B1E3CAB9CC4` (`siteVisitId`),
KEY `rebateSearchZipCode_idx` (`rebateSearchZipCode`),
KEY `FIND_UNPROCESSED_IDX` (`siteSKU`,`siteVisitId`,`timestamp`),
CONSTRAINT `FK52C29B1E3CAB9CC4` FOREIGN KEY (`siteVisitId`) REFERENCES `site_visits` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=32909504 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
This query takes ~3s:
SELECT pv.id, pv.siteSKU
FROM product_views pv
CROSS JOIN site_visits sv
WHERE pv.siteVisitId = sv.id
AND pv.siteSKU = 'foo'
AND sv.siteId = 'bar'
AND sv.postProcessed = 1
AND pv.timestamp >= '2011-05-19 00:00:00'
AND pv.timestamp < '2011-06-18 00:00:00';
But this one (non-indexed column added to SELECT) takes ~65s:
SELECT pv.id, pv.siteSKU, pv.hibernateVersion
FROM product_views pv
CROSS JOIN site_visits sv
WHERE pv.siteVisitId = sv.id
AND pv.siteSKU = 'foo'
AND sv.siteId = 'bar'
AND sv.postProcessed = 1
AND pv.timestamp >= '2011-05-19 00:00:00'
AND pv.timestamp < '2011-06-18 00:00:00';
Nothing in 'where' or 'from' clauses is different. All the extra time is spent in 'sending data':
mysql> show profile for query 1;
+--------------------+-----------+
| Status | Duration |
+--------------------+-----------+
| starting | 0.000155 |
| Opening tables | 0.000029 |
| System lock | 0.000007 |
| Table lock | 0.000019 |
| init | 0.000072 |
| optimizing | 0.000032 |
| statistics | 0.000316 |
| preparing | 0.000034 |
| executing | 0.000002 |
| Sending data | 63.530402 |
| end | 0.000044 |
| query end | 0.000005 |
| freeing items | 0.000091 |
| logging slow query | 0.000002 |
| logging slow query | 0.000109 |
| cleaning up | 0.000004 |
+--------------------+-----------+
16 rows in set (0.00 sec)
I understand that using a non-indexed column in where clause would slow things down, but why here? What can be done to improve the latter case - given that I will actually want to SELECT(*) from product_views?
EXPLAIN Output
explain extended select pv.id, pv.siteSKU from product_views pv cross join site_visits sv where pv.siteVisitId=sv.id and pv.siteSKU='foo' and sv.sit eId='bar' and sv.postProcessed=1 and pv.timestamp>='2011-05-19 00:00:00' and pv.timestamp<'2011-06-18 00:00:00';
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filt ered | Extra |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+--------------------------+ | 1 | SIMPLE | pv | ref | siteVisitId,FK52C29B1E3CAB9CC4,FIND_UNPROCESSED_IDX | FIND_UNPROCESSED_IDX | 258 | const | 41810 | 10
0.00 | Using where; Using index | | 1 | SIMPLE | sv | eq_ref | PRIMARY,post_processed_idx | PRIMARY | 8 | clabs.pv.siteVisitId | 1 | 10
0.00 | Using where |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+--------------------------+ 2 rows in set, 1 warning (0.00 sec)
mysql> explain extended select pv.id, pv.siteSKU, pv.hibernateVersion from product_views pv cross join site_visits sv where pv.siteVisitId=sv.id and pv.siteSKU= 'foo' and sv.siteId='bar' and sv.postProcessed=1 and pv.timestamp>='2011-05-19 00:00:00' and pv.timestamp<'2011-06-18 00:00:00';
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filt ered | Extra |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+-------------+ | 1 | SIMPLE | pv | ref | siteVisitId,FK52C29B1E3CAB9CC4,FIND_UNPROCESSED_IDX | FIND_UNPROCESSED_IDX | 258 | const | 41810 | 10
0.00 | Using where | | 1 | SIMPLE | sv | eq_ref | PRIMARY,post_processed_idx | PRIMARY | 8 | clabs.pv.siteVisitId | 1 | 10
0.00 | Using where |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+-------------+ 2 rows in set, 1 warning (0.00 sec)
UPDATE1: Splitting into 2 queries brings total time down to ~30s range
Not sure why, but splitting the latter query into the following reduces lat. from 65s to ~30s:
1) SELECT pv.id .... //from, where clauses same as above
2) SELECT * FROM product_views where id in (idList); //idList
UPDATE2: TABLE SIZE
table has on the order of 10M rows
query returns about 3k rows
When you select only indexed columns, MySQL does read only the index, and does not need to read the table data. This, as far as I remember, is called index-covered query. However, when there are columns, that are not present in the used index, MySQL needs to open the table and read the data from it. This is the reason index-covered queries to be much faster.
See Using Covering Indexes to Improve Query Performance.
As for the improvement, how many rows are in the table, how much the query returns and what is your buffer pool size, how much RAM is available, etc.?
From what I have read about show profile, 'sending data' is a portion of execution process, and has almost nothing to do with sending actual data to the client. You can take a look on this thread
Also, mysql docs says about "Sending data" :
The thread is reading and processing rows for a SELECT statement, and sending data to the client. Because operations occurring during this state tend to perform large amounts of disk access (reads), it is often the longest-running state over the lifetime of a given query.
In my opinion, mysql would better not mix together "reading and processing rows for a SELECT statement" and "sending data" in one state, especially in state called "sending" data" which causes lots of confusion.
I'm don't know MySQL internals at all, but Darhazer's explanation looks like the winner to me. When the non-indexed field is added, the entire row must be retrieved. And your rows are very wide. I can't quite tell from the names how (if at all) it is denormalized, but I suspect it is. site name and site sku smell like they belong in a site lookup table with an FK. rebates found amount and rebates found count sound like statistics that should be coming from a join to a separate product rebate table. etc.