I have a number of reports involving joins on large datasets. These tables are being written to many times per second. My cronjobs run the queries at the least impactful times but still I am concerned about harming performance by locking tables with them.
Here is a simple example they requested as a one off today. It shows playtimes for a RIIA report:
SELECT
date_format(p.`played`, '%Y-%m') as `month`,
SUM(TIME_TO_SEC(s.`length`))/3600 as `playtime`
INTO OUTFILE "/tmp/120313_playtime.csv"
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
FROM
`plays` p,
`songs` s
GROUP BY `month`
How do I construct this to avoid causing issues for the radio app writing to the plays table while the query is running? Should I create temp tables and copy the live ones over?
// EDIT per request EXPLAIN output
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | s | ALL | NULL | NULL | NULL | NULL | 3909 | Using temporary; Using filesort |
| 1 | SIMPLE | p | ALL | NULL | NULL | NULL | NULL | 4040933 | Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
CREATE TABLE `plays` (
`play_id` int(11) NOT NULL auto_increment,
`song_id` int(11) NOT NULL,
`played` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
PRIMARY KEY (`play_id`)
) ENGINE=MyISAM AUTO_INCREMENT=4040992 DEFAULT CHARSET=latin1 COMMENT='play counts for songs' AUTO_INCREMENT=4040992 ;
CREATE TABLE `songs` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(255) NOT NULL,
`artist_id` int(11) NOT NULL,
`length` time NOT NULL,
`album_id` int(11) NOT NULL,
`active` tinyint(4) NOT NULL,
`tracknum` varchar(16) NOT NULL,
`bitrate` varchar(32) NOT NULL,
`date_created` datetime NOT NULL,
`date_modified` timestamp NOT NULL default '0000-00-00 00:00:00' on update CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=4136 DEFAULT CHARSET=latin1 AUTO_INCREMENT=4136 ;
Just two immediate things come to mind... One, no "JOIN" between plays and songs which will result in a Cartesian product. Second, add a WHERE clause, and I would expect the "played" column is a date/time, so you could query for all records < NOW(), so if any are added while the query is running, they would be excluded. Since it appears you are doing monthly, you might even create a separate table that is nothing but the running totals per "time period" grouped by month and year, then you don't have to worry about a super long query. Then, you can just run for the current month in question... still less than NOW().
Related
CREATE TABLE `new_table` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`firstname` varchar(45) NOT NULL,
`lastname` varchar(45) NOT NULL,
`age` int(10) unsigned NOT NULL,
`index_id` int(10) unsigned NOT NULL,
`field1` varchar(45) NOT NULL,
`field2` int(10) unsigned NOT NULL,
`ip` varchar(15) NOT NULL,
`port` int(10) unsigned NOT NULL,
PRIMARY KEY (`Id`),
KEY `unq` (`ip`,`port`),
KEY `idx` (`index_id`)
) ENGINE=InnoDB AUTO_INCREMENT=6472004 DEFAULT
This is the table that the query is selecting. I run a program to insert rows in an infinite loop. Each loop runs an inserting query with 2000 rows for the table. Id is not provided in insert query, so it is auto increment. Ip and index_id(from 0 - 20000000) will be randomly generated. while the program is running, I run a query to select some rows, but the query is stuck at least 10+ seconds. Sometime it will be stuck forever.
select * from new_table where index_id > 2131223 limit 100;
They are the insert queries and a select query. Tablet is INNODB with RR isolation level, so it should not be a lock problem at all. I'm curious about what happen to the select query? If I try to sleep 1 second for each loop literation to slow down the insert query, the select query will be fine. Therefore I think the problem might be the priority between writing and reading?
Let me know if you have answer for this problem!
Here is select query explain.
mysql> explain select * from new_table where index_id > 2131223 limit 100;
+----+-------------+-----------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+
| 1 | SIMPLE | new_table | NULL | range | idx | idx | 4 | NULL | 2430541 | 100.00 | Using index condition; Using MRR |
+----+-------------+-----------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)
Update:
The problem will be solved if I turn off MRR. I think the problem are between MRR and bulk insertions.
I have two tables events and event_params
the first table stores the events with these columns
events | CREATE TABLE `events` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`project` varchar(24) NOT NULL,
`event` varchar(24) NOT NULL,
`date` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `project` (`project`,`event`)
) ENGINE=InnoDB AUTO_INCREMENT=2915335 DEFAULT CHARSET=latin1
and second stores parameters for each event with these columns
event_params | CREATE TABLE `event_params` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`event_id` int(10) unsigned NOT NULL,
`name` varchar(24) NOT NULL,
`value` varchar(524) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `event_id` (`event_id`),
KEY `value` (`value`),
) ENGINE=InnoDB AUTO_INCREMENT=20789391 DEFAULT CHARSET=latin1
now I want to get count of events those have various values on a specified parameter
I wrote this query for campaign parameter but this is too slow (15 secs to respond)
SELECT
event_params.value as campaign,
count(*) as count
FROM `events`
left join event_params on event_params.event_id = events.id
and event_params.name = 'campaign'
WHERE events.project = 'foo'
GROUP by event_params.value
and here is the EXPLAIN query result:
+----+-------------+--------------+------------+------+---------------------+----------+---------+------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+---------------------+----------+---------+------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | events | NULL | ref | project | project | 26 | const | 1 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | event_params | NULL | ref | name,event_id,value | event_id | 4 | events.events.id | 4 | 100.00 | Using where |
+----+-------------+--------------+------------+------+---------------------+----------+---------+------------------+------+----------+----------------------------------------------+
can i speed up this query ?
You may try adding the following index on the event_params table, which might speed up the join:
CREATE INDEX idx1 ON event_params (event_id, name, value);
The aggregation step probably can't be optimized much because the COUNT operation involves counting each record.
Move the "campaign value" into the main table, with a suitable length for VARCHAR and then
SELECT
campaign,
count(*) as count
FROM `events`
WHERE project = 'foo'
GROUP by campaign
And have
INDEX(project, campaign)
A bit of advice when tempted to use EAV: Move the 'important' values into the main table; leave only the rarely used or rarely set 'values' in the other table. Also (assuming there are no dups), have
PRIMARY KEY(event_id, name)
More discussion: http://mysql.rjweb.org/doc.php/eav
I've been trying to wrap my head around this for a good while, but had no luck. I have a simple queue system implemented on my small site and a cron job to check if there are any items in the queue. It's supposed to fetch several items ordered by priority and process them, but for some reason the priority index gets ignored. My create table syntax is
CREATE TABLE `site_queue` (
`row_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`task` tinyint(3) unsigned NOT NULL COMMENT '0 - email',
`priority` int(10) unsigned DEFAULT NULL,
`commands` text NOT NULL,
`added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`row_id`),
KEY `task` (`task`),
KEY `priority` (`priority`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
The query to fetch queued items is
SELECT `row_id`, `task`, `commands` FROM `site_queue` ORDER BY `priority` DESC LIMIT 5;
The EXPLAIN query returns the following:
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | site_queue | ALL | NULL | NULL | NULL | NULL | 1269 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
Can anyone offer some insight on what might be causing this?
Because when it's only few rows (originally 4, then increased to 1k) there is no reason to use index, since it will be slower (mysql will have to read both index and data pages too many times).
So the rule of thumb of mysql query optimizations: use reasonably big amount of data when you do so. It would be good if size was comparable to real production data size.
mysql> explain
select c.userEmail,f.customerId
from comments c
inner join flows f
on (f.id = c.typeId)
inner join users u
on (u.email = c.userEmail)
where c.addTime >= 1372617000
and c.addTime <= 1374776940
and c.type = 'flow'
and c.automated = 0;
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| 1 | SIMPLE | f | index | PRIMARY | customerId | 4 | NULL | 144443 | Using index |
| 1 | SIMPLE | c | ref | userEmail_idx,addTime,automated,typeId | typeId | 198 | f.id,const | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | email | email | 386 | c.userEmail | 1 | Using index |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
How do I make the above query faster - it constantly shows up in the slow query logs.
Indexes present :
id is the auto incremented primary key of the flows table.
customerId of flows table.
userEmail of comments table.
composite index (typeId,type) on comments table.
email of users table (unique)
automated of comments table.
addTime of comments table.
Number of rows :
1. flows - 150k
2. comments - 500k (half of them have automated = 1 and others have automated = 0) (also value of type is 'flow' for all the rows except 500)
3. users - 50
Table schemas :
users | CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=utf8
comments | CREATE TABLE `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`userEmail` varchar(128) DEFAULT NULL,
`content` mediumtext NOT NULL,
`addTime` int(11) NOT NULL,
`typeId` int(11) NOT NULL,
`automated` tinyint(4) NOT NULL,
`type` varchar(64) NOT NULL,
PRIMARY KEY (`id`),
KEY `userEmail_idx` (`userEmail`),
KEY `addTime` (`addTime`),
KEY `automated` (`automated`),
KEY `typeId` (`typeId`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=572410 DEFAULT CHARSET=utf8 |
flows | CREATE TABLE `flows` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(32) NOT NULL,
`status` varchar(128) NOT NULL,
`customerId` int(11) NOT NULL,
`createTime` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `flowType_idx` (`type`),
KEY `customerId` (`customerId`),
KEY `status` (`status`),
KEY `createTime` (`createTime`),
) ENGINE=InnoDB AUTO_INCREMENT=134127 DEFAULT CHARSET=utf8 |
You have the required indexes to perform the joins efficiently. However, it looks like MySQL is joining the tables in a less efficient manner. The EXPLAIN output shows that it is doing a full index scan of the flows table then joining the comments table.
It will probably be more efficient to read the comments table first before joining. That is, in the order you have specified in your query so that the comment set is restricted by the predicates you have supplied (probably what you intended).
Running OPTIMISE TABLE or ANALYZE TABLE can improve the decision that the query optimiser makes. Particularly on tables that have had extensive changes.
If the query optimiser still gets it wrong you can force tables to be read in the order you specify in the query by beginning your statement with SELECT STRAIGHT_JOIN or by changing the INNER JOIN to STRAIGHT_JOIN.
I have a table with about 30 million records which I need to perform queries upon. From my reading, I thought that a composite index using leftmost prefixing with all the fields I need to select would be the correct way to do it, but when I run an explain on the query, it's not even using the index.
This is the query:
select distinct email FROM my_table
WHERE `customer_id` IN(278,428,186,40,208,247,59,79,376,73,38,52,68,227)
AND `company_id` = 4
AND `active` = 1
AND `date` > '2012-04-15';
The explain looks like this
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
| 1 | SIMPLE | emails | index | customer_id | email | 772 | NULL | 29296705 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
These are the fields
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL DEFAULT '',
`customer_id` int(10) unsigned DEFAULT NULL,
`company_id` int(10) unsigned NOT NULL,
`active` tinyint(1) unsigned NOT NULL DEFAULT '1',
`date` date DEFAULT NULL
Indexes looks like this
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`,`customer_id`),
KEY `customer_id` (`customer_id`,`company_id`,`active`,`date`)
I'm not quite sure what the best way to optimize this is.
MySQL is often fussy about IN on the left side of the index. Try one query for each customer_id and see if that's using your index. You can use the UNION syntax to join them together The other possibility is that MySQL figures it's faster to sift through everything for 10% of rows than to try to use indexes for them.