Optimizing Slow, Indexed Select MySql Query - mysql

I am trying to execute a simple select query using a table indexed on src_ip like so:
SELECT * FROM netflow_nov2 WHERE src_IP=3111950672;
However this is not completed after even 4 or 5 hours. I need the response to be in the range of a few seconds. I am wondering how I can optimize it so this is the case.
Also note that source ip’s were converted to integers using the built in SQL command.
Other information about the table:
The table contains netflow data parsed from nfdump. I am using the table to get information about specific IP addresses. In other words, basically only queries like the above will be used.
Here is the relevant info as given by SHOW TABLE STATUS for this table:
Rows: 4,205,602,143 (4 billion)
Data Length: 426,564,911,104 (426 GB)
Index Length: 57,283,706,880 (57 GB)
Information about the system:
Hard disk: ~2TB, using close to maximum
RAM: 64GB
my.cnf file:
see gist: https://gist.github.com/ashtonwebster/e0af038101e1b42ca7e3
Table structure:
mysql> DESCRIBE netflow_nov2;
+-----------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+-------+
| date | datetime | YES | MUL | NULL | |
| duration | float | YES | | NULL | |
| protocol | varchar(16) | YES | | NULL | |
| src_IP | int(10) unsigned | YES | MUL | NULL | |
| src_port | int(2) | YES | | NULL | |
| dest_IP | int(10) unsigned | YES | MUL | NULL | |
| dest_port | int(2) | YES | | NULL | |
| flags | varchar(8) | YES | | NULL | |
| Tos | int(4) | YES | | NULL | |
| packets | int(8) | YES | | NULL | |
| bytes | int(8) | YES | | NULL | |
| pps | int(8) | YES | | NULL | |
| bps | int(8) | YES | | NULL | |
| Bpp | int(8) | YES | | NULL | |
| Flows | int(8) | YES | | NULL | |
+-----------+------------------+------+-----+---------+-------+
15 rows in set (0.02 sec)
I have additional info about the indexes and the results of explain, but briefly:
-The indexes are b-trees, and there are indexes for date, src_ip, and dest_ip, but only src_ip will really be used
-Based on the output of EXPLAIN, the src_ip index is being used for that particular query mentioned at the top
And the output of mysqltuner:
see gist: https://gist.github.com/ashtonwebster/cbfd98ee1799a7f6b323
SHOW CREATE TABLE output:
| netflow_nov2 | CREATE TABLE `netflow_nov2` (
`date` datetime DEFAULT NULL,
`duration` float DEFAULT NULL,
`protocol` varchar(16) DEFAULT NULL,
`src_IP` int(10) unsigned DEFAULT NULL,
`src_port` int(2) DEFAULT NULL,
`dest_IP` int(10) unsigned DEFAULT NULL,
`dest_port` int(2) DEFAULT NULL,
`flags` varchar(8) DEFAULT NULL,
`Tos` int(4) DEFAULT NULL,
`packets` int(8) DEFAULT NULL,
`bytes` int(8) DEFAULT NULL,
`pps` int(8) DEFAULT NULL,
`bps` int(8) DEFAULT NULL,
`Bpp` int(8) DEFAULT NULL,
`Flows` int(8) DEFAULT NULL,
KEY `src_IP` (`src_IP`),
KEY `dest_IP` (`dest_IP`),
KEY `date` (`date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Thanks in advance

Your current table structure is optimized for random writes: records are placed on disk in the order of writes.
Unfortunately the only read pattern that is well supported by such a structure is a full-table scan.
Usage of non-covering secondary indices still results in a lot of random disk seeks which are killing performance.
The best reading performance is obtained when data is read in the same order as it is located on disk, which for InnoDB means in the primary key order.
A materialized view (another InnoDB table that has an appropriate primary key) could be a possible solution. In this case a primary key that starts with src_IP is required.
upd: The idea is to achieve data locality and avoid random disk IO, aiming for sequential reading. This means that your materialized view will look like this:
CREATE TABLE `netflow_nov2_view` (
`row_id` bigint not null, -- see below
`date` datetime DEFAULT NULL,
`duration` float DEFAULT NULL,
`protocol` varchar(16) DEFAULT NULL,
`src_IP` int(10) unsigned DEFAULT NULL,
`src_port` int(2) DEFAULT NULL,
`dest_IP` int(10) unsigned DEFAULT NULL,
`dest_port` int(2) DEFAULT NULL,
`flags` varchar(8) DEFAULT NULL,
`Tos` int(4) DEFAULT NULL,
`packets` int(8) DEFAULT NULL,
`bytes` int(8) DEFAULT NULL,
`pps` int(8) DEFAULT NULL,
`bps` int(8) DEFAULT NULL,
`Bpp` int(8) DEFAULT NULL,
`Flows` int(8) DEFAULT NULL,
PRIMARY KEY (`src_IP`, `row_id`) -- you won't need other keys
) ENGINE=InnoDB DEFAULT CHARSET=latin1
where row_id has to be maintained by your materializing logic, since you don't have it in the original table (or you can introduce an explicit auto_increment field to your original table, it's how InnoDB handles it anyway).
The crucial difference is that now all data on the disk is placed in the primary key order, which means that once you locate the first record with a given 'src_IP' all other records can be obtained as sequentially as possible.
Depending on the way your data is written and adjacent application logic it can be accomplished either via triggers or by some custom external process.
If it is possible to sacrifice current write performance (or use some async queue as a buffer) then probably having a single table optimized for reading would suffice.
More on InnoDB indexing:
http://dev.mysql.com/doc/refman/5.6/en/innodb-index-types.html

I would think that reading the table without an index would take less than 5 hours. But you do have a big table. There are two "environmental" possibilities that would kill the performance:
The table is locked by another process.
The result set is huge (tens of millions of rows) and the network latency/processing time for returning the result set is causing the problem.
My first guess, though, is that the query is not using the index. I missed this at first, but you have one multi-part index. The only index this query can take advantage of is one where the first key is src_IP. So, if you index is either netflow_nov2(src_IP, date, dest_ip) or netflow_nov2(src_IP, dest_ip, date), then you are ok. If either of the other columns is first, then this index will not be used. You can easily see what is happening by putting explain in front the query to see if the index is being used.
If this is a problem, create an index with src_IP as the first (or only) key in the index.

Related

indexes and constraints not created in CREATE TABLE statement

I am using MySQL/MariaDB, and I create an employees table:
CREATE TABLE employees(
id INT AUTO_INCREMENT,
name VARCHAR(40) NOT NULL,
description VARCHAR(50) DEFAULT 'No Description',
random_assignment_id INT UNIQUE,
birth_date DATE,
salary DECIMAL(5,2),
supervisor_id INT,
branch_id INT NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT random_assignment_check CHECK (LENGTH(random_assignment_id) = 5),
INDEX(random_assignment_id, supervisor_id, branch_id),
PRIMARY KEY(id)
)
Then I confirm the table is created as expected:
SHOW CREATE TABLE employees\G;
*************************** 1. row ***************************
Table: employees
Create Table: CREATE TABLE `employees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(40) NOT NULL,
`description` varchar(50) DEFAULT 'No Description',
`random_assignment_id` int(11) DEFAULT NULL,
`birth_date` date DEFAULT NULL,
`salary` decimal(5,2) DEFAULT NULL,
`supervisor_id` int(11) DEFAULT NULL,
`branch_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `random_assignment_id` (`random_assignment_id`),
KEY `random_assignment_id_2` (`random_assignment_id`,`supervisor_id`,`branch_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.01 sec)
I don't see the random_assignment_check constraint listed, and I expected it to index random_assignment_id, supervisor_id and branch_id, but it does not:
DESCRIBE employees;
+----------------------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(40) | NO | | NULL | |
| description | varchar(50) | YES | | No Description | |
| random_assignment_id | int(11) | YES | UNI | NULL | |
| birth_date | date | YES | | NULL | |
| salary | decimal(5,2) | YES | | NULL | |
| supervisor_id | int(11) | YES | | NULL | |
| branch_id | int(11) | NO | | NULL | |
| created_at | timestamp | NO | | CURRENT_TIMESTAMP | |
| updated_at | timestamp | NO | | CURRENT_TIMESTAMP | |
+----------------------+--------------+------+-----+-------------------+----------------+
There are no MUL flags under key.
Note that I read MariaDB now supports Constraints; according to homebrew, I am using:
brew info mariadb
mariadb: stable 10.3.12 (bottled)
Drop-in replacement for MySQL
What am I doing wrong?
MySQL treats KEY as a synonym for INDEX. Your index INDEX(random_assignment_id, supervisor_id, branch_id) became KEY random_assignment_id_2 (random_assignment_id,supervisor_id,branch_id). The index name was generated by MySQL, but logically they're the same index.
When I tested your CREATE TABLE statement and then use DESC to display it, I also see no MUL indicators. This fits the documentation:
If Key is MUL, the column is the first column of a nonunique index in which multiple occurrences of a given value are permitted within the column.
So in your output, the Key field for random_assignment_id is UNI because it's a unique key, in addition to part of a multi-column key.
MySQL doesn't support CHECK constraints. It parses them, then ignores them. They are not stored with your table, and subsequently using SHOW CREATE TABLE doesn't show them.
MariaDB has implemented CHECK constraints in 10.2.1 according to https://mariadb.com/kb/en/library/constraint/#check-constraints (I don't use MariaDB, so I'll trust their doc).
It's not clear from your question if you tested your CHECK constraint on MySQL Community Edition or MariaDB. CHECK constraint will not be saved on MySQL, and on MariaDB it seems okay per the doc, but I have never tested it.
We should all stop thinking of MariaDB a drop-in replacement for MySQL. The two products have been diverging for nearly 10 years, and they can no longer be assumed to be compatible.

Sql Query Taking too much time with group by

I have one table containing 5m records by today. By the time this data will increase to 1 or 2 billion records. My task to generate an summary report from this data, for this I am using below query.
SELECT creation_date
,caller_circle
,count(id)
FROM call_reporting
WHERE enterprise_id = 206
GROUP BY DATE (creation_date)
,caller_circle limit 10;
Table structure looks like this.
CREATE TABLE `call_reporting` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT,
`SESSION_ID` varchar(255) DEFAULT NULL,
`CALLER_NUMBER` bigint(20) NOT NULL,
`DIALED_NUMBER` bigint(20) NOT NULL,
`CALL_START_TIME` datetime DEFAULT NULL,
`CALL_END_TIME` datetime DEFAULT NULL,
`OUT_CALL_START_TIME` datetime DEFAULT NULL,
`OUT_CALL_END_TIME` datetime DEFAULT NULL,
`HUNTING_START_TIME` datetime DEFAULT NULL,
`IN_CALL_DURATION` bigint(20) DEFAULT NULL,
`OUT_CALL_DURATION` bigint(20) DEFAULT NULL,
`HUNTING_DURATION` bigint(20) DEFAULT NULL,
`ADV_ID` bigint(20) DEFAULT NULL,
`ENTERPRISE_ID` bigint(20) DEFAULT NULL,
`AGENT_ID` bigint(20) DEFAULT NULL,
`HUNT_TRY` int(255) DEFAULT NULL,
`CAMPAIGN_ID` bigint(20) DEFAULT NULL,
`CALL_STATUS` varchar(255) DEFAULT NULL,
`URL_CALLING_STATUS` varchar(255) DEFAULT NULL,
`REMARKS` text,
`REF_NO` varchar(255) DEFAULT NULL,
`POST_CALL_RESULT` bit(1) DEFAULT NULL,
`CREATION_DATE` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`AGENT_DIAL_OUT_NUMBER` bigint(20) DEFAULT NULL,
`DATA_SYNC` bit(1) DEFAULT NULL,
`CALLER_CIRCLE` varchar(50) DEFAULT NULL,
`STATUS_CODE` varchar(50) DEFAULT NULL,
`OPERATOR_NAME` varchar(50) DEFAULT NULL,
`OBD_RESULT_STATUS` bit(1) DEFAULT NULL,
`MAIL_SENT` bit(1) DEFAULT NULL,
`SDR_ID` varchar(255) DEFAULT NULL,
`KEY_PRESS` varchar(1024) DEFAULT NULL,
`ENTERPRISE_USER_ID` bigint(20) DEFAULT NULL,
`CAMAIGN_NAME` varchar(128) DEFAULT NULL,
`DND_NO` bit(1) DEFAULT b'0',
KEY `ID` (`ID`),
KEY `ENTERPRISE_ID` (`ENTERPRISE_ID`),
KEY `SDR_ID` (`SDR_ID`),
KEY `CALLER_NUMBER` (`CALLER_NUMBER`),
KEY `CREATION_DATE` (`CREATION_DATE`),
KEY `DIALED_NUMBER` (`DIALED_NUMBER`),
KEY `CALLER_CIRCLE` (`CALLER_CIRCLE`),
KEY `CAMAIGN_NAME` (`CAMAIGN_NAME`),
KEY `ADV_ID` (`ADV_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=2612658 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (MONTH(CREATION_DATE))
PARTITIONS 12 */ |
This table contain partition also. but when i run the given query it takes
10 rows in set (15.11 sec)
When i have seen the query profile it gives the following stat.
+--------------------------------+-----------+
| Status | Duration |
+--------------------------------+-----------+
| starting | 0.000052 |
| Waiting for query cache lock | 0.000017 |
| checking query cache for query | 0.000106 |
| checking permissions | 0.000023 |
| Opening tables | 0.000051 |
| System lock | 0.000035 |
| Waiting for query cache lock | 0.000015 |
| init | 0.000085 |
| optimizing | 0.000036 |
| statistics | 0.003924 |
| preparing | 0.000075 |
| Creating tmp table | 0.000077 |
| executing | 0.000014 |
| Copying to tmp table | 16.945653 |
| Sorting result | 0.000879 |
| Sending data | 0.001254 |
| end | 0.000012 |
| removing tmp table | 0.000017 |
| end | 0.000010 |
| query end | 0.000013 |
| closing tables | 0.000019 |
| freeing items | 0.000030 |
| logging slow query | 0.000008 |
| logging slow query | 0.000008 |
| cleaning up | 0.000007 |
+--------------------------------+-----------+
25 rows in set (0.01 sec)
It is taking two much to copy data into temp table; Is there any way to reduce this execution time. In my case temp table size is
tmp_table_size | 16777216 |
I was also thinking to load data into RAM. But do not the pros and cons of it. Because in my case data size will grow expansion. Please give a way to do it.
Thanks in advance.
Try creating a covering index on
( enterprise_id, creation_date, caller_circle )
This way, the WHERE clause is optimized by the ID, and the GROUP BY can be used from the index too.
Also, change from "count(ID)" to just "count(*)" so the engine does not have to go to the page to get the ID, it just knows a record qualified from the covering index.
I would then suggest creating a secondary table that holds the simple aggregate roll-ups for the given enterprise, date and caller circle. That is... if the raw data won't change historically (such as back-filling call data). It happened, it's done, the totals are what they are for the historical date. Then, if anything needs to be re-calibrated, you could rebuild on a gradual basis, such as for a month/year so you are not plowing through a billion records in the future.

MySQL Query optimization on big table

I can't find a way to fasten simple queries in a huge table.
I don't think i'm asking something crazy to MySQL, even with the amount of datas… and i can't understand why these following queries have so much different execution time !
I tried my best to read all articles about big datas in mysql, fields optimization, and already achieved to reduce query time with field types… but really, i'm getting lost now with this kind of simple queries !
Here is an example on MySQL 5.1.69 :
SELECT rv.`id_prd`,SUM(`quantite`)
FROM `report_ventes` AS rv
WHERE `periode` BETWEEN 201301 AND 201312
GROUP BY rv.`id_prd`
Execution time : 3.76 sec
Let's add a LEFT JOIN and another selected field :
SELECT rv.`id_prd`,SUM(`quantite`),`acl_cip_7`
FROM `report_ventes` AS rv
LEFT JOIN `report_produits` AS rp
ON (rv.`id_prd` = rp.`id_prd`)
WHERE `periode` BETWEEN 201301 AND 201312
GROUP BY rv.`id_prd`
Execution time : 12.10 sec
Explain :
+----+-------------+-------+--------+---------------+---------+---------+--------------------------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+--------------------------+----------+----------------------------------------------+
| 1 | SIMPLE | rv | ALL | periode | NULL | NULL | NULL | 16556188 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | rp | eq_ref | PRIMARY | PRIMARY | 4 | main_reporting.rv.id_prd | 1 | Using index |
+----+-------------+-------+--------+---------------+---------+---------+--------------------------+----------+----------------------------------------------+
And let's another where clause :
SELECT rv.`id_prd`,SUM(`quantite`),`acl_cip_7`
FROM `report_ventes` AS rv
LEFT JOIN `report_produits` AS rp
ON (rv.`id_prd` = rp.`id_prd`)
WHERE rp.`id_clas_prd` LIKE '1%'
AND `periode` BETWEEN 201301 AND 201312
GROUP BY rv.`id_prd`
Execution time : 21.00 sec
Explain :
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+----------+----------------------------------------------+
| 1 | SIMPLE | rv | ALL | periode | NULL | NULL | NULL | 16556188 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | rp | eq_ref | PRIMARY,id_clas_prd | PRIMARY | 4 | main_reporting.rv.id_prd | 1 | Using where |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+----------+----------------------------------------------+
And here are the tables parameters :
report_produits : 80 000 rows
CREATE TABLE `report_produits` (
`id_prd` int(11) unsigned NOT NULL,
`acl_cip_7` int(7) NOT NULL,
`acl_cip_ean_13` varchar(255) DEFAULT NULL,
`lib_prd` varchar(255) DEFAULT NULL,
`id_clas_prd` char(7) NOT NULL DEFAULT '',
`id_lab_prd` int(11) unsigned NOT NULL,
`id_rbt_prd` int(11) unsigned NOT NULL,
`id_tva_prd` int(11) unsigned NOT NULL,
`t_gen` varchar(255) NOT NULL,
`id_grp_gen` varchar(16) NOT NULL DEFAULT '',
`id_liste_delivrance` int(11) unsigned NOT NULL,
PRIMARY KEY (`id_prd`),
KEY `index_lab` (`id_lab_prd`),
KEY `index_grp` (`id_grp_gen`),
KEY `id_clas_prd` (`id_clas_prd`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
report_ventes : 16 556 188 rows
CREATE TABLE `report_ventes` (
`id` int(13) NOT NULL AUTO_INCREMENT,
`periode` mediumint(6) DEFAULT NULL,
`id_phie` smallint(4) unsigned NOT NULL,
`id_prd` mediumint(8) unsigned NOT NULL,
`quantite` smallint(11) DEFAULT NULL,
`ca_ht` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `periode` (`periode`)
) ENGINE=MyISAM AUTO_INCREMENT=18491315 DEFAULT CHARSET=utf8;
There is no covering index and MySQL decides that scanning the whole table is more effective than to use an index and lookup for the requested values.
You are joining to the report_ventes on the id_prd, but that column is not the part of the clustering index (PK in MySQL). This means, the server should lookup for all the values. The server bypasses the periode index possibly because it is not enough selective to use it.
An index could help which includes the id_prd, periode and quantite columns. With this index, there is a chance that the MySQL server will use it since it is a covering index for this query.
Give it a try, but its hard to tell the real truth without testing it on the actual environment.
Basically your indexes is not being used, i can't spot the precise reason without trying it on a sql server, but a common cause is the data has different types.
AND periode BETWEEN 201301 AND 201312
"periode" has datatype mediumint(6) and the litteral "201301" possible has datatype int(10)
LEFT JOIN `report_produits` AS rp ON (rv.`id_prd` = rp.`id_prd`)
Here are the 2 datatypes also different.

Need help to optimize the My SQL query

I need help on optimizing the following query
select
DATE_FORMAT( traffic.stat_date, '%Y/%m'),
pt.promotion,
sum(traffic.voice_nat_onnet_mins - pt.promo_minutes_onnet) as total_onnet_mins,
sum(traffic.voice_nat_offnet_mins + traffic.voice_nat_landline_mins + traffic.voice_int_mins + traffic.voice_nng_mins + traffic.voice_not_rec_mins - pt.promo_minutes_offnet) as total_offnet_mins,
sum(traffic.sms_ptp_onnet_evts) as total_onnet_sms,
sum(traffic.sms_ptp_offnet_evts + traffic.sms_vas_pta_evts) as total_offnet_sms,
sum(traffic.dati_kb) as internet_kb
from
stats_novercanet.mnp_prod_stat_outgoing_traffic traffic
INNER JOIN stats_novercanet.mnp_prod_stat_promotion_traffic pt
ON pt.id_source_user=traffic.id_source_user
INNER JOIN stats_novercanet.mnp_prod_stat_customer_first_signup fs
ON pt.id_source_user = fs.id_source_user
where
traffic.stat_date between '2013-11-01' and '2013-11-30'
and traffic.stat_date >= (
select min(ft.stat_date)
from stats_novercanet.mnp_prod_stat_promotion_traffic ft
where
traffic.id_source_user=ft.id_source_user
and (ft.sub_rev>0 or ft.ren_rev>0)
and pt.promotion=ft.promotion
)
and pt.stat_date between '2013-11-01' and '2013-11-30'
group by
DATE_FORMAT( traffic.stat_date, '%Y/%m'),
pt.promotion
order by
DATE_FORMAT( traffic.stat_date, '%Y/%m'),
pt.promotion **
I have used explain for this query and it showed me following result
+----+--------------------+---------+-------+------------------------------------------------+---------------------------------+---------+-----------------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+-------+------------------------------------------------+---------------------------------+---------+-----------------------------------------+--------+----------------------------------------------+
| 1 | PRIMARY | pt | range | idx_prod_stat_pro_tra_stat_date,id_source_user | idx_prod_stat_pro_tra_stat_date | 4 | NULL | 530114 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | fs | ref | id_source_user | id_source_user | 5 | stats_novercanet.pt.id_source_user | 1 | Using where; Using index |
| 1 | PRIMARY | traffic | ref | stat_date,id_source_user | id_source_user | 5 | stats_novercanet.pt.id_source_user | 60 | Using where |
| 2 | DEPENDENT SUBQUERY | ft | ref | id_source_user,promotion | id_source_user | 5 | stats_novercanet.traffic.id_source_user | 93 | Using where |
+----+--------------------+---------+-------+------------------------------------------------+---------------------------------+---------+-----------------------------------------+--------+----------------------------------------------+
Any help on optimization will be great. I have created index on id_source_user, stat_date and promotion as well but no luck. Also tried with subquery in join but no luck.
Result is as follow for mnp_prod_stat_promotion_traffic.**
| mnp_prod_stat_promotion_traffic | CREATE TABLE `mnp_prod_stat_promotion_traffic` (
`stat_date` date DEFAULT NULL,
`id_source_user` int(64) DEFAULT NULL,
`promotion` varchar(64) DEFAULT NULL,
`num_of_sub` int(64) DEFAULT NULL,
`num_of_ren` int(64) DEFAULT NULL,
`credit` float DEFAULT NULL,
`minutes` float DEFAULT NULL,
`kb` float DEFAULT NULL,
`sms` int(64) DEFAULT NULL,
`lbs` int(64) DEFAULT NULL,
`sub_rev` float DEFAULT NULL,
`ren_rev` float DEFAULT NULL,
`consumed_credit` float DEFAULT NULL,
`sim_type` varchar(32) DEFAULT NULL,
`price_plan` varchar(64) DEFAULT NULL,
`WiFi_mins` float DEFAULT NULL,
`over_min` float DEFAULT NULL,
`over_min_consumed` float DEFAULT NULL,
`over_sms` float DEFAULT NULL,
`over_sms_consumed` float DEFAULT NULL,
`over_data` float DEFAULT NULL,
`over_data_consumed` float DEFAULT NULL,
`promo_minutes_onnet` float DEFAULT NULL,
`promo_minutes_offnet` float DEFAULT NULL,
`promo_sms_onnet` int(64) DEFAULT NULL,
`promo_sms_offnet` int(64) DEFAULT NULL,
KEY `idx_prod_stat_pro_tra_stat_date` (`stat_date`),
KEY `id_source_user` (`id_source_user`),
KEY `promotion` (`promotion`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
How many results are you expecting to get returned. If you know for example that you only want one record returned then you can use LIMITS
When a query is ran it will search the whole table for that record, but if you know the is only one, two or three results returned then you can LIMIT. This will save MySQL a lot of time, but again it will depend on the number of results you are excepting, and you will have to apply it to the tables you are running it on.
Also, another tip is to check what type of table types you are using. Have a look at this webpage for more information: http://www.mysqltutorial.org/understand-mysql-table-types-innodb-myisam.aspx
Another option to do is to build a script to use your existing query above and store the result in a new table, and only run the script once a month via cron at like midnight. I did this for an analytical project, and it worked well.

How can I optimize a Mysql query that searches for rows in a certain date range

Here is the query:
select timespans.id as timespan_id, count(*) as num
from reports, timespans
where timespans.after_date >= '2011-04-13 22:08:38' and
timespans.after_date <= reports.authored_at and
reports.authored_at < timespans.before_date
group by timespans.id;
Here are the table defs:
CREATE TABLE `reports` (
`id` int(11) NOT NULL auto_increment,
`source_id` int(11) default NULL,
`url` varchar(255) default NULL,
`lat` decimal(20,15) default NULL,
`lng` decimal(20,15) default NULL,
`content` text,
`notes` text,
`authored_at` datetime default NULL,
`created_at` datetime default NULL,
`updated_at` datetime default NULL,
`data` text,
`title` varchar(255) default NULL,
`author_id` int(11) default NULL,
`orig_id` varchar(255) default NULL,
PRIMARY KEY (`id`),
KEY `index_reports_on_title` (`title`),
KEY `index_content_on_reports` (`content`(128))
CREATE TABLE `timespans` (
`id` int(11) NOT NULL auto_increment,
`after_date` datetime default NULL,
`before_date` datetime default NULL,
`after_offset` int(11) default NULL,
`before_offset` int(11) default NULL,
`is_common` tinyint(1) default NULL,
`created_at` datetime default NULL,
`updated_at` datetime default NULL,
`is_search_chunk` tinyint(1) default NULL,
`is_day` tinyint(1) default NULL,
PRIMARY KEY (`id`),
KEY `index_timespans_on_after_date` (`after_date`),
KEY `index_timespans_on_before_date` (`before_date`)
And here is the explain:
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9 | NULL | 84 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | reports | ALL | NULL | NULL | NULL | NULL | 183297 | Using where |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+
And here is the explain after I create an index on authored_at. As you can see, the index is not actually getting used (I think...)
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9 | NULL | 86 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | reports | ALL | index_reports_on_authored_at | NULL | NULL | NULL | 183317 | Range checked for each record (index map: 0x8) |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
There are about 142k rows in the reports table, and far fewer in the timespans table.
The query is taking about 3 seconds now.
The strange thing is that if I add an index on reports.authored_at, it actually makes the query far slower, about 20 seconds. I would have thought it would do the opposite, since it would make it easy to find the reports at either end of the range, and throw the rest away, rather than having to examine all of them.
Can someone clarify? I'm stumped.
Instead of two separate indexes for the timespan table, try merging them into a single multi-column index with before_date and after_date in a single index. Then add that index to authored_at as well.
i rewrite you query like this:
select t.id, count(*) as num from timespans t
join reports r where t.after_date >= '2011-04-13 22:08:38'
and r.authored_at >= '2011-04-13 22:08:38'
and r.authored_at < t.before_date
group by t.id order by null;
and change indexes of tables
alter table reports add index authored_at_idx(authored_at);
You can used partition feature of database on column after_date. It will help u a lot.