mysql sub queries performance - mysql

my SQL(with sub-queries) take so long(nearly 24hour). Is using sub-queries is not good for performance?
My table as below
mysql> show create table eventnew;
CREATE TABLE `eventnew` (
`id` int(50) NOT NULL AUTO_INCREMENT,
`date` datetime DEFAULT NULL,
`src_ip` int(10) unsigned DEFAULT NULL,
`src_port` int(10) unsigned DEFAULT NULL,
`dst_ip` int(10) unsigned DEFAULT NULL,
`dst_port` int(10) unsigned DEFAULT NULL,
`repo_ip` varchar(50) DEFAULT NULL,
`link` varchar(50) DEFAULT NULL,
`binary_hash` varchar(50) DEFAULT NULL,
`sensor_id` varchar(50) DEFAULT NULL,
`repox_ip` int(10) unsigned DEFAULT NULL,
`flags` varchar(50) DEFAULT NULL,
`shellcode` varchar(1000) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `date` (`date`),
KEY `sensor_id` (`sensor_id`),
KEY `src_ip` (`src_ip`)
) ENGINE=MyISAM AUTO_INCREMENT=883278 DEFAULT CHARSET=latin1
my SQL as below:
SELECT COUNT( DISTINCT binary_hash ) AS cnt
FROM eventnew
WHERE DATE >= '2010-10-16'
AND DATE < '2010-10-17'
AND binary_hash NOT
IN (
SELECT DISTINCT binary_hash
FROM eventnew
WHERE DATE < '2010-10-16'
AND binary_hash IS NOT NULL
)
below are result running EXPLAIN:
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
| 1 | PRIMARY | eventnew | range | date | date | 9 | NULL | 14296 | Using where |
| 2 | DEPENDENT SUBQUERY | eventnew | range | date | date | 9 | NULL | 384974 | Using where |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+

Using subqueries certainly does affect your performance. For instance, lets say a Table T1 has 'n' records and T2 has 'm' records. when you do a join on T1 and T2, it will take n*m records and then will sort them based on your condition. The same case goes with in keyword as well. and if you have another constraint in subquery, it would further decrease the efficiency. However, using subqueries couldn't be avoided in practice as they are meant to be.

I'd suggest you use NOT EXISTS instead of NOT IN.

Try this
SELECT COUNT( DISTINCT a.binary_hash ) AS cnt
FROM eventnew a left join eventnew b on (a.binary_hash=b.binary_hash AND b.binary_hash IS NOT NULL AND b.DATE < '2010-10-16')
WHERE a.DATE >= '2010-10-16'
AND a.DATE < '2010-10-17'
and b.date is null

Related

how to speed up the mysql query to count average value?

the table structure is follow:
CREATE TABLE `crm_member` (
`member_id` int(11) NOT NULL AUTO_INCREMENT,
`shop_id` int(11) NOT NULL,
`nick` varchar(255) NOT NULL DEFAULT '',
`name` varchar(255) NOT NULL DEFAULT '',
`mobile` varchar(255) NOT NULL DEFAULT '',
`grade` int(11) NOT NULL DEFAULT '-1',
`trade_count` int(11) NOT NULL,
`trade_amount` float NOT NULL,
`last_trade_time` int(11) NOT NULL,
`trade_from` tinyint(4) NOT NULL,
`avg_price` float NOT NULL,
`seller_flag` tinyint(1) NOT NULL,
`is_black` tinyint(1) NOT NULL DEFAULT '0',
`created` int(11) NOT NULL,
PRIMARY KEY (`member_id`),
UNIQUE KEY `shop_id` (`shop_id`,`nick`),
KEY `last_trade_time` (`last_trade_time`),
KEY `idx_shop_id_grade` (`shop_id`,`grade`),
KEY `idx_shopid_created` (`shop_id`,`created`),
KEY `idx_trade_amount` (`shop_id`,`trade_amount`),
KEY `idx_trade_count` (`shop_id`,`trade_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
the table has 2148037 rows under the shop_id 3498706;
AND my query is like
SELECT AVG(trade_count) as trade_count, AVG(trade_amount) as trade_amount, AVG(grade) as grade from `crm_member_0141` WHERE shop_id = '3498706' and grade >= 0 and trade_count > 0 and is_black = 0 LIMIT 1
the query exec about 30 seconds .
the explain result
mysql> explain SELECT member_id, AVG(trade_count) as trade_count, AVG(trade_amount) as trade_amount, AVG(grade) as grade from `crm_member_0141` WHERE shop_id = 3498706 and grade >= 0 and trade_count > 0 and is_black = 0 order by member_id LIMIT 1;
+----+-------------+-----------------+------------+------+-------------------------------------------------------------------------------+---------+---------+-------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+------+-------------------------------------------------------------------------------+---------+---------+-------+---------+----------+-------------+
| 1 | SIMPLE | crm_member_0141 | NULL | ref | shop_id,idx_shop_id_grade,idx_shopid_created,idx_trade_amount,idx_trade_count | shop_id | 4 | const | 1074018 | 1.11 | Using where |
+----+-------------+-----------------+------------+------+-------------------------------------------------------------------------------+---------+---------+-------+---------+----------+-------------+
how to speed up the query?
It looks like mysql chooses to use the primary key instead of the shop_id, grade index, although shop_id, grade would probably speed up the selection of rows.
You can tell mysql to use a specific index for a query, using the USE INDEX directive :
try to add USE INDEX (idx_shop_id_grade) in your query after the table name and see if the computation is faster.
Otherwise, if this query is especially useful and frequent to call in your app, you can build a more specialised index for this query :
Your query selects on a particular value for shop_id and is_black, then does a range selection on grade and trade_count.
I would suggest to try indexing using shop_id, is_black, grade, trade_count.
note : obviously, test this first on test database

Slow mysql query with multiple joins

I have the following tables in my database:
product_fav:
CREATE TABLE `product_fav` (
`user_id` int(9) unsigned NOT NULL,
`asin` varchar(10) NOT NULL DEFAULT '',
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`price` decimal(7,2) NOT NULL,
PRIMARY KEY (`user_id`,`asin`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
product_info:
CREATE TABLE `product_info` (
`asin` varchar(10) NOT NULL,
`name` varchar(200) DEFAULT NULL,
`brand` varchar(50) DEFAULT NULL,
`part_number` varchar(50) DEFAULT NULL,
`url` text,
`image` text,
`availabillity` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`asin`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
product_price:
CREATE TABLE `product_price` (
`asin` varchar(10) NOT NULL,
`date` date NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`price` decimal(7,2) NOT NULL DEFAULT '0.00',
PRIMARY KEY (`asin`,`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I have the following query:
SELECT pi.*,
pp.price,
pf.date,
pf.price AS price_added,
round((100.0 (pp.price - pf.price) / pf.price),0) AS percentdiff
FROM product_info pi
JOIN
(
SELECT *
FROM product_price
ORDER BY date DESC) pp
ON pp.asin = pi.asin
JOIN product_fav pf
ON pp.asin = pf.asin
WHERE pf.user_id=". $user['user_id'] ."
GROUP BY asin
Product price has many records and query needs about 3 second. Is it possible to make it faster?
I have also the same issue with search query:
SELECT pi.*,
price,
date
FROM product_info pi
JOIN (SELECT *
FROM product_price
ORDER BY date DESC) pp
ON pi.asin = pp.asin
WHERE ( ` NAME ` LIKE '%".$search."%' )
GROUP BY pi.asin
ORDER BY price
EXPLAIN return this:
+----+-------------+---------------+--------+---------------+---------+---------+---------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+---------------+---------+---------+---------------+--------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 106709 | Using temporary; Using filesort |
| 1 | PRIMARY | pi | eq_ref | PRIMARY | PRIMARY | 32 | pp.asin | 1 | |
| 1 | PRIMARY | pf | eq_ref | PRIMARY | PRIMARY | 36 | const,pp.asin | 1 | |
| 2 | DERIVED | product_price | ALL | NULL | NULL | NULL | NULL | 112041 | Using filesort |
+----+-------------+---------------+--------+---------------+---------+---------+---------------+--------+---------------------------------+
You dont ORDER before JOIN, If you need order do it after the WHERE and GROUP BY so less data to sort.
JOIN
(
SELECT *
FROM product_price
ORDER BY date DESC) pp
Create index for asin so JOIN for ON pp.asin = pi.asin will be more efficient
Create index for user_id so the WHERE pf.user_id=". $user['user_id'] ." will be more efficient
Try running an EXPLAIN on your query to figure out where the bottle-neck is.
What's with the ORDER BY date in the inner query? Try getting rid of it. Also try replacing the inner query with a JOIN, they tend to be faster.
Also, do you have an index on the date field? Try adding one for the ORDER BY at the end of the query.

How ro reduce the mysql query running time

This is my query running in one page of my site
SELECT
DISTINCT b.CruisePortID,
b.SailingDates,
b.CruisePortID,
b.ArriveTime,
b.DepartTime,
b.PortName,
b.DayNumber
FROM
cruise_itineraries a,
cruise_itinerary_days b,
cruise_ports c
WHERE
a.ID = b.CruiseItineraryID
AND a.CruisePortID = c.ID
AND a.ID = '352905'
AND b.CruisePortID != 0
GROUP BY b.DayNumber;
while running this query in phpmy admin its take 3.20 sec because of cruise_itineraries had more 300 000 records
I tried indexing also after indexing it show 2.92 sec. Is any possible to reduced query time less .10 sec. Its help my site performance
here details
CREATE TABLE IF NOT EXISTS `cruise_itineraries` (
`cl` int(11) NOT NULL,
`ID` bigint(20) NOT NULL,
`Description` varchar(500) NOT NULL,
`SailingPlanID` varchar(100) NOT NULL,
`VendorID` varchar(100) NOT NULL,
`VendorName` varchar(100) NOT NULL,
`ShipID` varchar(100) NOT NULL,
`ShipName` varchar(100) NOT NULL,
`Duration` int(11) NOT NULL,
`DestinationID` varchar(100) NOT NULL,
`Date` datetime NOT NULL,
`CruisePortID` varchar(100) NOT NULL,
`TradeRestriction` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `cruise_itinerary_days` (
`cld` int(11) NOT NULL,
`CruiseItineraryID` varchar(100) NOT NULL,
`SailingDates` datetime NOT NULL,
`VendorID` int(11) NOT NULL,
`VendorName` varchar(100) NOT NULL,
`ShipID` int(11) NOT NULL,
`ShipName` varchar(100) NOT NULL,
`SailingPlanID` int(11) NOT NULL,
`PlanName` varchar(100) NOT NULL,
`DayNumber` bigint(20) NOT NULL,
`PortName` varchar(100) NOT NULL,
`CruisePortID` varchar(100) NOT NULL,
`ArriveTime` varchar(100) NOT NULL,
`DepartTime` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `cruise_ports` (
`cp` int(11) NOT NULL,
`ID` varchar(100) NOT NULL,
`Name` varchar(100) NOT NULL,
`Description` varchar(1000) NOT NULL,
`NearestAirportCode` varchar(100) NOT NULL,
`UNCode` varchar(100) NOT NULL,
`Address` varchar(500) NOT NULL,
`City` varchar(100) NOT NULL,
`StateCode` varchar(100) NOT NULL,
`CountryCode` varchar(100) NOT NULL,
`PostalCode` varchar(100) NOT NULL,
`Phone` varchar(50) NOT NULL,
`Fax` varchar(100) NOT NULL,
`Directions` varchar(1000) NOT NULL,
`Content` varchar(1000) NOT NULL,
`HomePageURL` varchar(100) NOT NULL,
`Longitude` varchar(100) NOT NULL,
`Latitude` varchar(500) NOT NULL,
`CarnivalID` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `cruise_itineraries`
ADD PRIMARY KEY (`cl`),
ADD KEY `ID_2` (`ID`);
ALTER TABLE `cruise_itineraries`
ADD PRIMARY KEY (`cl`),
ADD KEY `ID_2` (`ID`);
ALTER TABLE `cruise_itinerary_days`
ADD PRIMARY KEY (`cld`);
ALTER TABLE `cruise_ports`
ADD PRIMARY KEY (`cp`);
ALTER TABLE `cruise_itineraries`
MODIFY `cl` int(11) NOT NULL AUTO_INCREMENT;
ALTER TABLE `cruise_itinerary_days`
MODIFY `cld` int(11) NOT NULL AUTO_INCREMENT;
ALTER TABLE `cruise_ports`
MODIFY `cp` int(11) NOT NULL AUTO_INCREMENT;
EXPLAIN RESULT:
+----+-------------+-------+------+---------------+------+---------+-------+---------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+---------+--------------------------------------------------------+
| 1 | SIMPLE | a | ref | ID_2 | ID_2 | 8 | const | 1 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 3267 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 2008191 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+-------+---------+--------------------------------------------------------+
+----+-------------+-------+------+------------------------------------+------------------------------------+---------+-------+------+--------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------------------+------------------------------------+---------+-------+------+--------------------------------------------------------------+
| 1 | SIMPLE | b | ref | Idx_CruiseItineraryID_CruisePortID | Idx_CruiseItineraryID_CruisePortID | 9 | const | 12 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | a | ref | ID_2 | ID_2 | 8 | const | 1 | Distinct |
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 3267 | Using where; Distinct; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+------------------------------------+------------------------------------+---------+-------+------+--------------------------------------------------------------+
First I would like to state that try to avoid IMPLICIT MySQL JOINS.
Use INNER JOINS instead.
I personally think the INNER JOIN is better, because it is more
readable. It shows better the relations between the table. You got
those relations in the join, and you do the filtering in the WHERE
clause. This separation makes the query more readable.
The faults I've found:
The data type of cruise_itineraries.ID is BIGINT and the data type of cruise_itinerary_days.CruiseItineraryID is varchar. But you are matching them in a query. So it will run slow no matter if you use index on cruise_itinerary_days.CruiseItineraryID in cruise_itinerary_days table.
Change the data type of cruise_itinerary_days.CruiseItineraryID to BIGINT.
ALTER TABLE cruise_itinerary_days MODIFY CruiseItineraryID BIGINT;
Next you have to create a composite index on cruise_itinerary_days table based on your query.
ALTER TABLE cruise_itinerary_days ADD INDEX Idx_CruiseItineraryID_CruisePortID (CruiseItineraryID, CruisePortID)`
Now create an index in cruise_ports table on cruise_ports.ID field.
ALTER TABLE cruise_ports ADD INDEX Idx_cruise_ports_ID (ID);
And finally the query is formulated using INNER JOINS since I've stated reasons above behind this choice:
SELECT
DISTINCT b.CruisePortID,
b.SailingDates,
b.CruisePortID,
b.ArriveTime,
b.DepartTime,
b.PortName,
b.DayNumber
FROM cruise_itineraries a
INNER JOIN cruise_itinerary_days b ON a.ID = b.CruiseItineraryID
INNER JOIN cruise_ports c ON a.CruisePortID = c.ID
WHERE a.ID = 352905
AND b.CruisePortID != 0
GROUP BY b.DayNumber;

MySQL optimization query

i have one MySQL issue. I have to optimize some queries on my website. One of them i have already done, but there are still some which i cannot resolve without your help.
I have a table called "news":
CREATE TABLE IF NOT EXISTS `news` (
`id` int(10) NOT NULL auto_increment,
`edited` smallint(1) NOT NULL default '0',
`site` varchar(30) default NULL,
`foreign_id` varchar(25) default NULL,
`title` varchar(255) NOT NULL,
`text` text NOT NULL,
`image` varchar(255) default NULL,
`horizontal` smallint(1) NOT NULL,
`image_author` varchar(255) default NULL,
`text_author` varchar(255) default NULL,
`lang` varchar(3) NOT NULL,
`link` varchar(255) NOT NULL,
`date` date NOT NULL,
`redirect` smallint(1) NOT NULL,
`parent` int(10) NOT NULL,
`views` int(5) NOT NULL,
`status` smallint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `lang` (`lang`,`status`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=47122 ;
as you can see i have two indexes: "lang" and "date"
I have tried some combinations of different indexes and this one has produced me the best results ... unfortunately only on my local computer. On the server i still have bad results. I want to say that the database is the same.
query:
SELECT id FROM news WHERE lang = 'en' AND STATUS =1 ORDER BY DATE DESC LIMIT 0, 10
localhost explain:
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | news | index | lang | date | 3 | NULL | 23 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
server explain:
+----+-------------+-------+------+---------------+--------+---------+-------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------------+-------+-----------------------------+
| 1 | SIMPLE | news | ref | status | status | 13 | const,const | 15840 | Using where; Using filesort |
+----+-------------+-------+------+---------------+--------+---------+-------------+-------+-----------------------------+
I have looked a lot of other similar topics, but unfortunately i cannot find any solution to work on my server. I will be very glad to here from you some solution with some explanation for that so i can optimize my other queries.
Thanks !
This is your query:
SELECT id
FROM news
WHERE lang = 'en' AND STATUS =1
ORDER BY DATE DESC
LIMIT 0, 10
The best index is one that contains all the fields used in the query (four fields in all). The ordering in the index is by equality conditions in the where clause followed by the order by clause followed by other columns in the select clause.
So, try this index: ndws(leng, status, date, id).

How can I optimize a Mysql query that searches for rows in a certain date range

Here is the query:
select timespans.id as timespan_id, count(*) as num
from reports, timespans
where timespans.after_date >= '2011-04-13 22:08:38' and
timespans.after_date <= reports.authored_at and
reports.authored_at < timespans.before_date
group by timespans.id;
Here are the table defs:
CREATE TABLE `reports` (
`id` int(11) NOT NULL auto_increment,
`source_id` int(11) default NULL,
`url` varchar(255) default NULL,
`lat` decimal(20,15) default NULL,
`lng` decimal(20,15) default NULL,
`content` text,
`notes` text,
`authored_at` datetime default NULL,
`created_at` datetime default NULL,
`updated_at` datetime default NULL,
`data` text,
`title` varchar(255) default NULL,
`author_id` int(11) default NULL,
`orig_id` varchar(255) default NULL,
PRIMARY KEY (`id`),
KEY `index_reports_on_title` (`title`),
KEY `index_content_on_reports` (`content`(128))
CREATE TABLE `timespans` (
`id` int(11) NOT NULL auto_increment,
`after_date` datetime default NULL,
`before_date` datetime default NULL,
`after_offset` int(11) default NULL,
`before_offset` int(11) default NULL,
`is_common` tinyint(1) default NULL,
`created_at` datetime default NULL,
`updated_at` datetime default NULL,
`is_search_chunk` tinyint(1) default NULL,
`is_day` tinyint(1) default NULL,
PRIMARY KEY (`id`),
KEY `index_timespans_on_after_date` (`after_date`),
KEY `index_timespans_on_before_date` (`before_date`)
And here is the explain:
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9 | NULL | 84 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | reports | ALL | NULL | NULL | NULL | NULL | 183297 | Using where |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+
And here is the explain after I create an index on authored_at. As you can see, the index is not actually getting used (I think...)
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9 | NULL | 86 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | reports | ALL | index_reports_on_authored_at | NULL | NULL | NULL | 183317 | Range checked for each record (index map: 0x8) |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
There are about 142k rows in the reports table, and far fewer in the timespans table.
The query is taking about 3 seconds now.
The strange thing is that if I add an index on reports.authored_at, it actually makes the query far slower, about 20 seconds. I would have thought it would do the opposite, since it would make it easy to find the reports at either end of the range, and throw the rest away, rather than having to examine all of them.
Can someone clarify? I'm stumped.
Instead of two separate indexes for the timespan table, try merging them into a single multi-column index with before_date and after_date in a single index. Then add that index to authored_at as well.
i rewrite you query like this:
select t.id, count(*) as num from timespans t
join reports r where t.after_date >= '2011-04-13 22:08:38'
and r.authored_at >= '2011-04-13 22:08:38'
and r.authored_at < t.before_date
group by t.id order by null;
and change indexes of tables
alter table reports add index authored_at_idx(authored_at);
You can used partition feature of database on column after_date. It will help u a lot.