2 where clauses slower than separated queries - mysql

I'm using Google Cloud SQL (the micro server version) to run a couple of performance tests.
I want to do the following query:
select count(*) from table where A = valueA and B like "%input_string%";
+----------+
| count(*) |
+----------+
| 512997 |
+----------+
1 row in set (9.64 sec)
If I run them separately, I get:
select count(*) from table where A = valueA;
+----------+
| count(*) |
+----------+
| 512998 |
+----------+
1 row in set (0.18 sec)
select count(*) from table where B like "%input_string%";
+----------+
| count(*) |
+----------+
| 512997 |
+----------+
1 row in set (1.43 sec)
How is that difference in performance possible???
Both A and B columns have indexes as they are used to order tables in a web application.
Thx!
EDIT:
table schema
table | CREATE TABLE `table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`A` varchar(9) DEFAULT NULL,
`B` varchar(50) DEFAULT NULL,
`C` varchar(10) DEFAULT NULL,
`D` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `A` (`A`),
KEY `B` (`B`)
) ENGINE=InnoDB AUTO_INCREMENT=512999 DEFAULT CHARSET=utf8

A option might be using a FULLTEXT INDEX and using MATCH() on it.
CREATE TABLE `table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`A` varchar(9) DEFAULT NULL,
`B` varchar(50) DEFAULT NULL,
`C` varchar(10) DEFAULT NULL,
`D` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY(A),
FULLTEXT INDEX(B)
) ENGINE=InnoDB AUTO_INCREMENT=512999 DEFAULT CHARSET=utf8
And a query rewrite
SELECT
count(*)
FROM
`table`
WHERE
A = 'A'
AND
B IN (
SELECT
B
FROM
`table`
WHERE
MATCH(B) AGAINST('+input_string' IN BOOLEAN MODE)
)
The inner SQL will filter down a possible result based on the FULLTEXT index.
And the outer SQL will do the other filtering.
You could also use a UNION ALL now i think about it.
It should work on this questions CREATE TABLE statement .
The general idea is to get two counts for both filters and pick the lowest as valid count.
Query
SELECT
MIN(counted) AS 'COUNT(*)' # Result 512997
FROM (
select count(*) AS counted from `table` where A = 'A' # Result 512998
UNION ALL
select count(*) from `table` where B like "%input_string%" # Result 512997
) AS counts

Did you run each timing twice? If not, there could be caching involved that confuses you.
where A = valueA and B like "%input_string%"; begs for INDEX(A, B). Note: That composite index is not equivalent to your two separate indexes.
If you go with a FULLTEXT index on B, then this would be simpler:
SELECT COUNT(*) FROM t
WHERE MATCH(B) AGAINST('+input_string' IN BOOLEAN MODE)
AND A = valueA
(The use of a subquery should be unnecessary and slower.)

Related

MySQL Search for duplicate of a one to many relationship

I need to find the most efficient way in MySQL to compare 2 different instances of a one to many relationship. Take this table
CREATE TABLE `Table` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`ParentID` int(11) NOT NULL,
`ChildID` int(11) NOT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `pach` (`ParentID`,`ChildID`),
KEY `ParentID` (`ParentID`),
KEY `ChildID` (`ChildID`)
) ENGINE=InnoDB AUTO_INCREMENT=1;
insert into `Table` (`ID`,`ParentID`,`ChildID`) values
(1,1,1),
(2,1,2),
(3,1,3),
(4,1,4),
(5,2,1),
(6,2,3),
(7,3,1),
(8,3,3),
(9,3,4),
(10,4,1),
(11,4,4),
(12,4,3);
ParentID 3 has an identical set of children to ParentID 4, and that's what i need my query to be able to identify - Given ParentID=4, return ParentID 3 because it has exaclty the same children.
So far the only thing i can come up with is a very ugly group_concat query (see below). What would be a better approach to solve this problem?
select distinct(b.ParentID)
from `Table` a, `Table` b where
(select group_concat(ChildID order by ParentID asc) from `Table` where ParentID=a.ParentID )
=
(select group_concat(ChildID order by ParentID asc) from `Table` where ParentID=b.ParentID )
and b.ParentID!=a.ParentID
and a.parentID=4;
+----------+
| ParentID |
+----------+
| 3 |
+----------+

MySql group by optimization - avoid tmp table and/or filesort

I have a slow query, without the group by is fast (0.1-0.3 seconds), but with the (required) group by the duration is around 10-15s.
The query joins two tables, events (near 50 million rows) and events_locations (5 million rows).
Query:
SELECT `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1
AND el.other_id = '1'
AND time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;
Table events:
CREATE TABLE `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_type_id` int(11) NOT NULL,
`entity_type_id` int(11) NOT NULL,
`entity_id` varchar(64) NOT NULL,
`alias` varchar(64) NOT NULL,
`time_stamp` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `entity_id` (`entity_id`),
KEY `event_type_idx` (`event_type_id`),
KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table events_locations
CREATE TABLE `events_locations` (
`event_id` bigint(20) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
`some_id` bigint(20) DEFAULT NULL,
`other_id` bigint(20) DEFAULT NULL,
`time_span` bigint(20) DEFAULT NULL,
`group_alias` varchar(64) NOT NULL,
KEY `some_id_idx` (`some_id`),
KEY `idx_events_group_alias` (`group_alias`),
KEY `idx_event_id` (`event_id`),
CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The explain:
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1 | SIMPLE | ea | ALL | 'idx_event_id' | NULL | NULL | NULL | 5152834 | 'Using where; Using temporary; Using filesort' |
| 1 | SIMPLE | e | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8' | 'name.ea.event_id' | 1 | |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)
From the doc:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.
I already tried:
Create an index by 'el.some_id , el.group_alias'
Decrease the varchar size to 20
Increase the size of sort_buffer_size and read_rnd_buffer_size;
Any suggestions for performance tuning would be much appreciated!
In your case events table has time_span as indexing property. So before joining both tables first select required records from events table for specific date range with required details. Then join the event_location by using table relation properties.
Check your MySql Explain keyword to check how does your approach your table records. It will tell you how much rows are scanned for before selecting required records.
Number of rows that are scanned also involve in query execution time. Use my below logic to reduce the number of rows that are scanned.
SELECT
`e`.`id` AS `event_id`,
`e`.`time_stamp` AS `time_stamp`,
`el`.`latitude` AS `latitude`,
`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,
`e`.`entity_id` AS `asset_name`,
`el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,
`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id` AS `entity_type_id`,
`el`.`some_id` as `some_id`
FROM
(select
`id` AS `event_id`,
`time_stamp` AS `time_stamp`,
`entity_id` AS `asset_name`,
`event_type_id` AS `event_type_id`,
`entity_type_id` AS `entity_type_id`
from
`events`
WHERE
time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
) AS `e`
JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE
`el`.`other_id` = '1'
GROUP BY
`e`.`event_type_id` ,
`el`.`some_id` ,
`el`.`group_alias`;
The relationship between these tables is 1:1, so, I asked me why is a group by required and I found some duplicated rows, 200 in 50000 rows. So, somehow, my system is inserting duplicates and someone put that group by (years ago) instead of seek of the bug.
So, I will mark this as solved, more or less...

MySQL 5.6 InnoDB count, group by, having and order by return wrong result

I have a problem with a big query but tried to simplify it and found similar strange behaviour:
select concat(a.col1,a.col2) as b,
count(a.id) as c
from test as a
group by a.id
having b = "644591"
order by b
The same query returns no results on 5.6 InnoDB but 5.5 MyISAM returns one correct match.
If you remove the "order by b" it returns correct on InnoDB too.
Table:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`col1` varchar(100) NOT NULL DEFAULT '',
`col2` varchar(100) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB
id|col1 |col2
1| |644591
2|70083531|1226109
How about...
select concat(a.col1,a.col2) as b,
count(a.id) as c
from test a
where concat(a.col1,a.col2) = '644591'
group by concat(a.col1,a.col2)
order by b;

How can I optimize MySQL query for update?

I have a table with 300 000 records. In this table have duplicae rows and I want to update column "flag"
TABLE
------------------------------------
|number | flag | ... more column ...|
------------------------------------
|ABCD | 0 | ...................|
|ABCD | 0 | ...................|
|ABCD | 0 | ...................|
|BCDE | 0 | ...................|
|BCDE | 0 | ...................|
I use this query for updating "flag" column:
UPDATE table i
INNER JOIN (SELECT number FROM table
GROUP BY number HAVING count(number) > 1 ) i2
ON i.number = i2.number
SET i.flag = '1'
This query working very very slowly (more 600 seconds) for this 300 000 records.
How Can I optimize this query?
STRUCTURE OF MY TABLE
CREATE TABLE IF NOT EXISTS `inv` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`pn` varchar(10) NOT NULL COMMENT 'Part Number',
`qty` int(5) NOT NULL,
`qty_old` int(5) NOT NULL,
`flag_qty` tinyint(1) NOT NULL,
`name` varchar(60) NOT NULL,
`vid` int(11) NOT NULL ,
`flag_d` tinyint(1) NOT NULL ,
`flag_u` tinyint(1) NOT NULL ,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `pn` (`pn`),
KEY `name` (`name`),
KEY `vid` (`vid`),
KEY `pn_2` (`pn`),
KEY `flag_qty` (`flag_qty`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=0 ;
If "name" is duplicate I want to update flag_qty
If you do not already have an index on number you should add one -
CREATE INDEX table_number ON table (number);
UPDATE Try this -
UPDATE inv t1
INNER JOIN inv t2
ON t1.name = t2.name
AND t1.id <> t2.id
SET t1.flag_qty = 1;
You can create your table with just the duplicates by selecting this data directly into another table instead of doing this flag update first.
INSERT INTO duplicate_invs
SELECT DISTINCT inv1.*
FROM inv AS inv1
INNER JOIN inv AS inv2
ON inv1.name = inv2.name
AND inv1.id < inv2.id
If you can explain the logic for which rows get deleted from inv table it may be that the whole process can be done in one step.
Get MySQL to EXPLAIN the query to you. Then you will see what indexing would improve things.
EXPLAIN will show you where it is slow and here're some ideas, how to improve perfomance:
Add indexing
Use InnoDB foreign keys
Split query into 2 and process them separately in lagnuage you use.
write the same idea in MySQL procedure (not sure, whether this would be fast).
I would use a temp table. 1.) select all relevant records into a temp table, set INDEX on id. 2.) update the table using something like this
UPDATE table i, tmp_i
SET i.flag = '1'
WHERE i.id = tmp_i.id
you can try (assuming VB.net, but can be implemented with any language).
Dim ids As String = Cmd.ExectueScalar("select group_concat(number) from (SELECT number FROM table GROUP BY number HAVING count(number) > 1)")
After you get the list of IDs (comma-delimited) than use
UPDATE i
SET i.flag = '1'
WHERE i.number in ( .... )
It can be slow also, but the first - SELECT, will not lock up your database and replication, etc. the UPDATE will be faster.

Optimizing slow query in MySql InnoDB Database

I have a MySql database with a query that is running really slow. I'm trying the best I can to make it perform better but can't see what I'm doing wrong here. Maybe you can?
CREATE TABLE `tablea` (
`a` int(11) NOT NULL auto_increment,
`d` mediumint(9) default NULL,
`c` int(11) default NULL,
PRIMARY KEY (`a`),
KEY `d` USING BTREE (`d`)
) ENGINE=InnoDB AUTO_INCREMENT=1867710 DEFAULT CHARSET=utf8;
CREATE TABLE `tableb` (
`b` int(11) NOT NULL auto_increment,
`d` mediumint(9) default '1',
`c` int(10) NOT NULL,
`e` mediumint(9) default NULL,
PRIMARY KEY (`b`),
KEY `c` (`c`),
KEY `d` (`d`),
) ENGINE=InnoDB AUTO_INCREMENT=848150 DEFAULT CHARSET=utf8;
The query:
SELECT tablea.a, tableb.e
FROM tablea
INNER JOIN tableb ON (tablea.c=tableb.c) AND (tablea.d=tableb.d OR tableb.d=1)
WHERE tablea.d=100
This query takes like 10 seconds to run if tablea.d=100 gives 1500 rows and (tablea.d=tableb.d OR tableb.d=1) gives 1600 rows. This seems really slow. I need to make it much faster but I can't see what I'm doing wrong.
MySql EXPLAIN outputs:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tablea ref d d 4 const 1092 Using where
1 SIMPLE tableb ref c,d c 4 tablea.c 1 Using where
If I am not confused by the OR, the query is equivalent to:
SELECT tablea.a, tableb.e
FROM tablea
INNER JOIN tableb
ON tablea.c = tableb.c
WHERE tablea.d = 100
AND tableb.d IN (1,100)
Try it (using EXPLAIN) with various indexes. An index on the d field (in both tables) would help. Perhaps more, an index on (d,c).