MySQL query optimization with between or larger than > condition - mysql

Problem: slow query.
table1 has about 5 000 rows
table2 has about 50 000 rows
timestamp format is int(11)
MySQL - 20 seconds (with indexes)
PostgreSQL - 0,04 seconds (with indexes)
SELECT *
FROM table1
LEFT JOIN table2
ON table2_timestamp BETWEEN table1_timestamp - 500
AND table1_timestamp + 500 ;
Can anybody help me with optimize this query for MySQL?
Explain:
1 SIMPLE a index a 9 2 Using index
1 SIMPLE b index b b 9 5 Using index
Tables:
CREATE TABLE `a` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`table1_timestamp` bigint(20) NULL DEFAULT NULL ,
PRIMARY KEY (`id`),
INDEX `a` (`table1_timestamp`) USING BTREE
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci
AUTO_INCREMENT=3
ROW_FORMAT=COMPACT
;
CREATE TABLE `b` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`table2_timestamp` bigint(20) NULL DEFAULT NULL ,
PRIMARY KEY (`id`),
INDEX `a` (`table2_timestamp`) USING BTREE
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci
AUTO_INCREMENT=3
ROW_FORMAT=COMPACT
;

A couple of points spring to mind but both feel like long-shots. Realistically it looks as though there shouldn't be much you can do to your query assuming your example is an accurate representation.
1 : You are using BIGINT which has a maximum value of 9x10^18 (SIGNED). INT has a max value of 4x10^9 (UNSIGNED), compared to days timestamp which is around 1.4x10^9 (all values approximate) and so consider changing the data type of that column in both tables from BIGINT to INT UNSIGNED or DATETIME
2 : The ROW_FORMAT is COMPACT which may cause issues with BTREE indexes (source). You are dealing with INT data types and so a ROW_FORMAT of FIXED would suffice so try changing to ROW_FORMAT=FIXED on both tables
3 : If always expecting rows to be returned from table2 for table1 rows then INNER JOIN would be more efficient than LEFT JOIN

Related

Design Database to store lists

I apologize for the ambiguity of the column and table names.
My database has two tables A and B. Its a many to many relationship between these tables.
Table A has around 200 records
Table A structure
Id. Definition
12 Def1
42 Def2 .... etc.
Table B has around 5 Billion records
Column 1 . Associated Id(from table A)
eg . abc 12
abc 21
pqr 42
I am trying to optimize the way data is stored in table B, as it has a lot of redundant data. The structure am thinking of, is as follows
Column 1 Associated Ids
abc 12, 21
pqr 42
The "Associated Id" column can have updates when new rows are added to table A.
Is this a good structure to create in this scenario? If yes what should the column type be for the "Associated Id"? I am using mysql database.
Create table statements.
CREATE TABLE `A` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(100) DEFAULT NULL,
`name` varchar(100) DEFAULT NULL,
`creat_usr_id` varchar(20) NOT NULL,
`creat_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`modfd_usr_id` varchar(20) DEFAULT NULL,
`modfd_ts` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `A_ak1` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=277 DEFAULT CHARSET=utf8;
CREATE TABLE `B`(
`col1` varchar(128) NOT NULL,
`id` int(11) NOT NULL,
`added_dt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`creat_usr_id` varchar(20) NOT NULL,
`creat_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`col1`,`id`,`added_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(added_dt))
(PARTITION Lessthan_2016 VALUES LESS THAN (1451606400) ENGINE = InnoDB,
PARTITION L`Ω`essthan_201603 VALUES LESS THAN (1456790400) ENGINE = InnoDB,
PARTITION Lessthan_201605 VALUES LESS THAN (1462060800) ENGINE = InnoDB,
PARTITION Lessthan_201607 VALUES LESS THAN (1467331200) ENGINE = InnoDB,
PARTITION Lessthan_201609 VALUES LESS THAN (1472688000) ENGINE = InnoDB,
PARTITION Lessthan_201611 VALUES LESS THAN (1477958400) ENGINE = InnoDB,
PARTITION Lessthan_201701 VALUES LESS THAN (1483228800) ENGINE = InnoDB,
PARTITION pfuture VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
Indexes.
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Index_type Comment Index_comment
B 0 PRIMARY 1 col1 A
2 NULL NULL BTREE
B 0 PRIMARY 2 id A
6 NULL NULL BTREE
B 0 PRIMARY 3 added_dt A
6 NULL NULL BTREE
5 billion rows here. Let me walk through things:
col1 varchar(128) NOT NULL,
How often is this column repeated? That is, is is worth it to 'normalize it?
id int(11) NOT NULL,
Cut the size of this column in half (4 bytes -> 2), since you have only 200 distinct ids:
a_id SMALLINT UNSIGNED NOT NULL
Range of values: 0..65535
added_dt timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
Please explain why this is part of the PK. That is a rather odd thing to do.
creat_usr_id varchar(20) NOT NULL,
creat_ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
Toss these as clutter, unless you can justify keeping track of 5 billion actions this way.
PRIMARY KEY (col1,id,added_dt)
I'll bet you will eventually get two rows in the same second. A PK is 'unique'. Perhaps you need only (col, a_id)`? Else, you are allowing a col-a_id pair to be added multiple times. Or maybe you want IODKU to add a new row versus update the timestamp?
PARTITION...
This is useful if (and probably only if) you intend to remove 'old' rows. Else please explain why you picked partitioning.
It is hard to review a schema without seeing the main SELECTs. In the case of large tables, we should also review the INSERTs, UPDATEs, and DELETEs, since each of them could pose serious performance problems.
At 100 rows inserted per second, it will take more than a year to add 5B rows. How fast will the rows be coming in? This may be a significant performance issue, too.

Reduce execution time of query

I want to reduce the time taken by the query in mysql.
There are three tables say
A ~600k rows,
B ~2K rows,
C ~100K rows
having 2 columns each.
A has one column which is used in aggregation and other to join with table B.
B has one column to join with A and other with C
C has one column to join with B and other column to group by.
What should be the indexing plan to reduce the run time. As of now it is using temporary tables and then file sort. Is there any way we could avoid temporary tables.
Sample query :
SELECT
sum(`revenue_facts`.`total_price`) AS `m0`
FROM
`category_groups` AS `category_groups`,
`revenue_facts` AS `revenue_facts`,
`dim_products` AS `dim_products`
WHERE
`dim_products`.`product_category_group_sk` = `category_groups`.`product_category_group_sk` AND
`revenue_facts`.`product_sk` = `dim_products`.`product_sk`
GROUP BY `category_groups`.`category_name`;
I already have indexes on group by column and the columns in join.
my query is currently taking *6 minute*s. I want to reduce the time taken. table structure is as
table A :
CREATE TABLE `revenue_facts` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`product_sk` bigint(20) unsigned NOT NULL,
`total_price` decimal(12,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `product_sk` (`product_sk`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table B :
CREATE TABLE `dim_products` (
`product_sk` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`product_category_group_sk` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`product_sk`),
KEY `product_id` (`product_id`),
KEY (`product_sk`) (`product_sk`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
table C :
CREATE TABLE `category_groups` (
`product_category_group_sk` bigint(20) unsigned NOT NULL,
`category_sk` bigint(20) unsigned NOT NULL,
`category_name` varchar(255) NOT NULL,
PRIMARY KEY (`product_category_group_sk`,`category_sk`),
KEY `category_sk` (`category_sk`),
KEY `product_category_group_sk` (`product_category_group_sk`
KEY `category_sk` (`category_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Execution plan used is:
1 SIMPLE dim_products index PRIMARY,product_category_group_index product_category_group_index 8 NULL 651264 Using index; Using temporary; Using filesort
1 SIMPLE category_groups ref PRIMARY,category_sk,product_category_group_sk,category_name product_category_group_sk 8 etl_testing.dim_products.product_category_group_sk 4 Using index
1 SIMPLE revenue_facts ref product_sk product_sk 8 etl_testing..dim_products.product_sk 5 NULL
Try this:
SELECT
sum(`revenue_facts`.`total_price`) AS `m0`
FROM
(`dim_products` LEFT JOIN `category_groups` ON `dim_products`.`product_category_group_sk` = `category_groups`.`product_category_group_sk`)
LEFT JOIN `revenue_facts` ON `dim_products`.`product_sk` = `revenue_facts`.`product_sk`
GROUP BY `category_groups`.`category_name`;
Also, as Abdul said:
"Post your table structures and explain plan"

How to improve search performance in MySQL

I have a table that contains two bigint columns: beginNumber, endNumber, defined as UNIQUE. The ID is the Primary Key.
ID | beginNumber | endNumber | Name | Criteria
The second table contains a number. I want to retrieve the record from table1 when the Number from table2 is found to be between any two numbers. The is the query:
select distinct t1.Name, t1.Country
from t1
where t2.Number
BETWEEN t1.beginIpNum AND t1.endNumber
The query is taking too much time as I have so many records. I don't have experience in DB. But, I read that indexing the table will improve the search so MySQL does not have to pass through every row searching about m Number and this can be done by, for example, having UNIQE values. I made the beginNumber & endNumber in table1 as UNIQUE. Is this all what I can do ? Is there any possible way to improve the time ? Please, provide detailed answers.
EDIT:
table1:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`beginNumber` bigint(20) DEFAULT NULL,
`endNumber` bigint(20) DEFAULT NULL,
`Name` varchar(255) DEFAULT NULL,
`Criteria` varchar(455) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `beginNumber_UNIQUE` (`beginNumber`),
UNIQUE KEY `endNumber_UNIQUE` (`endNumber `)
) ENGINE=InnoDB AUTO_INCREMENT=327 DEFAULT CHARSET=utf8
table2:
CREATE TABLE `t2` (
`id2` int(11) NOT NULL AUTO_INCREMENT,
`description` varchar(255) DEFAULT NULL,
`Number` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id2`),
UNIQUE KEY ` description _UNIQUE` (`description `)
) ENGINE=InnoDB AUTO_INCREMENT=433 DEFAULT CHARSET=utf8
This is a toy example of the tables but it shows the concerned part.
I'd suggest an index on t2.Number like this:
ALTER TABLE t2 ADD INDEX numindex(Number);
Your query won't work as written because it won't know which t2 to use. Try this:
SELECT DISTINCT t1.Name, t1.Criteria
FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE t2.Number BETWEEN t1.beginNumber AND t1.endNumber);
Without the t2.Number index EXPLAIN gives this query plan:
1 PRIMARY t1 ALL 1 Using where; Using temporary
2 DEPENDENT SUBQUERY t2 ALL 1 Using where
With an index on t2.Number, you get this plan:
PRIMARY t1 ALL 1 Using where; Using temporary
DEPENDENT SUBQUERY t2 index numindex numindex 9 1 Using where; Using index
The important part to understand is that an ALL comparison is slower than an index comparison.
This is a good place to use binary tree index (default is hashmap). Btree indexes are best when you often sort or use between on column.
CREATE INDEX index_name
ON table_name (column_name)
USING BTREE

MySQL optimization - large table joins

To start out here is a simplified version of the tables involved.
tbl_map has approx 4,000,000 rows, tbl_1 has approx 120 rows, tbl_2 contains approx 5,000,000 rows. I know the data shouldn't be consider that large given that Google, Yahoo!, etc use much larger datasets. So I'm just assuming that I'm missing something.
CREATE TABLE `tbl_map` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`tbl_1_id` bigint(20) DEFAULT '-1',
`tbl_2_id` bigint(20) DEFAULT '-1',
`rating` decimal(3,3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tbl_1_id` (`tbl_1_id`),
KEY `tbl_2_id` (`tbl_2_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_2` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`data` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The Query in interest: also, instead of ORDER BY RAND(), ORDERY BY t.id DESC. The query is taking as much as 5~10 seconds and causes a considerable wait when users view this page.
EXPLAIN SELECT t.data, t.id , tm.rating
FROM tbl_2 AS t
JOIN tbl_map AS tm
ON t.id = tm.tbl_2_id
WHERE tm.tbl_1_id =94
AND tm.rating IS NOT NULL
ORDER BY t.id DESC
LIMIT 200
1 SIMPLE tm ref tbl_1_id, tbl_2_id tbl_1_id 9 const 703438 Using where; Using temporary; Using filesort
1 SIMPLE t eq_ref PRIMARY PRIMARY 8 tm.tbl_2_id 1
I would just liked to speed up the query, ensure that I have proper indexes, etc.
I appreciate any advice from DB Gurus out there! Thanks.
SUGGESTION : Index the table as follows:
ALTER TABLE tbl_map ADD INDEX (tbl_1_id,rating,tbl_2_id);
As per Rolando, yes, you definitely need an index on the map table but I would expand to ALSO include the tbl_2_id which is for your ORDER BY clause of Table 2's ID (which is in the same table as the map, so just use that index. Also, since the index now holds all 3 fields, and is based on the ID of the key search and criteria of null (or not) of rating, the 3rd element has them already in order for your ORDER BY clause.
INDEX (tbl_1_id,rating, tbl_2_id);
Then, I would just have the query as
SELECT STRAIGHT_JOIN
t.data,
t.id ,
tm.rating
FROM
tbl_map tm
join tbl_2 t
on tm.tbl_2_id = t.id
WHERE
tm.tbl_1_id = 94
AND tm.rating IS NOT NULL
ORDER BY
tm.tbl_2_id DESC
LIMIT 200

Why would this query run so slow?

I have two MySQL tables say A and B. A contains just one varchar column (lets call that one A1) with about 23000 records. Table B (70000 records) has some more columns, one of the corresponding with A1 from table A (lets call that one B1). I want to know which values in A are not in the corresponding column in B, so I use:
SELECT A1
FROM A
LEFT JOIN B
ON A1 = B1
WHERE B1 IS NULL
Both columns A1 and B1 have indices defined on them. Still this query runs very slow. I've run explain, this is the output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE A index \N PRIMARY 767 \N 23269 Using index
1 SIMPLE B ALL \N \N \N \N 70041 Using where; Not exists
UPDATE: SHOW CREATE TABLE for both tables (changed the original names);
CREATE TABLE `A` (
`A1` varchar(255) NOT NULL,
PRIMARY KEY (`A1`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `B` (
`col1` int(10) unsigned NOT NULL auto_increment,
`col2` datetime NOT NULL,
`col3` datetime default NULL,
`col4` datetime NOT NULL,
`col5` varchar(30) NOT NULL,
`col6` int(10) default NULL,
`col7` int(11) default NULL,
`col8` varchar(20) NOT NULL,
`B1` varchar(255) default NULL,
`col10` tinyint(1) NOT NULL,
`col11` varchar(255) default NULL,
PRIMARY KEY (`col1`),
KEY `NewIndex1` (`B1`)
) ENGINE=MyISAM AUTO_INCREMENT=70764 DEFAULT CHARSET=latin1
'nother edit: data_length and index_length from SHOW TABLE STATUS
table data_length index_length
A 465380 435200
B 5177996 1344512
The character sets of the two columns that you are comparing in an OUTER JOIN differ. I am not sure if this is the cause so I tested and got these results:
SELECT A1
FROM A
LEFT JOIN B ON A1 = B1
WHERE B1 IS NULL
-- Table A..: 23258 rows, collation = utf8_general_ci
-- Table B..: 70041 rows, collation = latin1_swedish_ci
-- Time ....: I CANCELLED THE QUERY AFTER 20 MINUTES
-- Table A..: 23258 rows, collation = latin1_swedish_ci
-- Table B..: 70041 rows, collation = latin1_swedish_ci
-- Time ....: 0.187 sec
-- Table A..: 23258 rows, collation = utf8_general_ci
-- Table B..: 70041 rows, collation = utf8_general_ci
-- Time ....: 0.344 sec
Solution: make the character sets of the two tables (or the two columns atleast) same.
This query will scan all rows of table A, but if you have an index on B1 then most likely it will not scan table B:
select A1
from A
where not exists (
select *
from B
where B.B1 = A.A1
)
Before running this or your original query you may try to run ANALYZE TABLE in order to update key distribution information for those tables:
ANALYZE TABLE A, B
If this doesn't help then you can try to play with indexes, for instance:
select A1
from A ignore index (PRIMARY)
where not exists (
select *
from B force index (NewIndex1)
where B.B1 = A.A1
)
It seems A1 and B1 are large feilds.
You created indices for both A1 and B1
Make sure that they are indexed!
SELECT A1
FROM A
WHERE A1 NOT IN (
SELECT B1 AS A1 From B;
)
If I use your CREATE TABLES statements and run an EXPLAIN on your SELECT statement, I get this result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE A index NULL PRIMARY 767 NULL 2 Using index
1 SIMPLE B index NULL NewIndex1 258 NULL 4 Using where; Using index
On my MySQL version (5.1.41) the index is used as expected, so I think this might be an already fixed bug in MySQL assuming your index is set like in your create table statement posted. What MySQL version do you use?
try this query:
SELECT B1
FROM B
WHERE not B1 in (
select A1
from a
)