MySQL optimization - large table joins

MySQL optimization - large table joins - mysql

To start out here is a simplified version of the tables involved.
tbl_map has approx 4,000,000 rows, tbl_1 has approx 120 rows, tbl_2 contains approx 5,000,000 rows. I know the data shouldn't be consider that large given that Google, Yahoo!, etc use much larger datasets. So I'm just assuming that I'm missing something.
CREATE TABLE `tbl_map` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`tbl_1_id` bigint(20) DEFAULT '-1',
`tbl_2_id` bigint(20) DEFAULT '-1',
`rating` decimal(3,3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tbl_1_id` (`tbl_1_id`),
KEY `tbl_2_id` (`tbl_2_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_2` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`data` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The Query in interest: also, instead of ORDER BY RAND(), ORDERY BY t.id DESC. The query is taking as much as 5~10 seconds and causes a considerable wait when users view this page.
EXPLAIN SELECT t.data, t.id , tm.rating
FROM tbl_2 AS t
JOIN tbl_map AS tm
ON t.id = tm.tbl_2_id
WHERE tm.tbl_1_id =94
AND tm.rating IS NOT NULL
ORDER BY t.id DESC
LIMIT 200
1 SIMPLE tm ref tbl_1_id, tbl_2_id tbl_1_id 9 const 703438 Using where; Using temporary; Using filesort
1 SIMPLE t eq_ref PRIMARY PRIMARY 8 tm.tbl_2_id 1
I would just liked to speed up the query, ensure that I have proper indexes, etc.
I appreciate any advice from DB Gurus out there! Thanks.

SUGGESTION : Index the table as follows:
ALTER TABLE tbl_map ADD INDEX (tbl_1_id,rating,tbl_2_id);

As per Rolando, yes, you definitely need an index on the map table but I would expand to ALSO include the tbl_2_id which is for your ORDER BY clause of Table 2's ID (which is in the same table as the map, so just use that index. Also, since the index now holds all 3 fields, and is based on the ID of the key search and criteria of null (or not) of rating, the 3rd element has them already in order for your ORDER BY clause.
INDEX (tbl_1_id,rating, tbl_2_id);
Then, I would just have the query as
SELECT STRAIGHT_JOIN
t.data,
t.id ,
tm.rating
FROM
tbl_map tm
join tbl_2 t
on tm.tbl_2_id = t.id
WHERE
tm.tbl_1_id = 94
AND tm.rating IS NOT NULL
ORDER BY
tm.tbl_2_id DESC
LIMIT 200

Related

Mysql EXISTS vs IN slow performance

I have posts and websites (and connecting post_websites). Each post can be on multiple websites, and some websites share the content, so I am trying to access the posts which are attached to particular website IDs.
Most of the cases WHERE IN works fine, but not for all websites, some of them are laggy, and I can't understand a difference.
SELECT *
FROM `posts`
WHERE `posts`.`id` IN (
SELECT `post_websites`.`post_id`
FROM `post_websites`
WHERE `website_id` IN (
12054,
19829,
2258,
253
)
) AND
`status` = 1 AND
`posts`.`deleted_at` IS NULL
ORDER BY `post_date` DESC
LIMIT 6
Explain
select_type
table
type
key
key_len
ref
rows
Extra
SIMPLE
post_websites
range
post_websites_website_id_index
4
NULL
440
Using index condition; Using temporary; Using filesort; Start temporary
SIMPLE
posts
eq_ref
PRIMARY
4
post_websites.post_id
1
Using where; End temporary
Other version with EXISTS
SELECT *
FROM `posts`
WHERE EXISTS (
SELECT `post_websites`.`post_id`
FROM `post_websites`
WHERE `website_id` IN (
12054,
19829,
2258,
253
) AND
`posts`.`id` = `post_websites`.`post_id`
) AND
`status` = 1 AND
`deleted_at` IS NULL
ORDER BY `post_date` DESC
LIMIT 6
EXPLAIN:
select_type
table
type
key
key_len
ref
rows
Extra
PRIMARY
posts
index
post_date_index
5
NULL
12
Using where
DEPENDENT SUBQUERY
post_websites
ref
post_id_website_id_unique
4
post.id
1
Using where; Using index
Long story short: based on different amounts of posts on each site and amount of websites sharing content the results are different from 20ms to 50s!
Based on the EXPLAIN the EXISTS works better, but on practice when the amount of data in subquery is lower, it can be very slow.
Is there a query I am missing that could work like a charm for all cases? Or should I check something before querying and choose the method of doing so dynamically?
migrations:
CREATE TABLE `posts` (
`id` int(10) UNSIGNED NOT NULL,
`title` varchar(225) COLLATE utf8_unicode_ci NOT NULL,
`description` varchar(500) COLLATE utf8_unicode_ci NOT NULL,
`post_date` timestamp NULL DEFAULT NULL,
`status` tinyint(4) NOT NULL DEFAULT '1',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`deleted_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
ALTER TABLE `posts`
ADD PRIMARY KEY (`id`),
ADD KEY `created_at_index` (`created_at`) USING BTREE,
ADD KEY `status_deleted_at_index` (`status`,`deleted_at`) USING BTREE,
ADD KEY `post_date_index` (`post_date`) USING BTREE,
ADD KEY `id_post_date_status_deleted_at` (`id`,`post_date`,`status`,`deleted_at`) USING BTREE;
CREATE TABLE `post_websites` (
`post_id` int(10) UNSIGNED NOT NULL,
`website_id` int(10) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
ALTER TABLE `post_websites`
ADD PRIMARY KEY (`website_id`, `post_id`),
ADD UNIQUE KEY `post_id_website_id_unique` (`post_id`,`website_id`),
ADD KEY `website_id_index` (`website_id`),
ADD KEY `post_id_index` (`post_id`);
eloquent:
$news = Post::select(['title', 'description'])
->where('status', 1)
->whereExists(
function ($query) use ($sites) {
$query->select('post_websites.post_id')
->from('post_websites')
->whereIn('websites_id', $sites)
->whereRaw('post_websites.post_id = posts.id');
})
->orderBy('post_date', 'desc');
->limit(6)
->get();
or
$q->whereIn('posts.id',
function ($query) use ($sites) {
$query->select('post_websites.post_id')
->from('post_websites')
->whereIn('website_id', $sites);
});
Thanks.

Many:many table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
That says to get rid if id (because it slows things down), promote that UNIQUE to be the PK, and add an INDEX in the opposite direction.
Don't use IN ( SELECT ... ). A simple JOIN is probably the best alternative here.
Did some 3rd party package provide those 3 TIMESTAMPs for each table? Are they ever used? Get rid of them.
KEY `id_post_date_status_deleted_at` (`id`,`post_date`,`status`,`deleted_at`) USING BTREE;
is mostly backward. Some rules:
Don't start an index with the PRIMARY KEY column(s).
Do start an index with = tests: status,deleted_at

Mysql GROUP BY really slow on a view

I have these tables.
CREATE TABLE `movements` (
`movementId` mediumint(8) UNSIGNED NOT NULL,
`movementType` tinyint(3) UNSIGNED NOT NULL,
`deleted` tinyint(1) UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `movements`
ADD PRIMARY KEY (`movementId`),
ADD KEY `movementType` (`movementType`) USING BTREE,
ADD KEY `deleted` (`deleted`),
ADD KEY `movementId` (`movementId`,`deleted`);
CREATE TABLE `movements_items` (
`movementId` mediumint(8) UNSIGNED NOT NULL,
`itemId` mediumint(8) UNSIGNED NOT NULL,
`qty` decimal(10,3) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `movements_items`
ADD KEY `movementId` (`movementId`),
ADD KEY `itemId` (`itemId`),
ADD KEY `movementId_2` (`movementId`,`itemId`);
and this view called "movements_items_view".
SELECT
movements_items.itemId, movements_items.qty,
movements.movementId, movements.movementType
FROM movements_items
JOIN movements ON (movements.movementId=movements_items.movementId
AND movements.deleted=0)
The first table has 5913 rows, the second one has 144992.
The view is very fast, it loads 20 result in PhpMyAdmin in 0.0011s but as soon as I ask for a GROUP BY on it (I need it to do statistics with SUM()) es:
SELECT * FROM movements_items_view GROUP BY itemId LIMIT 0,20
time jumps to 0.2s or more and it causes "Using where; Using temporary; Using filesort" on movements join.
Any help appreciated, thanks.
EDIT:
I also run via phpMyAdmin this query to try to not use the view:
SELECT movements.movementId, movements.movementType, movements_items.qty
FROM movements_items
JOIN movements ON movements.movementId=movements_items.movementId
GROUP BY itemId LIMIT 0,20
And the performance is the same.
Edit. Here is the EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE movements index PRIMARY,movementid movement_type 1 NULL 5913 Using index; Using temporary; Using filesort
1 SIMPLE movements_items ref movementId,itemId,movementId_2 movementId_2 3 movements.movementId 12 Using index

Turn it inside out. See if this works:
SELECT movementId, m.movementType, mi.qty
FROM
( SELECT movementId, qty
FROM movements_items
GROUP BY itemId
ORDER BY itemId
LIMIT 20
) AS mi
JOIN movements AS m USING(movementId)
The trick is to do the LIMIT sooner. The original way had all the data being hauled around, not just 20 rows.
In movements_items, is no column or combination of columns 'unique'? If so, make it the PRIMARY KEY.
In movement, KEY movementId (movementId,deleted) is redundant and should be dropped.
In movement_items, KEY movementId (movementId) is redundant and should be dropped.

Reduce execution time of query

I want to reduce the time taken by the query in mysql.
There are three tables say
A ~600k rows,
B ~2K rows,
C ~100K rows
having 2 columns each.
A has one column which is used in aggregation and other to join with table B.
B has one column to join with A and other with C
C has one column to join with B and other column to group by.
What should be the indexing plan to reduce the run time. As of now it is using temporary tables and then file sort. Is there any way we could avoid temporary tables.
Sample query :
SELECT
sum(`revenue_facts`.`total_price`) AS `m0`
FROM
`category_groups` AS `category_groups`,
`revenue_facts` AS `revenue_facts`,
`dim_products` AS `dim_products`
WHERE
`dim_products`.`product_category_group_sk` = `category_groups`.`product_category_group_sk` AND
`revenue_facts`.`product_sk` = `dim_products`.`product_sk`
GROUP BY `category_groups`.`category_name`;
I already have indexes on group by column and the columns in join.
my query is currently taking *6 minute*s. I want to reduce the time taken. table structure is as
table A :
CREATE TABLE `revenue_facts` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`product_sk` bigint(20) unsigned NOT NULL,
`total_price` decimal(12,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `product_sk` (`product_sk`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table B :
CREATE TABLE `dim_products` (
`product_sk` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`product_category_group_sk` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`product_sk`),
KEY `product_id` (`product_id`),
KEY (`product_sk`) (`product_sk`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
table C :
CREATE TABLE `category_groups` (
`product_category_group_sk` bigint(20) unsigned NOT NULL,
`category_sk` bigint(20) unsigned NOT NULL,
`category_name` varchar(255) NOT NULL,
PRIMARY KEY (`product_category_group_sk`,`category_sk`),
KEY `category_sk` (`category_sk`),
KEY `product_category_group_sk` (`product_category_group_sk`
KEY `category_sk` (`category_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Execution plan used is:
1 SIMPLE dim_products index PRIMARY,product_category_group_index product_category_group_index 8 NULL 651264 Using index; Using temporary; Using filesort
1 SIMPLE category_groups ref PRIMARY,category_sk,product_category_group_sk,category_name product_category_group_sk 8 etl_testing.dim_products.product_category_group_sk 4 Using index
1 SIMPLE revenue_facts ref product_sk product_sk 8 etl_testing..dim_products.product_sk 5 NULL

Try this:
SELECT
sum(`revenue_facts`.`total_price`) AS `m0`
FROM
(`dim_products` LEFT JOIN `category_groups` ON `dim_products`.`product_category_group_sk` = `category_groups`.`product_category_group_sk`)
LEFT JOIN `revenue_facts` ON `dim_products`.`product_sk` = `revenue_facts`.`product_sk`
GROUP BY `category_groups`.`category_name`;
Also, as Abdul said:
"Post your table structures and explain plan"

MySQL query optimization with between or larger than > condition

Problem: slow query.
table1 has about 5 000 rows
table2 has about 50 000 rows
timestamp format is int(11)
MySQL - 20 seconds (with indexes)
PostgreSQL - 0,04 seconds (with indexes)
SELECT *
FROM table1
LEFT JOIN table2
ON table2_timestamp BETWEEN table1_timestamp - 500
AND table1_timestamp + 500 ;
Can anybody help me with optimize this query for MySQL?
Explain:
1 SIMPLE a index a 9 2 Using index
1 SIMPLE b index b b 9 5 Using index
Tables:
CREATE TABLE `a` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`table1_timestamp` bigint(20) NULL DEFAULT NULL ,
PRIMARY KEY (`id`),
INDEX `a` (`table1_timestamp`) USING BTREE
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci
AUTO_INCREMENT=3
ROW_FORMAT=COMPACT
;
CREATE TABLE `b` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`table2_timestamp` bigint(20) NULL DEFAULT NULL ,
PRIMARY KEY (`id`),
INDEX `a` (`table2_timestamp`) USING BTREE
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci
AUTO_INCREMENT=3
ROW_FORMAT=COMPACT
;

A couple of points spring to mind but both feel like long-shots. Realistically it looks as though there shouldn't be much you can do to your query assuming your example is an accurate representation.
1 : You are using BIGINT which has a maximum value of 9x10^18 (SIGNED). INT has a max value of 4x10^9 (UNSIGNED), compared to days timestamp which is around 1.4x10^9 (all values approximate) and so consider changing the data type of that column in both tables from BIGINT to INT UNSIGNED or DATETIME
2 : The ROW_FORMAT is COMPACT which may cause issues with BTREE indexes (source). You are dealing with INT data types and so a ROW_FORMAT of FIXED would suffice so try changing to ROW_FORMAT=FIXED on both tables
3 : If always expecting rows to be returned from table2 for table1 rows then INNER JOIN would be more efficient than LEFT JOIN

How to improve search performance in MySQL

I have a table that contains two bigint columns: beginNumber, endNumber, defined as UNIQUE. The ID is the Primary Key.
ID | beginNumber | endNumber | Name | Criteria
The second table contains a number. I want to retrieve the record from table1 when the Number from table2 is found to be between any two numbers. The is the query:
select distinct t1.Name, t1.Country
from t1
where t2.Number
BETWEEN t1.beginIpNum AND t1.endNumber
The query is taking too much time as I have so many records. I don't have experience in DB. But, I read that indexing the table will improve the search so MySQL does not have to pass through every row searching about m Number and this can be done by, for example, having UNIQE values. I made the beginNumber & endNumber in table1 as UNIQUE. Is this all what I can do ? Is there any possible way to improve the time ? Please, provide detailed answers.
EDIT:
table1:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`beginNumber` bigint(20) DEFAULT NULL,
`endNumber` bigint(20) DEFAULT NULL,
`Name` varchar(255) DEFAULT NULL,
`Criteria` varchar(455) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `beginNumber_UNIQUE` (`beginNumber`),
UNIQUE KEY `endNumber_UNIQUE` (`endNumber `)
) ENGINE=InnoDB AUTO_INCREMENT=327 DEFAULT CHARSET=utf8
table2:
CREATE TABLE `t2` (
`id2` int(11) NOT NULL AUTO_INCREMENT,
`description` varchar(255) DEFAULT NULL,
`Number` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id2`),
UNIQUE KEY ` description _UNIQUE` (`description `)
) ENGINE=InnoDB AUTO_INCREMENT=433 DEFAULT CHARSET=utf8
This is a toy example of the tables but it shows the concerned part.

I'd suggest an index on t2.Number like this:
ALTER TABLE t2 ADD INDEX numindex(Number);
Your query won't work as written because it won't know which t2 to use. Try this:
SELECT DISTINCT t1.Name, t1.Criteria
FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE t2.Number BETWEEN t1.beginNumber AND t1.endNumber);
Without the t2.Number index EXPLAIN gives this query plan:
1 PRIMARY t1 ALL 1 Using where; Using temporary
2 DEPENDENT SUBQUERY t2 ALL 1 Using where
With an index on t2.Number, you get this plan:
PRIMARY t1 ALL 1 Using where; Using temporary
DEPENDENT SUBQUERY t2 index numindex numindex 9 1 Using where; Using index
The important part to understand is that an ALL comparison is slower than an index comparison.

This is a good place to use binary tree index (default is hashmap). Btree indexes are best when you often sort or use between on column.
CREATE INDEX index_name
ON table_name (column_name)
USING BTREE

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL optimization - large table joins - mysql

SUGGESTION : Index the table as follows: ALTER TABLE tbl_map ADD INDEX (tbl_1_id,rating,tbl_2_id);

Related

Mysql EXISTS vs IN slow performance

Mysql GROUP BY really slow on a view

Reduce execution time of query

MySQL query optimization with between or larger than > condition

How to improve search performance in MySQL

Categories

Resources