MySQL 500 million rows table in select query with join - mysql

I'm concerned about the performance of the query below once the tables are fully populated. So far it's under development and performs well with dummy data.
The table "adress_zoo" will contain about 500 million records once fully populated. "adress_zoo" table looks like this:
CREATE TABLE `adress_zoo`
( `adress_id` int(11) NOT NULL, `zoo_id` int(11) NOT NULL,
UNIQUE KEY `pk` (`adress_id`,`zoo_id`),
KEY `adress_id` (`adress_id`) )
ENGINE=InnoDB DEFAULT CHARSET=latin1;
The other tables will contain maximum 500 records each.
The full query looks like this:
SELECT a.* FROM jos_zoo_item AS a
JOIN jos_zoo_search_index AS zsi2 ON zsi2.item_id = a.id
WHERE a.id IN (
SELECT r.id FROM (
SELECT zi.id AS id, Max(zi.priority) as prio
FROM jos_zoo_item AS zi
JOIN jos_zoo_search_index AS zsi ON zsi.item_id = zi.id
LEFT JOIN jos_zoo_tag AS zt ON zt.item_id = zi.id
JOIN jos_zoo_category_item AS zci ON zci.item_id = zi.id
**JOIN adress_zoo AS az ON az.zoo_id = zi.id**
WHERE 1=1
AND ( (zci.category_id != 0 AND ( zt.name != 'prolong' OR zt.name is NULL))
OR (zci.category_id = 0 AND zt.name = 'prolong') )
AND zi.type = 'telefoni'
AND zsi.element_id = '44d3b1fd-40f6-4fd7-9444-7e11643e2cef'
AND zsi.value = 'Small'
AND zci.category_id > 15
**AND az.adress_id = 5**
GROUP BY zci.category_id ) AS r
)
AND a.application_id = 6
AND a.access IN (1,1)
AND a.state = 1
AND (a.publish_up = '0000-00-00 00:00:00' OR a.publish_up <= '2012-06-07 07:51:26')
AND (a.publish_down = '0000-00-00 00:00:00' OR a.publish_down >= '2012-06-07 07:51:26')
AND zsi2.element_id = '1c3cd26e-666d-4f8f-a465-b74fffb4cb14'
GROUP BY a.id
ORDER BY zsi2.value ASC
The query will usually return about 25 records.
Based on your experience, will this query perform acceptable (respond within say 3 seconds)?
What can I do to optimise this?
As adviced by #Jack I ran the query with EXPLAIN and got this:

This part is an important limiter:
az.adress_id = 5
MySQL will limit the table to only those records where adress_id matches before joining it with the rest of the statement, so it will depend on how big you think that result set might be.
Btw, you have a UNIQUE(adress_id, zoo_id) and a separate INDEX. Is there a particular reason? Because the first part of a spanning key can be used by MySQL to select with as well.
What's also important is to use EXPLAIN to understand how MySQL will "attack" your query and return the results. See also: http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html

To avoid subquery you can try to rewrite your query as:
SELECT a.* FROM jos_zoo_item AS a
JOIN jos_zoo_search_index AS zsi2 ON zsi2.item_id = a.id
INNER JOIN
(
SELECT ** distinct ** r.id FROM (
SELECT zi.id AS id, Max(zi.priority) as prio
FROM jos_zoo_item AS zi
JOIN jos_zoo_search_index AS zsi ON zsi.item_id = zi.id
LEFT JOIN jos_zoo_tag AS zt ON zt.item_id = zi.id
JOIN jos_zoo_category_item AS zci ON zci.item_id = zi.id
**JOIN adress_zoo AS az ON az.zoo_id = zi.id**
WHERE 1=1
AND ( (zci.category_id != 0 AND ( zt.name != 'prolong' OR zt.name is NULL))
OR (zci.category_id = 0 AND zt.name = 'prolong') )
AND zi.type = 'telefoni'
AND zsi.element_id = '44d3b1fd-40f6-4fd7-9444-7e11643e2cef'
AND zsi.value = 'Small'
AND zci.category_id > 15
**AND az.adress_id = 5**
GROUP BY zci.category_id ) AS r
) T
on a.id = T.id
where
AND a.application_id = 6
AND a.access IN (1,1)
AND a.state = 1
AND (a.publish_up = '0000-00-00 00:00:00' OR a.publish_up <= '2012-06-07 07:51:26')
AND (a.publish_down = '0000-00-00 00:00:00' OR a.publish_down >= '2012-06-07 07:51:26')
AND zsi2.element_id = '1c3cd26e-666d-4f8f-a465-b74fffb4cb14'
GROUP BY a.id
ORDER BY zsi2.value ASC
This approach don't perform subquery for each candidate row. Performance may be increased only if T is calculated in few milliseconds.

Related

How to optimize the product fetching query

I have used below query for product listing. Query is working fine but it takes approximately 0.4534 seconds. How can I optimize the same query.
SELECT SQL_CALC_FOUND_ROWS DISTINCT tp.prod_id, tp.prod_name, tp.prod_shop, tp.prod_retail_price, tp.prod_sale_price, tp.prod_initial_price, tp.prod_stock, ts.shop_id, ts.shop_name, ts.shop_logo, ts.shop_description, ts.shop_title, tu.user_profile_image, ( SELECT pdiscount_price FROM tbl_product_discounts tpd WHERE tpd.pdiscount_product_id = tp.prod_id AND tpd.pdiscount_qty = '1' AND( ( tpd.pdiscount_start_date = '0000-00-00' OR tpd.pdiscount_start_date < NOW()) AND( tpd.pdiscount_end_date = '0000-00-00' OR tpd.pdiscount_end_date > NOW()) ) ORDER BY tpd.pdiscount_priority ASC, tpd.pdiscount_price ASC LIMIT 1 ) AS discount FROM tbl_products tp LEFT JOIN tbl_shops ts ON tp.prod_shop = ts.shop_id AND ts.shop_is_deleted = 0 INNER JOIN tbl_users tu ON ts.shop_user_id = tu.user_id WHERE tp.prod_is_deleted = '0' LIMIT 0, 20
Without checking you table or Requirement :
Try to use group by instead of DISTINCT
Do not use sub query if possible .
Try To use indexing in you table .
This will help you to optimize you query.

MySQL query taking too much time

query taking 1 minute to fetch results
SELECT
`jp`.`id`,
`jp`.`title` AS game_title,
`jp`.`game_type`,
`jp`.`state_abb` AS game_state,
`jp`.`location` AS game_city,
`jp`.`zipcode` AS game_zipcode,
`jp`.`modified_on`,
`jp`.`posted_on`,
`jp`.`game_referal_amount`,
`jp`.`games_referal_amount_type`,
`jp`.`status`,
`jp`.`is_flaged`,
`u`.`id` AS employer_id,
`u`.`email` AS employer_email,
`u`.`name` AS employer_name,
`jf`.`name` AS game_function,
`jp`.`game_freeze_status`,
`jp`.`game_statistics`,
`jp`.`ats_value`,
`jp`.`integration_id`,
`u`.`account_manager_id`,
`jp`.`model_game`,
`jp`.`group_id`,
(CASE
WHEN jp.group_id != '0' THEN gm.group_name
ELSE 'NA'
END) AS group_name,
`jp`.`priority_game`,
(CASE
WHEN jp.country != 'US' THEN jp.country_name
ELSE ''
END) AS game_country,
IFNULL((CASE
WHEN
`jp`.`account_manager_id` IS NULL
OR `jp`.`account_manager_id` = 0
THEN
(SELECT
(CASE
WHEN
account_manager_id IS NULL
OR account_manager_id = 0
THEN
`u`.`account_manager_id`
ELSE account_manager_id
END) AS account_manager_id
FROM
user_user
WHERE
id = (SELECT
user_id
FROM
game_user_assigned
WHERE
game_id = `jp`.`id`
LIMIT 1))
ELSE `jp`.`account_manager_id`
END),
`u`.`account_manager_id`) AS acc,
(SELECT
COUNT(recach_limit_id)
FROM
recach_limit
WHERE
recach_limit = '1'
AND recach_limit_game_id = rpr.recach_limit_game_id) AS somewhatgame,
(SELECT
COUNT(recach_limit_id)
FROM
recach_limit
WHERE
recach_limit = '2'
AND recach_limit_game_id = rpr.recach_limit_game_id) AS verygamecommitted,
(SELECT
COUNT(recach_limit_id)
FROM
recach_limit
WHERE
recach_limit = '3'
AND recach_limit_game_id = rpr.recach_limit_game_id) AS notgame,
(SELECT
COUNT(joa.id) AS applicationcount
FROM
game_refer_to_member jrmm
INNER JOIN
game_refer jrr ON jrr.id = jrmm.rid
INNER JOIN
game_applied joa ON jrmm.id = joa.referred_by
WHERE
jrmm.STATUS = '1'
AND jrr.referby_user_id IN (SELECT
ab_testing_user_id
FROM
ab_testing)
AND joa.game_post_id = rpr.recach_limit_game_id
AND (rpr.recach_limit = 1
OR rpr.recach_limit = 2)) AS gamecount
FROM
(`game_post` AS jp)
JOIN
`user_info` AS u ON `jp`.`user_user_id` = `u`.`id`
JOIN
`game_functional` jf ON `jp`.`game_functional_id` = `jf`.`id`
LEFT JOIN
`group_musesm` gm ON `gm`.`group_id` = `jp`.`group_id`
LEFT JOIN
`recach_limit` rpr ON `jp`.`id` = `rpr`.`recach_limit_game_id`
WHERE
`jp`.`status` != '3'
GROUP BY `jp`.`id`
ORDER BY `posted_on` DESC
LIMIT 10
I would first suggest not nesting select statements because this will cause an n^x performance hit on every xth level and I see at least 3 levels of selects inside this query.
Add index
INDEX(status, posted_on)
Move LIMIT inside
Then, instead of saying
FROM (`game_post` AS jp)
say
FROM ( SELECT id FROM game_post
WHERE status != 3
ORDER BY posted_on DESC
LIMIT 10 ) AS ids
JOIN game_post AS jp USING(id)
(I am assuming that the PK of jp is (id)?)
That should efficiently use the new index to get the 10 ids needed. Then it will reach back into game_post to get the other columns.
LEFT
Also, don't say LEFT unless you need it. It costs something to generate NULLs that you may not be needing.
Is GROUP BY necessary?
If you remove the GROUP BY, does it show dup ids? The above changes may have eliminated the need.
IN(SELECT) may optimize poorly
Change
AND jrr.referby_user_id IN ( SELECT ab_testing_user_id
FROM ab_testing )
to
AND EXISTS ( SELECT * FROM ab_testing
WHERE ab_testing_user_id = jrr.referby_user_id )
(This change may or may not help, depending on the version you are running.)
More
Please provide EXPLAIN SELECT if you need further assistance.

Same MySQL query runs much slower in 5.6 than in 5.1

I am encountering a strange issue where this particular MySQL query that we have would run almost 50 times slower after we upgraded our database from MySQL 5.1.73 to 5.6.23.
This is the SQL query:
SELECT `companies`.*
FROM `companies`
LEFT OUTER JOIN `company_texts`
ON `company_texts`.`company_id` = `companies`.`id`
AND `company_texts`.`language` = 'en'
AND `company_texts`.`region` = 'US'
INNER JOIN show_texts
ON show_texts.company_id = companies.id
AND `show_texts`.`is_deleted` = 0
AND `show_texts`.`language` = 'en'
AND `show_texts`.`region` = 'US'
INNER JOIN show_region_counts
ON show_region_counts.show_id = show_texts.show_id
AND show_region_counts.region = 'US'
WHERE ( ( `companies`.`id` NOT IN ( '77', '26' ) )
AND ( `company_texts`.is_deleted = 0 )
AND `companies`.id IN (
SELECT DISTINCT show_texts.company_id AS
id
FROM shows
INNER JOIN `show_rollups`
ON
`show_rollups`.`show_id` = `shows`.`id`
AND ( `show_rollups`.`device_id` = 3 )
AND ( `show_rollups`.`package_group_id` = 2 )
AND ( `show_rollups`.`videos_count` > 0 )
LEFT OUTER JOIN `show_texts` ON
`show_texts`.`show_id` = `shows`.`id`
AND
`show_texts`.`is_deleted` = 0
AND
`show_texts`.`language` = 'en'
AND
`show_texts`.`region` = 'US'
AND
shows.is_browseable = 1
AND
show_texts.show_id IS NOT NULL
AND (
`show_rollups`.`episodes_count` > 0
OR `show_rollups`.`clips_count` > 0
OR `show_rollups`.`games_count` > 0
)
) )
GROUP BY companies.id
ORDER BY Sum(show_region_counts.view_count) DESC
LIMIT 30 offset 30;
Now the problem is when I run this query in MySQL 5.1.73 before the upgrade, the query would only take around 1.5 seconds, but after the upgrade to 5.6.23, it now can take upward to 1 minute.
So I did an EXPLAIN of this query in 5.1.73, and I saw this:
Enlarged version : http://i.stack.imgur.com/c4ko0.jpg
And when I did EXPLAIN in 5.6.23 , I saw this :
Enlarged version : http://i.stack.imgur.com/CgBtA.jpg
I can see that in both cases, there is a full scan (type ALL) of the shows table, but is there something else I am not seeing that is causing the massive slowdown in 5.6?
Thanks
IS
Please provide SHOW CREATE TABLE. Without that, I will guess that you are missing this desirable index:
show_rollups: INDEX(device_id, package_group_id, videos_count)
You have LEFT OUTER JOIN show_texts ... ON ... show_texts.show_id IS NOT NULL. This is probably wrong for two reasons: (a) Don't use LEFT if you are not looking for NULL, and (b) the NULL test should be in the missing WHERE clause, not in the ON clause.
Getting rid of the IN ( SELECT ... ) may help it on both machines:
SELECT c.*
FROM
( SELECT DISTINCT st.company_id AS id
FROM shows
INNER JOIN `show_rollups` AS sr ON sr.`show_id` = `shows`.`id`
AND ( sr.`device_id` = 3 )
AND ( sr.`package_group_id` = 2 )
AND ( sr.`videos_count` > 0 )
LEFT OUTER JOIN `show_texts` AS st ON st.`show_id` = `shows`.`id`
AND st.`is_deleted` = 0
AND st.`language` = 'en'
AND st.`region` = 'US'
AND shows.is_browseable = 1
AND st.show_id IS NOT NULL
AND ( sr.`episodes_count` > 0
OR sr.`clips_count` > 0
OR sr.`games_count` > 0 )
) AS x
JOIN `companies` AS c ON x.id = c.id
LEFT OUTER JOIN `company_texts` AS ct ON ct.`company_id` = c.`id`
AND ct.`language` = 'en'
AND ct.`region` = 'US'
INNER JOIN show_texts AS st ON st.company_id = c.id
AND st.`is_deleted` = 0
AND st.`language` = 'en'
AND st.`region` = 'US'
INNER JOIN show_region_counts AS src ON src.show_id = st.show_id
AND src.region = 'US'
WHERE ( c.`id` NOT IN ( '77', '26' ) )
AND ( ct.is_deleted = 0 )
GROUP BY c.id
ORDER BY Sum(src.view_count) DESC
LIMIT 30 offset 30;
There is some chance that the JOINs will mess with the computation in Sum(src.view_count).
I'm sorry, but I can't make out the EXPLAIN screen-grabs on my display.
Without that, I'd wonder if the schemas for the tables changed at all, or of the database engines changed. In particular, companies.id seems to be used as a primary key.

SUM with CASE Statement in mysql

I have following Mysql query
SELECT c.`id`
,c.`category_name`
,c.`category_type`
,c.bookmark_count
,f.category_id cat_id
,f.unfollow_at
,(
CASE WHEN c.id = f.follower_category_id
THEN (
SELECT count(`user_bookmarks`.`id`)
FROM `user_bookmarks`
WHERE (`user_bookmarks`.`category_id` = cat_id)
AND ((`f`.`unfollow_at` > `user_bookmarks`.`created_at`) || (`f`.`unfollow_at` = '0000-00-00 00:00:00'))
)
ELSE 0 END
) counter
,c.id
,f.follower_category_id follow_id
,c.user_id
FROM categories c
LEFT JOIN following_follower_categories f ON f.follower_category_id = c.id
WHERE c.user_id = 26
ORDER BY `category_name` ASC
and here is output what i am getting after execuation
now i just want to count . here i have field id having value 172 against it i have counter 30,3, 2 and Bookmark_count is 4( i need to include only once)
and i am accepting output for id 172 is 30+3+2+4(bookmark_count only once).
I am not sure how to do this.
Can anybody help me out
Thanks a lot
The following may be the most inefficient query for that purpose, but I added a cover to your query in order to hint at grouping the results.
(I removed the second c.id, and my example may have errors since I couldn't try it.)
SELECT `id`,
`category_name`,
`category_type`,
max(`bookmark_count`),
`cat_id`,
`unfollow_at`,
sum(`counter`)+max(`bookmark_count`) counter,
follow_id`, `user_id`
FROM
(SELECT c.`id`
,c.`category_name`
,c.`category_type`
,c.bookmark_count
,f.category_id cat_id
,f.unfollow_at
,(
CASE WHEN c.id = f.follower_category_id
THEN (
SELECT count(`user_bookmarks`.`id`)
FROM `user_bookmarks`
WHERE (`user_bookmarks`.`category_id` = cat_id)
AND ((`f`.`unfollow_at` > `user_bookmarks`.`created_at`) || (`f`.`unfollow_at` = '0000-00-00 00:00:00'))
)
ELSE 0 END
) counter
,f.follower_category_id follow_id
,c.user_id
FROM categories c
LEFT JOIN following_follower_categories f ON f.follower_category_id = c.id
WHERE c.user_id = 26)
GROUP BY `id`, `category_name`, `category_type`, `cat_id`, `unfollow_at`, `follow_id`, `user_id`
ORDER BY `category_name` ASC

MySQL: How to optimize this query?

I almost spent a day to optimize this query:
SELECT
prod. *,
cat.slug category_slug,
sup.bname bname,
sup.slug bname_slug
FROM bb_admin.bb_directory_products AS prod
LEFT JOIN bb_admin.bb_categories_products AS cat
ON prod.primary_category_id = cat.category_id
LEFT JOIN bb_admin.bb_directory_suppliers AS sup
ON prod.supplier_id = sup.supplier_id
LEFT JOIN bb_admin.bb_directory_suppliers AS credit_sup
ON prod.credit_supplier_id = credit_sup.supplier_id
LEFT JOIN bb_admin.bb_directory_suppliers AS photo_sup
ON prod.photo_supplier_id = photo_sup.supplier_id
WHERE (
prod.status = '1'
OR prod.status = '3'
OR prod.status = '5'
)
AND (
sup.active_package_id != '1'
OR sup.active_package_id != '5'
OR sup.active_package_id != '6'
OR prod.supplier_id = '0'
)
AND (
sup.supplier_id = '1989'
OR credit_sup.supplier_id = '1989'
OR photo_sup.supplier_id = '1989'
)
GROUP BY prod.product_id
ORDER BY prod.priority_index ASC
Can you help me to optimized this query?
Update your column data types to be INT or one of its variants, since the ones you are checking against are all integer IDs (assumption).
Create indexes on following columns(if possible in all tables):
prod.status
supplier_id
active_package_id
Use IN clause instead of concatenating OR segments.
I'll be putting the updated WHERE clause here:
WHERE prod.status IN(1, 3, 5)
AND ( sup.active_package_id NOT IN(1, 5, 6)
OR prod.supplier_id = 0
)
AND 1989 IN (prod.supplier_id, prod.credit_supplier_id, prod.photo_supplier_id)