MySQL LEFT JOIN returns empty resultset - mysql

maybe I miss something stupid but...
I have three tables in m-to-m relation:
CREATE TABLE tbl_users (
usr_id INT NOT NULL AUTO_INCREMENT ,
usr_name VARCHAR( 64 ) NOT NULL DEFAULT '' ,
usr_surname VARCHAR( 64 ) NOT NULL DEFAULT '' ,
usr_pwd VARCHAR( 64 ) NOT NULL ,
usr_level INT( 1 ) NOT NULL DEFAULT 0,
PRIMARY KEY ( usr_id )
) ENGINE = InnoDB;
CREATE TABLE tbl_houses (
house_id INT NOT NULL AUTO_INCREMENT ,
city VARCHAR( 100 ) DEFAULT '' ,
address VARCHAR( 100 ) DEFAULT '' ,
PRIMARY KEY ( house_id )
) ENGINE = InnoDB;
CREATE TABLE tbl_users_houses (
user_id INT NOT NULL ,
house_id INT NOT NULL ,
INDEX user_key (user_id),
FOREIGN KEY (user_id) REFERENCES tbl_users(usr_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
INDEX house_key (house_id) ,
FOREIGN KEY (house_id) REFERENCES tbl_houses(house_id)
ON DELETE CASCADE
ON UPDATE CASCADE
) ENGINE = InnoDB;
Into the link table I have two records:
user_id house_id
1 1
1 2
Now, trying to select all houses with:
select * from tbl_houses AS H
left join tbl_users_houses AS UH on H.house_id = UH.house_id
where UH.user_id = 2;
Why I get no data instead of all houses?

Because of this line:
where UH.user_id = 2;
This is only true if UH.user_id is non-null, so it effectively excludes any case where you have a house without a matching row in UH, which is the point of using a LEFT JOIN.
If you want all houses, and UH data where there is a match, use this:
select * from tbl_houses AS H
left join tbl_users_houses AS UH on H.house_id = UH.house_id and UH.user_id = 2;

Your WHERE clause is specifying that
UH.user_id = 2
What happens if you change it to H.user_id = 2 ?
To give this (all houses for user_id = 2):
select * from tbl_houses AS H
left join tbl_users_houses AS UH on H.house_id = UH.house_id
where H.user_id = 2;
Or if you want all houses regardless and data for user_id = 2 where it exists in tbl_User_houses try this:
select * from tbl_houses AS H
left join tbl_users_houses AS UH on H.house_id = UH.house_id and UH.user_id = 2;

Becasue you have no user with id 2.

Assuming your question "Why I get no data instead of all houses?" means you are wondering why you are not getting all the users when the users table is on on inner side of an outer join, this is happening because you placed the predicate condition after the join (in where clause) instead of in the join condition. This effectively converts the join to an inner join. change it to:
select * from tbl_houses H
left join tbl_users_houses UH
on uh.house_id = h.house_id
and UH.user_id = 2;
Conditions in where clauses are applied after all joins have been processed. At this point, values from rows from tables on the outer side of outer joins will all have nulls in them, so any predicate condition on such a value will cause these rows to be eliminated.

Related

Index probably not used correctly on simple SQL query

Size:
Campaigns: 3k rows (200 with campaigns.is_active = 1)
Links: 20k rows (4k with links.status = 1 // 500 with links.status = 1 AND campaigns.is_active = 1)
Clicks: 10mln rows (50k with created > '2020-10-25 00:00:00')
This query runs 2 seconds
SELECT links.id, COUNT(clicks.id)
FROM links
INNER JOIN campaigns ON campaigns.id = links.campaign_id
AND campaigns.is_active = 1
LEFT JOIN clicks ON clicks.link_id = links.id
WHERE links.status = 1
AND clicks.created > '2020-10-25 00:00:00'
GROUP BY links.id
When I remove the following line, it runs just 0.13 seconds (15 times faster)
AND campaigns.is_active = 1
There is an INDEX on campaigns.is_active.
Also tried to set an index on 2 columns (campaigns.id + campaigns.is_active) but didn't help.
"campaigns.is_active" contains simply 0 or 1. The campaigns table is small, the campaigns.is_active condition actually reduces the amount of rows. So it should speed up the query instead.
Why does it take so much longer because of this condition and how to fix it?
If I would remove the JOIN to campaigns and instead add links.campaign_id to the SELECT fields and then query every single of the returned campaign_id's in an additional query like "SELECT is_active FROM campaigns WHERE id = ?" it would still be faster, because such a query is 0.000x. From my experience when something is faster in 2 queries, it usually means the first query isn't optimized to its full extent.
Explain-Select
Structure
CREATE TABLE `campaigns` (
`id` int(11) UNSIGNED NOT NULL,
`is_active` tinyint(4) NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `clicks` (
`id` int(11) UNSIGNED NOT NULL,
`link_id` int(11) UNSIGNED NOT NULL,
`created` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `links` (
`id` int(11) UNSIGNED NOT NULL,
`campaign_id` int(8) UNSIGNED NOT NULL,
`status` tinyint(4) NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `campaigns`
ADD PRIMARY KEY (`id`),
ADD UNIQUE KEY `id_isactive` (`id`,`is_active`),
ADD KEY `is_active` (`is_active`)
ALTER TABLE `clicks`
ADD PRIMARY KEY (`id`),
ADD KEY `link_id` (`link_id`),
ADD KEY `created` (`created`)
ALTER TABLE `links`
ADD PRIMARY KEY (`id`),
ADD KEY `campaign_id` (`campaign_id`),
How long does this take?
SELECT l.id,
(SELECT COUNT(*)
FROM clicks cl
WHERE cl.link_id = l.id AND
cl.created > '2020-10-25'
)
FROM links l JOIN
campaigns ca
ca.id = l.campaign_id
WHERE l.status = 1 AND ca.is_active = 1;
EDIT:
Hmmm, with an order by, you can try:
SELECT l.id,
(SELECT COUNT(*)
FROM clicks cl
WHERE cl.link_id = l.id AND
cl.created > '2020-10-25'
)
FROM links l
WHERE EXISTS (SELECT 1
FROM campaigns ca
WHERE ca.id = l.campaign_id AND ca.is_active = 1
)
WHERE l.status = 1
ORDER BY l.id;
For this, you want an index on links(status, id) and campaigns(campaign_id, is_active).
Question... If a campaign is not currently active, you don't want any output for it, correct? Furthermore, there won't be any clicks for inactive campaigns, correct? Then why bother checking is_active?
Even if my analysis is wrong, it may be faster to ignore is_active until after the counts have been tallied.
Please don't use LEFT when it is not functional. You have a simple JOIN.
Use COUNT(*); COUNT(x) tests x for being not null.
SELECT links.id, COUNT(*)
FROM links
JOIN clicks ON clicks.link_id = links.id
WHERE links.status = 1
AND clicks.created > '2020-10-25 00:00:00'
GROUP BY links.id
This is redundant:
ADD UNIQUE KEY `id_isactive` (`id`,`is_active`),
since PRIMARY KEY(id) declares id to be an index and unique.
I prefer not to fight the database engine optimizer.
SELECT links.id, campaigns.is_active, COUNT(clicks.id)
FROM links
INNER JOIN campaigns ON campaigns.id = links.campaign_id
LEFT JOIN clicks ON clicks.link_id = links.id
WHERE links.status = 1
AND clicks.created > '2020-10-25 00:00:00'
GROUP BY links.id, campaigns.is_active
HAVING campaigns.is_active = 1;
Second variant!
-- Second Variant
EXPLAIN
SELECT links.id AS LinksId
, COUNT(clicks.id) AS ClickCount
FROM links
LEFT JOIN clicks
ON links.id = clicks.link_id
WHERE links.status = 1
AND clicks.created > '2020-10-25 00:00:00'
AND links.campaign_id IN (SELECT campaign_id
FROM campaigns
WHERE is_active = 1)
GROUP BY links.id;
Third time is the charm! Using CTEs due to the published cardinalities.
-- Third time is the charm
WITH ActiveCampaigns
AS
(SELECT *
FROM campaigns
WHERE is_active = 1)
SELECT links.id, COUNT(clicks.id)
FROM links
INNER JOIN ActiveCampaigns
ON ActiveCampaigns.id = links.campaign_id
LEFT JOIN clicks
ON clicks.link_id = links.id
WHERE links.status = 1
AND clicks.created > '2020-10-25 00:00:00'
GROUP BY links.id;

Optimizing rand query with join

I have a rand query which runs very slow like almost every rand query. I researched all stackoverflow but cannot find any good solution for my query
SELECT u.id
, u.is_instagram_connected
, u.tokens
, u.username
, u.name
, u.photo
, u.bio
, u.voice
, u.mobile_update
, 1584450999 - l.time idleTime
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN mobile_token_list t
ON t.username = l.username
WHERE l.time > 1584393399
AND l.username NOT IN ('enesdoo')
AND u.username NOT IN (
SELECT blocked_username
FROM hided_mobile_users_from_shuffle
WHERE username = 'enesdoo'
)
AND u.ban_status = 0
AND u.perma_ban = 0
AND u.mobile_online_status = 1
AND u.lock_status = 0
GROUP
BY l.username
ORDER
BY RAND( )
LIMIT 27
If i remove the order by rand line, this runs very very quick like 100 times faster.
How can i speed up this query?
mobile_login_list has > 50k rows
users has > 1m rows
Edit:
Explain:
My table:
CREATE TABLE IF NOT EXISTS `mobile_login_list` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(30) COLLATE utf8_bin NOT NULL,
`key` varchar(32) COLLATE utf8_bin NOT NULL,
`time` int(11) NOT NULL,
`ip` int(11) NOT NULL,
`version` smallint(4) NOT NULL,
`messaged` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `kontrol` (`username`,`key`),
KEY `username` (`username`),
KEY `time` (`time`),
KEY `username_2` (`username`,`time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=3351637 ;
In the lingo of random retrieval, this is called a deal operation (deal 27 different cards from a shuffled deck of 4k or so. The other random operation is called roll: it allows duplicates.)
You're using SELECT mess-of-columns FROM mess-of-joins WHERE mess-of-criteria ORDER BY RAND() LIMIT small-number to do shuffle and deal operation. That is a notorious performance antipattern. It causes some extra work for the server because it must order a fairly large result set then discard almost all of it (with the LIMIT).
A way to save some of the trouble is to defer the joins to the details. Shuffle only the ids. Then take the small number of results and fetch the details you need. Something like this.
SELECT u.id /* just the id values */
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN mobile_token_list t
ON t.username = l.username
WHERE l.time > 1584393399
AND l.username NOT IN ('enesdoo')
AND u.username NOT IN (
SELECT blocked_username
FROM hided_mobile_users_from_shuffle
WHERE username = 'enesdoo'
)
AND u.ban_status = 0
AND u.perma_ban = 0
AND u.mobile_online_status = 1
AND u.lock_status = 0
ORDER
BY RAND( )
LIMIT 27
You can debug, run EXPLAIN and optimize this subquery by changing indexes and maybe tightening up your selection criteria. It's the one doing all the hard work of shuffling and dealing.
Then join that resultset to your detail tables to choose the data you need. This outer query only needs to process your 27 rows. Be sure to shuffle again.
SELECT u.id
, u.is_instagram_connected
, u.tokens
, u.username
, u.name
, u.photo
, u.bio
, u.voice
, u.mobile_update
, 1584450999 - l.time idleTime
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN (
/* the subquery from above */
) selected ON u.id = selected.id
ORDER BY RAND()
Putting it all together, you get this big repetitive mess of a query. But it should be a little faster.
SELECT u.id
, u.is_instagram_connected
, u.tokens
, u.username
, u.name
, u.photo
, u.bio
, u.voice
, u.mobile_update
, 1584450999 - l.time idleTime
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN (
SELECT u.id
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN mobile_token_list t
ON t.username = l.username
WHERE l.time > 1584393399
AND l.username NOT IN ('enesdoo')
AND u.username NOT IN (
SELECT blocked_username
FROM hided_mobile_users_from_shuffle
WHERE username = 'enesdoo'
)
AND u.ban_status = 0
AND u.perma_ban = 0
AND u.mobile_online_status = 1
AND u.lock_status = 0
ORDER
BY RAND( )
LIMIT 27
) selected ON u.id = selected.id
ORDER BY RAND()
A more performant way to deal records is this, if you do the dealing a lot.
Add a FLOAT column to the table you're dealing from, let's call it deal. Put an index on it.
Every few hours, or maybe overnight or even once a week, shuffle the table by running this query UPDATE users SET deal = RAND(); It will take a while; it needs to change the deal value in every row.
When you need to deal, do ...WHERE deal >= RAND() * 0.9 ... ORDER BY deal LIMIT n. The multiplication by 0.9 helps ensure you don't hit the end of the table by choosing a random number too close to 1.
This is equivalent, in cardshark terms, to shuffling the deck every few hours and then just cutting it for every deal. It's the way Wikipedia implements their "show a random article" feature.
Can we see the EXPLAIN for this instead...?
SELECT DISTINCT u.id
, u.is_instagram_connected
, u.tokens
, u.username
, u.name
, u.photo
, u.bio
, u.voice
, u.mobile_update
, 1584450999 - l.time idleTime
FROM mobile_login_list l
JOIN users u
ON l.username = u.username
JOIN mobile_token_list t
ON t.username = l.username
LEFT
JOIN hided_mobile_users_from_shuffle x
ON x.blocked_username = u.username
AND x.username = 'enesdoo'
WHERE l.time > 1584393399
AND l.username NOT IN ('enesdoo')
AND x.blocked_username IS NULL
AND u.ban_status = 0
AND u.perma_ban = 0
AND u.mobile_online_status = 1
AND u.lock_status = 0
ORDER
BY RAND( )
LIMIT 27
Given my limited knowledge of query optimisation, I would simply define the table as follows, but maybe someone else can suggest further improvements:
CREATE TABLE IF NOT EXISTS mobile_login_list
(id SERIAL PRIMARY KEY
,username varchar(30) COLLATE utf8_bin NOT NULL
,`key` varchar(32) COLLATE utf8_bin NOT NULL
,time int NOT NULL
,ip int NOT NULL
,version smallint NOT NULL
,messaged int NOT NULL DEFAULT 0
,KEY username_2 (username,time) -- or (time,username)
);
Note that key is a reserved word (and time is a 'keyword') rendering it a poor choice for a table/column identifier

Improve SQL Query perofrmance

I have a complex query which takes 700ms to run on my machine. I found that the bottleneck is the ORDER BY at_firstname.value clause, but how can I use indexes to improve this?
SELECT
`e`.*
, `at_default_billing`.`value` AS `default_billing`
, `at_billing_postcode`.`value` AS `billing_postcode`
, `at_billing_city`.`value` AS `billing_city`
, `at_billing_region`.`value` AS `billing_region`
, `at_billing_country_id`.`value` AS `billing_country_id`
, `at_company`.`value` AS `company`
, `at_firstname`.`value` AS `firstname`
, `at_lastname`.`value` AS `lastname`
, CONCAT(at_firstname.value
, " "
, at_lastname.value) AS `full_name`
, `at_phone`.`value` AS `phone`
, IFNULL(at_phone.value,"N/A") AS `telephone`
, `e`.`entity_id` AS `id`
FROM
`customer_entity` AS `e`
LEFT JOIN
`customer_entity_int` AS `at_default_billing`
ON (`at_default_billing`.`entity_id` = `e`.`entity_id`)
AND (`at_default_billing`.`attribute_id` = '13')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_postcode`
ON (`at_billing_postcode`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_postcode`.`attribute_id` = '30')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_city`
ON (`at_billing_city`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_city`.`attribute_id` = '26')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_region`
ON (`at_billing_region`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_region`.`attribute_id` = '28')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_country_id`
ON (`at_billing_country_id`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_country_id`.`attribute_id` = '27')
LEFT JOIN
`customer_address_entity_varchar` AS `at_company`
ON (`at_company`.`entity_id` = `at_default_billing`.`value`)
AND (`at_company`.`attribute_id` = '24')
LEFT JOIN
`customer_entity_varchar` AS `at_firstname`
ON (`at_firstname`.`entity_id` = `e`.`entity_id`)
AND (`at_firstname`.`attribute_id` = '5')
LEFT JOIN
`customer_entity_varchar` AS `at_lastname`
ON (`at_lastname`.`entity_id` = `e`.`entity_id`)
AND (`at_lastname`.`attribute_id` = '7')
LEFT JOIN
`customer_entity_varchar` AS `at_phone`
ON (`at_phone`.`entity_id` = `e`.`entity_id`)
AND (`at_phone`.`attribute_id` = '136')
ORDER BY
`at_firstname`.`value` ASC LIMIT 20
This is execution plan :
EXPLAIN of Query :
'1','SIMPLE','e',NULL,'ALL',NULL,NULL,NULL,NULL,'19951','100.00','Using temporary; Using filesort'
'1','SIMPLE','at_default_billing',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_INT_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_INT_ENTITY_ID,IDX_CUSTOMER_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
'1','SIMPLE','at_billing_postcode',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_billing_city',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_billing_region',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_billing_country_id',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_company',NULL,'eq_ref','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ADDRESS_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.at_default_billing.value,const','1','100.00','Using where'
'1','SIMPLE','at_firstname',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
'1','SIMPLE','at_lastname',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
'1','SIMPLE','at_phone',NULL,'eq_ref','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID,IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE','UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID','6','lazurd.e.entity_id,const','1','100.00',NULL
Table Structure:
CREATE TABLE `customer_entity_varchar` (
`value_id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Value Id',
`entity_type_id` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'Entity Type Id',
`attribute_id` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'Attribute Id',
`entity_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Entity Id',
`value` varchar(255) DEFAULT NULL COMMENT 'Value',
PRIMARY KEY (`value_id`),
UNIQUE KEY `UNQ_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID` (`entity_id`,`attribute_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_TYPE_ID` (`entity_type_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ATTRIBUTE_ID` (`attribute_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID` (`entity_id`),
KEY `IDX_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_ATTRIBUTE_ID_VALUE` (`entity_id`,`attribute_id`,`value`),
CONSTRAINT `FK_CSTR_ENTT_VCHR_ATTR_ID_EAV_ATTR_ATTR_ID` FOREIGN KEY (`attribute_id`) REFERENCES `eav_attribute` (`attribute_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_CSTR_ENTT_VCHR_ENTT_TYPE_ID_EAV_ENTT_TYPE_ENTT_TYPE_ID` FOREIGN KEY (`entity_type_id`) REFERENCES `eav_entity_type` (`entity_type_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_CUSTOMER_ENTITY_VARCHAR_ENTITY_ID_CUSTOMER_ENTITY_ENTITY_ID` FOREIGN KEY (`entity_id`) REFERENCES `customer_entity` (`entity_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=131094 DEFAULT CHARSET=utf8 COMMENT='Customer Entity Varchar';
As of now your query is:
Performing ALL left outer joins first.
Then ORDERing the rows.
Then LIMITing the rows.
I would perform the strictly needed outer joins first, then ordering and limiting (to reduce to 20 rows), and finally I would do all the rest of the outer joins. In short I would do:
Performing minimal left outer join first. That is, two tables only.
Then ORDERing the rows.
Then LIMITing the rows. This produce a max of 20 rows.
Perform all the rest of outer joins. At this point this is not thousands of rows anymore, but only 20.
This change should massively reduce the "Unique Key Lookup" executions. The modified query will look like:
select
e.*
, `at_default_billing`.`value` AS `default_billing`
, `at_billing_postcode`.`value` AS `billing_postcode`
, `at_billing_city`.`value` AS `billing_city`
, `at_billing_region`.`value` AS `billing_region`
, `at_billing_country_id`.`value` AS `billing_country_id`
, `at_company`.`value` AS `company`
, `at_lastname`.`value` AS `lastname`
, CONCAT(firstname
, " "
, at_lastname.value) AS `full_name`
, `at_phone`.`value` AS `phone`
, IFNULL(at_phone.value,"N/A") AS `telephone`
from ( -- Step #1: joining customer_entity with customer_entity_varchar
SELECT
`e`.*
, `at_firstname`.`value` AS `firstname`
, `e`.`entity_id` AS `id`
FROM
`customer_entity` AS `e`
LEFT JOIN
`customer_entity_varchar` AS `at_firstname`
ON (`at_firstname`.`entity_id` = `e`.`entity_id`)
AND (`at_firstname`.`attribute_id` = '5')
ORDER BY -- Step #2: Sorting (the bare minimum)
`at_firstname`.`value` ASC
LIMIT 20 -- Step #3: Limiting (to 20 rows)
) e
LEFT JOIN -- Step #4: Performing all the rest of outer joins (only few rows now)
`customer_entity_int` AS `at_default_billing`
ON (`at_default_billing`.`entity_id` = `e`.`entity_id`)
AND (`at_default_billing`.`attribute_id` = '13')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_postcode`
ON (`at_billing_postcode`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_postcode`.`attribute_id` = '30')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_city`
ON (`at_billing_city`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_city`.`attribute_id` = '26')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_region`
ON (`at_billing_region`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_region`.`attribute_id` = '28')
LEFT JOIN
`customer_address_entity_varchar` AS `at_billing_country_id`
ON (`at_billing_country_id`.`entity_id` = `at_default_billing`.`value`)
AND (`at_billing_country_id`.`attribute_id` = '27')
LEFT JOIN
`customer_address_entity_varchar` AS `at_company`
ON (`at_company`.`entity_id` = `at_default_billing`.`value`)
AND (`at_company`.`attribute_id` = '24')
LEFT JOIN
`customer_entity_varchar` AS `at_lastname`
ON (`at_lastname`.`entity_id` = `e`.`entity_id`)
AND (`at_lastname`.`attribute_id` = '7')
LEFT JOIN
`customer_entity_varchar` AS `at_phone`
ON (`at_phone`.`entity_id` = `e`.`entity_id`)
AND (`at_phone`.`attribute_id` = '136')
Unfortunately, SELECT whole_mess_of_rows FROM many_tables ORDER BY one_col LIMIT small_number is a notorious performance antipattern. Why? Because it sorts a big result set, just to discard most of it.
The trick is to cheaply find out which rows are within that LIMIT small_number, then retrieve only those rows from the larger query.
Which rows do you want? It looks to me like this query will retrieve their customer_entity.id values. But it's hard to be sure, so you should test this subquery.
SELECT customer_entity.entity_id
FROM customer_entity
LEFT JOIN customer_entity_varchar AS at_firstname
ON (at_firstname.entity_id = e.entity_id)
AND (at_firstname.attribute_id = '5')
ORDER BY at_firstname.value ASC
LIMIT 20
This should give the twenty relevant entity_id values. Test it. Look at its execution plan. Add an appropriate index to customer_entity if need be. That index might be (firstname_attribute_id, firstname_entity_id, firstname_value) But I am guessing.
Then you can put this at the end of your main query, right before ORDER BY.
WHERE e.entity_id IN (
SELECT customer_entity.entity_id
FROM customer_entity
LEFT JOIN customer_entity_varchar AS at_firstname
ON (at_firstname.entity_id = e.entity_id)
AND (at_firstname.attribute_id = '5')
ORDER BY at_firstname.value ASC
LIMIT 20
)
and things should be a bit faster.
I agree with the previous Answers, but want to emphasize on more antipattern: Over-noramlization.
Your schema is a curious (and inefficient) variant on the already-bad EAV schema pattern.
There is little advantage, and some disadvantage in splitting customer_address_entity_varchar across 5 tables. Similarly for customer_entity_varchar.
An address should (usually) be stored as a few columns in a single table; no JOINs to other tables.
Likewise for firstname+lastname.
Phone could be another issue, since a person/company/entity could have multiple phone numbers (cell, home, work, fax, etc). But that is a different story.

SQL Multiple table JOINS, GROUP BY and HAVING

I've a table structured somewhat similar to this:
CREATE TABLE `user`
(`id` int, `name` varchar(7));
CREATE TABLE `email`
(`id` int, `email_address` varchar(50), `verified_flag` tinyint(1),`user_id` int);
CREATE TABLE `social`
(`id` int,`user_id` int);
INSERT INTO `user`
(`id`, `name`)
VALUES
(1,'alex'),
(2,'jon'),
(3,'arya'),
(4,'sansa'),
(5,'hodor')
;
INSERT INTO `email`
(`id`,`email_address`,`verified_flag`,`user_id`)
VALUES
(1,'alex#gmail.com','1',1),
(2,'jon#gmail.com','0',1),
(3,'arya#gmail.com','0',3),
(4,'sansa#gmail.com','1',4),
(5,'reek#gmail.com','0',3),
(6,'hodor#gmail.com','0',5),
(7,'tyrion#gmail.com','0',1)
;
INSERT INTO `social`
(`id`,`user_id`)
VALUES
(1,4),
(2,4),
(3,5),
(4,4),
(5,4)
;
What I want to get is all emails:
which are not verified
which belongs to a user who has no, i.e 0, verified emails
which belongs to a user who has no, i.e 0, social records
With the below query I'm able to apply the 1st and 3rd condition but not the 2nd one:
SELECT *
FROM `email`
INNER JOIN `user` ON `user`.`id` = `email`.`user_id`
LEFT JOIN `social` ON `user`.`id` = `social`.`user_id`
WHERE `email`.`verified_flag` = 0
GROUP BY `email`.`user_id`,`email`.`email_address`
HAVING COUNT(`social`.`id`) = 0
How can I achieve the result?
Here's the sqlfiddle as well
Interesting and tricky one.
I see you've got something going on there. But having and sub queries becomes a VERY bad idea when your tables become large.
See below for an approach. Don't forget to set up your indexes!
SELECT * from email
LEFT JOIN social on email.user_id = social.user_id
-- tricky ... i'm going back to email table to pick verified emails PER user
LEFT JOIN email email2 on email2.user_id = email.user_id AND email2.verified_flag = 1
WHERE
-- you got this one going already :)
email.verified_flag = 0
-- user does not have any social record
AND social.id is null
-- email2 comes in handy here ... we limit resultset to include only users that DOES NOT have a verified email
AND email2.id is null
ORDER BY email.user_id asc;
You can use the following query:
SELECT e.`id`, e.`email_address`, e.`verified_flag`, e.`user_id`
FROM (
SELECT `id`,`email_address`,`verified_flag`,`user_id`
FROM `email`
WHERE `verified_flag` = 0) AS e
INNER JOIN (
SELECT `id`, `name`
FROM `user` AS t1
WHERE NOT EXISTS (SELECT 1
FROM `email` AS t2
WHERE `verified_flag` = 1 AND t1.`id` = t2.`user_id`)
AND
NOT EXISTS (SELECT 1
FROM `social` AS t3
WHERE t1.`id` = t3.`user_id`)
) AS u ON u.`id` = e.`user_id`;
This query uses two derived tables:
e implements the first condition, i.e. returns all emails which are not verified
u implements the 2nd and 3rd condition, i.e. it returns a set of all users that have no verified emails and have no social records.
Performing an INNER JOIN between e and u returns all emails satisfying condition no. 1 which belong to users satisfying conditions no. 2 and 3.
Demo here
You can alternatively use this query:
SELECT *
FROM `email`
WHERE `user_id` IN (
SELECT `email`.`user_id`
FROM `email`
INNER JOIN `user` ON `user`.`id` = `email`.`user_id`
LEFT JOIN `social` ON `user`.`id` = `social`.`user_id`
GROUP BY `email`.`user_id`
HAVING COUNT(`social`.`id`) = 0 AND
COUNT(CASE WHEN `email`.`verified_flag` = 1 THEN 1 END) = 0 )
The subquery is used in order to select all user_id satisfying conditions no. 2 and 3. Condition no. 1 is redundant since if the user has no verified emails, then there is no way a verified email is related to this user.
Demo here
Simply run a Union Query:
SELECT `user_id`, `email_address`, `verified_flag`, 'No Email' as `Type`
FROM `email` RIGHT JOIN `user` ON `user`.`id` = `email`.`user_id`
WHERE `email`.`user_id` IS NULL
UNION
SELECT `user_id`, `email_address`, `verified_flag`, 'Not Verified' as `Type`
FROM `email` INNER JOIN `user` ON `user`.`id` = `email`.`user_id`
WHERE `email`.`verified_flag` = 0
UNION
SELECT `user_id`, `email_address`, `verified_flag`, 'No Social' as `Type`
FROM `email` INNER JOIN `user` ON `user`.`id` = `email`.`user_id`
LEFT JOIN `social` ON `user`.`id` = `social`.`user_id`
GROUP BY `user_id`, `email_address`, `verified_flag`
HAVING COUNT(IFNULL(`social`.`id`, 0)) = 0;
SELECT
u.id AS u_id
, u.name AS u_name
, e.email_address AS e_email
, e.verified_flag AS e_verify
, e.user_id AS e_uid
, s.id AS s_id
, s.user_id AS u_id
, COALESCE(ver_e.ver_email_count,0) as ver_email_count
FROM
email as e
LEFT OUTER JOIN
user as u
ON u.id = e.user_id
LEFT OUTER JOIN
social AS s
ON u.id = s.user_id
LEFT OUTER JOIN
(
SELECT
COUNT(email_address) AS ver_email_count
, user_id
FROM
email
) AS ver_e
ON u.id = ver_e.user_id
GROUP BY
e.user_id
HAVING e.verified_flag = 0
AND
ver_email_count = 0
AND
ISNULL(s.id)
Uses one derived table to get the number of verified email addresses each user has got

MySQL how to SUM() columns of two JOINed tables into a new column?

Good morning/evening everybody,
I am trying to (LEFT) JOIN two tables into a table and SUM() specific columns' values of the matching ON fk_id = id... statement. This is what the tables look like:
ws1 table:
ws2 table:
The queries I have tried so far:
SELECT
alias.name alias,
(SUM(IFNULL(ws1.teamkills,0)) + SUM(IFNULL(ws2.teamkills,0))) teamkills
FROM pickup
JOIN player ON player.pickup_id = pickup.id
JOIN alias ON player.alias_id = alias.id
LEFT JOIN weapon_stats_1 ws1 ON ws1.pickup_id = pickup.id AND ws1.player_id = player.id
LEFT JOIN weapon_stats_2 ws2 ON ws2.pickup_id = pickup.id AND ws2.player_id = player.id
WHERE pickup.logfile_name = 'srv-20130725-2151-log' GROUP BY player.id
Result:
and:
SELECT
alias.name alias,
(SUM(DISTINCT IFNULL(ws1.teamkills,0)) + SUM(DISTINCT IFNULL(ws2.teamkills,0))) teamkills
FROM pickup
JOIN player ON player.pickup_id = pickup.id
JOIN alias ON player.alias_id = alias.id
LEFT JOIN weapon_stats_1 ws1 ON ws1.pickup_id = pickup.id AND ws1.player_id = player.id
LEFT JOIN weapon_stats_2 ws2 ON ws2.pickup_id = pickup.id AND ws2.player_id = player.id
WHERE pickup.logfile_name = 'srv-20130725-2151-log' GROUP BY player.id
Result:
I understand that SUM(DISTINCT.... ) returns 2, because DISTINCT selects only one result of the same value.
My goal is to get SUM()s of both teamkills fields and add them together. In the example it should return 3 where player_id is 4. How can I do that?
EDIT:
Table 'player':
Table 'pickup':
You need two dependent subqueries instead of join of ws1+ws2, jonin wont work here.
Something like:
SELECT id, player_alias,
( SELECT sum( teamkills ) FROM ws1
WHERE ws1.player_id = player.id )
+
( SELECT sum( teamkills ) FROM ws2
WHERE ws2.player_id = player.id ) as total
FROM player
JOIN alias ON ......
Here is SQLFiddle demo, look at the first query (and the resultset below) to gain better understanding why you get wrong results from join, and in general, how joins work.
Join combines (glues) each record from one table to all corresponding records from the other table (that meet join criteria), and in your case it produces 4 rows with duplicated data.
The third query in this demo is an example of dependent subqueries that gives proper result (for example data in this demo).
You may something like following
Table t1
CREATE TABLE `t1` (
`pik_id` int(11) NOT NULL AUTO_INCREMENT,
`palyer_id` int(11) DEFAULT NULL,
`amount` double DEFAULT NULL,
UNIQUE KEY `pik_id` (`pik_id`)
)
ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Table t2
CREATE TABLE `t2` (
`playayer_id` int(11) NOT NULL AUTO_INCREMENT,
`amount` double DEFAULT NULL,
UNIQUE KEY `playayer_id` (`playayer_id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
and The query for join and SUM
SELECT playayer_id, t1.amount+t2.amount amount
FROM
(SELECT t1.pik_id,t1.palyer_id,SUM(t1.amount) amount FROM t1 GROUP BY t1.palyer_id)t1
JOIN
(SELECT t2.playayer_id,t2.amount FROM t2)t2
ON t1.palyer_id=t2.playayer_id
GROUP BY playayer_id
playayer_id amount
1 133
2 152
3 1076
and I hope your problem will solved by this way.
A possible solution without using correlated subqueries
SELECT a.name alias, SUM(q.teamkills) teamkills
FROM
(
SELECT player_id, teamkills
FROM weapon_stats_1 w JOIN pickup p
ON w.pickup_id = p.id
WHERE p.logfile_name = 'srv-20130725-2151-log'
UNION ALL
SELECT player_id, teamkills
FROM weapon_stats_2 w JOIN pickup p
ON w.pickup_id = p.id
WHERE p.logfile_name = 'srv-20130725-2151-log'
) q JOIN player p
ON q.player_id = p.id JOIN alias a
ON p.alias_id = a.id
GROUP BY a.name
Sample output:
| ALIAS | TEAMKILLS |
----------------------
| alias4 | 3 |
Here is SQLFiddle demo