Is there a shorter alternative to my MySql query? - mysql

I'm a student of Java and do SQL too. In a lesson we were presented with an example database sketch, and a query that a replicate in this question.
I have made an example with MySql and it has three tables,
CREATE TABLE `employed` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
CREATE TABLE `department` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;
CREATE TABLE `employees_departments` (
`employed_id` int(11) NOT NULL,
`department_id` int(11) NOT NULL,
PRIMARY KEY (`employed_id`,`department_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
employed was filled with
(1 'Karl'), (2 'Bengt'), (3 'Adam'), (4 'Stefan')
department was filled with
(4, 'HR'), (5, 'Sälj'), (6, 'New departm')
employees_departments was filled with
1 4
2 5
3 4
So "Stefan" has no department, and "New departm" has no employed.
I wanted a query that would give the employees with all their departments, and employees without departments and departments with no employees. I found on solution like this:
select A.name, C.name from employed A
left join employees_departments B on (A.id=B.employed_id)
left join department C on (B.department_id = C.id)
union
select C.name, A.name from department A
left join employees_departments B on (A.id=B.department_id)
left join employed C on (B.employed_id = C.id)
Would be nice if there was a short query to make it...
Also, I made this without foreign key constraints, since I want to do it as simple as possible for this example.
Greetings

MySQL doesn't support a FULL OUTER join operation.
We can emulate that by combining two sets... the result of an OUTER JOIN and the result from an anti-JOIN.
(
SELECT ee.name AS employed_name
, dd.name AS department_name
FROM employed ee
LEFT
JOIN employees_departments ed
ON ed.employed_id = ee.id
LEFT
JOIN department dd
ON dd.id = ed.department_id
)
UNION ALL
(
SELECT nn.name AS employed_name
, nd.name AS department_name
FROM department nd
LEFT
JOIN employees_departments ne
ON ne.deparment_id = nd.id
LEFT
JOIN employeed nn
ON nn.id = nd.employee_id
WHERE nn.id IS NULL
)
The first SELECT returns all employed name, along with matching department name, including employed that have no department.
The second SELECT returns just department name that have no matching rows in employed.
The results from the two SELECT are combined/concatenated using a UNION ALL set operator. (The UNION ALL operation avoids a potentially expensive "Using filesort" operation that would be forced with the UNION set operator.
This is the shortest query pattern to return these rows.
We could make the SQL a little shorter. For example, if we have a foreign key relationships between employeed_department and employed (no indication in the original post that such a relationship is enforced, so we don't assume that there is one)... but if that is enforced, then we could omit the employed table from the second SELECT
UNION ALL
(
SELECT NULL AS employed_name
, nd.name AS department_name
FROM department nd
LEFT
JOIN employees_departments ne
ON ne.deparment_id = nd.id
WHERE ne.department_id IS NULL
)
With suitable indexes available, this is going to give us the most efficient access plan.
Is there shorter SQL that will return an equivalent result? If there is, it's likely not going to perform as efficiently as the above.

Related

query taking too long, while split it to two queries taking 0.2 sec

i have the current query:
select m.id, ms.severity, ms.risk_score, count(distinct si.id), boarding_date_tbl.boarding_date
from merchant m
join merchant_has_scan ms on m.last_scan_completed_id = ms.id
join scan_item si on si.merchant_has_scan_id = ms.id and si.is_registered = true
join (select m.id merchant_id, min(s_for_boarding.scan_date) boarding_date
from merchant m
left join merchant_has_scan ms on m.id = ms.merchant_id
left join scan s_for_boarding on s_for_boarding.id = ms.scan_id and s_for_boarding.scan_type = 1
group by m.id) boarding_date_tbl on boarding_date_tbl.merchant_id = m.id
group by m.id
limit 100;
when i run it on big scheme (about 2mil "merchant") it takes more then 20 sec.
but if i'll split it to:
select m.legal_name, m.unique_id, m.merchant_status, s_for_boarding.scan_date
from merchant m
join merchant_has_scan ms on m.id = ms.merchant_id
join scan s_for_boarding on s_for_boarding.id = ms.scan_id and s_for_boarding.scan_type = 1
group by m.id
limit 100;
and
select m.id, ms.severity, ms.risk_score, count(distinct si.id)
from merchant m
join merchant_has_scan ms on m.last_scan_completed_id = ms.id
join scan_item si on si.merchant_has_scan_id = ms.id and si.is_registered = true
group by m.id
limit 100;
both will take about 0.1 sec
the reason for that is clear, the low limit means it doesn't need to do much to get the first 100. it is also clear that the inner select cause the first query to run as much as it does.
my question is there a way to do the inner select only on the relevant merchants and not on the entire table?
Update
making a left join instead of a join before the inner query help reduce it to 6 sec, but it still a lot more then what i can get if i do 2 queries
UPDATE 2
create table for merchant:
CREATE TABLE `merchant` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`last_scan_completed_id` bigint(20) DEFAULT NULL,
`last_updated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
CONSTRAINT `FK_9lhkm7tb4bt87qy4j3fjayec5` FOREIGN KEY (`last_scan_completed_id`) REFERENCES `merchant_has_scan` (`id`)
)
merchant_has_scan:
CREATE TABLE `merchant_has_scan` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`merchant_id` bigint(20) NOT NULL,
`risk_score` int(11) DEFAULT NULL,
`scan_id` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_merchant_id` (`scan_id`,`merchant_id`),
CONSTRAINT `FK_3d8f81ts5wj2u99ddhinfc1jp` FOREIGN KEY (`scan_id`) REFERENCES `scan` (`id`),
CONSTRAINT `FK_e7fhioqt9b9rp9uhvcjnk31qe` FOREIGN KEY (`merchant_id`) REFERENCES `merchant` (`id`)
)
scan_item:
CREATE TABLE `scan_item` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`is_registered` bit(1) NOT NULL,
`merchant_has_scan_id` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
CONSTRAINT `FK_avcc5q3hkehgreivwhoc5h7rb` FOREIGN KEY (`merchant_has_scan_id`) REFERENCES `merchant_has_scan` (`id`)
)
scan:
CREATE TABLE `scan` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`scan_date` datetime DEFAULT NULL,
`scan_type` int(11) NOT NULL,
PRIMARY KEY (`id`)
)
and the explain:
You don't have the latest version of MySQL, which would be able to create an index for the derived table. (What version are you running?)
The "derived table" (the subquery) will be the first table in the EXPLAIN because, well, it has to be.
merchant_has_scan is a many:many table, but without the optimization tips here -- fixing this may be the biggest factor in speeding it up. Caveat: The tips suggest getting rid of id, but you seem to have a use for id, so keep it.
The COUNT(DISTINCT si.id) and JOIN si... can be replaced by ( SELECT COUNT(*) FROM scan_item WHERE ...), thereby eliminating one of the JOINs and possibly diminishing the Explode-Implode .
LEFT JOIN -- are you sometimes expecting to get NULL for boarding_date? If not, please use JOIN, not LEFT JOIN. (It is better to state your intention than to leave the query open to multiple interpretations.)
If you can remove the LEFTs, then since m.id and merchant_id are specified to be equal, why list them both in the SELECT? (This is a confusion factor, not a speed question).
You say you split it into two -- but you did not. You added LIMIT 100 to the inner query when you pulled it out. If you need that, add it to the derived table, too. Then you may be able to remove GROUP BY m.id LIMIT 100 from the outer query.

query only returns one value and it should return 2

I have the following tables.
CREATE TABLE `Customer` (
`CID` varchar(10) CHARACTER SET latin1 NOT NULL DEFAULT '',
`Name` varchar(40) CHARACTER SET latin1 NOT NULL DEFAULT '',
`City` varchar(40) CHARACTER SET latin1 NOT NULL DEFAULT '',
`State` varchar(40) CHARACTER SET latin1 NOT NULL DEFAULT '',
PRIMARY KEY (`CID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin;
CREATE TABLE `LineItem` (
`LID` varchar(10) NOT NULL DEFAULT '',
`OID` varchar(10) NOT NULL DEFAULT '',
`PID` varchar(110) NOT NULL DEFAULT '',
`Number` int(11) DEFAULT NULL,
`TotalPrice` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`LID`),
KEY `Order ID` (`OID`),
CONSTRAINT `Order ID` FOREIGN KEY (`OID`) REFERENCES `OrderItem` (`OID`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `OrderItem` (
`OID` varchar(10) NOT NULL DEFAULT '',
`CID` varchar(10) NOT NULL DEFAULT '',
PRIMARY KEY (`OID`),
KEY `CID` (`CID`),
CONSTRAINT `CID` FOREIGN KEY (`CID`) REFERENCES `Customer` (`CID`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `Product` (
`PID` varchar(10) NOT NULL DEFAULT '',
`ProductName` varchar(40) DEFAULT '',
`Price` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`PID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
What I've been trying to do in my query is run it so I can successfully get it to do the following:
List the products bought by all the customers of Newark
List the products ordered only by the customers of Newark
For #5, I tried this query:
Select product.productname
From Customer as c INNER JOIN OrderItem as o
ON c.CID = o.CID
INNER JOIN LineItem line
ON o.OID = line.OID
Inner Join Product product
ON line.PID = product.PID
Where C.City = 'Newark'
Having Count(product.productname) > 1;
But it only returns one value and it should return 2 (unless I am not using it properly).
For #6 I understand the concept but I don't know how to "subtract tables" in SQL.
The goal of the first question is to list the common items purchased by everyone from Newark. So if Person A bought Items X, Y and Z and Person B bought W, V, and Y, the query will return "Item Y".
I guess my comment is an answer.
Having Count(product.productname) > 1;
Having requires a group by to function correctly as it's a filter on an aggregate and aggregates require a group by. 90% of database engines would have returned an error explicitly stating it requires a group by...but MySQL prefers to do the wrong thing instead of return an error to you (it's why you got one row...MySQL did a group by of whatever it felt like). Add a group by (I assume on product name with what you have here) and it should work.
you need to add GROUP BY . Try this:
Select product.productname
From Customer as c
INNER JOIN OrderItem as o ON c.CID = o.CID
INNER JOIN LineItem line ON o.OID = line.OID
Inner Join Product product ON line.PID = product.PID
Where C.City = 'Newark'
Group by product.productname
Having Count(*) > 1;
Two other answers have pointed out that if you want the HAVING then you need a GROUP BY. But your question doesn't actually ask a question or explain what your query is supposed to return. (You are not explaining clearly in your question or comments.) You wrote a comment, "I have two people from Newark and was trying to show the item(s) [product(s)?] that both of them purchased." But if your query is only "corrected" with grouping then it calculates the wrong counts.
A problem is that you should return PID (and maybe ProductName). You need to list products. PIDs as key are presumably 1:1 with products but ProductName is not a key so product names are not 1:1 with products. ProductName can even be NULL. So selecting ProductName does not get you all the relevant products. (Also in LineItem PID should be a FOREIGN KEY.)
Another problem is that you should use PID to GROUP BY product. ProductName is not a key of Product. So two products can have the same name. So you will get the count for each product name, not for each product. Plus ProductName can be NULL. (Even if you were only returning the names not PIDs of products with those counts, you would need to group by PID.)
Another problem is the counts are wrong. Grouping by PID groups rows that can be made by combining a Newark Customer row, an OrderItem row and a LineItem row for a product. Those combination rows are counted by COUNT(PID). But you want the number of disinct Newark customers in those rows. You could do this by a sub-select but there happens to be a shorter way.
SELECT p.PID, p.ProductName
FROM Customer c
JOIN OrderItem AS o ON c.CID = o.CID
JOIN LineItem l ON o.OID = l.OID
JOIN Product p ON l.PID = p.PID
WHERE c.City = 'Newark'
GROUP BY (p.PID)
HAVING COUNT(DISTINCT c.CID) > 1;
You get a query for goal 1 after a change in part of its condition.
HAVING COUNT(DISTINCT c.CID)
= (SELECT COUNT(*) FROM Customer WHERE City='Newark')
Relational algebra MINUS/DIFFERENCE corresponds to EXCEPT in the SQL standard. But MySQL does not have it. You can do it using LEFT JOIN, NOT IN or NOT EXISTS instead.

complicated sql query returns a result with empty tables

I have three empty tables
--
-- Tabellenstruktur für Tabelle `projects`
--
CREATE TABLE IF NOT EXISTS `projects` (
`id_project` int(11) NOT NULL AUTO_INCREMENT,
`id_plan` int(11) DEFAULT NULL,
`name` varchar(255) NOT NULL,
`description` longtext NOT NULL,
PRIMARY KEY (`id_project`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
-- --------------------------------------------------------
--
-- Tabellenstruktur für Tabelle `project_plans`
--
CREATE TABLE IF NOT EXISTS `project_plans` (
`id_plan` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` longtext NOT NULL,
`max_projects` int(11) DEFAULT NULL,
`max_member` int(11) DEFAULT NULL,
`max_filestorage` bigint(20) NOT NULL DEFAULT '3221225472' COMMENT '3GB Speicherplatz',
PRIMARY KEY (`id_plan`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
-- --------------------------------------------------------
--
-- Tabellenstruktur für Tabelle `project_users`
--
CREATE TABLE IF NOT EXISTS `project_users` (
`id_user` int(11) NOT NULL,
`id_project` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
All these tables are empty but i get a result with my query?
my query:
SELECT
A.id_plan,
A.name AS plan_name,
A.description AS plan_description,
A.max_projects,
A.max_member,
A.max_filestorage,
B.id_plan,
B.name AS project_name,
B.description AS project_description,
C.id_user,
C.id_project,
COUNT(*) AS max_project_member
FROM
".$this->config_vars["projects_plans_table"]." AS A
LEFT JOIN
".$this->config_vars["projects_table"]." AS B
ON
B.id_plan = A.id_plan
LEFT JOIN
".$this->config_vars["projects_user_table"]." AS C
ON
C.id_project = B.id_project
WHERE
C.id_project = '".$id."'
&& B.deleted = '0'
i think the problem is the COUNT (*) AS ...
how i can solve the problem?
For one, you are getting a record explicitly due to the COUNT(). Even though you have no records, you are asking the engine how many records which at worst case will return zero. Count(), like other aggregates are anticipated to have a group by, so even though you don't have one, you are still asking.
So the engine is basically stating hey... there are no records, but I have to send you a record so you can get the count() column to look at and do with what you will. So, it is doing what you asked.
Now, for the comment to the other question where you asked...
Yes but i want to count the project member from a project, how i can count the users from project_users where all users have the id_project 1.
Since you only care about a count, and not the specific WHO involved, you can get this result directly from the project_users table (which should have an index on both the ID_User and another on the ID_Project. Then
select count(*)
from project_users
where id_project = 1
To expand from basis of your original question to get the extra details, I would do...
select
p.id_project,
p.id_plan,
p.name as projectName,
p.description as projectDescription,
pp.name as planName,
pp.description as planDescription,
pp.max_projects,
pp.max_member,
pp.max_filestorage,
PJCnt.ProjectMemberCount
from
( select id_project,
count(*) as ProjectMemberCount
from
project_users
where
id_project = 1 ) PJCnt
JOIN Projects p
on PJCnt.id_project = p.id_project
JOIN Project_Plans PP
on p.id_plan = pp.id_plan
Now, based on this layout of tables, a plan can have a max member count, but there is nothing indicating max members for the plan based on all projects, or max per SINGLE project. So, if a plan allows for 20 people, can there be 20 people for 10 different projects under the same plan? That's something only you would know the impact of... just something to consider what you are asking for.
Your cleaned-up query should look like :
See sqlfidle demo as well : http://sqlfiddle.com/#!2/e693f5/9
SELECT
A.id_plan,
A.name AS plan_name,
A.description AS plan_description,
A.max_projects,
A.max_member,
A.max_filestorage,
B.id_plan,
B.name AS project_name,
B.description AS project_description,
C.id_user,
C.id_project,
COUNT(*) AS max_project_member
FROM
project_plans AS A
LEFT JOIN
projects AS B
ON
B.id_plan = A.id_plan
LEFT JOIN
project_users AS C
ON
C.id_project = B.id_project
WHERE
C.id_project = '".$id."';
This will return you null values for all the cols from the select because you have one legit return form the result set and that is the count(*) output 0.
To fix this just add a group by at the end (see group by example http://sqlfiddle.com/#!2/14d46/2) or
Remove the count(*) and the null values will be gone as well as the count(*) values 0
See simple sql example here : http://sqlfiddle.com/#!2/ab7dd/5
Just comment the count() and you fixed you null problem!

Mysql query to get detail of comma-separated ids data

I have 2 tables, items and members :
CREATE TABLE IF NOT EXISTS `items` (
`id` int(5) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`member` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `members` (
`id` int(5) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
What if, for example I have a record inside items, such as
INSERT INTO `test`.`items` (
`id` ,
`name` ,
`member`
)
VALUES (
NULL , 'xxxx', '1, 2, 3'
);
in members :
INSERT INTO `members` (`id`, `name`) VALUES
(1, 'asdf'),
(2, 'qwert'),
(3, 'uiop'),
(4, 'jkl;');
and I'd like to display items.member data with members.name, something like 1#asdf, 2#qwert, 3#uiop??
I've tried the following query,
SELECT items.id, items.name, GROUP_CONCAT(CONCAT_WS('#', members.id, members.name) ) as member
FROM `items`
LEFT JOIN members AS members on (members.id = items.member)
WHERE items.id = 1
But the result is not like I expected. Is there any other way to display the data via one call query? Because I'm using PHP, right now, i'm explode items.member and loop it one by one, to display the members.name.
You could look into using FIND_IN_SET() in your join criteria:
FROM items JOIN members ON FIND_IN_SET(members.id, items.member)
However, note from the definition of FIND_IN_SET():
A string list is a string composed of substrings separated by “,” characters.
Therefore the items.member column should not contain any spaces (I suppose you could use FIND_IN_SET(members.id, REPLACE(items.member, ' ', '')) - but this is going to be extremely costly as your database grows).
Really, you should normalise your schema:
CREATE TABLE memberItems (
item_id INT(5) NOT NULL,
member_id INT(5) NOT NULL,
FOREIGN KEY item_id REFERENCES items (id),
FOREIGN KEY member_id REFERENCES members (id)
);
INSERT INTO memberItems
(item_id, member_id)
SELECT items.id, members.id
FROM items
JOIN members ON FIND_IN_SET(members.id, REPLACE(items.member,' ',''))
;
ALTER TABLE items DROP member;
This is both index-friendly (and therefore can be queried very efficiently) and has the database enforce referential integrity.
Then you can do:
FROM items JOIN memberItems ON memberItems.item_id = items.id
JOIN members ON members.id = memberItems.member_id
Note also that it's generally unwise to use GROUP_CONCAT() to combine separate records into a string in this fashion: your application should instead be prepared to loop over the resultset to fetch each member.
Please take a look at this sample:
SQLFIDDLE
Your query seems to work for what you have mentioned in the question... :)
SELECT I.ID, I.ITEM,
GROUP_CONCAT(CONCAT("#",M.ID,
M.NAME, " ")) AS MEMB
FROM ITEMS AS I
LEFT JOIN MEMBERS AS M
ON M.ID = I.MID
WHERE i.id = 1
;
EDITTED ANSWER
This query will not work for you¬ as your schema doesn't seem to have any integrity... or proper references. Plus your memeber IDs are delimtted by a comma, which has been neglected in this answer.

MySQL multiple LEFT OUTER JOIN bug

I hope someone can help me with my MySQL problem. I have a bug where if there is one left outer join on contribution table, result of amount is $100 (which is correct). If I include a second left outer join of another table (ikes). And I have 2 ikes, it doubles amount ($200), if I have 3 ikes, it triples ($300). For the life of me, I cannot figure this out. What do the ikes have any to do with the contribution amount? I've separated the queries and they work by themselves. But together they cause the problem.
Can anyone see the problem? I've included the query and the tables below.
SELECT COUNT(i.type) AS xlike,
SUM(c.amount) AS amount,
w.*
FROM wish w
LEFT OUTER JOIN contributions c ON w.ID=c.receiveid
LEFT OUTER JOIN ikes i ON w.ID=i.wishid
WHERE w.ID = 236
Tables:
CREATE TABLE IF NOT EXISTS `contributions` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`amount` decimal(19,2) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;
CREATE TABLE IF NOT EXISTS `ikes` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`type` enum('likes','dislikes') NOT NULL,
`wishid` int(11) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
While most will tell you to use JOINs, you have to be aware that joins will duplicate parent records if more than one child record is associated to it. This is what can inflate values from aggregate functions.
I re-wrote your query as:
SELECT w.*,
COALESCE(x.amount, 0) AS amount,
COALESCE(y.type, 0) AS type
FROM WISH w
LEFT JOIN (SELECT c.receiveid,
SUM(c.amount) AS amount
FROM CONTRIBUTIONS c
GROUP BY c.receiveid) x ON x.receiveid = w.ID
LEFT JOIN (SELECT i.wishid,
COUNT(i.type) AS type
FROM IKES i
GROUP BY i.wishid) y ON y.wishid = w.ID
WHERE w.ID = 236