I did a test today and there were 2 questions I couldn't figure out. I forgot the second one, but here is the first:
You have a database about beer. Three tables, only 2 relevant to the question. These are:
Variaties:
variety_id
variety_name
Beers:
beer_id
beer_name
variety_id
beer_alcohol
beer_alcohol is a double, representing the alcohol percentage.
There were 38 varieties of beer and 1215 individual beer entries.
The question was: Display all individual varieties of beer, per variety the highest alcohol percentage and also the name of the beer that has this percentage.
At first sight, this is an "inner join" on the variety_id, a "max()" on the alcohol and a "group by" on the variety_id/variety_name.
The problem is, this won't display the name of the beer with the highest % alcohol of it's variety. It will display the alphabetically first beer of its variety.
And I cracked my head over it but I can't begin to image how to do this without a function.
Can someone enlighten me?
do one more join which joins on varietyid and %. doing this on the outside gets the correct info.
this will bring 2 results back if there are beers with equal percentages
You could also sub-select, eg
select variety_id, variety_name, beer_name, abv
FROM varieties
JOIN (
select beer_id, beer_name, variety_id, MAX(beer_alcohol) as abv
FROM beers
GROUP BY variety_id
) booziest ON booziest.variety_id = varieties.variety_id
I usually like to have sample data and sample output but let's blindly answer the question.
This is a greatest-n-per-group question (with n = 1). You can solve this with a derived table with a group by or, my personal favourite, the left join:
SELECT v.variety_id, v.variety_name, b1.* FROM beers b1
LEFT JOIN beers b2
ON b1.variety_id = b2.variety_id AND b1.beer_alcohol < b2.beer_alcohol
WHERE b2.beer_alcohol IS NULL
JOIN varieties v ON b1.variety_id = v.variety_id
Related
I have a list of plant, which can be filtered with a CONCAT, originally it was just text, but I have converted it to ID's instead. It was showing all records and could be filtered before I converted to ID's.
This involves 4 tables. (with example data) "" are not used in the fields, they are just to show you that it is a word.
plant
idplant example 1
plantname example "001 Forklift"
idplanttype1 example 1
idlocation1 example 1
iddepartment1 example 1
planttypes
idplanttype example 1
planttype example "Forklift Truck"
locations
idlocation example 1
location example "Preston"
departments
iddepartment example 1
department example "Waste Disposal"
Without the WHERE statement, it shows all records, including nulls. (but the filter doesn't work)
But With the WHERE statement, it is only showing complete records (all of which have no Null fields and the filter works) records with nulls do not show
The issue seems to be the CONCAT. (i've cleaned up the parentheses, but had to add a 1 to make the id's different)
if(isset($_POST['search'])) {$valueToSearch = $_POST['valueToSearch'];}
$sql = "
SELECT idplant, plantname, planttype, location, department
FROM plant
LEFT JOIN planttypes ON idplanttype1 = idplanttype
LEFT JOIN locations ON idlocation1 = idlocation
LEFT JOIN departments ON iddepartment1 = iddepartment
WHERE CONCAT(plantname, planttype, location, department) LIKE
'%".$valueToSearch."%'
ORDER BY plantname";
SOLUTION
The above code works, it was just missing.
WHERE CONCAT_WS
I'm new to Joins, so any help would be greatly appreciated.
Edit: Using Linux Server - Apache Version 2.4.46
Thanks in advance!
Your problem is probably blanks.
WHERE CONCAT(plantname, planttype, location, department)
LIKE '%001 Forklift Forklift Truck Preston Waste Disposal%'
won't find anything for example, as the concated strings result in '001 ForkliftForklift TruckPrestonWaste Disposal', not '001 Forklift Forklift Truck Preston Waste Disposal'.
You want blanks between the substrings, which is easiest to achieve with CONCAT_WS:
SELECT p.idplant, p.plantname, pt.planttype, l.location, d.department
FROM plant p
INNER JOIN planttypes pt ON pt.idplanttype = p.idplanttype1
INNER JOIN locations l ON l.idlocation = p.idlocation1
INNER JOIN departments d ON d.iddepartment = p.iddepartment1
WHERE CONCAT_WS(' ', p.plantname, pt.planttype, l.location, d.department)
LIKE '%001 Forklift Forklift Truck Preston Waste Disposal%'
I have been having this doubt for a while now, after some practices in SQL I started to ask myself: 'When is the right time to use NATURAL JOIN'?
Due to the enormous size of the database example that I'm using to practice my SQL skills I'm just going to put two sample queries here. Let's say I want to
Find, for each item, the total quantity sold by the departments on the second floor
The sample answer of this question is:
SELECT Item.ItemName, SUM(SaleQTY)
FROM Item INNER JOIN Sale INNER JOIN Department
ON Item.ItemID = Sale.ItemID
AND Department.DepartmentID = Sale.DepartmentID
WHERE DepartmentFloor = 2
GROUP BY Item.ItemName
ORDER BY Item.ItemName;
However when doing this question myself I only used NATURAL JOIN and here is my attempt:
SELECT Item.ItemName, SUM(SaleQTY)
FROM Item NATURAL JOIN SALE NATURAL JOIN Department
WHERE DepartmentFloor = 2
GROUP BY Item.ItemName
ORDER BY Item.ItemName
And it produced the exact same output as the sample answer:
ItemName SUM(SaleQTY)
Boots - snakeproof 2
Camel saddle 1
Elephant polo stick 1
Hat - polar explorer 3
Pith helmet 1
Pocket knife - Nile 2
Sextant 2`
I understand that the reason for an INNER JOIN is to ensure the integrity of the data by these conditions applied in the code and eliminate any data that does not satisfy them. But still I'm wondering, is NATURAL JOIN sufficient enough to crack this problem?
If not, what are some important rules to follow?
I have been playing around with this for what seems like hours and I can't get the results I want. Here is the query I am having trouble with:
SELECT year.year, dstate,
(SELECT sum(amount) FROM gift
WHERE year.year = gift.year
AND gift.donorno = donor.donorno)
FROM donor, gift, year
WHERE year.year = gift.year
AND gift.donorno = donor.donorno;
This seems redundant. Anyway, I am trying display the total donations (gift.amount) for each state by year.
ex.
1999 GA 500 (donorno 1 from GA donated 200 and donorno 2 from GA donated 300)
1999 FL 400
2000 GA 600
2000 FL 500
...
To clarify donors can be from the same state but I am trying to total the gift amounts for that state for the year it is donated.
Any advice is appreciated. I feel like the answer is right in front of me.
Here is a picture of tables for reference:
This is a very simple join & aggregation problem.
SELECT y.year, d.state, SUM(g.amount) AS total
FROM gift AS g
INNER JOIN year AS y ON y.year=g.year
INNER JOIN donor AS d ON d.donorno=g.donorno
GROUP BY y.year, d.state
You don't need the sub-query in your SELECT clause in order to get the total amount. You can sum it by grouping. (I think the GROUP BY clause is what you're missing. I recommend reading up on it.) What you've done is called a correlated sub-query and it is going to be very slow over large data sets because it has to be calculated row-by-row instead of as a set operation.
Also, please don't use the old style comma join syntax. Instead use the explicit join syntax as shown above. It is much clearer and will help avoid accidental Cartesian products.
I have MySQL question I cannot solve myself (for the first time).
I have a query-with-parameters database plus PHP program that, together, generate extensive MySQL queries to run.
The problem is actually a simple one: that of correct summation. I need to SUM distinct rows (not values) within a complex, multi-joined query, and I cannot get it to work.
Do not ask why I work with the data structure below - I am working with data that is supplied to me and it needs to be this way. (The tables represent existing invoices.)
I will try to reproduce the situation very simplified here.
TABLE INVOICE
=============
Inv.Nr (ID) Other Data
------------------------
#1 Stuff
#2 Stuff
#3 More Stuff
TABLE INVOICE LINE
==================
ID Inv.Nr QUANTITY ArticleID UNIT PRICE
----------------------------------------------
1 #1 1 5 € 2.50
2 #1 1 109 € 4.00
3 #2 4 77 € 5.00
4 #2 10 91 € 6.00
TABLE INVOICE LINE VAT
======================
ID LINE-ID AMOUNT VATP VAT
1 1 € 2.00 25% € 0.50
2 2 € 2.00 25% € 0.50
3 2 € 1.42 6% € 0.08
4 3 €18.87 6% € 1.23
5 4 €16.00 25% € 4.00
6 4 €37.74 6% € 2.26
As you can see: some articles have a double VAT rate, because they consist of more elements that have different VAT rates (i.e. a book with a cd).
Now the queries are very long, there are much more tables joined that can have dynamic WHERE and GROUP BY clauses. So a query might look somewhat like (again much simplified):
SELECT `Inv.Nr`, ArticleID, SUM(Quantity), SUM(Amount), SUM(VAT)
FROM ((((`Invoice` INNER JOIN `Invoice Line`
ON `Invoice`.`Inv.Nr`=`Invoice Line`.`Inv.Nr`)
INNER JOIN `Invoice Line VAT`
ON `Invoice Line`.ID = `Invoice Line VAT`.`Line-ID`)
INNER JOIN `More Stuff`
ON .... )
INNER JOIN ....
ON ..... )
WHERE ....
GROUP BY .....
HAVING .....
The INNER JOINs defined by ... are many to 1, so Invoice Line VAT is on the many-side of both its JOIN relations.
The WHERE, GROUP BY and HAVING are semi-dynamically created in PHP code.
My problem is that i cannot get a proper SUM(Amount) and SUM(Quantity) at the same time, since the Quantity is added multiple times if there are multiple VAT rates to one invoice line.
SUM(DISTINCT Quantity) obviously doesn't work, since I need distinct rows, not values.
I cannot really create a subquery that either calculates the number of VAT rates (and divides the SUM(Quantity)), or calculates the Amount, since the subquery needs the same WHERE/HAVING parameters as the main query to work properly, and those are semi-dynamic (the queries are in a database and contain parameters that are filled in following the user's commands). Well, to be fair, I could do it, but it would leave the query-database and the php software extremely complicated, and I don't want to use a very complex solution for such a very simple problem, especially since someone else will have to maintain it in the future.
So how do I:
SUM the quantity only on distinct rows, or
COUNT the number of VAT rates per line, given the WHERE/HAVING (so without a subquery)?
I could add extra fields to the tables to help with this problem, but that possibility didn't help me - yet. For instance: storing the number of VAT rates doesn't help, since in the WHERE there may be a selection on VAT rate.
I hope it is something VERY simple that I overlooked, but I have been searching for hours now to no avail...
If anyone can help me that would be great! Thanks in advance!
EDIT: I found a solution, but I am not very pleased with it. I have to split up the WHERE, and SUM SUMs and repeat columns... It is UGLY and badly maintainable.
It is as follows:
SELECT `Inv.Nr`, ArticleID, SUM(Quantity), SUM(Amount), SUM(VAT)
FROM ((`Invoice` INNER JOIN `Invoice Line`
ON `Invoice`.`Inv.Nr`=`Invoice Line`.`Inv.Nr`)
INNER JOIN
(SELECT SUM(Amount) AS Amount, SUM(VAT) AS VAT, `Line-ID`
FROM ((`Invoice Line VAT`
INNER JOIN `More Stuff`
ON .... )
INNER JOIN ....
ON ..... )
WHERE some-where-stuff
GROUP BY `Line-ID`) x
ON `Invoice Line`.ID = x.`Line-ID`)
WHERE other-where-stuff
GROUP BY .....
HAVING .....
I hope someone got a more elegant, simpler solution!
In an update to the question, I answered the question myself. I said that I hoped for a less ugly and badly maintainable solution than:
SELECT `Inv.Nr`, ArticleID, SUM(Quantity), SUM(Amount), SUM(VAT)
FROM ((`Invoice` INNER JOIN `Invoice Line`
ON `Invoice`.`Inv.Nr`=`Invoice Line`.`Inv.Nr`)
INNER JOIN
(SELECT SUM(Amount) AS Amount, SUM(VAT) AS VAT, `Line-ID`
FROM ((`Invoice Line VAT`
INNER JOIN `More Stuff`
ON .... )
INNER JOIN ....
ON ..... )
WHERE some-where-stuff
GROUP BY `Line-ID`) x
ON `Invoice Line`.ID = x.`Line-ID`)
WHERE other-where-stuff
GROUP BY .....
HAVING .....
It turns out, that, now that I am working with this solution and rephrasing all my queries based in it, it is not so humongous and ugly after all. It turns out that it works quite well and much better than other solutions and workarounds I have tried. Because I guess there is no other solution than what I wrote I close this question by answering that above cited answer is the right one.
It turns out that using the correct SQL code instead of workarounds is the right way to do, even when it looks too complicated at first. And since there is nothing like SUM(DISTINCT ...) that works with distinct records instead of values, in this case the above code is the correct code.
Let's assume we have this very simple table:
|class |student|
---------------
Math Alice
Math Bob
Math Peter
Math Anne
Music Bob
Music Chis
Music Debbie
Music Emily
Music David
Sports Alice
Sports Chris
Sports Emily
.
.
.
Now I want to find out, who I have the most classes in common with.
So basically I want a query that gets as input a list of classes (some subset of all classes)
and returns a list like:
|student |common classes|
Brad 6
Melissa 4
Chris 3
Bob 3
.
.
.
What I'm doing right now is a single query for every class. Merging the results is done on the client side. This is very slow, because I am a very hardworking student and I'm attending around 1000 classes - and so do most of the other students. I'd like to reduce the transactions and do the processing on the server side using stored procedures. I have never worked with sprocs, so I'd be glad if someone could give me some hints on how to do that.
(note: I'm using a MySQL cluster, because it's a very big school with 1 million classes and several million students)
UPDATE
Ok, it's obvious that I'm not a DB expert ;) 4 times the nearly the same answer means it's too easy.
Thank you anyway! I tested the following SQL statement and it's returning what I need, although it is very slow on the cluster (but that will be another question, I guess).
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
But actually I simplified my problem a bit too much, so let's make a bit it harder:
Some classes are more important than others, so they are weighted:
| class | importance |
Music 0.8
Math 0.7
Sports 0.01
English 0.5
...
Additionally, students can be more ore less important.
(In case you're wondering what this is all about... it's an analogy. And it's getting worse. So please just accept that fact. It has to do with normalizing.)
|student | importance |
Bob 3.5
Anne 4.2
Chris 0.3
...
This means a simple COUNT() won't do it anymore.
In order to find out who I have the most in common with, I want to do the following:
map<Student,float> studentRanking;
foreach (Class c in myClasses)
{
float myScoreForClassC = getMyScoreForClass(c);
List students = getStudentsAttendingClass(c);
foreach (Student s in students)
{
float studentScoreForClassC = c.classImportance*s.Importance;
studentRanking[s] += min(studentScoreForClassC, myScoreForClassC);
}
}
I hope it's not getting too confusing.
I should also mention that I myself am not in the database, so I have to tell the SELECT statement / stored procedure, which classes I'm attending.
SELECT
tbl.student,
COUNT(tbl.class) AS common_classes
FROM
tbl
WHERE tbl.class IN (SELECT
sub.class
FROM
tbl AS sub
WHERE
(sub.student = "BEN")) -- substitue "BEN" as appropriate
GROUP BY tbl.student
ORDER BY common_classes DESC;
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
Update re your question update.
Assuming there's a table class_importance and student_importance as you describe above:
SELECT classes.student, SUM(ci.importance*si.importance) AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
The only thing this doesn't have is the LEAST(weighted_importance, myScoreForClassC) because I don't know how you calculate that.
Supposing you have another table myScores:
class | score
Math 10
Sports 0
Music 0.8
...
You can combine it all like this (see the extra LEAST inside the SUM):
SELECT classes.student, SUM(LEAST(m.score,ci.importance*si.importance)) -- min
AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
LEFT JOIN myScores m ON classes.class=m.class -- add in myScores
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
If your myScores didn't have a score for a particular class and you wanted to assign some default, you could use IFNULL(m.score,defaultvalue).
As I understand your question, you can simply run a query like this:
SELECT `student`, COUNT(`class`) AS `commonClasses`
FROM `classes_to_students`
WHERE `class` IN ('Math', 'Music', 'Sport')
GROUP BY `student`
ORDER BY `commonClasses` DESC
Do you need to specify the classes? Or could you just specify the student? Knowing the student would let you get their classes and then get the list of other students who share those classes.
SELECT
otherStudents.Student,
COUNT(*) AS sharedClasses
FROM
class_student_map AS myClasses
INNER JOIN
class_student_map AS otherStudents
ON otherStudents.class = myClasses.class
AND otherStudents.student != myClasses.student
WHERE
myClasses.student = 'Ben'
GROUP BY
otherStudents.Student
EDIT
To follow up your edit, you just need to join on the new table and do your calculation.
Using the SQL example you gave in the edit...
SELECT
classes_table.student,
MIN(class_importance.importance * student_importance.importance) as rank
FROM
classes_table
INNER JOIN
class_important
ON classes_table.class = class_importance.class
INNER JOIN
student_important
ON classes_table.student = student_importance.student
WHERE
classes_table.class in (my_subject_list)
GROUP BY
classes_table.student
ORDER BY
2