I have been having this doubt for a while now, after some practices in SQL I started to ask myself: 'When is the right time to use NATURAL JOIN'?
Due to the enormous size of the database example that I'm using to practice my SQL skills I'm just going to put two sample queries here. Let's say I want to
Find, for each item, the total quantity sold by the departments on the second floor
The sample answer of this question is:
SELECT Item.ItemName, SUM(SaleQTY)
FROM Item INNER JOIN Sale INNER JOIN Department
ON Item.ItemID = Sale.ItemID
AND Department.DepartmentID = Sale.DepartmentID
WHERE DepartmentFloor = 2
GROUP BY Item.ItemName
ORDER BY Item.ItemName;
However when doing this question myself I only used NATURAL JOIN and here is my attempt:
SELECT Item.ItemName, SUM(SaleQTY)
FROM Item NATURAL JOIN SALE NATURAL JOIN Department
WHERE DepartmentFloor = 2
GROUP BY Item.ItemName
ORDER BY Item.ItemName
And it produced the exact same output as the sample answer:
ItemName SUM(SaleQTY)
Boots - snakeproof 2
Camel saddle 1
Elephant polo stick 1
Hat - polar explorer 3
Pith helmet 1
Pocket knife - Nile 2
Sextant 2`
I understand that the reason for an INNER JOIN is to ensure the integrity of the data by these conditions applied in the code and eliminate any data that does not satisfy them. But still I'm wondering, is NATURAL JOIN sufficient enough to crack this problem?
If not, what are some important rules to follow?
Related
I have a list of plant, which can be filtered with a CONCAT, originally it was just text, but I have converted it to ID's instead. It was showing all records and could be filtered before I converted to ID's.
This involves 4 tables. (with example data) "" are not used in the fields, they are just to show you that it is a word.
plant
idplant example 1
plantname example "001 Forklift"
idplanttype1 example 1
idlocation1 example 1
iddepartment1 example 1
planttypes
idplanttype example 1
planttype example "Forklift Truck"
locations
idlocation example 1
location example "Preston"
departments
iddepartment example 1
department example "Waste Disposal"
Without the WHERE statement, it shows all records, including nulls. (but the filter doesn't work)
But With the WHERE statement, it is only showing complete records (all of which have no Null fields and the filter works) records with nulls do not show
The issue seems to be the CONCAT. (i've cleaned up the parentheses, but had to add a 1 to make the id's different)
if(isset($_POST['search'])) {$valueToSearch = $_POST['valueToSearch'];}
$sql = "
SELECT idplant, plantname, planttype, location, department
FROM plant
LEFT JOIN planttypes ON idplanttype1 = idplanttype
LEFT JOIN locations ON idlocation1 = idlocation
LEFT JOIN departments ON iddepartment1 = iddepartment
WHERE CONCAT(plantname, planttype, location, department) LIKE
'%".$valueToSearch."%'
ORDER BY plantname";
SOLUTION
The above code works, it was just missing.
WHERE CONCAT_WS
I'm new to Joins, so any help would be greatly appreciated.
Edit: Using Linux Server - Apache Version 2.4.46
Thanks in advance!
Your problem is probably blanks.
WHERE CONCAT(plantname, planttype, location, department)
LIKE '%001 Forklift Forklift Truck Preston Waste Disposal%'
won't find anything for example, as the concated strings result in '001 ForkliftForklift TruckPrestonWaste Disposal', not '001 Forklift Forklift Truck Preston Waste Disposal'.
You want blanks between the substrings, which is easiest to achieve with CONCAT_WS:
SELECT p.idplant, p.plantname, pt.planttype, l.location, d.department
FROM plant p
INNER JOIN planttypes pt ON pt.idplanttype = p.idplanttype1
INNER JOIN locations l ON l.idlocation = p.idlocation1
INNER JOIN departments d ON d.iddepartment = p.iddepartment1
WHERE CONCAT_WS(' ', p.plantname, pt.planttype, l.location, d.department)
LIKE '%001 Forklift Forklift Truck Preston Waste Disposal%'
I did a test today and there were 2 questions I couldn't figure out. I forgot the second one, but here is the first:
You have a database about beer. Three tables, only 2 relevant to the question. These are:
Variaties:
variety_id
variety_name
Beers:
beer_id
beer_name
variety_id
beer_alcohol
beer_alcohol is a double, representing the alcohol percentage.
There were 38 varieties of beer and 1215 individual beer entries.
The question was: Display all individual varieties of beer, per variety the highest alcohol percentage and also the name of the beer that has this percentage.
At first sight, this is an "inner join" on the variety_id, a "max()" on the alcohol and a "group by" on the variety_id/variety_name.
The problem is, this won't display the name of the beer with the highest % alcohol of it's variety. It will display the alphabetically first beer of its variety.
And I cracked my head over it but I can't begin to image how to do this without a function.
Can someone enlighten me?
do one more join which joins on varietyid and %. doing this on the outside gets the correct info.
this will bring 2 results back if there are beers with equal percentages
You could also sub-select, eg
select variety_id, variety_name, beer_name, abv
FROM varieties
JOIN (
select beer_id, beer_name, variety_id, MAX(beer_alcohol) as abv
FROM beers
GROUP BY variety_id
) booziest ON booziest.variety_id = varieties.variety_id
I usually like to have sample data and sample output but let's blindly answer the question.
This is a greatest-n-per-group question (with n = 1). You can solve this with a derived table with a group by or, my personal favourite, the left join:
SELECT v.variety_id, v.variety_name, b1.* FROM beers b1
LEFT JOIN beers b2
ON b1.variety_id = b2.variety_id AND b1.beer_alcohol < b2.beer_alcohol
WHERE b2.beer_alcohol IS NULL
JOIN varieties v ON b1.variety_id = v.variety_id
I'm working on a EAV database implemented in MySQL so when I say entity, you can read that as table. Since it's a non-relational database I cannot provide any SQL for tables etc but I'm hoping to get the conceptual answer for a relational database and I will translate to EAV SQL myself.
I'm building a mini stock market system. There is an "asset" entity that can have many "demand" and "offer" entities. The asset entity also may have many "deal" entites. Each deal entity has a "share_price" attribute. Not all assets have demand, offer or deal entities.
I want to return a list of offer and demand entities, grouped by asset i.e. if an asset has 2 offers and 3 demands only 1 result will show. This must be sorted by the highest share_price of deals attached to assets of the demand or offer. Then, the highest share_price for each demand or offer is sorted overall. If an asset has demands or offers but no deals, it will be returned with NULL for share_price.
So say the data is like this:
Asset 1 has 1 offer, 1 demand and 2 deals with share_price 7.50 and 12.00
Asset 2 has 1 offer and 1 deal with share_price 8.00
Asset 3 has 3 offers and 3 demands and no deals
Asset 4 has no offers and no demand and 1 deal with share_price 13.00
I want the results:
Asset share_price
Asset 1 12.00
Asset 2 8.00
Asset 3 null
Note: Asset 4 is not in the result set because it has no offers or demands.
I know this is a complex one with I really dont want to have to go to database more than once or do any array re-ordering in PHP. Any help greatly appreciated.
Some users want to see SQL I have. Here it is but this won't make too much sense as its a specialised EAV Database.
SELECT DISTINCT data.asset_guid, r.guid_two, data.share_price FROM (
select rr.guid_one as asset_guid, max(msv.string) as share_price from market_entities ee
join market_entity_relationships rr on ee.guid = rr.guid_two
JOIN market_metadata as mt on ee.guid = mt.entity_guid
JOIN market_metastrings as msn on mt.name_id = msn.id
JOIN market_metastrings as msv on mt.value_id = msv.id
where subtype = 6 and msn.string = 'share_price' and rr.relationship = 'asset_deal'
group by
rr.guid_one
) data
left outer JOIN market_entities e on e.guid = data.asset_guid
left outer JOIN market_entity_relationships r on r.guid_one = e.guid
WHERE r.relationship = 'trade_share'
GROUP BY data.asset_guid
Without fully understanding your table structure (you should post that), looks like you just need to use a single LEFT JOIN, with GROUP BY and MAX:
SELECT a.assetname, MAX(d.share_price)
FROM asset a
LEFT JOIN deal d ON a.AssetId = d.AssetId
GROUP BY a.assetname
ORDER BY MAX(d.share_price) DESC
I'm using the assumption that your Asset table and your Deal table have a common key, in the above case, AssetId. Not sure why you'd need to join on Demand or Offer, unless those link to your Deal table. Posting your table structure would alleviate that concern...
--EDIT--
In regards to your comments, you want to only show the assets which have either an offer or a demand? If so, this should work:
SELECT a.assetname, MAX(d.share_price)
FROM asset a
LEFT JOIN deal d ON a.AssetId = d.AssetId
LEFT JOIN offer o ON o.AssetId = d.AssetId
LEFT JOIN demand de ON de.AssetId = d.AssetId
WHERE o.AssetId IS NOT NULL OR de.AssetId IS NOT NULL
GROUP BY a.assetname
ORDER BY MAX(d.share_price) DESC
This will only include the asset if it has at least an offer or at least a demand.
assuming you have 3 tables, assets, offers and shares, you can use a query like below.
SELECT asset, MAX(share_Price)
FROM assets
INNER JOIN offers ON assets.id = offers.id //requires there are offers
LEFT OUTER JOIN shares ON assets.id = shares.id // returns results even if no shares
GROUP BY asset
ORDER BY asset
Let's assume we have this very simple table:
|class |student|
---------------
Math Alice
Math Bob
Math Peter
Math Anne
Music Bob
Music Chis
Music Debbie
Music Emily
Music David
Sports Alice
Sports Chris
Sports Emily
.
.
.
Now I want to find out, who I have the most classes in common with.
So basically I want a query that gets as input a list of classes (some subset of all classes)
and returns a list like:
|student |common classes|
Brad 6
Melissa 4
Chris 3
Bob 3
.
.
.
What I'm doing right now is a single query for every class. Merging the results is done on the client side. This is very slow, because I am a very hardworking student and I'm attending around 1000 classes - and so do most of the other students. I'd like to reduce the transactions and do the processing on the server side using stored procedures. I have never worked with sprocs, so I'd be glad if someone could give me some hints on how to do that.
(note: I'm using a MySQL cluster, because it's a very big school with 1 million classes and several million students)
UPDATE
Ok, it's obvious that I'm not a DB expert ;) 4 times the nearly the same answer means it's too easy.
Thank you anyway! I tested the following SQL statement and it's returning what I need, although it is very slow on the cluster (but that will be another question, I guess).
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
But actually I simplified my problem a bit too much, so let's make a bit it harder:
Some classes are more important than others, so they are weighted:
| class | importance |
Music 0.8
Math 0.7
Sports 0.01
English 0.5
...
Additionally, students can be more ore less important.
(In case you're wondering what this is all about... it's an analogy. And it's getting worse. So please just accept that fact. It has to do with normalizing.)
|student | importance |
Bob 3.5
Anne 4.2
Chris 0.3
...
This means a simple COUNT() won't do it anymore.
In order to find out who I have the most in common with, I want to do the following:
map<Student,float> studentRanking;
foreach (Class c in myClasses)
{
float myScoreForClassC = getMyScoreForClass(c);
List students = getStudentsAttendingClass(c);
foreach (Student s in students)
{
float studentScoreForClassC = c.classImportance*s.Importance;
studentRanking[s] += min(studentScoreForClassC, myScoreForClassC);
}
}
I hope it's not getting too confusing.
I should also mention that I myself am not in the database, so I have to tell the SELECT statement / stored procedure, which classes I'm attending.
SELECT
tbl.student,
COUNT(tbl.class) AS common_classes
FROM
tbl
WHERE tbl.class IN (SELECT
sub.class
FROM
tbl AS sub
WHERE
(sub.student = "BEN")) -- substitue "BEN" as appropriate
GROUP BY tbl.student
ORDER BY common_classes DESC;
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
Update re your question update.
Assuming there's a table class_importance and student_importance as you describe above:
SELECT classes.student, SUM(ci.importance*si.importance) AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
The only thing this doesn't have is the LEAST(weighted_importance, myScoreForClassC) because I don't know how you calculate that.
Supposing you have another table myScores:
class | score
Math 10
Sports 0
Music 0.8
...
You can combine it all like this (see the extra LEAST inside the SUM):
SELECT classes.student, SUM(LEAST(m.score,ci.importance*si.importance)) -- min
AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
LEFT JOIN myScores m ON classes.class=m.class -- add in myScores
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
If your myScores didn't have a score for a particular class and you wanted to assign some default, you could use IFNULL(m.score,defaultvalue).
As I understand your question, you can simply run a query like this:
SELECT `student`, COUNT(`class`) AS `commonClasses`
FROM `classes_to_students`
WHERE `class` IN ('Math', 'Music', 'Sport')
GROUP BY `student`
ORDER BY `commonClasses` DESC
Do you need to specify the classes? Or could you just specify the student? Knowing the student would let you get their classes and then get the list of other students who share those classes.
SELECT
otherStudents.Student,
COUNT(*) AS sharedClasses
FROM
class_student_map AS myClasses
INNER JOIN
class_student_map AS otherStudents
ON otherStudents.class = myClasses.class
AND otherStudents.student != myClasses.student
WHERE
myClasses.student = 'Ben'
GROUP BY
otherStudents.Student
EDIT
To follow up your edit, you just need to join on the new table and do your calculation.
Using the SQL example you gave in the edit...
SELECT
classes_table.student,
MIN(class_importance.importance * student_importance.importance) as rank
FROM
classes_table
INNER JOIN
class_important
ON classes_table.class = class_importance.class
INNER JOIN
student_important
ON classes_table.student = student_importance.student
WHERE
classes_table.class in (my_subject_list)
GROUP BY
classes_table.student
ORDER BY
2
I have a custom shop, and I need to redo the shipping. However, that is sometimes later, and in the meantime, I need to add a shipping option for when a cart only contains a certain range of products.
SO there is a ship_method table
id menuname name zone maxweight
1 UK Standard ukfirst 1 2000
2 UK Economy uksecond 1 750
3 Worldwide Air world_air 4 2000
To this I have added another column prod_restrict which is 0 for the existing ones, and 1 for the restricted ones, and a new table called ship_prod_restrict which contains two columns, ship_method_id and item_id, listing what products are allowed in a shipping category.
So all I need to do is look in my transactions, and for each cart, just check which shipping methods are either prod_restrict of 0 or have 1 and have no products in the cart that aren't in the restriction table.
Unfortunately it seems that because you can't values from an outer query to an inner one, I can't find a neat way of doing it. (edited to show the full query due to comments below)
select ship_method.* from ship_method, ship_prod_restrict where
ship_method.`zone` = 1 and prod_restrict='0' or
(
prod_restrict='1'
and ship_method.id = ship_prod_restrict.ship_method_id
and (
select count(*) from (
select transactions.item from transactions
LEFT JOIN ship_prod_restrict
on ship_prod_restrict.item_id = transactions.item
and ship_prod_restrict.ship_method_id=XXXXX
where transactions.session='shoppingcartsessionid'
and item_id is null
) as non_permitted_items < 1 )
group by ship_method.id
gives you a list of whether the section matches or not, and works as an inner query but I can't get that ship_method_id in there (at XXXXX).
Is there a simple way of doing this, or am I going about it the wrong way? I can't currently change the primary shipping table, as this is already in place for now, but the other bits can change. I could also do it within PHP but you know, that seems like cheating!
Not sure how the count is important, but this might be a bit lighter - hard to tell without a full table schema dump:
SELECT COUNT(t.item) FROM transactions t
INNER JOIN ship_prod_restrict r
ON r.item_id = t.item
WHERE t.session = 'foo'
AND r.ship_method_id IN (**restricted, id's, here**)