Let's assume we have this very simple table:
|class |student|
---------------
Math Alice
Math Bob
Math Peter
Math Anne
Music Bob
Music Chis
Music Debbie
Music Emily
Music David
Sports Alice
Sports Chris
Sports Emily
.
.
.
Now I want to find out, who I have the most classes in common with.
So basically I want a query that gets as input a list of classes (some subset of all classes)
and returns a list like:
|student |common classes|
Brad 6
Melissa 4
Chris 3
Bob 3
.
.
.
What I'm doing right now is a single query for every class. Merging the results is done on the client side. This is very slow, because I am a very hardworking student and I'm attending around 1000 classes - and so do most of the other students. I'd like to reduce the transactions and do the processing on the server side using stored procedures. I have never worked with sprocs, so I'd be glad if someone could give me some hints on how to do that.
(note: I'm using a MySQL cluster, because it's a very big school with 1 million classes and several million students)
UPDATE
Ok, it's obvious that I'm not a DB expert ;) 4 times the nearly the same answer means it's too easy.
Thank you anyway! I tested the following SQL statement and it's returning what I need, although it is very slow on the cluster (but that will be another question, I guess).
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
But actually I simplified my problem a bit too much, so let's make a bit it harder:
Some classes are more important than others, so they are weighted:
| class | importance |
Music 0.8
Math 0.7
Sports 0.01
English 0.5
...
Additionally, students can be more ore less important.
(In case you're wondering what this is all about... it's an analogy. And it's getting worse. So please just accept that fact. It has to do with normalizing.)
|student | importance |
Bob 3.5
Anne 4.2
Chris 0.3
...
This means a simple COUNT() won't do it anymore.
In order to find out who I have the most in common with, I want to do the following:
map<Student,float> studentRanking;
foreach (Class c in myClasses)
{
float myScoreForClassC = getMyScoreForClass(c);
List students = getStudentsAttendingClass(c);
foreach (Student s in students)
{
float studentScoreForClassC = c.classImportance*s.Importance;
studentRanking[s] += min(studentScoreForClassC, myScoreForClassC);
}
}
I hope it's not getting too confusing.
I should also mention that I myself am not in the database, so I have to tell the SELECT statement / stored procedure, which classes I'm attending.
SELECT
tbl.student,
COUNT(tbl.class) AS common_classes
FROM
tbl
WHERE tbl.class IN (SELECT
sub.class
FROM
tbl AS sub
WHERE
(sub.student = "BEN")) -- substitue "BEN" as appropriate
GROUP BY tbl.student
ORDER BY common_classes DESC;
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
Update re your question update.
Assuming there's a table class_importance and student_importance as you describe above:
SELECT classes.student, SUM(ci.importance*si.importance) AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
The only thing this doesn't have is the LEAST(weighted_importance, myScoreForClassC) because I don't know how you calculate that.
Supposing you have another table myScores:
class | score
Math 10
Sports 0
Music 0.8
...
You can combine it all like this (see the extra LEAST inside the SUM):
SELECT classes.student, SUM(LEAST(m.score,ci.importance*si.importance)) -- min
AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
LEFT JOIN myScores m ON classes.class=m.class -- add in myScores
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
If your myScores didn't have a score for a particular class and you wanted to assign some default, you could use IFNULL(m.score,defaultvalue).
As I understand your question, you can simply run a query like this:
SELECT `student`, COUNT(`class`) AS `commonClasses`
FROM `classes_to_students`
WHERE `class` IN ('Math', 'Music', 'Sport')
GROUP BY `student`
ORDER BY `commonClasses` DESC
Do you need to specify the classes? Or could you just specify the student? Knowing the student would let you get their classes and then get the list of other students who share those classes.
SELECT
otherStudents.Student,
COUNT(*) AS sharedClasses
FROM
class_student_map AS myClasses
INNER JOIN
class_student_map AS otherStudents
ON otherStudents.class = myClasses.class
AND otherStudents.student != myClasses.student
WHERE
myClasses.student = 'Ben'
GROUP BY
otherStudents.Student
EDIT
To follow up your edit, you just need to join on the new table and do your calculation.
Using the SQL example you gave in the edit...
SELECT
classes_table.student,
MIN(class_importance.importance * student_importance.importance) as rank
FROM
classes_table
INNER JOIN
class_important
ON classes_table.class = class_importance.class
INNER JOIN
student_important
ON classes_table.student = student_importance.student
WHERE
classes_table.class in (my_subject_list)
GROUP BY
classes_table.student
ORDER BY
2
Related
Following table: student name, student's lector name(one lector can have some students), student mark after course.
Stud Lector Mark
-----------------
Joe Mr.A 5
Steve Mr.A 4
Bob Mr.B 5
Jim Mr.D 5
Kai Mr.C 4
Mo Mr.A 3
Hue Mr.B 3
Mia Mr.D 5
What query will return lector(s), whose ALL students passed course excellently (got 5 mark). Just in our case - Mr.D should be returned as query result.
I think the easiest approach is via aggregation on the lector:
SELECT Lector
FROM yourTable
GROUP BY Lector
HAVING SUM(CASE WHEN Mark < 5 THEN 1 ELSE 0 END) = 0;
The nice thing about this approach is that it reads closely according to the actual logic you want to implement. That is, we simply check the marks for each lector and make sure that no sub-5 marks occurred.
Demo
You have to join the table to itself; once to get the excellent marks and again to get the potentially less than excellent marks. There are a few ways to achieve this, but the best IMHO is this way:
select distinct a.lector
from mytable a
left join mytable b on a.lector = b.lector
and b.mark != 5
where a.mark = 5
and b.mark is null
The trick here is using an outer join from excellent marks to worse marks and using b.mark is null to ensure there are in fact no worse marks.
I am very new to SQL. I have done some basic "select from where ..." queries but I struggle with my current project.
Lets say this is my source table:
Project Involved
1 Harald
1 Kerstin
1 Peter
1 Christian
1 Lisa
1 Linda
2 Sören
2 Schmidt
2 Jörg
2 Robert
2 Harald
2 Lisa
My question should be fairly simple. The input is the name "Lisa" and "Harald". I want to know "Which projects are Lisa and Harald involved in"
If this is super easy and cannot understand why I ask such easy thing: provide me with a link where this is explained and ill read trough it myself, just not so sure what exactly to look for so I thought this was a faster way to get started :)
You can solve this in many ways, but here is the primary way I would solve it... but start simply for what projects LISA is associated with... This prevents looking at what could be 1000s of projects / people but if Lisa is only associated with 5 (or 2 in this case), why query against all of them..
Select
p1.project
from
project p1
where
p1.involved = 'Lisa'
So this lists projects 1 & 2 (obviously your short sample of data. Now, that we know this works, I would just JOIN again for Harold and the same project as this.
Select
p1.project
from
project p1
join project p2
on p1.project = p2.project
AND p2.involved = 'Harold'
where
p1.involved = 'Lisa'
Ensure you have an index on (involved, project) to help optimize the query
Additionally, others may propose to do a group by and having clause based on both parties you are interested in.... something like
select
p1.project
from
project p1
where
p1.involved in ( 'Lisa', 'Harold')
group by
p1.project
having
count(*) = 2
This basically says to the engine. Give me each project where either Lisa or Harold exist. But, by applying a group by I only want the project to show once so I don't see duplicates. The HAVING clause tells how many you EXPECT to have per project, and since you are asking for 2 possible names, and want both of them, the HAVING COUNT(*) is 2 so you know BOTH are included.
I've named table as person, so this is the example
SELECT
p1.project
FROM
person p1,
person p2
WHERE
p1.name = 'Harald'
AND p2.name = 'Lisa'
AND p1.project = p2.project
To see the projects which both Lisa and Harald are involed in:
SELECT P1.Project FROM
(SELECT Project FROM MyTable WHERE Involved = 'Lisa') P1
INNER JOIN
(SELECT Project FROM MyTable WHERE Involved = 'Harald') P2
WHERE P1.Project = P2.Project
I'm working on a EAV database implemented in MySQL so when I say entity, you can read that as table. Since it's a non-relational database I cannot provide any SQL for tables etc but I'm hoping to get the conceptual answer for a relational database and I will translate to EAV SQL myself.
I'm building a mini stock market system. There is an "asset" entity that can have many "demand" and "offer" entities. The asset entity also may have many "deal" entites. Each deal entity has a "share_price" attribute. Not all assets have demand, offer or deal entities.
I want to return a list of offer and demand entities, grouped by asset i.e. if an asset has 2 offers and 3 demands only 1 result will show. This must be sorted by the highest share_price of deals attached to assets of the demand or offer. Then, the highest share_price for each demand or offer is sorted overall. If an asset has demands or offers but no deals, it will be returned with NULL for share_price.
So say the data is like this:
Asset 1 has 1 offer, 1 demand and 2 deals with share_price 7.50 and 12.00
Asset 2 has 1 offer and 1 deal with share_price 8.00
Asset 3 has 3 offers and 3 demands and no deals
Asset 4 has no offers and no demand and 1 deal with share_price 13.00
I want the results:
Asset share_price
Asset 1 12.00
Asset 2 8.00
Asset 3 null
Note: Asset 4 is not in the result set because it has no offers or demands.
I know this is a complex one with I really dont want to have to go to database more than once or do any array re-ordering in PHP. Any help greatly appreciated.
Some users want to see SQL I have. Here it is but this won't make too much sense as its a specialised EAV Database.
SELECT DISTINCT data.asset_guid, r.guid_two, data.share_price FROM (
select rr.guid_one as asset_guid, max(msv.string) as share_price from market_entities ee
join market_entity_relationships rr on ee.guid = rr.guid_two
JOIN market_metadata as mt on ee.guid = mt.entity_guid
JOIN market_metastrings as msn on mt.name_id = msn.id
JOIN market_metastrings as msv on mt.value_id = msv.id
where subtype = 6 and msn.string = 'share_price' and rr.relationship = 'asset_deal'
group by
rr.guid_one
) data
left outer JOIN market_entities e on e.guid = data.asset_guid
left outer JOIN market_entity_relationships r on r.guid_one = e.guid
WHERE r.relationship = 'trade_share'
GROUP BY data.asset_guid
Without fully understanding your table structure (you should post that), looks like you just need to use a single LEFT JOIN, with GROUP BY and MAX:
SELECT a.assetname, MAX(d.share_price)
FROM asset a
LEFT JOIN deal d ON a.AssetId = d.AssetId
GROUP BY a.assetname
ORDER BY MAX(d.share_price) DESC
I'm using the assumption that your Asset table and your Deal table have a common key, in the above case, AssetId. Not sure why you'd need to join on Demand or Offer, unless those link to your Deal table. Posting your table structure would alleviate that concern...
--EDIT--
In regards to your comments, you want to only show the assets which have either an offer or a demand? If so, this should work:
SELECT a.assetname, MAX(d.share_price)
FROM asset a
LEFT JOIN deal d ON a.AssetId = d.AssetId
LEFT JOIN offer o ON o.AssetId = d.AssetId
LEFT JOIN demand de ON de.AssetId = d.AssetId
WHERE o.AssetId IS NOT NULL OR de.AssetId IS NOT NULL
GROUP BY a.assetname
ORDER BY MAX(d.share_price) DESC
This will only include the asset if it has at least an offer or at least a demand.
assuming you have 3 tables, assets, offers and shares, you can use a query like below.
SELECT asset, MAX(share_Price)
FROM assets
INNER JOIN offers ON assets.id = offers.id //requires there are offers
LEFT OUTER JOIN shares ON assets.id = shares.id // returns results even if no shares
GROUP BY asset
ORDER BY asset
i have a query that returns some users related to a specific user (Bob).
I need to retrieve the nearest records, meaning, i must return users whose ID column is near Bob's ID.
For example:
ID
Tom 5
Mike 8
Bob 10
Jack 12
Brian 13
The query:
SELECT users.* FROM users
INNER JOIN neighboors on neighboors.neighboor_id = users.id #ignore this join, just to exemplify
WHERE neighboors.user_id = 10 # bobs id
ORDER BY something
LIMIT 3 # i want to return only the 3 nearest users (according to the table above:mike, jack and brian)
How can i achieve this?
updated
the logic is, users can plant trees, each tree has an specie. The query should return users that have planted the same tree specie.
And why is important order by proximity of id? the client want this way :) there is no other reason.
Try with this, should do what you need :
SELECT users.* FROM users
INNER JOIN neighboors ON neighboors.neighboor_id = users.id
WHERE neighboors.user_id = 10
ORDER BY ABS(neighboors.user_id - 10)
LIMIT 3
The ABS function in this case it is used to calculate the "distance" from user_id selected value (the value filtered by the WHERE ... ).
To obtain better performance on large tables you have to index(if not yet) the column : neighboors.user_id .
One way to do this is to store the differences as a separate column in an inner query and then query for the smallest differences. A good example for nested queries is at :
http://dev.mysql.com/tech-resources/articles/subqueries_part_1.html
The problem is that nearness works in both a positive and negative direction.
If you had:
Tom 5
Mike 8
Sally 9
Bob 10
Sarah 11
Jack 12
Brian 13
Then do you want to return Mike, Sally and Sarah, or Sally, Sarah and Jack? Do you prefer ascending proximity or descending proximity?
It will help to know exactly what business logic this is trying to implement. Why is it important to select by proximity of the ID? How does the ID relate users to each other?
I'd be interested in helping if you can provide more details.
The actual question is a little more complex than that, so here goes.
I have a website which reviews games. Ratings/reviews are posted for each game, and so I have a MySQL database to handle it all.
Thing is, I'd really like a page that showed what score (out of 10) meant what, and to illustrate it would have the game that was last reviewed as an example. I can always do it without, but this would be cooler.
So the query should return something like this (but running from 10 to 0):
|---------------*----------------*-----------------*-----------------|
* game.gameName | game.gameImage | review.ourScore | review.postedOn *
|---------------*----------------*-----------------*-----------------|
| Top Game | img | 10 | (unix timestamp)|
| NearlyTop Game| img | 9 | (unix timestamp)|
| Great Game | img | 8 | (unix timestamp)|
|---------------*----------------*-----------------*-----------------|
The information is in two tables, game and review. I think you'd use MAX() to find out the last timestamp and corresponding game information, but as far as complex queries go, I'm in way over my head.
Of course this could be done with 10 simple SELECTs but I'm sure there must be a way to do this in one query.
Thanks for any help.
Here is an ugly solution I found:
This query simply gets the IDs and scores of the reviews that you want to look at. I have included it so that you can understand what the trick is, without getting distracted by other stuff:
SELECT * FROM
(SELECT reviewID, ourScore FROM review ORDER BY postedOn DESC) as `r`
GROUP BY ourScore
ORDER BY ourScore DESC;
This exploits MySQL's 'GROUP BY' behavior. When the grouping is done, if the source rows have different values for different columns, then the value of the topmost source row is used. So if you had rows in this order:
reviewId Score
1 3
0 3
2 3
Then after you group by score, the reviewId is 1 because that row was on the top:
reviewId Score
1 3
So we want to put the most recent review on the top before we do the group by. Since ORDERing is always dones after grouping, in a single SELECT statement, I had to make a subquery to accomplish this. Now we just dress up this query a little bit to get all the fields you wanted:
SELECT `r`.*, game.gameName, game.gameImage FROM
(SELECT reviewID, ourScore, postedOn, gameID FROM review ORDER BY **postedOn DESC**) as `r`
JOIN game ON `r`.gameID = game.gameID
GROUP BY ourScore
ORDER BY ourScore DESC;
That should work.
SELECT DISTINCT game.gameName, game.gameImage, review.ourScore FROM game
LEFT JOIN review
ON game.ID = review.gameID
ORDER BY review.postedOn
LIMIT 10
Or something like that, check out how to use the Distinct first, I'm not sure on the syntax, and you may have to tell the ORDER BY DESC or ASC depending on what you want.
Well..
SELECT game.gameName, game.gameImage, review.ourScore
FROM game
LEFT JOIN review ON game.gameID = review.gameID
GROUP BY review.ourScore DESC
LIMIT 10
returns a list of games grouped by each individual score. But this isn't what I want, I want the game that is last posted - this is why the timestamp is important. With that query, MySQL returns the first result it can find.
I think this would work:
select g.gameName, g.gameImage, r.ourScore, r.postedOn
from game g, review r
where g.gameId = r.gameId
and r.postedOn = (select max(sr.postedOn)
from review sr where sr.ourScore = r.ourScore)
group by r.ourScore
order by r.ourScore desc;
Edit: above SQL was corrected after David Grayson's comment. I think this query is pretty easy to understand but probably performs poorly compared with his solution.