SQL query for extracting data from related tables - mysql

Although I found several similar problems but I couldn't apply them to resolve my issue.
Problem statement:- I have three MySql tables STUDENT,SUBJECT and STUDIES as described below:-
STUDENT-
rollno Name
X1 Alpha
X2 Beta
Y1 Zeta
X3 Omega
here the alphabet in each rollno corresponds to student in same batch/class. E.g students X1,X2 and X3 belong to same class whereas Y1 is that of a different one.
SUBJECT-
Code Title Credits
abc subject1 2
bcd subject2 4
gfp subject3 3
STUDIES-
rollno code
X1 abc
X1 bcd
X1 gfp
X2 bcd
X2 abc
Y1 gfp
X3 abc
I need help in framing mysql queries for:
a) displaying the credits undertaken by each student.
Like
Rollno Name Credits
X1 Alpha 9
X2 Beta 6
Y1 Zeta 3
X3 Omega 2
The best that I have come up is with this
select rollno,
(select sum(credits) from subject
where studies.code=subject.code)
from studies;
But what I get are the rollno and credits displayed individually for every subject a student studies.(I haven't been able to scale my query to get the name of student from the third table yet)
b) Finding out the subjects which have been taken by all the students of a class/batch.
In the given scenario the answer would be
Batch Subject Code Title
X abc subject1
I can extract out distinct batches by string processing but dont know how to proceed further from that point.
c) Being a MySql newbie, could you also point me out to some good web resource with practice problems for learning advanced queries like these. I have gone through a few of them on the last couple days but have not found them sufficient in developing concepts required to get through my queries.
EDIT: Sharing the CREATE queries for the tables:-
For SUBJECT:
CREATE TABLE IF NOT EXISTS subject (
code varchar(8) UNIQUE, title varchar(75) NOT NULL,
credits int, check (credits <5),PRIMARY KEY (code));
For STUDENT:
CREATE TABLE IF NOT EXISTS student (
rollno varchar(9) UNIQUE,name varchar(50));
For Studies:
CREATE TABLE IF NOT EXISTS studies (
rollno varchar(9) NOT NULL,code varchar(50) NOT NULL,
FOREIGN KEY(rollno) REFERENCES student(rollno));

Well, for your first query, you want to join across all three tables, looking for rollno, name, and credits. So at a first pass, you need to join like this:
SELECT s.rollno, s.name, sb.credits
FROM student s
INNER JOIN studies st
ON st.rollno = s.rollno
INNER JOIN subject sb
ON sb.code = st.code
This is part of the solution - it gives you the information you want, and now you just have to use an aggregate function to tally up the credits, using SUM and GROUP BY:
SELECT s.rollno, s.name, SUM(sb.credits) AS credits
FROM student s
INNER JOIN studies st
ON st.rollno = s.rollno
INNER JOIN subject sb
ON sb.code = st.code
GROUP BY s.rollno, s.name
The second part is tougher, and there are likely other (and better) ways to do this, but here's my approach:
SELECT q1.batch, q1.code, sb.title
FROM
(SELECT st.code, SUBSTR(st.rollno,1,1) batch,
COUNT(SUBSTR(st.rollno,1,1)) numb
FROM studies st
GROUP BY st.code, SUBSTR(st.rollno,1,1)) q1
INNER JOIN
(SELECT SUBSTR(s.rollno,1,1) batch,
COUNT(SUBSTR(s.rollno,1,1)) numb
FROM student s
GROUP BY SUBSTR(s.rollno,1,1)) q2
ON q1.batch = q2.batch AND q1.numb = q2.numb
INNER JOIN subject sb
ON q1.code = sb.code
Some explanation: the first sub-query (q1) counts the number of students from each batch in each subject. The output from that would be:
abc x 3
bcd x 2
gfp x 1
gfp y 1
The second subquery (q2) counts the number of students in each batch, with output:
x 3
y 1
By JOINing these two subqueries, we select only those subjects where the batch and the batch count are the same:
abc x 3
gfp y 1
Finally, JOIN on the subject table to get the subject title included (and set the starting SELECT statement to only select batch, code, and title), giving output:
x abc subj1
y gfp subj3
Note that the last row here (y - subj3) is valid, since every member of the 'y' batch (of which there is only one) is enrolled in course gfp.
As for recommended sites and resources - that's a bit outside the scope of SO. You can find lots of good online resources by Googling 'SQL tutorial' or 'SQL online courses'. Lots of good free stuff out there.

Related

Variance Dissimilarity in SQL with three table

I have three table:
rtgitems
rtgusers
POI
(the tables aren't complete for reasons of space).
I want to resolve this form:
where r_i,x is the value of column "voto" for the user "rater" i for the "item" x and avg_x is the average (the division from "totalrate" and "nrrates" -> totalrate/nrrates). |G| is given and isn't a trouble.
I want this table result:
Nome (from POI) | VD_x(G)
Tour Eiffel | 23
Arc | 18
...
I tried this for the firsts two table for to take the value for calculate the average (the third table I don't know how matching with the others):
SELECT totalrate, nrrates, voto FROM rtgitems INNER JOIN rtgusers ON rtgitems.item=rtgusers.item GROUP BY rater
but don't work.
Thanks for the help.
Just focus on the rtgusers table. If you want to bring in the names, that's fine. You can do it after the variance calculation (you seem to know what a join is). The first table seems superfluous to the problem.
You can calculate the variance by pre-calculating the summary values and then applying the formula. I think this is the basic logic that you want:
SELECT ru.item, (1.0 / max(rus.n)) * sum(power(ru.voto - avg_voto), 2)
FROM rtgusers ru join
(select ru.item, avg(voto * 1.0) as avg_voto, count(*) as n
from rtgusers ru
group by ru.item
) rus
on ru.item = rus.item
group by ru.item;

Database Design for Time Table Generation

I am doing a project using J2EE(servlet) for Time Table Generation of College.
There are Six Slots(6 Hours) in a Day
4 x 1 HR Lectures
1 x 2 HR Lab
There Are three batches ( 3IT, 5IT, 7IT)
2 Classroom
1 LAB
Each slot in the time table will have
(Subject,Faculty)
For Lab I will duplicate the slot.
The Tables
Subject(SubjectID INT, SubjectName VARCHAR);
Faculty(FacultyID INT,FacultyName VARCHAR,NumOfSub INT,Subjects XYZ);
Here I am not able to decide the DATATYPE for subject. What should I do ? Since a faculty can teach multiple subjects ? Also how to link with Subject Table ?
P.S. Using MySQL Database
You don't want to actually store either NumOfSub (number of subjects) OR Subjects in Faculty. Storing subjects that way is a violation of First Normal Form, and dealing with it would cause major headaches.
Instead, what you want is another table:
FacultySubject
----------------
FacultyId -- fk for Faculty.FacultyId
SubjectId -- fk for Subject.SubjectId
From this, you can easily get the count of subjects, or a set of rows listing the subjects (I believe MySQL also has functions to return a list of values, but I have no experience with those):
This query will retrieve the count of Subjects taught by a particular teacher:
SELECT Faculty.FacultyId, COUNT(*)
FROM Faculty
JOIN FacultySubject
ON FacultySubject.FacultyId = FacultyId.FacultyId
WHERE Faculty.FacultyName = 'Really Cool Professor'
GROUP BY Faculty.FacultyId
... and this query will get all the subjects (named) that they teach:
SELECT Subject.SubjectId, Subject.SubjectName
FROM Faculty
JOIN FacultySubject
ON FacultySubject.FacultyId = FacultyId.FacultyId
JOIN Subject
ON Subject.SubjectId = FacultySubject.SubjectId
WHERE Faculty.FacultyName = 'Really Cool Professor'
(note that this last returns the subjects as a set of rows ie:
SubjectId SubjectName
=========================
1 Tree Houses
2 Annoying Younger Sisters
3 Swimming Holes
4 Fishing
)

MySQL multiple intersection with self performance

For simplicity, let's say we have a table with two columns: uid (user id) and fruit, describing what kinds of fruit a user likes.
E.g.:
uid | fruit
----|------------
1 | Strawberry
1 | Orange
2 | Strawberry
2 | Banana
3 | Watermelon
and so forth.
If I want to find what kinds of fruit are common in N particular users (i.e. the intersection N times of the table with itself), the first option is to use an INNER JOIN.
SELECT DISTINCT fruit FROM Fruits f1
INNER JOIN Fruits f2 USING (fruit)
INNER JOIN Fruits f3 USING (fruit)
...
INNER JOIN Fruits fN USING (fruit)
WHERE f1.uid = 1 AND f2.uid = 2 ... AND fN.uid = M
But this kinds of looks silly to me. What if N = 10? or even 20? Is it sensible to do 20 joins? Is there some other join operation I'm missing?
Before learning the "magic" of joins, I used another method, which would apply in my current case as follows:
SELECT DISTINCT fruit FROM Fruits
WHERE uid IN (1, 2, ..., M)
GROUP BY fruit
HAVING COUNT (*) = N
It seems much more compact, but I remember somebody telling me to avoid using GROUP BY because it is slower than an INNER JOIN.
So, I guess my question really is, is there maybe a third method for doing the above? If yes/no, which one is the most efficient?
-- EDIT --
So, it seems a question has been asked before, bearing a resemblance to mine. The two answers provided, are actually the two methods I'm using.
But the question remains. Which one is really more efficient? Is there, maybe, a third one?

Better solution to MySQL nested select in's

Currently I have two MySQL tables
Properties
id name
1 Grove house
2 howard house
3 sunny side
Advanced options
prop_id name
1 Wifi
1 Enclosed garden
1 Swimming pool
2 Swimming pool
As you can see table two contains specific features about the properties
When I only have max 3 options the query below worked just fine. (maybe a little slow but ok) now things have expanded somewhat and i have a max of 12 options that it is possible to search by and its causing me some major speed issues. The query below is for 8 options and as you can see its very messy. Is there a better way of doing what I'm trying to achieve?
SELECT * FROM properties WHERE id in (
select prop_id from advanced_options where name = 'Within 2 miles of sea or river' and prop_id in (
select prop_id from advanced_options where name = 'WiFi' and prop_id in (
select prop_id from advanced_options where name = 'Walking distance to pub' and prop_id in (
select prop_id from advanced_options where name = 'Swimming pool' and prop_id in (
select prop_id from advanced_options where name = 'Sea or River views' and prop_id in (
select prop_id from advanced_options where name = 'Pet friendly' and prop_id in (
select prop_id from advanced_options where name = 'Open fire, wood burning stove or a real flame fire-place' and prop_id in (
select prop_id from advanced_options where name='Off road parking')
)
)
)
)
)
)
)
Like Mike Brant suggest I would consider altering your datamodel to a limit to set and creating a column for each of these in your properties table. But some times the boss comes: "We also need 'flatscreen tv'" and then you have to go back to the DB and update the scheme and your data access layer.
A way to move this logic somehow out if the database it to use bitwise comparison. This allows you to make simple queries, but requires a bit of preprocessing before you make your query.
Judge for yourself.
I've put everything in a test suite for you here sqlfiddle
The basic idea is that each property in your table has an id that is the power of 2. Like this:
INSERT INTO `advanced_options` (id, name)
VALUES
(1, 'Wifi'),
(2, 'Enclosing Garden'),
(8, 'Swimming Pool'),
(16, 'Grill');
You can then store a single value in your properties table buy adding up the options:
Wifi + Swimming Pool = 1 + 8 = 9
If you want to find all properties with wifi and a swimming pool you then do like this:
SELECT * FROM `properties` WHERE `advanced_options` & 9 = 9
If you just wanted swimming pool this would be it:
SELECT * FROM `properties` WHERE `advanced_options` & 8 = 8
Go try out the fiddle
You really need to consider a schema change to your table. It seems that advanced options in and of themselves don't have any properties, so instead of an advanced_options table that is trying to be a many-to-many JOIN table, why not just have a property_options table with a field for each "options". Something like this
|prop_id | wifi | swimming_pool | etc..
-----------------------------------
| 1 | 0 | 1 |
| 2 | 1 | 0 |
Here each field is a simple TINYINT field with 0/1 boolean representation.
To where you could query like:
SELECT * FROM properties AS p
INNER JOIN property_options AS po ON p.id = po.prop.id
WHERE wifi = 1 AND swimming_pool = 1 ....
Here you would just build your WHERE clause based on which options you are querying for.
There actually wouldn't be any need to even have a separate table, as these records would have a one-to-one relationship with the properties, so you could normalize these fields onto you properties table if you like.
Join back to the advanced_options table multiple times. Here's a sample with 2 (lather, rinse, repeat).
select o1.prop_id
from advanced_options o1
inner join advanced_options o2 on o1.prop_id = o2.prop_id and o2.name = "WiFi"
where o1.name = 'Within 2 miles of sea or river'
Could you do something like this?:
select p.*,count(a.prop_id) as cnt
from properties p
inner join advanced_options a on a.prop_id = p.id
where a.name in ('Enclosed garden','Swimming pool')
group by p.name
having cnt = 2
That query would get all the properties that have ALL of those advanced_options...
I would also suggest normalizing your tables by creating a separate table Called Advanced_option (id,name) where you store your unique Option values and then create a junction entity table like Property_x_AdvancedOption (fk_PropertyID, FK_AdvancedOptionID) that way you use less resources and avoid data integrity issues.

mysql update with a self referencing query

I have a table of surveys which contains (amongst others) the following columns
survey_id - unique id
user_id - the id of the person the survey relates to
created - datetime
ip_address - of the submission
ip_count - the number of duplicates
Due to a large record set, its impractical to run this query on the fly, so trying to create an update statement which will periodically store a "cached" result in ip_count.
The purpose of the ip_count is to show the number of duplicate ip_address survey submissions have been recieved for the same user_id with a 12 month period (+/- 6months of created date).
Using the following dataset, this is the expected result.
survey_id user_id created ip_address ip_count #counted duplicates survey_id
1 1 01-Jan-12 123.132.123 1 # 2
2 1 01-Apr-12 123.132.123 2 # 1, 3
3 2 01-Jul-12 123.132.123 0 #
4 1 01-Aug-12 123.132.123 3 # 2, 6
6 1 01-Dec-12 123.132.123 1 # 4
This is the closest solution I have come up with so far but this query is failing to take into account the date restriction and struggling to come up with an alternative method.
UPDATE surveys
JOIN(
SELECT ip_address, created, user_id, COUNT(*) AS total
FROM surveys
WHERE surveys.state IN (1, 3) # survey is marked as completed and confirmed
GROUP BY ip_address, user_id
) AS ipCount
ON (
ipCount.ip_address = surveys.ip_address
AND ipCount.user_id = surveys.user_id
AND ipCount.created BETWEEN (surveys.created - INTERVAL 6 MONTH) AND (surveys.created + INTERVAL 6 MONTH)
)
SET surveys.ip_count = ipCount.total - 1 # minus 1 as this query will match on its own id.
WHERE surveys.ip_address IS NOT NULL # ignore surveys where we have no ip_address
Thank you for you help in advance :)
A few (very) minor tweaks to what is shown above. Thank you again!
UPDATE surveys AS s
INNER JOIN (
SELECT x, count(*) c
FROM (
SELECT s1.id AS x, s2.id AS y
FROM surveys AS s1, surveys AS s2
WHERE s1.state IN (1, 3) # completed and verified
AND s1.id != s2.id # dont self join
AND s1.ip_address != "" AND s1.ip_address IS NOT NULL # not interested in blank entries
AND s1.ip_address = s2.ip_address
AND (s2.created BETWEEN (s1.created - INTERVAL 6 MONTH) AND (s1.created + INTERVAL 6 MONTH))
AND s1.user_id = s2.user_id # where completed for the same user
) AS ipCount
GROUP BY x
) n on s.id = n.x
SET s.ip_count = n.c
I don't have your table with me, so its hard for me to form correct sql that definitely works, but I can take a shot at this, and hopefully be able to help you..
First I would need to take the cartesian product of surveys against itself and filter out the rows I don't want
select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)
The output of this should contain every pair of surveys that match (according to your rules) TWICE (once for each id in the 1st position and once for it to be in the 2nd position)
Then we can do a GROUP BY on the output of this to get a table that basically gives me the correct ip_count for each survey_id
(select x, count(*) c from (select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)) group by x)
So now we have a table mapping each survey_id to its correct ip_count. To update the original table, we need to join that against this and copy the values over
So that should look something like
UPDATE surveys SET s.ip_count = n.c from surveys s inner join (ABOVE QUERY) n on s.survey_id = n.x
There is some pseudo code in there, but I think the general idea should work
I have never had to update a table based on the output of another query myself before.. Tried to guess the right syntax for doing this from this question - How do I UPDATE from a SELECT in SQL Server?
Also if I needed to do something like this for my own work, I wouldn't attempt to do it in a single query.. This would be a pain to maintain and might have memory/performance issues. It would be best have a script traverse the table row by row, update on a single row in a transaction before moving on to the next row. Much slower, but simpler to understand and possibly lighter on your database.