Complicated join and MAX() query in MySQL - mysql

I need to make a join across 4 tables whilst picking the maximum (i.e. most recent) timestamp of test to associate with a person. For each student in a class, I want to lookup what their most recent test is, and get its ID and timestamp
SELECT students.ref,
students.fname,
students.sname,
classes.name AS 'group',
tests.id,
max(tests.timestamp)
FROM tests, students, classlinks, classes
WHERE tests.ref=students.ref AND
classlinks.ref=students.ref AND
classlinks.classid=29 AND
tests.grade=2 AND
tests.subject=2
GROUP BY students.ref
ORDER BY students.sname ASC, students.fname ASC
looks like it is perfect: for each student in a class, it gives the timestamp of their most recent test. Unfortunately, the test ID associated with that timestamp is wrong: it is just giving the test ID of a random test.
If I change the 'group by' to be
GROUP BY students.ref, tests.id
then the query matches correct test IDs to correct timestamps, but now there are several entries for each student. Does anyone have any advice so that I can get one row for each student, with correct test ID matched to correct most recent timestamp? Any help appreciated. Thanks.
Table descriptions:
mysql> describe students;
+--------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ref | varchar(50) | NO | UNI | NULL | |
| fname | varchar(22) | NO | | NULL | |
| sname | varchar(22) | NO | | NULL | |
| school | int(11) | NO | | NULL | |
| year | int(11) | NO | | NULL | |
+--------+-------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
mysql> describe classes;
+---------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| subject | int(11) | YES | MUL | NULL | |
| type | int(11) | YES | | 1 | |
| school | int(11) | YES | | NULL | |
| year | int(11) | YES | | NULL | |
| name | varchar(50) | YES | | NULL | |
+---------+-------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
mysql> describe classlinks;
+---------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ref | varchar(50) | YES | MUL | NULL | |
| subject | int(11) | YES | | NULL | |
| school | int(11) | YES | | NULL | |
| classid | int(11) | YES | MUL | NULL | |
| type | int(11) | YES | | 1 | |
+---------+-------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
mysql> describe tests;
+------------+-------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| subject | int(11) | YES | | NULL | |
| ref | varchar(22) | NO | MUL | NULL | |
| test | int(3) | NO | | NULL | |
| grade | varchar(22) | NO | | NULL | |
| timestamp | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| lastupdate | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |

I am assuming that the combination of (ref,timestamp) is unique in tests table. Here is my solution but I don't have any of your sample data to verify it. If it is incorrect than post a sample data so that I can test it.
UPDATE
Here is the update query which is working check the sqlfiddle
SELECT students.ref,
students.fname,
students.sname,
classes.name AS 'group',
tests.id,
T.timestamp
FROM (select ref,max(timestamp) as timestamp from tests group by ref)as T
natural join tests, students, classlinks, classes
WHERE
T.ref=students.ref AND
classlinks.ref=students.ref AND
classlinks.classid=classes.id AND
classlinks.classid=29 AND
tests.grade=2 AND
tests.subject=2
ORDER BY students.sname ASC, students.fname ASC

Using the logic in SQL the query can be written as follows, not sure about mySQL but hope the logic works.
Select ref
,fname
,sname
,ID
,group
,Timestamp
From
(select
S.ref
,S.fname
,S.sname,
,T.id
,classes.name AS 'group'
,T.timestamp
from
tests T,students S, classlinks, classes
Where
T.ref=S.ref and
T.grade=2 AND
classlinks.ref=students.ref AND
classlinks.classid=29 AND
classlinks.classid=classes.id AND
T.subject=2 ) A
inner join
(SELECT tests.ref
,max(tests.timestamp)
FROM
tests
group by
tests.ref
) B
on
A.ref=b.ref and
A.timestamp = b.timestamp

Related

Check that all rows in table A have a specific value in table B, including a GROUP BY

I have two tables - students and evidence - and I'm trying to check for a corresponding value in one column in evidence, grouped by another column in evidence for every entry in students
These are what the tables look like:
> desc students;
+------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| firstname | varchar(191) | NO | | NULL | |
| surname | varchar(191) | NO | | NULL | |
| class_id | int(11) | NO | | NULL | |
| dob | datetime | NO | | NULL | |
| enrollment | datetime | NO | | NULL | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
+------------+---------------------+------+-----+---------+----------------+
> desc evidence;
+---------------+------------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------------------------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| type | enum('written','image','audio','video','link') | YES | | NULL | |
| mime | varchar(191) | YES | | NULL | |
| path | varchar(191) | YES | | NULL | |
| user_id | int(11) | NO | | NULL | |
| date_recorded | datetime | YES | | NULL | |
| statement_id | int(11) | NO | | NULL | |
| progress | int(11) | NO | | NULL | |
| notes | mediumtext | YES | | NULL | |
| student | int(11) | NO | MUL | NULL | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
+---------------+------------------------------------------------+------+-----+---------+----------------+
Entries in the evidence table are associated with a student (evidence.student_id) and an evidence statement (evidence.statement_id) and then given a progress value of 1 (in progress) or 2 (complete).
I want to be able to check that for each statement_id every student has at least one row with the progress entry set to 2. Ideally I'd like to GROUP BY statement_id and only return a value of 2 for progress if every student has at least one row in the evidence table for that statement_id where progress has been set to 2.
The goal is to list all of the statement_ids where everyone in a class has completed the task (and therefore had some evidence added with progress set to complete).
I've tried doing joins similar to this
SELECT * FROM students left join evidence ON students.id = evidence.student GROUP BY evidence.statement_id HAVING progress = 2;
but the problem there is that if one student is marked as completing a statement_id but another student doesn't have any entries for that statement_id then progress will return 2. I'd rather it just returned NULL for the student without any entries.
I'm pretty stumped on this one to any help is greatly appreciated.

Join between two tables based on multiple criteria

I have a table accounts with columns ip_from, ip_to, start_time, end_time, bytes.
There is a second table called all_audit with columns project, ip, time.
I need to join the tables in order to get a resulting table with columns for project, time and bytes.
Things that need to be considered are that time only matches with records which fall between start_time and end_time. ip can match either ip_from or ip_to.
The schema for the two tables are:
accounts
+----------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------------+------+-----+---------+-------+
| ip_from | char(15) | NO | PRI | NULL | |
| ip_to | char(15) | NO | PRI | NULL | |
| bytes | bigint(20) unsigned | NO | | NULL | |
| start_time | datetime | NO | PRI | NULL | |
| end_time | datetime | YES | | NULL | |
+----------------+---------------------+------+-----+---------+-------+
all_audit
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| project | varchar(255) | YES | | NULL | |
| ip | varchar(32) | YES | MUL | NULL | |
| time | timestamp | YES | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
result
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| project | varchar(255) | YES | | NULL | |
| time | timestamp | YES | | NULL | |
| bytes | bigint(20) unsigned| NO | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
I know it will be a join but I just don’t know where to start. Pointers will be very helpful as I am not that competent yet in sql statements but willing to learn.
I suspect you are looking for something like this:
SELECT aa.project
, aa.time
, a.bytes
FROM all_audit aa
JOIN accounts a
on (aa.ip = a.ip_from OR aa.ip = a.ip_to)
AND aa.time BETWEEN a.start_time AND a.end_time

MySQL - how can I do a join and order by the sum of votes?

I have schema like this (just experimenting, so if you have improvement suggestions I am all ears):
mysql> describe contest_entries;
+---------------+----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+----------------+------+-----+---------+----------------+
| entry_id | int(10) | NO | PRI | NULL | auto_increment |
| member_id | int(10) | YES | | NULL | |
| person_name | varchar(10000) | NO | | NULL | |
| date | date | NO | | NULL | |
| platform | varchar(30) | YES | | NULL | |
| business_name | varchar(100) | YES | | NULL | |
| url | varchar(200) | YES | | NULL | |
| business_desc | varchar(3000) | YES | | NULL | |
| guid | varchar(50) | YES | UNI | NULL | |
+---------------+----------------+------+-----+---------+----------------+
9 rows in set (0.00 sec)
mysql> describe contest_votes;
+------------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+---------+------+-----+---------+----------------+
| vote_id | int(10) | NO | PRI | NULL | auto_increment |
| user_id | int(10) | NO | | NULL | |
| contest_entry_id | int(10) | NO | MUL | NULL | |
| vote | int(7) | NO | | NULL | |
+------------------+---------+------+-----+---------+----------------+
And I am trying to pull the data as a leaderboard, ordering the results by the most votes. How would I do that? I am able to do the left-join part, but the sum and the ordering part of the query is confusing me.
Thank you!
SELECT entry_id
FROM contest_entries
LEFT OUTER JOIN contest_votes ON entry_id = contest_entry_id
GROUP BY entry_id
ORDER BY SUM(vote) DESC
select e.entry_id, sum(v.vote) as votes
from contest_entries e
left join contest_votes v on e.entry_id = v.contest_entry_id
group by e.member_id
order by votes desc

MySQL query with JOIN and GROUP BY optimization. Is it possible?

I have two tables: gpnxuser and key_value
mysql> describe gpnxuser;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| email | varchar(255) | YES | | NULL | |
| uuid | varchar(255) | NO | MUL | NULL | |
| partner_id | bigint(20) | NO | MUL | NULL | |
| password | varchar(255) | YES | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
and
mysql> describe key_value;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
| upkey | varchar(255) | NO | MUL | NULL | |
| user_id | bigint(20) | YES | MUL | NULL | |
| security_level | int(11) | NO | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
key_value.user_id is FK that references gpnxuser.id. I also have an index in gpnxuser.partner_id which is a FK that references a table called "partner" (which, I think, does not matter much to this question).
For partner_id = 64, I have 500K rows in gpnxuser which have relationship with approximatelly 6M rows in key_value.
I wanted to have a query that returned all distinct 'key_value.upkey' for user´s belonging to a given partner. I did something like this:
select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
which takes forever to run. The explain for the query looks like:
mysql> explain select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | gpnxuser | ref | PRIMARY,FKB2D9FEBE725C505E | FKB2D9FEBE725C505E | 8 | const | 259640 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | key_value | ref | FK9E0C0F912D11F5A9 | FK9E0C0F912D11F5A9 | 9 | gpnx_finance_db.gpnxuser.id | 14 | Using where |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
My question is: is there a query that can run fast and obtain the result that I want?
what you need to do is utilize EXISTS statement: This will cause only partial table scan until a match found and not more.
select upkey from (select distinct upkey from key_value) upk
where EXISTS
(select 1 from gpnxuser u, key_value kv
where u.id=kv.user_id and partner_id=1 and kv.upkey = upk.upkey)
NB. In the original query, group by is misused: distinct looks better there.
select DISTINCT upkey from gpnxuser join key_value on
gpnxuser.id=key_value.user_id where partner_id=1
I would look into partitioning your key_value table on user_id, if you typically run queries based on this column.
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Writing MySQL query with several table joins or multiple select

I am trying to write a MySQL query that gives me results of Organisation Name, its Post Code, any Events that belong to the Organisation and the Post Code of that Event. I've tried all sorts of of join, join and select combinations to no avail. Is this something that is possible ? (I could have a separate table for Org Address and Event Address but it seems like it should be possible to use just one table)
My table structures:
mysql> DESCRIBE cc_organisations;
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| user_id | int(10) unsigned | NO | MUL | NULL | |
| type | enum('C','O') | YES | | NULL | |
| name | varchar(150) | NO | MUL | NULL | |
| description | text | YES | | NULL | |
+-------------+------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> DESCRIBE cc_events;
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| org_id | int(10) unsigned | NO | MUL | NULL | |
| name | varchar(150) | NO | MUL | NULL | |
| start_date | int(11) | NO | MUL | NULL | |
| end_date | int(11) | YES | MUL | NULL | |
| start_time | int(11) | NO | | NULL | |
| end_time | int(11) | NO | | NULL | |
| description | text | YES | | NULL | |
+-------------+------------------+------+-----+---------+----------------+
8 rows in set (0.00 sec)
mysql> DESCRIBE cc_addresses;
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| org_id | int(10) unsigned | YES | MUL | NULL | |
| event_id | int(10) unsigned | YES | MUL | NULL | |
| post_code | varchar(7) | NO | MUL | NULL | |
| address_1 | varchar(100) | NO | | NULL | |
| address_2 | varchar(100) | YES | | NULL | |
| town | varchar(50) | NO | | NULL | |
| county | varchar(50) | NO | | NULL | |
| email | varchar(150) | NO | | NULL | |
| phone | int(11) | YES | | NULL | |
| mobile | int(11) | YES | | NULL | |
| website_uri | varchar(150) | YES | | NULL | |
| facebook_uri | varchar(250) | YES | | NULL | |
| twitter_uri | varchar(250) | YES | | NULL | |
+--------------+------------------+------+-----+---------+----------------+
14 rows in set (0.00 sec)
select o.Name, oAddress.PostCode, e.Name, eAddress.PostCode
from cc_organisations o
inner join cc_addresses oAddress on oAddress.org_id = o.id
left outer join cc_events e on e.org_id=o.id
inner join cc_addresses eAddress on eAddress.event_id = e.id
SELECT cco.name as OrgName, cca.post_code as OrgPostCode, cce.id,
cce.org_id, cce.name, cce.start_date, cce.end_date, cce.start_time,
cce.end_time, cce.description
FROM cc_events cce, cc_addresses cca, cc_organisations cco
WHERE cca.event_id = cce.id AND cco.id=cce.org_id
ORDER BY cce.start_date
LIMIT 50;
You can change your sort and limit, I just added those in because I don't know how big your DB is... You may even be able to get away with:
SELECT cco.name as OrgName, cca.post_code as OrgPostCode, cce.*
FROM cc_events cce, cc_addresses cca, cc_organisations cco
WHERE cca.event_id = cce.id AND cco.id=cce.org_id
ORDER BY cce.start_date LIMIT 50;
But im not 100% sure if the 2nd query will bum out or not.
Your address table has the post codes in it; but it also has an organization id and event id foreign keys. We only need to check the event_id from the address table because any event will belong to an organization.
Address's Event matched Event ID
Event's Organization matched Organization ID