MYSQL - output extra column based on a certain condition - mysql

At first, I want to apologize for providing such a weak title; I couldn't describe it in a better way.
Consider the following: We have three tables, one for users, one for records and one for ratings. The tables are quite self-explanatory but the schema for database is as following:
+---------------------+
| Tables_in_relations |
+---------------------+
| records |
| ratings |
| users |
+---------------------+
The schema for records table is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| title | varchar(256) | NO | | NULL | |
| year | int(4) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
The schema for users table is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(256) | NO | | NULL | |
| name | varchar(256) | NO | | NULL | |
| password | varchar(256) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
ratings table is, obvoiusly, where the ratings are stored among with the record_id and user_id and works as a relation table.
It's schema is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| record_id| smallint(5) unsigned | NO | MUL | NULL | |
| user_id | smallint(5) unsigned | NO | MUL | NULL | |
| rating | int(1) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
Now, In my application, I have a search function that fetches records based on a certain keyword. The output should also include the average rating of a certain record and a total amount of ratings per record. This can be accomplished by following query:
SELECT re.id, re.title, re.year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id;
which will give me the following output:
+----+------------------------+------+----------+-------------------+
| id | title | year | avg_rate | total_times_rated |
+----+------------------------+------+----------+-------------------+
| 1 | Test Record 1 | 2008 | 3 | 4 |
| 2 | Test Record 2 | 2012 | 2 | 4 |
| 3 | Test Record 3 | 2003 | 3 | 4 |
| 4 | Test Record 4 | 2012 | 3 | 3 |
| 5 | Test Record 5 | 2003 | 2 | 3 |
| 6 | Test Record 6 | 2006 | 2 | 3 |
+----+------------------------+------+----------+-------------------+
Question:
Now, here comes the tricky part, at least for me. Within my app, you can search records whether signed in or not and if signed in, I'd also like to include the user's own rating value in the above query.
I know that I can run a conditional to check whether user is signed in or not by reading the session value and execute a corresponding query based on that. I just don't know how to include that individual rating value of a certain user to the above query.

You can add user's rating in the result by adding a SELECT query in columns:
SELECT re.id, re.title, re.year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated,
(SELECT rating FROM ratings WHERE user_id = ? AND record_id = re.id) as user_rating
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id;
We can get the user_id from session and pass it to this query in order to generate user_rating column in the result.
Assuming user can rate a record multiple times, I have used SUM. If not, we can remove it from the query.
Update
If you don't want GROUP BY to consider that value then you can wrap the existing query into another query and add a column to it, e.g.:
SELECT a.id, a.title, a.year, a.avg_rate, a.total_times_rated,
(SELECT rating FROM ratings WHERE user_id = ? AND record_id = a.id) as user_rating
FROM (SELECT re.id as id, re.title as title, re.year as year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id) a;

Related

Debugging a rather difficult/complex MySQL query

I'm having troubles in making a rather difficult MySQL query work. I've been trying, but creating complex queries has never been my strong side.
This query includes 4 tables, which I'll describe of course.
First, we have song table, which I need to select the needed info from.
+--------------+-----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-----------+------+-----+---------+----------------+
| ID | int(6) | NO | PRI | - | auto_increment |
| Anime | char(100) | NO | | - | |
| Title | char(100) | NO | | - | |
| Type | char(20) | NO | | - | |
| Singer | char(50) | NO | | - | |
| Youtube | char(30) | NO | | - | |
| Score | double | NO | | 0 | |
| Ratings | int(8) | NO | | 0 | |
| Favourites | int(7) | NO | | 0 | |
| comments | int(11) | NO | | 0 | |
| release_year | int(4) | NO | | 2019 | |
| season | char(10) | NO | | Spring | |
+--------------+-----------+------+-----+---------+----------------+
Then we have song_ratings, which basically represents the lists of each user, since once you rate a song, it appears on your list.
+------------+----------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+-------------------+----------------+
| ID | int(11) | NO | PRI | 0 | auto_increment |
| UserID | int(11) | NO | MUL | 0 | |
| SongID | int(11) | NO | MUL | 0 | |
| Rating | double | NO | | 0 | |
| RatedAt | datetime | NO | | CURRENT_TIMESTAMP | |
| Favourited | int(1) | NO | | 0 | |
+------------+----------+------+-----+-------------------+----------------+
Users have the option to create custom lists(playlists), and this is the table which they are stored in. This is table lists.
+------------+-----------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-----------+------+-----+-------------------+----------------+
| ID | int(11) | NO | PRI | 0 | auto_increment |
| userID | int(11) | NO | MUL | 0 | |
| name | char(50) | NO | | - | |
| likes | int(11) | NO | | 0 | |
| favourites | int(11) | NO | | 0 | |
| created_at | datetime | NO | | CURRENT_TIMESTAMP | |
| cover | char(100) | NO | | - | |
| locked | int(1) | NO | | 0 | |
| private | int(1) | NO | | 0 | |
+------------+-----------+------+-----+-------------------+----------------+
And finally, the table which contains all the songs that have been added to any playlists, called list_elements.
+--------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | 0 | auto_increment |
| listID | int(11) | NO | MUL | 0 | |
| songID | int(11) | NO | MUL | 0 | |
+--------+---------+------+-----+---------+----------------+
What my query needs to do is list all the songs that are on the list of a user, basically these are the record in song_ratings where the userID = ?(obviously the ID of the user), but are not on a specific playlist(has no record in list_elements) where the ID/listID = ?(the ID of that playlist).
This is the query I've been using so far, but after a while I had realized this doesn't actually work the way I wanted to.
SELECT DISTINCT
COUNT(*)
FROM
song
INNER JOIN song_ratings ON song_ratings.songID = song.ID
LEFT JOIN list_elements ON song_ratings.songID = list_elements.songID
WHERE
song_ratings.userID = 34 AND list_elements.songID IS NULL
I have also tried something like this, and several variants of it
SELECT DISTINCT
COUNT(*)
FROM
song
INNER JOIN song_ratings ON song_ratings.songID = song.ID
INNER JOIN lists ON lists.userID = song_ratings.userID
LEFT JOIN list_elements ON song_ratings.songID = list_elements.songID
WHERE
song_ratings.userID = 34 AND lists.ID = 1
To make it easier, here's a SQL Fiddle, with all the necessary tables and records in them.
What you need to know. When you check for the playlist with the ID of 1, the query needs to return 23(basically all matches).
When you do the same with the ID 4, it need to return 21, if the query works correctly, because the playlist 1 is empty, thus all of the songs in the table song_ratings can be added to it(at least the ones that exist in song table, which is only half of the overall records now).
But playlist 4 already has 2 songs added to it, so only 21 are left available for adding.
Or in case the number are wrong, playlist 1 needs to return all matches. playlist 4 need to return all matches-2(because 2 songs are already added).
The userID needs to remain the same(34), and there are no records with different ID, so don't change it.
You could try subquery with NOT IN clause
SELECT DISTINCT
COUNT(*)
FROM
song
INNER JOIN song_ratings ON song_ratings.songID = song.ID
WHERE
song_ratings.userID = 34 AND song.ID not in (select songID from list_elements group by songID)
Your original query was almost correct. When you use a column from a joined table with a LEFT JOIN in the WHERE-clause, it causes the LEFT JOIN to turn into an INNER JOIN.
You can put the condition into the ON-clause:
SELECT COUNT(*)
FROM song
INNER JOIN song_ratings ON song_ratings.songID = song.ID
LEFT JOIN list_elements ON song_ratings.songID = list_elements.songID
AND list_elements.songID IS NULL
WHERE song_ratings.userID = 34
Using JOINs in MySQL is faster than using subqueries, this would probably be faster as well.
Btw, you do not need DISTINCT when you only have COUNT(*). The COUNT(*) returns only one row so there is no need to take distinct values from one value.

Getting a SQL query to print 0 for null count results across 3 tables

I'm trying to get a SQL query to give me the results of a count but I need the result to include rows where the count is 0. What I found for solutions to this was to use IFNULL(COUNT(*), 0) in place of COUNT(*) however that had no effect on the result. I also tried using a LEFT JOIN but SQL gave me a syntax error if I tried to put in those. Here's my table setup
User
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| UserID | mediumint(9) | NO | PRI | NULL | auto_increment |
| firstName | varchar(15) | NO | | NULL | |
| lastName | varchar(15) | NO | | NULL | |
| Protocol | varchar(10) | NO | | NULL | |
| Endpoint | varchar(50) | NO | | NULL | |
| UsergroupID | mediumint(9) | NO | MUL | NULL | |
+-------------+--------------+------+-----+---------+----------------+
Subscription
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| SubscriptionID | mediumint(9) | NO | PRI | NULL | auto_increment |
| TopicID | mediumint(9) | NO | MUL | NULL | |
| UserID | mediumint(9) | NO | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
Topic
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| TopicID | mediumint(9) | NO | PRI | NULL | auto_increment |
| Name | varchar(50) | NO | | NULL | |
| FBName | varchar(30) | YES | | NULL | |
| FBToken | varchar(255) | YES | | NULL | |
| TWName | varchar(10) | YES | | NULL | |
| TWToken | varchar(50) | YES | | NULL | |
| TWSecret | varchar(50) | YES | | NULL | |
+----------+--------------+------+-----+---------+----------------+
My SQL query to try and get the COUNT is :
SELECT Topic.TopicID as ID, Topic.Name AS TopicName, COUNT(*) AS numSubscriptions
FROM User, Topic, Subscription
WHERE Subscription.UserID = User.UserID
AND Subscription.TopicID = Topic.TopicID
GROUP BY Topic.TopicID;
I've tried replacing COUNT(*) with IFNULL(COUNT(*), 0) and I've tried to replace User,Topic,Subscription with User JOIN Subscription JOIN Topic and I also tried User LEFT JOIN Subscription LEFT JOIN Topic but that got a SQL error.
The output I'm getting is:
+----+-----------+------------------+
| ID | TopicName | numSubscriptions |
+----+-----------+------------------+
| 2 | test | 2 |
| 3 | test2 | 1 |
+----+-----------+------------------+
I need to be getting
+----+-----------+------------------+
| ID | TopicName | numSubscriptions |
+----+-----------+------------------+
| 2 | test | 2 |
| 3 | test2 | 1 |
| 4 | test3 | 0 |
+----+-----------+------------------+
By default, outer joins are left to right. So, the trick is to start with Topic:
SELECT Topic.TopicID as ID, Topic.Name AS TopicName,
COUNT(User.UserID) AS numSubscriptions
FROM Topic
LEFT JOIN Subscription
ON Subscription.TopicID = Topic.TopicID
JOIN User
ON User.UserID = Subscription.UserID
GROUP BY Topic.TopicID
This allows for multiple subscriptions per user and requires that the user record exists to be considered in the count.
COUNT(NULL) evaluates to 0, so any topic records without a corresponding subscription and user record will show as 0.
If you're not concerned whether the user record exists, you could simplify it to the following:
SELECT Topic.TopicID as ID, Topic.Name AS TopicName,
COUNT(Subscription.TopicID) AS numSubscriptions
FROM Topic
LEFT JOIN Subscription
ON Subscription.TopicID = Topic.TopicID
GROUP BY Topic.TopicID
The example below should do what you're after. The column in the COUNT() can be any column of the subscription table, but using its ID is a good practice.
Using the left join ensures that all entries of the user table will show up in the results, even if there are no matching subscriptions.
SELECT User.firstName,
User.lastName,
Topic.Name AS TopicName,
COUNT(Subscription.SubscriptionId) AS numSubscriptions
FROM USER
LEFT OUTER JOIN Subscription ON Subscription.UserID=USER.UserID
LEFT OUTER JOIN Topic ON Subscription.TopicID=Topic.TopicID
GROUP BY User.firstName, User.lastName, Topic.Name;

MySQL merge results into table from count of 2 other tables, matching ids

I've got 3 tables: model, model_views, and model_views2. In an effort to have one column per row to hold aggregated views, I've done a migration to make the model look something like this, with a new column for the views:
+---------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | | NULL | |
| [...] | | | | | |
| views | int(20) | YES | | 0 | |
+---------------+---------------+------+-----+---------+----------------+
This is what the columns for model_views and model_views2 look like:
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | smallint(5) | NO | MUL | NULL | |
| model_id | smallint(5) | NO | MUL | NULL | |
| time | int(10) unsigned | NO | | NULL | |
| ip_address | varchar(16) | NO | MUL | NULL | |
+------------+------------------+------+-----+---------+----------------+
model_views and model_views2 are gargantuan, both totalling in the tens of millions of rows each. Each row is representative of one view, and this is a terrible mess for performance. So far, I've got this MySQL command to fetch a count of all the rows representing single views in both of these tables, sorted by model_id added up:
SELECT model_id, SUM(c) FROM (
SELECT model_views.model_id, COUNT(*) AS c FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c FROM model_views2
GROUP BY model_views2.model_id)
AS foo GROUP BY model_id
So that I get a nice big table with the following:
+----------+--------+
| model_id | SUM(c) |
+----------+--------+
| 1 | 1451 |
| [...] | |
+----------+--------+
What would be the safest route for pulling off commands from here on in to merge the values of SUM(c) into the column model.views, matched by the model.id to model_ids that I get out of the above SQL query? I want to only fill the rows for models that still exist - There is probably model_views referring to rows in the model table which have been deleted.
You can just use UPDATE with a JOIN on your subquery:
UPDATE model
JOIN (
SELECT model_views.model_id, COUNT(*) AS c
FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c
FROM model_views2
GROUP BY model_views2.model_id) toupdate ON model.id = toupdate.model_id
SET model.views = toupdate.c

NATURAL JOIN vs WHERE IN Clauses

Recently, I dealt with retrieving a large amount of data which consists of thousands of records from a MySQL database. Since it was my first time to handle such large data set, I didn't think about the efficiency of the SQL statement. And the problem comes.
Here are the tables of the database
(It is just a simple database model of a curriculum system):
course:
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| course_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(20) | NO | | NULL | |
| lecturer | varchar(20) | NO | | NULL | |
| credit | float | NO | | NULL | |
| week_from | tinyint(3) unsigned | NO | | NULL | |
| week_to | tinyint(3) unsigned | NO | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
select:
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| select_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| card_no | int(10) unsigned | NO | | NULL | |
| course_id | int(10) unsigned | NO | | NULL | |
| term | varchar(7) | NO | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
When I want to retrieve all the courses that a student has selected (with his card number),
the SQL statement is
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` WHERE course_id IN (
SELECT course_id FROM `select` WHERE card_no=<student's card number>
);
But, it was extremely slow and it didn't return anything for a long time.
So I changed WHERE IN clauses into NATURAL JOIN. Here is the SQL,
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `select` NATURAL JOIN `course`
WHERE card_no=<student's card number>;
It returns immediately and works fine!
So my question is:
What's the difference between NATURAL JOIN and WHERE IN Clauses?
What makes them perform differently?
(Is that maybe because I doesn't set up any INDEX?)
When shall we use NATURAL JOIN or WHERE IN?
Theoretically the two queries are equivalent. I think it's just poor implementation of the MySQL query optimizer that causes JOIN to be more efficient than WHERE IN. So I always use JOIN.
Have you looked at the output of EXPLAIN for the two queries? Here's what I got for a WHERE IN:
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
| 1 | PRIMARY | t_users | ALL | NULL | NULL | NULL | NULL | 2458304 | Using where |
| 2 | DEPENDENT SUBQUERY | t_user_attributes | index_subquery | PRIMARY,attribute | PRIMARY | 13 | func,const | 7 | Using index; Using where |
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
It's apparently performing the subquery, then going through every row in the main table testing whether it's in -- it doesn't use the index. For the JOIN I get:
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| 1 | SIMPLE | t_user_attributes | ref | PRIMARY,attribute | attribute | 1 | const | 15 | Using where |
| 1 | SIMPLE | t_users | eq_ref | username,username_2 | username | 12 | bbodb_test.t_user_attributes.username | 1 | |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
Now it uses the index.
Try this:
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` c
WHERE c.course_id IN (
SELECT s.course_id
FROM `select` s
WHERE card_no=<student's card number>
AND c.course_id = s.course_id
);
Notice the addition of the AND clause in the sub-query. This is called a co-related sub-query because it relates the two course_ids, just as the NATURAL JOIN does.
I think Barmar's index explanation is on the mark.

Obtaining the most recent records in a table of transactions

I'm trying to run a query on a table that we keep for transactions regarding aspects of records of our database. To be more specific, when we "expire" an "asset" (as we call it), we change it's state to expired in the main table, and then record the record of when it was expired in another (this was not my design).
The problem is, sometimes the end user gets impatient with the front-end and we end up with multiple expired transactions for a specific record from the other table.
The table in question is as follows:
+---------------+-----------------------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------------------+------+-----+-------------------+-------+
| m_id | int(11) | NO | PRI | 0 | |
| a_ordinal | int(11) | NO | PRI | 0 | |
| date_expired | datetime | NO | PRI | | |
| expire_state | enum('EXPIRED','UNEXPIRED') | YES | | NULL | |
| note | text | YES | | NULL | |
| created_by | varchar(40) | YES | | NULL | |
| creation_date | datetime | NO | | | |
| updated_by | varchar(40) | NO | | | |
| last_update | timestamp | NO | | CURRENT_TIMESTAMP | |
+---------------+-----------------------------+------+-----+-------------------+-------+
From what I can ascertain, m_id, a_ordinal and date_expired form a composite key.
What I need is a query to the table to display the most recent transaction for each expired record (m_id, a_ordinal, expired_date). Currently it's displaying 809 records, but that's because we could have multiple instances of when the record was expired:
| 2223 | 20 | 2011-05-02 12:15:43 | EXPIRED | 165 Plays. Program quality is poor.
| 2223 | 20 | 2011-05-02 12:16:05 | EXPIRED | 165 Plays. Program quality is poor.
I know it involves a sub-query with a join, (or perhaps not?) but it's been 5 years since I've worked with MySQL, and I'm very rusty. Any help would be appreciated!!
SELECT t.m_id, t.a_ordinal, t.date_expired, t.note
FROM expiry_table_name t
INNER JOIN (
SELECT m_id, a_ordinal, MAX(date_expired) AS date_expired
FROM expiry_table_name
GROUP BY m_id, a_ordinal
) g
ON g.m_id = t.m_id
AND g.a_ordinal = t.a_ordinal
AND g.date_expired = t.date_expired
n.b. If you have duplicate date_expired values (for a specific m_id, a_ordinal combination) you'll need to do something more sophisticated.
I believe you'll need to join on a subquery to do this... try the following:
SELECT
yt.m_id,
yt.a_ordinal,
yt.date_expired
FROM
yourtable yt
INNER JOIN (
SELECT m_id, a_ordinal, MAX(date_expired) as `max_date`
FROM yourtable
GROUP BY m_id, a_ordinal) dt
ON (yt.m_id = dt.m_id AND yt.a_ordinal = dt.a_ordinal AND yt.date_expired = dt.max_date)