I'm practicing MySQL and I'm trying to solve an exercise. I have data that contains reviews to a hotel. The data contains reviews by different users: one user can have many reviews if they have visited more than once. Each review has its own id and then review values from 1 to 5.
The reviews also have dates, and now I would like to count the average reviews of the first visits (earliest date). My problem is that the ways I have tried to retrieve the earliest date, don't actually work. By this I mean that I get the same results with and without the HAVING and WHERE methods. Is there someone that could help me with this? Thanks!
Here is my query (I have tried with the HAVING and WHERE methods)
SELECT AVG(overall_rating), AVG(rooms_rating), AVG(service_rating), AVG(location_rating), AVG(value_rating)
FROM reviews
HAVING MIN(review_date)
#WHERE review_date IN (SELECT MIN(review_date) FROM repatrons_reviews GROUP BY id)
Here is an example of the data
user_id | id | rooms_rating | service_rating | location_rating | value_rating | date
---------------------------------------------------------------------------------------
matt21 | 123 | 4 | 5 | 2 | 4 | 2007-08-20
This
SELECT ...
FROM reviews
HAVING MIN(review_date)
cannot work. Let's say the minimum date in the table is DATE '2020-01-01', then what is HAVING DATE '2020-01-01' supposed to mean?
This
SELECT ...
FROM reviews
WHERE review_date IN (SELECT MIN(review_date) FROM reviews GROUP BY id);
is close, but it's not the minimum date per ID, but the minimum date per user ID you want. And if you replace id by user_id, then there is still a problem, because what is the first date for one user can be the third date for another.
Here is this query corrected:
SELECT
AVG(overall_rating), AVG(rooms_rating),
AVG(service_rating), AVG(location_rating), AVG(value_rating)
FROM reviews
WHERE (user_id, review_date) IN
(SELECT user_id, MIN(review_date) FROM reviews GROUP BY user_id);
You can do it with NOT EXISTS:
SELECT AVG(r.overall_rating), AVG(r.rooms_rating), AVG(r.service_rating), AVG(r.location_rating), AVG(r.value_rating)
FROM reviews r
WHERE NOT EXISTS (SELECT 1 FROM reviews WHERE user_id = r.user_id AND date < r.date)
Related
I've tried a few things but I've ended up confusing myself.
What I am trying to do is find the most recent records from a table and left join the first after a certain date.
An example might be
id | acct_no | created_at | some_other_column
1 | A0001 | 2017-05-21 00:00:00 | x
2 | A0001 | 2017-05-22 00:00:00 | y
3 | A0001 | 2017-05-22 00:00:00 | z
So ideally what I'd like is to find the latest record of each acct_no sorted by created_at DESC so that the results are grouped by unique account numbers, so from the above record it would be 3, but obviously there would be multiple different account numbers with records for different days.
Then, what I am trying to achieve is to join on the same table and find the first record with the same account number after a certain date.
For example, record 1 would be returned for a query joining on acct_no A0001 after or equal to 2017-05-21 00:00:00 because it is the first result after/equal to that date, so these are sorted by created_at ASC AND created_at >= "2017-05-21 00:00:00" (and possibly AND id != latest.id.
It seems quite straight forward but I just can't get it to work.
I only have my most recent attempt after discarding multiple different queries.
Here I am trying to solve the first part which is to select the most recent of each account number:
SELECT latest.* FROM my_table latest
JOIN (SELECT acct_no, MAX(created_at) FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no
but that still returns all rows rather than the most recent of each.
I did have something using a join on a subquery but it took so long to run I quite it before it finished, but I have indexes on acct_no and created_at but I've also ran into other problems where columns in the select are not in the group by. I know this can be turned off but I'm trying to find a way to perform the query that doesn't require that.
Just try a little edit to your initial query:
SELECT latest.* FROM my_table latest
join (SELECT acct_no, MAX(created_at) as max_time FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no AND latest.created_at = latest2.max_time
Trying a different approach. Not sure about the performance impact. But hoping that avoiding self join and group by would be better in terms of performance.
SELECT * FROM (
SELECT mytable1.*, IF(#temp <> acct_no, 1, 0) selector, #temp := acct_no FROM `mytable1`
JOIN (SELECT #temp := '') a
ORDER BY acct_no, created_at DESC , id DESC
) b WHERE selector = 1
Sql Fiddle
you need to get the id where max date is created.
SELECT latest.* FROM my_table latest
join (SELECT max(id) as id FROM my_table GROUP
BY acct_no where created_at = MAX(created_at)) latest2
ON latest.id = latest2.id
I have a table of records (lets call them TV shows) with an air_date field.
I have another table of advertisements that are related by a show_id field.
I am trying to get the average number of advertisements per show for each date (with a where clause specifying the shows).
I currently have this:
SELECT
`air_date`,
(SELECT COUNT(*) FROM `commercial` WHERE `show_id` = `show`.`id`) AS `num_commercials`,
FROM `show`
WHERE ...
This gives me a result like so:
air_date | num_commercials
2015-6-30 | 6
2015-6-30 | 3
2015-6-30 | 8
2015-6-30 | 2
2015-6-31 | 9
2015-6-31 | 4
When I do a GROUP_BY, it only gives me one of the records, but I want the average for each air_date.
Not too sure I am clear on what you want - but does this do it
SELECT `air_date`,
AVG((SELECT COUNT(*) FROM `commercial` WHERE `show_id` = `show`.`id`)) AS `num_commercials`,
FROM `show`
WHERE .....
GROUP BY `air_date`
(Note double parentheses for AVG function is required)
You can use a sub-query to select count of commercials by air_date/show, then use an outer query to select the average commercials count per air_date.
Something like this should work:
select air_date, avg(num_commercials)
from
(
select show.air_date as air_date,
show.id as show_id,
count(*) as num_commercials
from show
inner join commercial on commercial.show_id = show.id
group by show.air_date, show.id
where ...
) sub
group by air_date
Let's say I have a schools table (cols = "ids (int)") and a users table (cols = "id (int), school_id (int), created_at (datetime)").
I have a list of school ids saved in <school_ids>. I want to group those schools by the yearweek(users.created_at) value for the user at that school with the earliest created_at value, and for each group list the value of yearweek(users.created_at) and the number of schools.
In other words, i want to find the earliest-created user for each school, and then group the schools by the yearweek() result for that created_at date, so i have the number of schools that signed up their first user in each week, effectively.
So, i want results like
| 201301 | 22 | #meaning there are 22 schools where the earliest created_at user
#has yearweek(created_at) = "201301"
| 201302 | 5 | #meaning there are 5 schools where the earliest created_at user
#has yearweek(created_at) = "201302"
etc
As a sanity check, the total of all rows in the second column should equal the size of <school_ids>, ie the number of ids in school_ids.
Does that make sense? I can't quite figure out how to get this without doing several queries and storing values in between. I'm sure there's a one-liner. Thanks! max
You could use a subquery that returns the minimum created_at field for every school_id, and then you can group by yearweek and do the count:
SELECT
yearweek(u.min_created_at) AS yearweek_first_user,
COUNT(*)
FROM
(
SELECT school_id, MIN(created_at) AS min_created_at
FROM users
GROUP BY school_id
) u
GROUP BY
yearweek(u.min_created_at)
I'm having trouble writing this Query. I have 2 tables, vote_table and click_table. in the vote_table I have two fields, id and date. the format of the date is "12/30/11 : 14:28:36". in the click_table i have two fields, id and date. the format of the date is "12.30.11".
The id's occur multiple times in both tables. What i want to do is produce a result that contains 3 fields: id, votes, clicks. the id column should have distinct id values, the votes column should have the total times that ID has the date 12/30/11% from the vote_table, and the clicks should have the total times that ID has the date 12.30.11 from the click table, so something like this:
ID | VOTES | CLICKS
001 | 24 | 50
002 | 30 | 45
Assuming that the types of the 'date' columns are actually either DATE or DATETIME (rather than, say, VARCHAR), then the required operation is fairly straight-forward:
SELECT v.id, v.votes, c.clicks
FROM (SELECT id, COUNT(*) AS votes
FROM vote_table AS v1
WHERE DATE(v1.`date`) = TIMESTAMP('2011-12-30')
GROUP BY v1.id) AS v
JOIN (SELECT id, COUNT(*) AS clicks
FROM click_table AS c1
WHERE DATE(c1.`date`) = TIMESTAMP('2011-12-30')
GROUP BY c1.id) AS c
ON v.id = c.id
ORDER BY v.id;
Note that this only shows ID values for which there is at least one vote and at least one click on the given day. If you need to see all the ID values which either voted or clicked or both, then you have to do more work.
If you have to normalize the dates because they are VARCHAR columns, the WHERE clauses become correspondingly more complex.
I am storing all visits to my site in a table, I store the date, the page visited and a session id.
In theory, I can group somebody by their session id and this counts as 1 visit.
What I'd like to do however is go through the table and get the total of visits for each date. So it would group by the session id, and then group by the date.
ie:
SELECT DATE(added) as date, COUNT(*) FROM visits GROUP BY sessionID, date
This doesn't work as it retrieves then the total of visits for that session id, and the date.
My table structure looks a bit like this:
----------------------------------
| id | added | page | sessionid
----------------------------------
Any ideas?
My query gives me results that look like this:
2010-11-24 | 2
2010-11-24 | 14
2010-11-24 | 17
2010-11-24 | 1
While I'd be hoping for something more like a total of all those under the 1 date, ie:
2010-11-24 | 34
Each date contains the time which will be different for each request. If you use DATE in the GROUP BY clause just like you did in the SELECT clause, that will solve your problem.
By grouping by sessionID, it's going to create a row for every session. If instead of grouping by sessionID, you use COUNT(DISTINCT sessionID), that will contact the distinct number of session IDs for that date.
SELECT DATE(added) as date, COUNT(DISTINCT sessionID) as sessions FROM visits GROUP BY DATE(added)