SQL Returning Double Rows - mysql

I have 2 tables in:
users (user_id, fname, lname, department) and clock (id, punchType, punchTime, comment, user_id).
The SQL query below pulls 2 rows for some records and I can't figure out why. Any insight would be helpful.
SELECT user.user_id, user.fname, user.lname, user.department, punchType, punchTime, comment
FROM user
INNER JOIN (
SELECT *
FROM clock
WHERE punchTime IN (
SELECT MAX(punchTime)
FROM clock
GROUP BY user_id
)
) AS a
ON user.user_id = a.user_id

Because different users can have the same punch time. One user's punchtime could be another users maximum punchtime. Here is one fix:
SELECT *
FROM clock
WHERE (user_id, punchTime) IN (
SELECT user_id, MAX(punchTime)
FROM clock
GROUP BY user_id
);
This could also be fixed with correlated subqueries and other methods.

You will notice that when you subquery by punchTime alone, you can end up with duplicate records per user. What happens is if any of a user's punchTimes match a max punch time, they stay in the set. So, if a user has a max time that matches someone else's max time, or the users has two+ records that represent their own max punch time, you will be joining multiple rows of the same user_id from clock with user table.
For example:
SELECT
user_id,
MAX(punchTime) as real_max_time,
COUNT(1) as dupe_count,
COUNT(DISTINCT(punchTime)) as unique_punchTimes
COUNT(DISTINCT(punchType)) as unique_punchTypes
FROM clock
WHERE punchTime IN (
SELECT MAX(punchTime)
FROM clock
GROUP BY user_id
)
GROUP BY
user_id
HAVING COUNT(1) > 1
Otherwise you could have a duplicate user_id within your user table. Maybe one user has been in multiple departments? or changed names?
Find duplicated user_ids with the following:
SELECT
user_id,
COUNT(1) as duplicate_user_count
FROM user
GROUP BY user_id
HAVING COUNT(1) >1
Putting it all back together - find where the duplication is happening and then add other columns to you care about once resolved:
SELECT
users.user_id,
users.dupe_users,
max_time.distinct_punchtimes,
max_time.distinct_punchtypes,
max_time.max_punchTime
FROM (
SELECT
user_id,
COUNT(1) as dupe_users
FROM user
GROUP BY
user_id
) as users
INNER JOIN (
SELECT
user_id,
COUNT(1) as clock_rows,
COUNT(DISTINCT(punchTime)) as distinct_punchtimes,
COUNT(DISTINCT(punchType)) as distinct_punchtypes,
MAX(punchTime) max_punchTime
FROM clock
GROUP BY user_id
) as max_time
ON users.user_id = max_time.user_id

Related

SQL query with most recent name and total count

I already have a table, "table_one", set up on phpMyAdmin that has the following columns:
USER_ID: A discord user ID (message.author.id)
USER_NAME: A discord username (message.author.name)
USER_NICKNAME: The user's display name on the server (message.author.display_name)
TIMESTAMP: A datetime timestamp when the message was entered (message.created_at)
MESSAGE CONTENT: A cleaned input keyword to successful completion of content, just for this example consider "apple" or "orange" as the two target keywords.
What I'd like as a result is a view or query that returns a table with the following:
The user's most recent display name (USER_NICKNAME), based off the most recent timestamp
The total number of times a user has entered a specific keyword. Such as confining the search to only include "apple" but not instances "orange"
My intention is that if a user entered a keyword 10 times, then changed their server nickname and entered the same keyword 10 more times, the result would show their most recent nickname and that they entered the keyword 20 times in total.
This is the closest I have gotten to my desired result so far. The query correctly groups instances where user has changed their nickname based on the static discord ID, but I would like it to retain this functionality while instead showing the most recent USER_NICKNAME instead of a USER_ID:
SELECT USER_ID, COUNT(USER_ID)
FROM table_one
WHERE MESSAGE_CONTENT = 'apple'
GROUP BY USER_ID
I don't think there is an uncomplicated way to do this. In Postgres, I would use the SELECT DISTINCT ON to get the nickname, but in MySQL I believe you are limited to JOINing grouped queries.
I would combine two queries (or three, depending how you look at it).
First, to get the keyword count, use your original query:
SELECT USER_ID, COUNT(USER_ID) as apple_count
FROM table_one
WHERE MESSAGE_CONTENT = 'apple'
GROUP BY USER_ID;
Second, to get the last nickname, group by USER_ID without subsetting rows and use the result as a subquery in a JOIN statement:
SELECT a.USER_ID, a.USER_NICKNAME AS last_nickname
FROM table_one a
INNER JOIN
(SELECT USER_ID, MAX(TIMESTAMP) AS max_ts
FROM table_one
GROUP BY USER_ID) b
ON a.USER_ID = b.USER_ID AND TIMESTAMP = max_ts
I would then JOIN these two, using a WITH statement to increase the clarity of what's going on:
WITH
nicknames AS
(SELECT a.USER_ID, a.USER_NICKNAME AS last_nickname
FROM table_one a
INNER JOIN
(SELECT USER_ID, MAX(TIMESTAMP) AS max_ts
FROM table_one
GROUP BY USER_ID) b
ON a.USER_ID = b.USER_ID AND TIMESTAMP = max_ts),
counts AS
(SELECT USER_ID, COUNT(USER_ID) AS apple_count
FROM table_one
WHERE MESSAGE_CONTENT = 'apple'
GROUP BY USER_ID)
SELECT nicknames.USER_ID, nicknames.last_nickname, counts.apple_count
FROM nicknames
INNER JOIN counts
ON nicknames.USER_ID = counts.USER_ID;

Mysql get sum of two tables columns grouped

I have 3 tables:
I would like to select the difference of the total gain and total spent per user. So my hypothetical table could be:
I tried this:
SELECT g.total - s.total AS quantity, id FROM
(SELECT SUM(quantity) AS total FROM gain GROUP BY user) AS g,
(SELECT SUM(quantity) AS total FROM spent GROUP BY user) AS s, users
But it doesn't work...
You need to use the users table as base table, to be able to consider all the users, and then LEFT JOIN to the sub queries computing the total spent and total gain. This is because some user may not have any entry in either gain or spent table(s). Also, Coalesce() function handles the NULL (in case of no matching row)
SELECT
u.id AS user,
COALESCE(tot_gain, 0) - COALESCE(tot_spent, 0) AS balance
FROM users AS u
LEFT JOIN (SELECT user, SUM(quantity) as tot_spent
FROM spent
GROUP BY user) AS s ON s.user = u.id
LEFT JOIN (SELECT user, SUM(quantity) as tot_gain
FROM gain
GROUP BY user) AS g ON g.user = u.id
Madhur's solution is fine. An alternative is union all and group by:
select user, sum(gain) as gain, sum(spent) as spent
from ((select user, quantity as gain, 0 as spent
from gain
) union all
(select user, 0, quantity as spent
from spent
)
) u
group by user;
You can join to user if you want users that are not in either table or you need additional columns. However, that join may not be necessary.

SQL: Return only the most recent activity per customer

I have two tables with linked data. There are activity updates against individual customers (customer_id). I want to return the most recent activity per customer;
contact;
customer_id (auto_increment)
last_name
first_name
phone_work
activity;
activity_id (auto_increment)
data_item_id
entered_by
date_created
notes
I can return the entire set of activities;
SELECT last_name, first_name, date_created, notes, FROM contact JOIN activity ON contact.customer_id=activity.data_item_id;
..but I only want the most recent activity per customer_name. If I use unique, it seems to return the first activity per customer_name and not the most recent. I'm sure it's extrememly simple but I've not found it yet. Thoughts?
You can use MAX(date_created) to choose the correct row from the joined table. Otherwise MySQL would join all the rows, or in case of GROUP BY, choose the first matching row it finds.
So the query would be:
SELECT MAX(date_created) AS date, last_name, first_name, notes
FROM contact JOIN activity ON contact.customer_id=activity.data_item_id
ORDER BY customer_id ASC
WITH TempTbl AS (
SELECT
MAX(activity_id) activity_id --as it is auto_increment, this should return the most recent record)
,Last_Name
,First_Name
,Date_Created
,Notes
FROM Activity
JOIN Contact ON Contact.Customer_Id=Activity.Data_Item_Id
)
SELECT * FROM TempTbl
if you still have multipe records per customer, then suggestion will be to consider using RANK()OVER(PARTITION BY... to select most recent record
One more solution :
select temp.* from
(
SELECT c.last_name last, c.first_name first, a.date_created, a.notes
FROM contact c JOIN activity a ON c.customer_id=a.data_item_id
order by a.date_created desc
) temp
group by temp.last, temp.first;

Select max with group by on other table

I have 2 tables: users_item that has 2 columns user_id, item_id and item_rates that has 2 columns rate_item_id, rate.
They are connected with Foreign_Key on users_item.item_id = item_rates.rate_item_id. I need to
select item_id's with max rate for a given range of users. One user can have a lot of items.
My select is:
SELECT MAX(rate), rate_item_id, user_id
FROM users_item JOIN item_rates ON item_id = rate_item_id
AND user_id in (2706,2979) GROUP BY user_id;
but it returns not correspondent item_id's with max rate. In given example select has to return just 2 rows. Can someone help on this. Thanks in advance.
Ok, I found what you want.. Try this:
SELECT users_item.user_id, item_id, maxrate
FROM user_items
JOIN item_rates ON users_item.item_id = item_rates.rate_item_id
JOIN (SELECT MAX(rate) AS maxrate, user_id
FROM users_item JOIN item_rates ON item_id = rate_item_id
WHERE user_id in (1,2)
GROUP BY user_id) AS maxis
ON users_item.USER_ID = maxis.USER_ID
WHERE item_rates.rate = maxrate
The reason you need a subquery is that multiple different items own by the same user could have the same rate and this could be the maximum rate of user's owned items.
Try grouping by user_id, rate_item_id
I'm surprised that MySql doesn't give you an error, Oracle would...

get the average time for time from subscription until payment

I have two tables. The first is subscribers. Subscribers are also appointed to a category. The second table is payments that the subscribers made. I want to know what the average time is between the time of subscription and the FIRST payment of a subscriber (the can make multiple).
Here is a piece of SQL, but it doesn't do what I want just yet - although I have the feeling I'm close ;)
SELECT category,
AVG(TIMESTAMPDIFF(HOUR, subs.timestamp, MIN(payments.timestamp)))
FROM subs
JOIN payments ON (payments.user_id = subs.user_id)
GROUP BY category
Now I get "Invalid use of group function" - because of the MIN function, so that ain't right. What do I have to do now? Thanks in advance!
SELECT category,
AVG(TIMESTAMPDIFF(HOUR, subs.timestamp, p.timestamp))
FROM subs
JOIN ( SELECT user_id
, min(timestamp) timestamp
FROM payments
GROUP BY user_id
) p
ON p.user_id = subs.user_id
GROUP BY category
If you needed to update another table with the results of this query, you could do something like this (not tested, so there may be syntax errors but hopefully you get the idea). I assume that another_table has category and avg_hrs_spent columns.
UPDATE another_table
SET avg_hrs_spent =
(
SELECT a.avg_hrs_spent FROM
(
(SELECT category,
AVG(TIMESTAMPDIFF(HOUR, subs.timestamp, p.timestamp)) avg_hrs_spent
FROM subs
JOIN ( SELECT user_id
, min(timestamp) timestamp
FROM payments
GROUP BY user_id
) p
ON p.user_id = subs.user_id
GROUP BY category) a
)
WHERE a.category = another_table.category
)