Struggling with a simple SQL JOIN - mysql

I'm pretty bad at SQL and have been having some troubles doing somewhat of a UNIQUE join of two tables. The SQL structure is somewhat abysmal, but I didn't design it.
I have two tables:
users
uid, ufn, uln, ue
Where users id = uid.
and
transactions
uid, unit, address, start_date
Basically in the transactions table, there are multiple entries per uid. What I am looking to do is select users.ufn, users.uln, users.ue, transactions.unit, transactions.address based on ONLY the newest start_date. Meaning I will only get ONE result per uid. Currently I'm getting returns for ALL uid entries in the transactions table.
I've tried doing some JOINS, LEFT JOINS, and things with MAX, but have been largely unsuccessful.
SELECT * FROM users JOIN ( SELECT unit, address, start_date FROM transactions GROUP BY uid) as a ON users.tenant_id = a.tenant_id
Is what I tried among a mix of other things.
Any hint as to the right direction would be much appreciated. Thank you!

This will get you close. The problem will be if 2 transactions have the same start date for the same user. But if you don't have that case this should work fine.
select u.ufn,
u.uln,
u.ue,
t.unit,
t.address
from users u
inner join (
select uid,
max(start_date) as newest_start_date
from transactions
group by uid) x
on u.uid = x.uid
inner join transactions t
on t.start_Date = x.newest_start_date
and t.uid = u.uid

Your example SQL has "tenant_id" but that is not in your tables example?
Are you running this once or 10000 times a day?
Try this:
SELECT users.ufn, users.uln, users.ue, transactions.unit, transactions.address
FROM users join transactions on users.uid = transactions.uid
WHERE transactions.UID, transactions.start_date IN
(SELECT UID, MAX(start_date) FROM TRANSACTIONS GROUP BY UID);

Another option is to use an ANTI JOIN on an inequality
select users.ufn,
users.uln,
users.ue,
t.unit,
t.address
from users
INNER JOIN transactions t
ON t.uid = u.uid
LEFT JOIN transactions t1
ON t.uid = t1.uid
and t.start_date < t1.start_date
WHERE
t1.uid is null
Because of t.start_date < t1.start_date and t1.uid is null only records that don't have another record with a greater start_date will be selected
As with MAX() if two or more transaction have start_dates that tie for a user you will get both

This query could work:
SELECT
u.ufn,
u.uln,
u.ue,
t2.unit,
t2.address
FROM
users AS u
INNER JOIN
(
SELECT
uid
, MAX(start_date) AS start_date
FROM
transactions
WHERE
uid = users.uid
) AS t1
INNER JOIN
transaction AS t2 ON t2.uid = t1.uid AND t2.start_date = t1.start_date
Temporary tables are also an option (may be faster, you have to try):
CREATE TEMPORARY TABLE
last_transactions
AS SELECT
uid
, MAX(start_date) AS start_date
FROM
transactions
GROUP BY
uid
;
SELECT
u.ufn,
u.uln,
u.ue,
t2.unit,
t2.address
FROM
users AS u
INNER JOIN
last_transactions AS t1 ON t1.uid = u.uid
INNER JOIN
transaction AS t2 ON t2.uid = t1.uid AND t2.start_date = t1.start_date
P.S.: You should definitely consider adding a primary key to the transactions table. This would allow for a better join clause between t1 and t2. Also, it would prevent duplicates that may occur when multiple start_date occur for the same user.
P.P.S.: wouldn't adding a last_transaction_start_date column to the user table be wiser?

I am not sure if this would be exactly same syntax in MySQL (most likely it will be), but here is how you would do it in SQL server.
Use a rank() function to determine the latest date.
SELECT x.* , y.*
FROM users as x
JOIN
(SELECT *, RANK() Over (Partition By UID Order By Start_Date DESC) as Rank_ FROM Transactions) as y
ON x.uid = y.uid and y.rank_ = 1
Hope this helps.
Cheers!

Related

Subquery left join refer to parent ID

I am trying to make a query to fetch the newest car for each user:
select * from users
left join
(select cars.* from cars
where cars.userid=users.userid
order by cars.year desc limit 1) as cars
on cars.userid=users.userid
It looks like it says Unknown column "users.userid" in where clause
I tried to remove cars.userid=users.userid part, but then it only fetches 1 newest car, and sticks it on to each user.
Is there any way to accomplish what I'm after? thanks!!
For this purpose, I usually use row_number():
select *
from users u left join
(select c.* , row_number() over (partition by c.userid order by c.year desc) as seqnum
from cars c
) c
on c.userid = u.userid and c.seqnum = 1;
One option is to filter the left join with a subquery:
select * -- better enumerate the columns here
from users u
left join cars c
on c.userid = u.userid
and c.year = (select max(c1.year) from cars c1 where c1.userid = c.userid)
For performance, consider an index on car(userid, year).
Note that this might return multiple cars per user if you have duplicate (userid, year) in cars. It would be better to have a real date rather than just the year.
Maybe there are better and more efficient way to query this. Here is my solution;
select users.userid, cars.*
from users
left join cars on cars.userid = users.userid
join (SELECT userid, MAX(year) AS maxDate
FROM cars
GROUP BY userid) as sub on cars.year = sub.maxDate;

How to get most recent balance from many users balances?

I have two table users and transactions. transactions table has relation with users two table format like below
users
id name email created
1 a a#mail.com 12-03-01
2 b b#mail.com 11-03-01
Transactions
id user_id balance
1 1 250
2 1 550
3 2 50
4 2 1000
I need last inserted users balance from transactions table with all users information. I am new in sql.
So I have tried below code
select * from transactions
where id in (select max(id) from transactions group by user_id)
INNER JOIN users on transactions.user_id=users.id
It's giving me syntax error near inner join.Have I made any mistake in inner join ? or I am in wrong direction ?
If you only want the balance, then a correlated subquery might be faster:
select u.*,
(select t.balance
from transactions t
where t.user_id = u.id
order by t.id desc
limit 1
) as MostRecentBalance
from users u;
For maximum performance, you want an index on transactions(user_id, id desc, balance).
The reason this is faster is because it avoids the aggregation on the entire transactions table. This is even more important if you are only selecting a subset of users.
EDIT:
I originally read this question as one row per user. However, if you only want one row returned -- for the last insert into transactions -- then a simpler method is:
select u.*, t.balance
from users u join
transactions t
on u.id = t.user_id
order by t.id desc
limit 1;
The JOIN should be part of the FROM statement so it should look more like the code below.
select *
from transactions ts
INNER JOIN users
ON (transactions.user_id=users.id)
where ts.id in
(
select max(transactions.id)
from transactions
group by user_id
);
edited to clarify which id is in use as per Gordons suggestion
2 simple methods.
A sub query to get the lastest transaction, and from that all the transaction details and then the user
SELECT users.*
FROM users
INNER JOIN transactions
ON users.id = transactions.user_id
INNER JOIN
(
SELECT MAX(id) AS max_id
FROM transactions
) sub0
ON transactions.id = sub0.max_id
Or you could try ordering by the id descending with a limit of 1:-
SELECT users.*
FROM users
INNER JOIN transactions
ON users.id = transactions.user_id
ORDER BY transactions.id DESC
LIMIT 1
EDIT
To get the last transaction for all users then you could use the following:-
SELECT *
FROM users
INNER JOIN transactions
ON users.id = transactions.user_id
INNER JOIN
(
SELECT user_id, MAX(id) AS max_id
FROM transactions
GROUP BY user_id
) sub0
ON transactions.id = sub0.max_id
ON transactions.user_id = sub0.user_id

Query on two tables with belongs_to/has_many relation

One table is Users with id and email columns.
Another table is Payments with id, created_at, user_id and foo columns.
User has many Payments.
I need a query that returns each user's email, his last payment date and this last payment's foo value. How do I do that? What I have now is:
SELECT users.email, MAX(payments.created_at), payments.foo
FROM users
JOIN payments ON payments.user_id = users.id
GROUP BY users.id
This is wrong, because foo value does not necessarily belong to user's most recent payment.
Try this :
select users.email,foo,create_at
from users
left join(
select a.* from payments a
inner join (
select id,user_id,max(create_at)
from payments
group by id,user_id
)b on a.id = b.id
) payments on users.id = payments.user_id
If users has no payment yet, then foo and create_at would return NULL. if you want to exclude users who has no payment, then use INNER JOIN.
One approach would be to use a MySQL version of rank over partition and then select only those rows with rank = 1:
select tt.email,tt.created_at,tt.foo from (
select t.*,
case when #cur_id = t.id then #r:=#r+1 else #r:=1 end as rank,
#cur_id := t.id
from (
SELECT users.id,users.email, payments.created_at, payments.foo
FROM users
JOIN payments ON payments.user_id = users.id
order by users.id asc,payments.created_at desc
) t
JOIN (select #cur_id:=-1,#r:=0) r
) tt
where tt.rank =1;
This would save hitting the payments table twice. Could be slower though. Depends on your data!

Mysql count and return just one row of data

I need to count the amount of users that have have answered all of those 3 profile_options (so they have at least 3 records in the profile_answers table).
SELECT COUNT(DISTINCT(users.id)) users_count
FROM users
INNER JOIN profile_answers ON profile_answers.user_id = users.id
WHERE profile_answers.profile_option_id IN (37,86,102)
GROUP BY users.id
HAVING COUNT(DISTINCT(profile_answers.id))>=3
The problem is that this query is return a table with rows for each user and how many they answered (in this case always 3). What I need is to return just one row that has the total number of users (so the sum of all rows of this example)
I know how to do it with another subquery but the problem is that I am running into "Mysql::Error: Too high level of nesting for select"
Is there a way to do this without the extra subquery?
SELECT SUM(sum_sub.users_count) FROM (
(SELECT COUNT(DISTINCT(users.id)) users_count
FROM users
INNER JOIN profile_answers ON profile_answers.user_id = users.id
WHERE profile_answers.profile_option_id IN (37,86,102)
GROUP BY users.id
HAVING COUNT(DISTINCT(profile_answers.id))>=3)
) sum_sub
Please give this query a shoot
SELECT COUNT(DISTINCT(u.id)) AS users_count
FROM users AS u
INNER JOIN (
SELECT user_id, COUNT(DISTINCT profile_option_id) AS total
FROM profile_answers
WHERE profile_option_id IN (37,86,102)
GROUP BY users.id
HAVING COUNT(DISTINCT profile_option_id) = 3
) AS a ON a.user_id = u.id
If you have lots of data in your tables, you will get a better/faster performance by using temporary tables like so
CREATE TEMPORARY TABLE a (KEY(user_id)) ENGINE = MEMORY
SELECT user_id, COUNT(DISTINCT profile_option_id) AS total
FROM profile_answers
WHERE profile_option_id IN (37,86,102)
GROUP BY users.id
HAVING COUNT(DISTINCT profile_option_id) = 3;
Then your final query will look like this
SELECT COUNT(DISTINCT(u.id)) as users_count
FROM a
INNER JOIN on a.user_id = u.id
Unless there is a need to join the users table you can go with this
SELECT COUNT(*) AS users_count
FROM (
SELECT user_id, COUNT(DISTINCT profile_option_id) AS total
FROM profile_answers
WHERE profile_option_id IN (37,86,102)
GROUP BY users.id
HAVING COUNT(DISTINCT profile_option_id) = 3
) AS a
Should you need another solution, please consider providing us you EXPLAIN EXTENDED for the query and the table definitions along with a better problem description.
I hope this helps
You can give the queries a name using the AS clause. See the updated query below.
SELECT SUM(sum_sub.users_count) FROM (
(SELECT COUNT(DISTINCT(users.id)) as users_count
FROM users
INNER JOIN profile_answers ON profile_answers.user_id = users.id
WHERE profile_answers.profile_option_id IN (37,86,102)
GROUP BY users.id
HAVING COUNT(DISTINCT(profile_answers.id))>=3)
) as sum_sub
You should not group by on a field not present in select statement.
select id, count(*) from users group by id is fine
select count(id) from users group by id is NOT
Regarding your query I think the link to user table is not necessary. Just using foreign key should be fine.
Try this one:
select count(*) from
(SELECT users_id count(*) as cnt
FROM profile_answers
INNER JOIN users ON profile_answers.user_id = users.id
WHERE profile_answers.profile_option_id IN (37,86,102)
group by users_id
having count(*) >3)

Selecting the last record and comparing a datetime

I'm building a discussion board and I want to get a list of unread topics.
A topic should be unread and selected if the created_at datetime for the last post in a topic is greater than the last time the currently logged in user viewed this topic.
http://sqlfiddle.com/#!2/4e2e99/1
If you delete all of the user view inserts ALL of the topics should be listed.
I have three tables:
topics
id
user_id
created_at
topic_posts
id
topic_id
created_at
topic_user_views
id
topic_id
created_at
My query so far (but it doesn't work):
SELECT DISTINCT `topics`.`id`, `topics`.`name`
FROM `topics`
INNER JOIN `topic_user_views`
ON `topic_user_views`.`topic_id` = `topics`.`id`
INNER JOIN `topic_posts`
ON `topic_posts`.`topic_id` = `topics`.`id`
WHERE `topic_posts`.`created_at` > `topic_user_views`.`created_at`
AND `topic_user_views`.`user_id` = 1
ORDER BY `id` DESC
I don't know how to compare a topic's last post's created_at column post to the last time this user has viewed the topic.
here is one way of doing it. we will use exists to test if the topic was viewed before the last topic post was created. see the sql fiddle - http://sqlfiddle.com/#!2/4e2e99/11
select t.id as topic_id, t.name as topic_name
from topics t
where not exists(
select tuv.topic_id, max(tuv.created_at) as last_view, max(tp.created_at) as last_post
from topic_user_views tuv
inner join topic_posts as tp
on tuv.topic_id=tp.topic_id and tuv.created_at > tp.created_at
group by topic_id
having t.id=topic_id)
ORDER BY id DESC
With this example, topics also will be selected if user have not seen anything yet:
SELECT DISTINCT
`topics`.`id`,
`topics`.`name`
FROM `topics`
LEFT JOIN `topic_user_views` AS tuv
ON `tuv`.`topic_id` = `topics`.`id`
WHERE (SELECT
1
FROM topic_posts AS tp
WHERE tp.topic_id = `topics`.`id`
AND (tp.created_at > tuv.created_at
OR tuv.created_at IS NULL)
LIMIT 1)IS NOT NULL
ORDER BY `id` DESC;
Here is what I came up with :
SELECT t1.topic_id, t3.account_user_id, t1.created_at last_post_date, t4.created_at as last_seen_date
FROM topic_posts t1
INNER JOIN (SELECT topic_id, MAX(created_at) as created_at
FROM topic_posts
GROUP BY topic_id) t2
USING (topic_id, created_at)
INNER JOIN topic_user_views t3
USING (topic_id)
INNER JOIN (SELECT topic_id, account_user_id, created_at
FROM topic_user_views
INNER JOIN (SELECT topic_id, account_user_id, MAX(created_at) as created_at
FROM topic_user_views
GROUP BY topic_id, account_user_id) _
USING (topic_id, created_at, account_user_id)) t4
ON t1.topic_id = t4.topic_id and t3.account_user_id = t4.account_user_id and t3.created_at = t4.created_at
WHERE t1.created_at > t4.created_at
AND t3.account_user_id = 1;
This idea is to join the date of the last posted message in a topic (t2) and the date of the last seen message in a topic by a user (t4), and then you just have to filter out the results you don't want.
If I'm not mistaken, in the SQLFiddle you provided, there are no topics that have not already been seen by users, so this returns nothing. I slightly modified it (you can see it here) and it seems to work as wanted.
To consider the topics never viewed by a user, I think the best solution is to use the account table with a RIGHT OUTER JOIN. Something similar to :
SELECT t1.topic_id, t3.account_user_id, t1.created_at last_post_date, t4.created_at as last_seen_date
FROM topic_posts t1
INNER JOIN (SELECT topic_id, MAX(created_at) as created_at
FROM topic_posts
GROUP BY topic_id) t2
USING (topic_id, created_at)
INNER JOIN topic_user_views t3
USING (topic_id)
INNER JOIN (SELECT topic_id, account_user_id, created_at
FROM topic_user_views
INNER JOIN (SELECT topic_id, account_user_id, MAX(created_at) as created_at
FROM topic_user_views
GROUP BY topic_id, account_user_id) _
USING (topic_id, created_at, account_user_id)) t4
ON t1.topic_id = t4.topic_id and t3.account_user_id = t4.account_user_id and t3.created_at = t4.created_at
RIGHT OUTER JOIN account
ON account.user_id = t3.account_user_id
WHERE t1.created_at > t4.created_at or t4.account_user_id IS NULL
AND t3.account_user_id = 1;
which I did not test.