MySQL returning other columns with DISTINCT - mysql

Need a little bit of help with a query.
SELECT id,email FROM user_info WHERE username!=''
AND email='example#gmail.com'
GROUP BY email ORDER BY id DESC LIMIT 1
What this does currently is fetch a distinct email but not the newest ID for an account under this email.

It sounds like you can apply an aggregate to the id field and it will return the most recent id for each email:
SELECT Max(id), email
FROM user_info
WHERE username!=''
AND email='example#gmail.com'
GROUP BY email
Unfortunately, when you apply a GROUP BY without an aggregate there is no guarantee what id will be returned unless you specify to select the max.
If you want to return the username that is associated with the max(id), then you can use a subquery:
SELECT i.MaxId,
i.email,
u.username
FROM user_info u
inner join
(
select max(id) MaxId, email
from user_info
WHERE username!=''
AND email='example#gmail.com'
group by email
) i
on u.id = i.maxid
and u.email = i.email
WHERE username!=''
AND email='example#gmail.com';

If by "newest id" you mean the largest value, then here's one approach that makes use of MySQL user variables to retain the value from previous rows, so a comparison can be made:
SELECT IF(u.email=#prev_email,1,0) AS dup_email_ind
, u.id
, u.username
, #prev_email := u.email AS email
FROM user_info u
CROSS
JOIN ( SELECT #prev_email := NULL) i
WHERE u.username != ''
AND u.email = 'example#gmail.com'
GROUP
BY u.email DESC
, u.id DESC
, u.username DESC
This returns all of the rows, with an indicator of whether the row is considered an older "duplicate" email or not. Rows that have dup_email_ind = 1 are identified as older duplicates, dup_email_ind = 0 indicates that this row is the latest row (the row with the largest id value) for a given email value.
(Usually, when I'm looking for duplicates like this, it's helpful for me to return both, or all, of the rows that are "duplicates".)
To return only the rows with the "newest id", wrap the query above (as an inline view) in another query: the output from the query is used a row source for the outer query.)
SELECT d.*
FROM (
-- the query above gets put here
) d
WHERE d.dup_mail_ind = 0
Another approach is to use a correlated subqueries in the SELECT list, although this is really only suitable for returning small sets. (This approach can have serious performance issues with large sets.)
SELECT ( SELECT u1.id
FROM user_info u1
WHERE u1.email = e.email
AND u1.username != ''
ORDER BY u1.id DESC
LIMIT 1
) AS id
, ( SELECT u2.username
FROM user_info u2
WHERE u2.email = e.email
AND u2.username != ''
ORDER BY u2.id DESC
LIMIT 1
) AS username
, e.email
FROM ( SELECT u.email
FROM user_info u
WHERE u.email = 'example#gmail.com'
AND u.username != ''
GROUP BY u.email
) e

Related

SQL - how to remove whole row if one of the column in subquery return NULL

I am stuck in 1 SQL query
SELECT u.*,
um2.meta_value as parent_user_id,
( select u.user_email FROM wp_users u WHERE u.ID = um2.meta_value ) AS parent_user_email
FROM
wp_users u
JOIN wp_usermeta um2 ON u.ID = um2.user_id
AND um2.meta_key = 'parent_user_id'
GROUP BY
u.ID
This query return 4 row ( As shown in the screenshot )
I want a scenario like : If subquery return NULL , then the whole row will not be shown.
So in this example "childthree" should not be shown , as "parent_user_email" is NULL , so the whole 3rd row need to remove
Use a join instead:
SELECT u.*, um2.meta_value as parent_user_id,
u2.user_email as parent_user_email
FROM wp_users u JOIN
wp_usermeta um2
ON u.ID = um2.user_id AND
um2.meta_key = 'parent_user_id' JOIN
wp_users u2
ON u2.ID = um2.meta_value
GROUP BY u.ID;
Note: This assumes that the email value itself is never NULL. If that is possible, add WHERE u2.user_email IS NOT NULL.
Also, your query should fail because the GROUP BY columns are inconsistent with the SELECT. However, logically it seems ok, because there is only one parent and user email per user. However, I would include those columns in the GROUP BY.

mysql join table and search for most recent record on where clause

I have two tables
users: id, email, firstName, lastName
subscriptions: id, userId, currentPeriodStart, currentPeriodEnd
Below just shows you how the two tables are related. I want to return subscriptions that expire after 1565827199, but it needs to check against each user's most recent subscription.
select
u.id
from users u
join subscriptions s on u.id s.userId
where s.currentPeriodEnd > 1565827199
ORDER BY u.lastName ASC
A user may have multiple subscriptions in the subscriptions table. What I need to do is modify the query above, so it checks against that user's most recent subscription and not the first one it finds.
select * from subscriptions ORDER BY currentPeriodEnd DESC LIMIT 1
I've tried a few different things (alias table, sub query) I found elsewhere on stackoverflow without any luck.
You can filter with a correlated subquery, like so:
select u.*, s.*
from users u
inner join subscriptions s on u.id = s.userId
where s.currentPeriodEnd = (
select max(s1.currentPeriodEnd)
from subscriptions s1
where s1.userId = u.id and s1.currentPeriodEnd > 1565827199
)
order by u.lastName
For performance, consider an index on subscriptions(userId, currentPeriodEnd).
Alternatively, if you are running MySQL 8.0, you can use row_number():
select *
from (
select
u.*,
s.*,
row_number() over(partition by u.id order by s.currentPeriodEnd desc)
from users u
inner join subscriptions s on u.id = s.userId
where s.currentPeriodEnd > 1565827199
) t
where rn = 1
order by lastName
Join with a subquery that gets the latest time for each user, and filters it down to just the ones after your specified timestamp.
select u.id
from users u
join (
select userid
FROM subscriptions
GROUP BY userid
HAVING MAX(currentPeriodEnd) > 1565827199
) s ON s.userid = u.id
ORDER BY u.lastName ASC

Left join to get most recent entries

I have the tables users and statuses . I want to select all the users, plus any statuses they might have, but only the most recent status from each user.
Here is the code that doesn't work:
SELECT users.id, alias, gender, login, logout, users.create_date, statustext as statustxt, TIMESTAMPDIFF(YEAR,birthdate,CURDATE()) AS age
FROM users
LEFT JOIN statuses s ON users.id = s.user_id
WHERE s.ID = (
SELECT MAX(s2.ID)
FROM statuses s2
WHERE s2.user_id = s.user_id
)
This gets the users with the most recent statuses, but not the users from the users table as well. Maybe it can be fixed by some small adjustment?
I got the sub query by searching, but I don't understand how that code works. It seems to compare two versions of the same table (For example: WHERE s2.user_id = s.user_id ) . Where can I read about this sort of technique?
Is a sub query required in this case by the way?
If you can find a solution would be great, and some basic explanation of how it works would highly appreciated.
----------EDIT---------------
I took one of the responses (by maresa) and combined with the sub query of my initial code , and this works(!) It has 3 selects and looks a bit over complicated maybe?:
SELECT users.id, alias, gender, login, logout, users.create_date, statustext as statustxt, TIMESTAMPDIFF(YEAR,birthdate,CURDATE()) AS age
FROM users
LEFT JOIN (
SELECT user_id, statustext FROM statuses s
WHERE s.ID = (
SELECT MAX(s2.ID)
FROM statuses s2
WHERE s2.user_id = s.user_id
)
) as s ON users.id = s.user_id
I've encountered similar problem. This post is relevant: http://www.microshell.com/database/sql/optimizing-sql-that-selects-the-maxminetc-from-a-group/.
Regarding your specific query, since you care only the latest status, you want to first get the latest status from each users. Assuming that the latest status has the latest id (based on your sample), the SQL would be below:
SELECT
MAX(ID), statustext, user_id
FROM
statuses
GROUP BY
user_id
What the above query does is, to get the latest status per user_id. Once you get that, you can think of it as if it's a table. Then simply join on this "table" (the query) instead of the real one (statuses table). Therefore, your query would be like below:
SELECT
users.id, alias, gender, login, logout, users.create_date, statustext as statustxt, TIMESTAMPDIFF(YEAR,birthdate,CURDATE()) AS age
FROM
users
LEFT JOIN (
SELECT
MAX(ID), user_id
FROM
statuses
GROUP BY
user_id
) as s ON users.id = s.user_id
LEFT JOIN statuses ON statuses.ID = s.ID -- EDIT: Added this line.
You might use a subselect as the join, and limit to show only 1 row:
SELECT
users.id,
alias,
gender,
login,
logout,
users.create_date,
statustext as statustxt,
TIMESTAMPDIFF(YEAR,birthdate,CURDATE()) AS age
FROM users
LEFT JOIN (
SELECT user_id,statustext
FROM statuses s1
WHERE s1.user_id = users.id
ORDER BY status_date DESC
LIMIT 1
) s ON users.id = s.user_id
You could use a in clause with a tuple
SELECT
users.id
, alias
, gender
, login
, logout
, users.create_date
, statustext as statustxt
, TIMESTAMPDIFF(YEAR,birthdate,CURDATE()) AS age
FROM users
LEFT JOIN statuses s ON users.id = s.user_id
WHERE (s.user_id, s.ID) in (
SELECT user_id, MAX(s2.ID)
FROM statuses s2
group by user_id
)

how to select the last text from message table

I have two tables (messages and user). I want to select the last (msg_id,text) from the messages table for a particular ad_id and need to select the name of the user from the user table.
SELECT u.id
, m.date
, m.ad_id
, max(m.msg_id)as msg_id
, u.first_name
, m.text
, m.u_to_id
, m.u_from_id
FROM user u
JOIN messages m
ON CASE WHEN m.u_from_id ='14' THEN u.id = m.u_to_id
ELSE u.id = m.u_from_id END
AND (m.u_from_id='14' OR m.u_to_id='14')
AND m.ad_id='20'
GROUP BY CONCAT(m.ad_id,u.id)
ORDER by m.msg_id DESC
this query is working but I can't select t the last m.textTable structure
SELECT u.id, m.text
FROM user u
JOIN messages m ON m.msg_id = (SELECT max(msg_id) FROM messages WHERE u_from_id = u.id)
I simplified your query to show the logic relevant to your question. Basically you want to join your messages table on the msg_id that is equal to the inner query of the max msg_id with that user.
After so many experiments added a new column(bargainer) for identify the recipient and this query working fine for me
select m.msg_id,m.text,m.status,m.date,m.bargainer,m.ad_id,u.first_name,u.id from user u JOIN messages m where msg_id in (select max(msg_id) from messages m where m.ad_id=20 and u.id=m.bargainer group by(m.bargainer))group by(m.msg_id) order by msg_id DESC

Nested Join in Subquery and failing correlation

I have 3 tables sc_user, sc_cube, sc_cube_sent
I wand to join to a user query ( sc_user) one distinct random message/cube ( from sc_cube ), that has not been sent to that user before ( sc_cube_sent), so each row in the result set has a disctinct user id and a random cubeid from sc_cube that is not part of sc_cube_sent with that user id associated there.
I am facing the problem that I seem not to be able to use a correlation id for the case that I need the u.id of the outer query in the inner On clause. I would need the commented section to make it work.
# get one random idcube per user not already sent to that user
SELECT u.id, sub.idcube
FROM sc_user as u
LEFT JOIN (
SELECT c.idcube, sent.idreceiver FROM sc_cube c
LEFT JOIN sc_cube_sent sent ON ( c.idcube = sent.idcube /* AND sent.idreceiver = u.id <-- "unknown column u.id in on clause" */ )
WHERE sent.idcube IS NULL
ORDER BY RAND()
LIMIT 1
) as sub
ON 1
I added a fiddle with some data : http://sqlfiddle.com/#!9/7b0bc/1
new cubeids ( sc_cube ) that should show for user 1 are the following : 2150, 2151, 2152, 2153
Edit>>
I could do it with another subquery instead of a join, but that has a huge performance impact and is not feasible ( 30 secs+ on couple of thousand rows on each table with reasonably implemented keys ), so I am still looking for a way to use the solution with JOIN.
SELECT
u.id,
(SELECT sc_cube.idcube
FROM sc_cube
WHERE NOT EXISTS(
SELECT sc_cube.idcube FROM sc_cube_sent WHERE sc_cube_sent.idcube = sc_cube.idcube AND sc_cube_sent.idreceiver = u.id
)
ORDER BY RAND() LIMIT 0,1
) as idcube
FROM sc_user u
without being able to test this, I would say you need to include your sc_user in the subquery because you have lost the scope
LEFT JOIN
( SELECT c.idcube, sent.idreceiver
FROM sc_user u
JOIN sc_cube c ON c.whatever_your_join_column_is = u.whatever_your_join_column_is
LEFT JOIN sc_cube_sent sent ON ( c.idcube = sent.idcube AND sent.idreceiver = u.id )
WHERE sent.idcube IS NULL
ORDER BY RAND()
LIMIT 1
) sub
If you want to get messagges ids that has not been sent to the particular user, then why use a join or left join at all ?
Just do:
SELECT sent.idcube
FROM sc_cube_sent sent
WHERE sent.idreceiver <> u.id
Then the query may look like this:
SELECT u.id,
/* sub.idcube */
( SELECT sent.idcube
FROM sc_cube_sent sent
WHERE sent.idreceiver <> u.id
ORDER BY RAND()
LIMIT 1
) as idcube
FROM sc_user as u
Got it working with NOT IN subselect in the on clause. Whereas the correlation link u.id is not given within the LEFT JOIN scope, it is for the scope of the ON clause. Here is how it works:
SELECT u.id, sub.idcube
FROM sc_user as u
LEFT JOIN (
SELECT idcube FROM sc_cube c ORDER BY RAND()
) sub ON (
sub.idcube NOT IN (
SELECT s.idcube FROM sc_cube_sent s WHERE s.idreceiver = u.id
)
)
GROUP BY u.id
Fiddle : http://sqlfiddle.com/#!9/7b0bc/48