SQL Group by combination? - mysql

I am having problems selecting items from a table where a device_id can be either in the from_device_id column or the to_device_id column. I am trying to return all chats where the given device is ID is in the from_device_id or to_device_id columns, but only return the latest message.
select chat.*, (select screen_name from usr where chat.from_device_id=usr.device_id limit 1) as from_screen_name, (select screen_name from usr where chat.to_device_id=usr.device_id limit 1) as to_screen_name from chat where to_device_id="ffffffff-af28-3427-a2bc-83865900edbe" or from_device_id="ffffffff-af28-3427-a2bc-83865900edbe" group by from_device_id, to_device_id;
+----+--------------------------------------+--------------------------------------+---------+---------------------+------------------+----------------+
| id | from_device_id | to_device_id | message | date | from_screen_name | to_screen_name |
+----+--------------------------------------+--------------------------------------+---------+---------------------+------------------+----------------+
| 20 | ffffffff-af28-3427-a2bc-83860033c587 | ffffffff-af28-3427-a2bc-83865900edbe | ee | 2011-02-28 12:36:38 | kevin | handset |
| 1 | ffffffff-af28-3427-a2bc-83865900edbe | ffffffff-af28-3427-a2bc-83860033c587 | yyy | 2011-02-27 17:43:17 | handset | kevin |
+----+--------------------------------------+--------------------------------------+---------+---------------------+------------------+----------------+
2 rows in set (0.00 sec)
As expected, two rows are returned. How can I modify this query to only return one row?
mysql> describe chat;
+----------------+---------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| from_device_id | varchar(128) | NO | | NULL | |
| to_device_id | varchar(128) | NO | | NULL | |
| message | varchar(2048) | NO | | NULL | |
| date | timestamp | YES | | CURRENT_TIMESTAMP | |
+----------------+---------------+------+-----+-------------------+----------------+
5 rows in set (0.00 sec)

select chat.*,
(select screen_name
from usr
where chat.from_device_id=usr.device_id
limit 1
) as from_screen_name,
(select screen_name
from usr
where chat.to_device_id=usr.device_id
limit 1
) as to_screen_name
from chat
where to_device_id="ffffffff-af28-3427-a2bc-83865900edbe" or
from_device_id="ffffffff-af28-3427-a2bc-83865900edbe"
group by from_device_id, to_device_id
order by date DESC
limit 1;
You need to tell SQL that it should sort the returned data by date to get the most recent chat. Then you just limit the returned rows to 1.

You shouldn't need to use a Group By at all. Rather, you can simply use the Limit predicate to return the last row. In addition, you shouldn't need subqueries as you can use simply Joins. If chat.from_device_id and chat.to_device_id are both not-nullable, then you can replace the Left Joins with Inner Joins.
Select chat.id
, chat.from_device_id
, chat.to_device_id
, chat.message
, chat.date
, FromUser.screen_name As from_screen_nam
, ToUser.screen_name As to_screen_name
From chat
Left Join usr As FromUser
On FromUser.device_id = chat.from_device_id
Left Join usr As ToUser
On ToUser.device_id = chat.to_device_id
Where chat.to_device_id="ffffffff-af28-3427-a2bc-83865900edbe"
Or chat.from_device_id="ffffffff-af28-3427-a2bc-83865900edbe"
Order By chat.date Desc
Limit 1

Related

Delete rows with null values mysql

I want to delete the rows with null values in the column
How can i delete it?
SELECT employee.Name,
`department`.NUM,
SALARY
FROM employee
LEFT JOIN `department` ON employee.ID = `department`.ID
ORDER BY NUM;
+--------------------+-------+----------+
| Name | NUM | SALARY |
+--------------------+-------+----------+
| Gallegos | NULL | NULL |
| Lara | NULL | NULL |
| Kent | NULL | NULL |
| Lena | NULL | NULL |
| Flores | NULL | NULL |
| Alexandra | NULL | NULL |
| Hodge | 8001 | 973.45 |
+--------------------+-------+----------+
Should be like this
+--------------------+-------+----------+
| Name | NUM | SALARY |
+--------------------+-------+----------+
| | | |
| Hodge | 8001 | 973.45 |
+--------------------+-------+----------+
You are asking to delete, but to me it seems more like removing nulls from the result of select statement, if so use:
SELECT employee.Name,
`department`.NUM,
SALARY
FROM employee
LEFT JOIN `department` ON employee.ID = `department`.ID
WHERE (`department`.NUM IS NOT NULL AND SALARY IS NOT NULL)
ORDER BY NUM;
Note: The parentheses are not required but it’s good practice to enclose grouped comparators for better readability.
The above query will exclude the even if the NUM column is not null and the SALARY column is null and vice versa
If by deleting you mean that you don't want to see rows with null values in your table, you can use INNER JOIN instead of LEFT JOIN.
You use INNER JOIN when you want to return only records having pair on both sides, and you'll use LEFT JOIN when you need all records from the “left” table, no matter if they have pair in the “right” table or not.
You can learn more here.

Performant way to self-join and filter by revised rows

I'm trying to select all rows in this table, with the constraint that revised id's are selected instead of the original ones. So, if a row has a revision, that revision is selected instead of that row, if there are multiple revision numbers the highest revision number is preferred.
I think an example table, output, and query will explain this better:
Table:
+----+-------+-------------+-----------------+-------------+
| id | value | original_id | revision_number | is_revision |
+----+-------+-------------+-----------------+-------------+
| 1 | abcd | null | null | 0 |
| 2 | zxcv | null | null | 0 |
| 3 | qwert | null | null | 0 |
| 4 | abd | 1 | 1 | 1 |
| 5 | abcde | 1 | 2 | 1 |
| 6 | zxcvb | 2 | 1 | 1 |
| 7 | poiu | null | null | 0 |
+----+-------+-------------+-----------------+-------------+
Desired Output:
+----+-------+-------------+-----------------+
| id | value | original_id | revision_number |
+----+-------+-------------+-----------------+
| 3 | qwert | null | null |
| 5 | abcde | 1 | 2 |
| 6 | zxcvb | 2 | 1 |
| 7 | poiu | null | null |
+----+-------+-------------+-----------------+
View Called revisions_max:
SELECT
responses.original_id AS original_id,
MAX(responses.revision_number) AS revision
FROM
responses
WHERE
original_id IS NOT NULL
GROUP BY responses.original_id
My Current Query:
SELECT
responses.*
FROM
responses
WHERE
id NOT IN (
SELECT
original_id
FROM
revisions_max
)
AND
is_revision = 0
UNION
SELECT
responses.*
FROM
responses
INNER JOIN revisions_max ON revisions_max.original_id = responses.original_id
AND revisions_max.revision_number = responses.revision_number
This query works, but takes 0.06 seconds to run. With a table of only 2000 rows. This table will quickly start expanding to tens or hundreds of thousands of rows. The query under the union is what takes most of the time.
What can I do to improve this queries performance?
How about using coalesce()?
SELECT COALESCE(y.id, x.id) AS id,
COALESCE(y.value, x.value) AS value,
COALESCE(y.original_id, x.original_id) AS original_id,
COALESCE(y.revision_number, x.revision_number) AS revision_number
FROM responses x
LEFT JOIN (SELECT r1.*
FROM responses r1
INNER JOIN (SELECT responses.original_id AS
original_id,
Max(responses.revision_number) AS
revision
FROM responses
WHERE original_id IS NOT NULL
GROUP BY responses.original_id) rev
ON r1.original_id = rev.original_id
AND r1.revision_number = rev.revision) y
ON x.id = y.original_id
WHERE y.id IS NOT NULL
OR x.original_id IS NULL;
The approach I would take with any other DBMS is to use NOT EXISTS:
SELECT r1.*
FROM Responses AS r1
WHERE NOT EXISTS
( SELECT 1
FROM Responses AS r2
WHERE r2.original_id = COALESCE(r1.original_id, r1.id)
AND r2.revision_number > COALESCE(r1.revision_number, 0)
);
To remove any rows where a higher revision number exists for the same id (or original_id if it is populated). However, in MySQL, LEFT JOIN/IS NULL will perform better than NOT EXISTS1. As such I would rewrite the above as:
SELECT r1.*
FROM Responses AS r1
LEFT JOIN Responses AS r2
ON r2.original_id = COALESCE(r1.original_id, r1.id)
AND r2.revision_number > COALESCE(r1.revision_number, 0)
WHERE r2.id IS NULL;
Example on DBFiddle
I realise that you have said that you don't want to use LEFT JOIN and check for nulls, but I don't see that there is a better solution.
1. At least this was the case historically, I don't actively use MySQL so don't keep up to date with developments in the optimiser

Mysql use query result in new query

So I have two tables one called points_log and one called leaderboard.
mysql> describe points_log;
+---------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+-------+
| user_id | int(11) | NO | | NULL | |
| points | int(11) | YES | | 0 | |
| date | date | NO | | NULL | |
+---------+---------+------+-----+---------+-------+
3 rows in set (0.00 sec)
mysql> describe leaderboard;
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| bucket | varchar(255) | YES | | NULL | |
| user_id | int(11) | YES | | NULL | |
| school_id | int(11) | YES | | NULL | |
+-----------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
I have the following query:
SELECT leaderboard.user_id FROM leaderboard where
leaderboard.bucket=(SELECT bucket FROM leaderboard WHERE leaderboard.user_id=$user_id) AND
leaderboard.school_id = (SELECT school_id FROM leaderboard WHERE leaderboard.user_id=$user_id)
This will return one or more rows with user_id's that are in the bucket with $user_id passed in. What I want to do is take all of those user_id's and find run the following query
SELECT sum(points) FROM points_log WHERE user_id=$user_id AND
date >= (SELECT subdate(curdate(), INTERVAL (weekday(now())) DAY))
The issue is this second query if not guaranteed to return something, so in the case that it doesn't return anything I want sum(points) to be 0. I also need to return the user_id,bucket, and sum(points) for each row.
Right now what I have is
SELECT leaderboard.user_id,sum(points_log.points) AS points, leaderboard.bucket
FROM points_log LEFT JOIN leaderboard ON points_log.user_id = leaderboard.user_id
WHERE points_log.DATE >= (SELECT subdate(curdate(), INTERVAL (weekday(now())) DAY))
AND leaderboard.bucket=(SELECT bucket FROM leaderboard WHERE leaderboard.user_id=$user_id)
AND leaderboard.school_id = (SELECT school_id FROM leaderboard WHERE leaderboard.user_id=$user_id)
GROUP BY USER_ID ORDER BY SUM(points) DESC
The issue with this is that it only works when there is a value in points_log for that user. I'm unsure how to make it default to 0 if there is no value.
Any help is greatly appreciated!
SELECT leaderboard.user_id, COALESCE( sum(points_log.points), 0 )AS points, leaderboard.bucket
FROM points_log RIGTH OUTER JOIN leaderboard ON points_log.user_id = leaderboard.user_id
WHERE points_log.DATE >= (SELECT subdate(curdate(), INTERVAL (weekday(now())) DAY))
AND leaderboard.bucket=(SELECT bucket FROM leaderboard WHERE leaderboard.user_id=$user_id)
AND leaderboard.school_id = (SELECT school_id FROM leaderboard WHERE leaderboard.user_id=$user_id)
GROUP BY USER_ID ORDER BY SUM(points) DESC
Try this... note the Outer Join and the COALESCE function.

Get all rows from a table for a particular user along with sum

I have a table called real_estate its structure and data is as follows:-
| id | user_id | details | location | worth
| 1 | 1 | Null | Null | 10000000
| 2 | 1 | Null | Null | 20000000
| 3 | 2 | Null | Null | 10000000
My query is the folloeing:
SELECT * , SUM( worth ) as sum
FROM real_estate
WHERE user_id = '1'
The result which I get from this query is
| id | user_id | details | location | worth | sum
| 1 | 1 | Null | Null | 10000000 | 30000000
I want result to be like
| id | user_id | details | location | worth | sum
| 1 | 1 | Null | Null | 10000000 | 30000000
| 2 | 1 | Null | Null | 20000000 | 30000000
Is there any way to get the result the way I want or should I write 2 different queries?
1)To get the sum of worth
2)To get all the rows for that user
You need to use a subquery that calculates the sum for every user, and then JOIN the result of the subquery with your table:
SELECT real_estate.*, s.user_sum
FROM
real_estate INNER JOIN (SELECT user_id, SUM(worth) AS user_sum
FROM real_estate
GROUP BY user_id) s
ON real_estate.user_id = s.user_id
WHERE
user_id = '1'
but if you just need to return records for a single user, you could use this:
SELECT
real_estate.*,
(SELECT SUM(worth) FROM real_estate WHERE user_id='1') AS user_sum
FROM
real_estate
WHERE
user_id='1'
You can do your sum in a subquery like this
SELECT * , (select SUM(worth) from real_estate WHERE user_id = '1' ) as sum
FROM real_estate WHERE user_id = '1'
Group by id
SELECT * , SUM( worth ) as sum FROM real_estate WHERE user_id = '1' group by id

Optimize MySQL nested select with arithmetic operation

I have this sql query running on MySQL 5.1 non-normalized table. It works the way i want it to, but it can be quite slow. I added an index on the day column but it still needs to be faster. Any suggestions on how to get this faster? (maybe with a join instead?)
SELECT DISTINCT(bucket) AS b,
(possible_free_slots -
(SELECT COUNT(availability)
FROM ip_bucket_list
WHERE bucket = b
AND availability = 'used'
AND tday = 'evening'
AND day LIKE '2012-12-14%'
AND network = '10_83_mh1_bucket')) AS free_slots
FROM ip_bucket_list
ORDER BY free_slots DESC;
The individual queries are fast:
SELECT DISTINCT(bucket) FROM ip_bucket_list;
1024 rows in set (0.05 sec)
SELECT COUNT(availability) from ip_bucket_list WHERE bucket = 0 AND availability = 'used' AND tday = 'evening' AND day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket';
1 row in set (0.00 sec)
Table:
mysql> describe ip_bucket_list;
+---------------------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ip | varchar(50) | YES | | NULL | |
| bucket | int(11) | NO | MUL | NULL | |
| availability | varchar(20) | YES | | NULL | |
| network | varchar(100) | NO | MUL | NULL | |
| possible_free_slots | int(11) | NO | | NULL | |
| tday | varchar(20) | YES | | NULL | |
| day | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
+---------------------+--------------+------+-----+-------------------+----------------+
and the DESC:
DESC SELECT DISTINCT(bucket) as b,(possible_free_slots - (SELECT COUNT(availability) from ip_bucket_list WHERE bucket = b AND availability = 'used' AND tday = 'evening' AND day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket')) as free_slots FROM ip_bucket_list ORDER BY free_slots DESC;
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
| 1 | PRIMARY | ip_bucket_list | ALL | NULL | NULL | NULL | NULL | 328354 | Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | ip_bucket_list | ref | bucket,network,ip_bucket_list_day_index | bucket | 4 | func | 161 | Using where |
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
I would move the correlated subquery from the SELECT clause into the FROM clause, using a join:
SELECT distinct bucket as b,
(possible_free_slots - a.avail) as free_slots
FROM ip_bucket_list ipbl left outer join
(SELECT bucket COUNT(availability) as avail
from ip_bucket_list
WHERE availability = 'used' AND tday = 'evening' AND
day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket'
) on a
on ipbl.bucket = avail.bucket
ORDER BY free_slots DESC;
The version in the SELECT clause is probably being re-run for every row (even before the distinct is running). By putting it in the from clause, the ip_bucket_list table will be scanned only once.
Also, if you are expecting each bucket to only show up once, then I would recommend that you use group by rather than distinct. It would clarify the purpose of the query. You may be able to eliminate the second reference to the table altogether, with something like:
SELECT bucket as b,
max(possible_free_slots -
(case when availability = 'used' AND tday = 'evening' AND
day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket'
then 1 else 0
end)
) as free_slots
FROM ip_bucket_list
group by bucket
ORDER BY free_slots DESC;
To speed up your version of the query, you need an index on bucket, because this is used for the correlated subquery.
Try moving the subquery into the main query - like so:
SELECT b.bucket AS b,
b.possible_free_slots - COUNT(l.availability) AS free_slots
FROM ip_bucket_list b
LEFT JOIN ip_bucket_list l
ON l.bucket = b.bucket
AND l.availability = 'used'
AND l.tday = 'evening'
AND l.day LIKE '2012-12-14%'
AND l.network = '10_83_mh1_bucket'
GROUP BY b.bucket, b.possible_free_slots
ORDER BY 2 DESC