MySQL subquery select first row for each group - mysql

I need to create a MySQL stored procedure it selects each User SUM of all the Points they've earned.
The query should group Game by StartTime and only select the first row of each group ordered by Points. I'm trying to ignore duplicate StartTime values for each User but still keep the first one. This should avoid cheating if the User saves the same game twice.
If the User hasn't been in any Game, it should still return NULL.
CREATE PROCEDURE `spGetPoints`(
IN _StartDate DATETIME,
IN _EndDate DATETIME,
IN _Limit INT,
IN _Offset INT
)
BEGIN
SELECT `User`.`UserId`, `User`.`Username`,
(SELECT SUM(`Game`.`Points`)
FROM `Game`
WHERE `Game`.`UserId` = `User`.`UserId` AND
`Game`.`StartDate` > _StartDate AND `Game`.`StartDate` < _EndDate
GROUP BY `Game`.`StartDate`
ORDER BY `Game`.`Points` DESC
LIMIT 1
) AS `Value`
FROM `User`
ORDER BY `Value` DESC, `User`.`Username` ASC
LIMIT _Limit OFFSET _Offset;
END
Sample User Table
+--------+----------+
| UserId | Username |
+--------+----------+
| 1 | JaneDoe |
| 2 | JohnDoe |
+--------+----------+
Sample Game Table
+--------+--------+-------------------------+--------+
| GameId | UserId | StartDate | Points |
+--------+--------+-------------------------+--------+
| 1 | 1 | 2019-01-09 12:43:00 AM | 1789 |
| 2 | 1 | 2019-01-09 11:35:00 AM | 1048 |
| 3 | 1 | 2019-01-09 9:22:00 AM | 900 |
| 4 | 1 | 2019-01-09 12:43:00 AM | 1789 |
| 5 | 1 | 2019-01-09 11:35:00 AM | 1048 |
| 6 | 1 | 2019-01-09 9:22:00 AM | 900 |
| 7 | 1 | 2019-01-09 12:43:00 AM | 1789 |
| 8 | 1 | 2019-01-09 11:35:00 AM | 1048 |
| 9 | 2 | 2019-01-17 12:05:00 AM | 552 |
| 10 | 2 | 2019-01-24 12:08:00 AM | 512 |
| 11 | 2 | 2019-01-27 5:13:00 PM | 0 |
+--------+--------+-------------------------+--------+
Current Result
+--------+----------+-------+
| UserId | Username | Value |
+--------+----------+-------+
| 1 | JaneDoe | 5367 |
| 2 | JohnDoe | 552 |
+--------+----------+-------+
Expected Result
+--------+----------+-------+
| UserId | Username | Value |
+--------+----------+-------+
| 1 | JaneDoe | 3737 |
| 2 | JohnDoe | 1064 |
+--------+----------+-------+
I was able to get the expected result with the following statement by selecting the SUM from a subquery and hardcoding the UserId.
SELECT SUM(`x`.`Points`) FROM
(SELECT `Points`
FROM `Game`
WHERE `Game`.`UserId` = 1 AND
`Game`.`StartDate` > STR_TO_DATE('01/09/2019', '%m/%d/%Y') AND `Game`.`StartDate` < STR_TO_DATE('02/09/2019', '%m/%d/%Y')
GROUP BY `Game`.`StartDate`
ORDER BY `Game`.`Points` ASC) AS `x`;
When I try to put that statement in a subquery like in the following statement, I get this error message Error Code: 1054. Unknown column 'User.UserId' in 'where clause'. I'm getting this error because the UserId isn't visible in the second subquery.
SELECT `User`.`UserId`, `User`.`Username`,
(SELECT SUM(`x`.`Points`) FROM (SELECT `Game`.`Points`
FROM `Game`
WHERE `Game`.`UserId` = `User`.`UserId` AND
`Game`.`StartDate` > STR_TO_DATE('01/09/2019', '%m/%d/%Y') AND `Game`.`StartDate` < STR_TO_DATE('02/09/2019', '%m/%d/%Y')
GROUP BY `Game`.`StartDate`
ORDER BY `Game`.`Points` DESC) AS `x`
) AS `Value`
FROM `User`
ORDER BY `Value` DESC, `User`.`Username` ASC;

I changed the query to use LEFT JOIN on Game. I also added GROUP BY 'Game'.'UserId', 'Game'.'StartDate' and GROUP BY 'User'.'UserId'.
CREATE PROCEDURE `spGetPoints`(
IN _StartDate DATETIME,
IN _EndDate DATETIME,
IN _Limit INT,
IN _Offset INT
)
BEGIN
SELECT `User`.`UserId`, `User`.`Username`,
SUM(`Game`.`Points`) AS `Value`
FROM `User`
LEFT JOIN (SELECT *
FROM `Game`
WHERE `Game`.`StartDate` > _StartDate AND `Game`.`StartDate` < _EndDate
GROUP BY `Game`.`UserId`, `Game`.`StartDate`
ORDER BY `Game`.`Points`
) AS `Game` ON `User`.`UserId` = `Game`.`UserId`
GROUP BY `User`.`UserId`
ORDER BY `Value` DESC, `User`.`Username` ASC
LIMIT _Limit OFFSET _Offset;
END
This link also helped. Select first row in each GROUP BY group?

Related

Recovery the missing hour, minute interval MySQL database

This is part of my table on MySQL database
+----------+---------------------+--------+
| sID | sDatetime | sETX |
+----------+---------------------+--------+
| 16213404 | 2020-04-24 16:00:00 | 497681 |
| 16213398 | 2020-04-20 14:58:56 | 281011 |
+----------+---------------------+--------+
This table count with 14.121.398 records
I realized that in this case more than one hour has passed between the previous and the next row
mysql> SELECT
TIMEDIFF(
'2020-04-20 16:00:00',
'2020-04-20 14:58:56'
);
+------------------------------------------------------------------+
| TIMEDIFF(
'2020-04-20 16:00:00',
'2020-04-20 14:58:56'
) |
+------------------------------------------------------------------+
| 01:01:04 |
+------------------------------------------------------------------+
1 row in set
this is not possible because the data is downloaded maximum from the source every five minutes
in this case is missing the time slot between 3pm and 4pm
I have tried this query without success because the return is all zero
I think because the sID is not consecutive
The code I've tried below
SELECT A.`sID`, A.`sDatetime`, (B.`sDatetime` - A.`sDatetime`) AS timedifference
FROM tbl_2020 A INNER JOIN tbl_2020 B ON B.sID = (A.sID + 1)
ORDER BY A.sID ASC;
how can i find this anomaly in mysql table?
my version of MySQL is 5.5.62-log
the name of column is sDatetime the type is Datetime.
any suggestion, please?
thanks in advance for any help
edit #01
+----------+-----------+---------------------+
| sID | time_diff | sDatetime |
+----------+-----------+---------------------+
| 18389322 | 301 | 2020-05-16 23:53:29 |
| 18390472 | 308 | 2020-05-16 23:48:21 |
| 18389544 | 301 | 2020-05-16 23:43:20 |
| 18388687 | 303 | 2020-05-16 23:38:17 |
| 18388398 | 301 | 2020-05-16 23:33:16 |
| 18390451 | 308 | 2020-05-16 23:28:08 |
| 18388915 | 302 | 2020-05-16 23:23:06 |
| 18388208 | 301 | 2020-05-16 23:18:05 |
| 18390516 | 301 | 2020-05-16 23:13:04 |
| 18389904 | 301 | 2020-05-16 23:08:03 |
+----------+-----------+---------------------+
mysql> SELECT
TIMEDIFF(
'2020-05-16 23:53:29',
'2020-05-16 23:48:21'
) AS td;
+----------+
| td |
+----------+
| 00:05:08 |
+----------+
1 row in set
You should try something like this
SELECT
sID
,TIME_TO_SEC(TIMEDIFF(#date,sDatetime)) time_diff
,#date := sDatetime
,sETX
FROM(
SELECT * FROM table1
ORDER BY sDatetime DESC) s1,(SELECT #date :=(SELECT MAX(sDatetime) FROM table1)) s2
HAVING time_diff > 300
First you order the table by time, then you get the time difference between two consecutive rows and check if they are bigger than 5 minutes
see example here https://www.db-fiddle.com/f/2yKt6d5RWngXVYJKPGZL6m/8
Comparing current row to previous works
drop table if exists t;
create table t
(sID int, sDatetime datetime, sETX int);
insert into t values
( 16213404 , '2020-04-24 16:00:00' , 497681),
( 16213398 , '2020-04-20 14:58:56' , 281011);
select sid,sdatetime,(select sdatetime from t t1 where t1.sid < t.sid order by t1.sid desc limit 1) prevdt,
time_to_sec(sdatetime) - time_to_sec((select sdatetime from t t1 where t1.sid < t.sid order by t1.sid desc limit 1)) diff
from t
where time_to_sec(sdatetime) - time_to_sec((select sdatetime from t t1 where t1.sid < t.sid order by t1.sid desc limit 1)) > 300;
+----------+---------------------+---------------------+------+
| sid | sdatetime | prevdt | diff |
+----------+---------------------+---------------------+------+
| 16213404 | 2020-04-24 16:00:00 | 2020-04-20 14:58:56 | 3664 |
+----------+---------------------+---------------------+------+
1 row in set (0.002 sec)
If this is too slow add your table definition so that we can see the indexes you have.

Find and Delete Duplicate rows in MySQL

I'm having trouble finding duplicates in a database table with the following setup:
==========================================================================
| stock_id | product_id | store_id | stock_qty | updated_at |
==========================================================================
| 9990 | 51 | 1 | 13 | 2014-10-25 16:30:01 |
| 9991 | 90 | 2 | 5 | 2014-10-25 16:30:01 |
| 9992 | 161 | 1 | 3 | 2014-10-25 16:30:01 |
| 9993 | 254 | 1 | 18 | 2014-10-25 16:30:01 |
| 9994 | 284 | 2 | 12 | 2014-10-25 16:30:01 |
| 9995 | 51 | 1 | 11 | 2014-10-25 17:30:02 |
| 9996 | 90 | 2 | 5 | 2014-10-25 17:30:02 |
| 9997 | 161 | 1 | 3 | 2014-10-25 17:30:02 |
| 9998 | 254 | 1 | 16 | 2014-10-25 17:30:02 |
| 9999 | 284 | 2 | 12 | 2014-10-25 17:30:02 |
==========================================================================
Stock updates are imported into this table every hour, I'm trying to find duplicate stock entries (any rows which have a matching product id and store id) so I can delete the oldest. The query below is my attempt, by comparing product ids and store ids on a join like this I can find one set of duplicates:
SELECT s.`stock_id`, s.`product_id`, s.`store_id`, s.`stock_qty`, s.`updated_at`
FROM `stock` s
INNER JOIN `stock` j ON s.`product_id`=j.`product_id` AND s.`store_id`=j.`store_id`
GROUP BY `stock_id`
HAVING COUNT(*) > 1
ORDER BY s.updated_at DESC, s.product_id ASC, s.store_id ASC, s.stock_id ASC;
While this query will work, it doesn't find ALL duplicates, only 1 set, which means if an import goes awry and isn't noticed until the morning, there's a possibility that we'll be left with tons of duplicate stock entries. My MySQL skills are sadly lacking and I'm at a complete loss about how to find and delete all duplicates in a fast, reliable manner.
Any help or ideas are welcome. Thanks
You can use this query:
DELETE st FROM stock st, stock st2
WHERE st.stock_id < st2.stock_id AND st.product_id = st2.product_id AND
st.store_id = st2.store_id;
This query will delete older record having same product_id and store_id and will keep latest record.
A self join on store_id, product_id and 'is older' in combination with DISTINCT should give you all rows where also a newer version exists:
> SHOW CREATE TABLE stock;
CREATE TABLE `stock` (
`stock_id` int(11) NOT NULL,
`product_id` int(11) DEFAULT NULL,
`store_id` int(11) DEFAULT NULL,
`stock_qty` int(11) DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`stock_id`)
> select * from stock;
+----------+------------+----------+-----------+---------------------+
| stock_id | product_id | store_id | stock_qty | updated_at |
+----------+------------+----------+-----------+---------------------+
| 1 | 1 | 1 | 1 | 2001-01-01 12:00:00 |
| 2 | 2 | 2 | 1 | 2001-01-01 12:00:00 |
| 3 | 2 | 2 | 1 | 2002-01-01 12:00:00 |
+----------+------------+----------+-----------+---------------------+
> SELECT DISTINCT s1.stock_id, s1.store_id, s1.product_id, s1.updated_at
FROM stock s1 JOIN stock s2
ON s1.store_id = s2.store_id
AND s1.product_id = s2.product_id
AND s1.updated_at < s2.updated_at;
+----------+----------+------------+---------------------+
| stock_id | store_id | product_id | updated_at |
+----------+----------+------------+---------------------+
| 2 | 2 | 2 | 2001-01-01 12:00:00 |
+----------+----------+------------+---------------------+
> DELETE stock FROM stock
JOIN stock s2 ON stock.store_id = s2.store_id
AND stock.product_id = s2.product_id
AND stock.updated_at < s2.updated_at;
Query OK, 1 row affected (0.02 sec)
> select * from stock;
+----------+------------+----------+-----------+---------------------+
| stock_id | product_id | store_id | stock_qty | updated_at |
+----------+------------+----------+-----------+---------------------+
| 1 | 1 | 1 | 1 | 2001-01-01 12:00:00 |
| 3 | 2 | 2 | 1 | 2002-01-01 12:00:00 |
+----------+------------+----------+-----------+---------------------+
Or you can use a stored Procedure:
DELIMITER //
DROP PROCEDURE IF EXISTS removeDuplicates;
CREATE PROCEDURE removeDuplicates(
stockID INT
)
BEGIN
DECLARE stockToKeep INT;
DECLARE storeID INT;
DECLARE productID INT;
-- gets the store and product value
SELECT DISTINCT store_id, product_id
FROM stock
WHERE stock_id = stockID
LIMIT 1
INTO
storeID, productID;
SELECT stock_id
FROM stock
WHERE product_id = productID AND store_id = storeID
ORDER BY updated_at DESC
LIMIT 1
INTO
stockToKeep;
DELETE FROM stock
WHERE product_id = productID AND store_id = storeID
AND stock_id != stockToKeep;
END //
DELIMITER ;
And afterwards call it for every pair of the product id and store id via a cursor procedure:
DELIMITER //
CREATE PROCEDURE updateTable() BEGIN
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE stockID INT UNSIGNED;
DECLARE cur CURSOR FOR SELECT DISTINCT stock_id FROM stock;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
OPEN cur;
testLoop: LOOP
FETCH cur INTO stockID;
IF done THEN
LEAVE testLoop;
END IF;
CALL removeDuplicates(stockID);
END LOOP testLoop;
CLOSE cur;
END//
DELIMITER ;
And then just call the second procedure
CALL updateTable();

SQL Help: How come the total from this query is different that a summation query?

This query does a group by on lead_source_id:
SELECT ch.lead_source_id,
Count(DISTINCT ch.repurchased_date)
FROM customers_history ch
WHERE ch.repurchased_date >= '2014-04-01'
AND ch.repurchased_date < '2014-05-01'
AND ch.lead_source_id IS NOT NULL
GROUP BY ch.lead_source_id;
And this query totals the records in the table:
SELECT Count(DISTINCT( repurchased_date ))
FROM customers_history
INNER JOIN (SELECT DISTINCT( customer_id ) AS xcid
FROM customers_history
WHERE repurchased_date >= '2014-04-01'
AND repurchased_date < '2014-05-01'
AND lead_source_id IS NOT NULL) AS Temp
ON Temp.xcid = customer_id
WHERE repurchased_date >= '2014-04-01'
AND repurchased_date < '2014-05-01'
AND lead_source_id IS NOT NULL;
On our production data, the totals from Query1 come to 7963, but the second query prints 7905. Why the difference and how can we fix our queries?
Here's our table layout:
+--------+-------------+----------------+---------------------+--------+
| id | customer_id | lead_source_id | repurchased_date | Rating |
+--------+-------------+----------------+---------------------+--------+
| 422923 | 420450 | 4 | 2014-04-14 09:16:48 | Warm |
| 422924 | 420450 | 4 | 2014-04-14 09:16:48 | Cold |
| 422956 | 420450 | 4 | 2014-04-14 09:16:49 | Hot |
| 422933 | 420451 | 37 | 2014-04-14 09:18:41 | Hot |
| 422938 | 420452 | 1 | 2014-04-10 20:50:30 | Hot |
| 422984 | 420452 | 1 | 2014-04-12 20:50:30 | Warm |
| 422940 | 420453 | 47 | 2014-04-14 09:20:27 | Hot |
+--------+-------------+----------------+---------------------+--------+
EDIT
To answer some of the possibilities about nulls:
select count(id) from customers_history where customer_id is null: 0
select count(id) from customers_history where lead_source_id is null: 5103
select count(id) from customers_history where repurchased_date is null: 0
The most obvious conclusion is that some lead_source_ids share values of repurchased_date.
Another possibility is that you have NULL values for customer_id and the second filters these out.
The third possibility is that NULL values of lead_source_id are adding additional values in the first query.

Calculation involving repeated items in MySQL table

I have a table with a composite primary key on EID (event ID) and start_time. I have another column called attending.
Users make their events more popular by reusing the event ID and changing the date, however, I create a new line in the database in this instance.
I would like to create a 4th column, actual_attending which is equal to the attending value minus the previous event's attending value. If their is no previous ID, the column can be null. How can I calculate this via update.
Here is a sqlfiddle as an example: http://sqlfiddle.com/#!2/43f2c5
update event e1
set e1.actual_attending = (select e1.attending - e2.attending
from event e2
where e2.eid(+) = e1.previous_eid
)
SELECT a.*
, a.attending-b.attending new_actual_attending
FROM
( SELECT x.*
, COUNT(*) rank
FROM event x
JOIN event y
ON y.eid = x.eid
AND y.start_time <= x.start_time
GROUP
BY eid, start_time
) a
LEFT
JOIN
( SELECT x.*
, COUNT(*) rank
FROM event x
JOIN event y
ON y.eid = x.eid
AND y.start_time <= x.start_time
GROUP
BY eid, start_time
) b
ON b.eid = a.eid
AND b.rank = a.rank - 1;
+-----+------------+-----------+------------------+------+----------------------+
| eid | start_time | attending | actual_attending | rank | new_actual_attending |
+-----+------------+-----------+------------------+------+----------------------+
| 1 | 2013-06-08 | 29 | NULL | 1 | NULL |
| 2 | 2013-06-09 | 72 | NULL | 1 | NULL |
| 2 | 2013-06-16 | 104 | NULL | 2 | 32 |
| 3 | 2013-06-07 | 224 | NULL | 1 | NULL |
| 3 | 2013-06-14 | 222 | NULL | 2 | -2 |
+-----+------------+-----------+------------------+------+----------------------+
http://sqlfiddle.com/#!2/43f2c5/2

How to make this query run faster?

I have query like this:
SELECT
`om_chapter`.`manganame` as `link`,
(SELECT `manganame` FROM `om_manga` WHERE `Active` = '1' AND `om_manga`.`link` = `om_chapter`.`manganame` LIMIT 0,1) AS `manganame`,
(SELECT `cover` FROM `om_manga` WHERE `Active` = '1' AND `om_manga`.`link` = `om_chapter`.`manganame` LIMIT 0,1) AS `cover`,
(SELECT `othername` FROM `om_manga` WHERE `Active` = '1' AND `om_manga`.`link` = `om_chapter`.`manganame` LIMIT 0,1) AS `othername`
FROM `om_chapter`
WHERE
`Active` = '1' AND
(SELECT `Active` From `om_manga` WHERE `om_manga`.`link` = `om_chapter`.`manganame` LIMIT 0,1) AND
`id` IN ( SELECT MAX(`id`) FROM `om_chapter` WHERE `Active` = '1' GROUP BY `manganame` )
ORDER BY `id` DESC LIMIT 10
So how can I make this query faster?
Here are my tables:
om_chapter:
id | manganame | chapter | Active
-----------------------------------------
1 | naruto | 1 | 1
2 | naruto | 12 | 1
3 | naruto | 22 | 1
4 | bleach | 10 | 1
5 | bleach | 15 | 1
6 | gents | 1 | 1
7 | naruto | 21 | 1
om_manga:
id | othername | manganame | cover | Active
-----------------------------------------------------
1 | naruto | naruto | n.jpg | 1
2 | bleach | bleach | b.jpg | 1
4 | gents | gents | g.jpg | 1
First thing i want form this query is to give me 10 last rows form om_chapter by grouping manganame and ordering by id.. i try to use a simple query by using group or even distinct but none of them give me the right result...
In a simple query with group or distinct, the result is like this:
id | manganame | chapter | Active
-----------------------------------------
7 | prince | 21 | 1
5 | gent | 15 | 1
2 | naruto | 12 | 1
1 | bleach | 1 | 1
But i want this result:
id | manganame | chapter | Active
-----------------------------------------
9 | gents | 21 | 1
8 | bleach | 21 | 1
7 | prince | 21 | 1
6 | naruto | 1 | 1
So i use this:
WHERE
`Active` = '1' AND
(SELECT `Active` From `om_manga` WHERE `om_manga`.`link` = `om_chapter`.`manganame` LIMIT 0,1) AND
`id` IN ( SELECT MAX(`id`) FROM `om_chapter` WHERE `Active` = '1' GROUP BY `manganame` )
And i use sub select in where because i want Active's field in om_manga's table be 1..
For the reset of sub select, i actually didn`t try join, but i will..!
I might have misunderstood your intentions.. But here's one try:
SELECT c.`manganame` AS `link`
, m.`manganame`
, m.`cover`
, m.`othername`
FROM
`om_manga` m
INNER JOIN `om_chapter` c
ON m.`link` = c.`manganame`
INNER JOIN
( SELECT `manganame`, MAX(`id`) AS `maxid`
FROM `om_chapter`
WHERE `Active` = '1'
GROUP BY `manganame` ) mx
ON mx.`maxid` = c.`id`
ORDER BY c.`id` DESC LIMIT 10
I would introduce a foreign key contstrain to the om_chapter table to account for the link from a manga to its corresponding chapters.
This is how I would conceptualize the problem.
A manga can have many chapters. A chapter belongs to one manga.
Then I would alter the om_chapter table, to include a foreign key for the chapter to link to the manga.
ALTER TABLE om_Chapter (
ADD mangaID int references om_Manga (id)
)
And drop the manganame column as it is just redundant now
ALTER TABLE om_Chapter (
DROP COLUMN manganame
)
Your tables then could look like this.
om_manga:
id | othername | manganame | cover | Active
-----------------------------------------------------
1 | naruto | naruto | n.jpg | 1
2 | bleach | bleach | b.jpg | 1
4 | gents | gents | g.jpg | 1
om_chapter:
id | chapter | Active | mangaID
-----------------------------------------
1 | 1 | 1 | 1
2 | 12 | 1 | 1
3 | 22 | 1 | 1
4 | 10 | 1 | 2
5 | 15 | 1 | 2
6 | 1 | 1 | 4
Finally you could query the tables like so
SELECT TOP 10 m.Manganame as link,
m.Manganame,
m.cover,
m.othername,
FROM om_manga as m INNER JOIN
om_chapter as c ON m.ID = c.mangaID
WHERE m.active = 1 AND c.active = 1
ORDER BY m.ID DESC
Why not a simple join?
SELECT om_chapter.manganame, cover, othername
FROM om_chapter
JOIN om_manga ON om_chapter.manganame = om_manga=manganame
WHERE om_chapter.Active = 1 AND om_manga.Active = 1
unless I'm misreading your version.
Use a left outer join (and lose the sub-queries, and the back-quotes):
SELECT c.manganame AS link,
m.manganame AS manganame,
m.cover AS cover,
m.othername AS `othername
FROM om_chapter AS c
LEFT JOIN om_manga AS m
ON c.manganame = m.manganame
WHERE c.Active = '1'
AND c.id IN (SELECT MAX(o.id)
FROM om_chapter AS o
WHERE o.active = 1
GROUP BY o.manganame)
ORDER BY c.id DESC LIMIT 10
Were it my query, I'd probably select 'c.id AS id' too.