This question already has answers here:
MySQL Select Latest Row of Specific Value
(2 answers)
Closed 2 years ago.
In a project I am working on I have the table you can see below. On the frontend I need to show only the records that are published grouped by the entity_id. For example, in the example below only id 1, 11, 16 and 19 should be shown. I have no idea how to make this query. I tried several things with subqueries etc but none of them work. I guess there should be a way to retrieve this data. What am I missing?
| id | revision | entity_id | status
========================================
| 1 | 1 | 1 | published
| 2 | 2 | 1 | archived
| 3 | 1 | 2 | draft
| 4 | 2 | 2 | draft
| 5 | 3 | 2 | draft
| 6 | 4 | 2 | ready
| 7 | 5 | 2 | draft
| 8 | 6 | 2 | published
| 9 | 7 | 2 | published
| 10 | 8 | 2 | ready
| 11 | 9 | 2 | published
| 13 | 1 | 3 | draft
| 14 | 1 | 4 | draft
| 15 | 2 | 4 | draft
| 16 | 3 | 4 | published
| 18 | 1 | 5 | draft
| 19 | 2 | 5 | published
| 20 | 3 | 5 | draft
| 21 | 10 | 5 | archived
I created a DBFiddle to play around:
https://www.db-fiddle.com/f/4UcjKhTvzzNQWL3Pfkfew4/1
Note It's not the same as SQL select only rows with max value on a column since the answer there would select all the revisions that are published and not just the latest one.
Presumably you're after something like this...
DROP TABLE IF EXISTS entities;
CREATE TABLE `entities`
( id SERIAL PRIMARY KEY
, entity_id INT NOT NULL
, revision INT NOT NULL DEFAULT '1'
, type enum('gym','trainer')
, status enum('published','ready','draft','archived') NOT NULL DEFAULT 'draft'
, UNIQUE KEY entities_entity_id_revision_unique (entity_id,revision)
);
INSERT INTO entities
(id, entity_id, revision, type,status) VALUES
( 1,1, 1,'gym','published'),
( 2,1, 2,'gym','archived'),
( 3,2, 1,'gym','draft'),
( 4,2, 2,'gym','draft'),
( 5,2, 3,'gym','draft'),
( 6,2, 4,'gym','ready'),
( 7,2, 5,'gym','draft'),
( 8,2, 6,'gym','published'),
( 9,2, 7,'gym','published'),
(10,2, 8,'gym','ready'),
(11,2, 9,'gym','published'),
(13,3, 1,'gym','draft'),
(14,4, 1,'gym','draft'),
(15,4, 2,'gym','draft'),
(16,4, 3,'gym','published'),
(18,5, 1,'gym','draft'),
(19,5, 2,'gym','draft'),
(20,5, 3,'gym','draft'),
(21,5,10,'gym','published');
SELECT x.*
FROM entities x
JOIN
( SELECT entity_id
, MAX(revision) revision
FROM entities
WHERE status = 'published'
GROUP
BY entity_id
) y
ON y.entity_id = x.entity_id
AND y.revision = x.revision;
+----+-----------+----------+------+-----------+
| id | entity_id | revision | type | status |
+----+-----------+----------+------+-----------+
| 1 | 1 | 1 | gym | published |
| 11 | 2 | 9 | gym | published |
| 16 | 4 | 3 | gym | published |
| 21 | 5 | 10 | gym | published |
+----+-----------+----------+------+-----------+
4 rows in set (0.00 sec)
You can also use over partition by.
SELECT * FROM(
SELECT *,
ROW_NUMBER() OVER( PARTITION BY ENTITY_ID ORDER BY REVISION ASC) AS RN
FROM ENTITIES
WHERE STATUS = 'PUBLISHED') K WHERE RN =1
| id | entity_id | revision | type | status | name | slug | short_description | description | address | phone | email | website | openinghours | images | thumbnail | pricerange | created_at | updated_at | deleted_at | RN |
| --- | --------- | -------- | ---- | --------- | ---- | ---- | ----------------- | ----------- | ------- | ----- | ----- | ------- | ------------ | ------ | --------- | ---------- | ------------------- | ------------------- | ------------------- | --- |
| 1 | 1 | 1 | gym | published | | | | | | | | | | | | | 2020-10-03 21:49:14 | 2020-10-03 21:49:14 | | 1 |
| 8 | 2 | 6 | gym | published | | | | | | | | | | | | | 2020-10-10 16:28:14 | 2020-10-10 16:28:15 | | 1 |
| 16 | 4 | 3 | gym | published | | | | | | | | | | | | | 2020-10-10 17:06:38 | 2020-10-10 17:06:53 | 2020-10-10 17:06:53 | 1 |
| 21 | 5 | 10 | gym | published | | | | | | | | | | | | | 2020-10-11 14:54:16 | 2020-10-11 14:54:16 | | 1 |
View on DB Fiddle
It s almost the same as with the max value
Query #1
SELECT *
FROM
`entities`
WHERE id IN
(SELECT MIN(id)id FROM
`entities`
WHERE `status` = 'published'
GROUP BY `entity_id`);
| id | entity_id | revision | type | status | name | slug | short_description | description | address | phone | email | website | openinghours | images | thumbnail | pricerange | created_at | updated_at | deleted_at |
| --- | --------- | -------- | ---- | --------- | ---- | ---- | ----------------- | ----------- | ------- | ----- | ----- | ------- | ------------ | ------ | --------- | ---------- | ------------------- | ------------------- | ------------------- |
| 1 | 1 | 1 | gym | published | | | | | | | | | | | | | 2020-10-03 21:49:14 | 2020-10-03 21:49:14 | |
| 8 | 2 | 6 | gym | published | | | | | | | | | | | | | 2020-10-10 16:28:14 | 2020-10-10 16:28:15 | |
| 16 | 4 | 3 | gym | published | | | | | | | | | | | | | 2020-10-10 17:06:38 | 2020-10-10 17:06:53 | 2020-10-10 17:06:53 |
| 21 | 5 | 10 | gym | published | | | | | | | | | | | | | 2020-10-11 14:54:16 | 2020-10-11 14:54:16 | |
View on DB Fiddle
If you have multiple rivision you can use max, as the id is always increase by every new revision this makes no difference at all
Query #1
SELECT *
FROM
`entities`
WHERE (`entity_id` ,`revision`) IN
(SELECT `entity_id` ,MAX(`revision`) FROM
`entities`
WHERE `status` = 'published'
GROUP BY `entity_id`);
| id | entity_id | revision | type | status | name | slug | short_description | description | address | phone | email | website | openinghours | images | thumbnail | pricerange | created_at | updated_at | deleted_at |
| --- | --------- | -------- | ---- | --------- | ---- | ---- | ----------------- | ----------- | ------- | ----- | ----- | ------- | ------------ | ------ | --------- | ---------- | ------------------- | ------------------- | ------------------- |
| 1 | 1 | 1 | gym | published | | | | | | | | | | | | | 2020-10-03 21:49:14 | 2020-10-03 21:49:14 | |
| 11 | 2 | 9 | gym | published | | | | | | | | | | | | | 2020-10-10 16:48:20 | 2020-10-10 17:00:47 | |
| 16 | 4 | 3 | gym | published | | | | | | | | | | | | | 2020-10-10 17:06:38 | 2020-10-10 17:06:53 | 2020-10-10 17:06:53 |
| 21 | 5 | 10 | gym | published | | | | | | | | | | | | | 2020-10-11 14:54:16 | 2020-10-11 14:54:16 | |
View on DB Fiddle
I'm working with a pretty nasty table schema which unfortunately I can't change as it's defined by our SCADA program. There's one analog float value (power usage), and one digital int value (machine setting). I need to be able to find the Min, Max, and Avg of the power usage for each machine setting.
So basically each time a new machine setting (intvalue) is recorded, I need the aggregate power usage (floatvalue) until the next machine setting. I'd like to be able to group by intvalue as well, so I could get these aggregate numbers for a whole month, for example.
So far, I've tried playing around with joins and nested queries, but I can't get anything to work. I can't really find any examples like this either, since its such a poor table design.
Table schema found here: http://www.sqlfiddle.com/#!9/29164/1
Data:
| tagid | intvalue | floatvalue | t_stamp |
|-------|----------|------------|----------------------|
| 2 | 9 | (null) | 2019-07-01T00:01:58Z |
| 1 | (null) | 120.2 | 2019-07-01T00:02:00Z |
| 1 | (null) | 120.1 | 2019-07-01T00:02:31Z |
| 2 | 11 | (null) | 2019-07-01T00:07:58Z |
| 1 | (null) | 155.9 | 2019-07-01T00:08:00Z |
| 1 | (null) | 175.5 | 2019-07-01T00:10:12Z |
| 1 | (null) | 185.5 | 2019-07-01T00:10:58Z |
| 2 | 2 | (null) | 2019-07-01T00:11:22Z |
| 1 | (null) | 10.1 | 2019-07-01T00:11:22Z |
| 1 | (null) | 12 | 2019-07-01T00:13:58Z |
| 1 | (null) | 9.9 | 2019-07-01T00:14:21Z |
| 2 | 9 | (null) | 2019-07-01T00:15:38Z |
| 1 | (null) | 120.9 | 2019-07-01T00:15:39Z |
| 1 | (null) | 119.2 | 2019-07-01T00:16:22Z |
Desired output:
| intvalue | min | avg | max |
|----------|-------|-------|-------|
| 2 | 9.9 | 10.7 | 12 |
| 9 | 119.2 | 120.1 | 120.9 |
| 11 | 155.9 | 172.3 | 185.5 |
Is this possible?
You can fill the missing intvalues with a subquery in the SELECT clause:
select t.*, (
select t1.intvalue
from sqlt_data_1_2019_07 t1
where t1.t_stamp <= t.t_stamp
and t1.intvalue is not null
order by t1.t_stamp desc
limit 1
) as group_int
from sqlt_data_1_2019_07 t
order by t.t_stamp;
The result will be
| tagid | intvalue | floatvalue | t_stamp | group_int |
| ----- | -------- | ---------- | ------------------- | --------- |
| 2 | 9 | | 2019-07-01 00:01:58 | 9 |
| 1 | | 120.2 | 2019-07-01 00:02:00 | 9 |
| 1 | | 120.1 | 2019-07-01 00:02:31 | 9 |
| 2 | 11 | | 2019-07-01 00:07:58 | 11 |
| 1 | | 155.9 | 2019-07-01 00:08:00 | 11 |
| 1 | | 175.5 | 2019-07-01 00:10:12 | 11 |
| 1 | | 185.5 | 2019-07-01 00:10:58 | 11 |
| 2 | 2 | | 2019-07-01 00:11:22 | 2 |
| 1 | | 10.1 | 2019-07-01 00:11:22 | 2 |
| 1 | | 12 | 2019-07-01 00:13:58 | 2 |
| 1 | | 9.9 | 2019-07-01 00:14:21 | 2 |
| 2 | 9 | | 2019-07-01 00:15:38 | 9 |
| 1 | | 120.9 | 2019-07-01 00:15:39 | 9 |
| 1 | | 119.2 | 2019-07-01 00:16:22 | 9 |
Now you can simply group by the result of the subquery:
select (
select t1.intvalue
from sqlt_data_1_2019_07 t1
where t1.t_stamp <= t.t_stamp
and t1.intvalue is not null
order by t1.t_stamp desc
limit 1
) as group_int,
min(floatvalue) as min,
avg(floatvalue) as avg,
max(floatvalue) as max
from sqlt_data_1_2019_07 t
group by group_int
order by group_int;
And you get:
| group_int | min | avg | max |
| --------- | ----- | ------------------ | ----- |
| 2 | 9.9 | 10.666666666666666 | 12 |
| 9 | 119.2 | 120.10000000000001 | 120.9 |
| 11 | 155.9 | 172.29999999999998 | 185.5 |
View on DB Fiddle
This is my table with sample data:
Table:PersTrans
+------------+-------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+------------+-------+
| PersTrID | char(10) | NO | PRI | | |
| PersTrSeq | int(11) | NO | PRI | 0 | |
| PersTrDate | date | YES | | 1001-01-01 | |
| PersTrPaid | float(9,2) | YES | | 0.00 | |
+------------+-------------+------+-----+------------+-------+
mysql> select * from PersTrans;
+------------+-----------+-----------+-------------+
| PersTrID | PersTrSeq | PersTrDate | PersTrPaid |
+------------+-----------+-----------+-------------+
| MOCK | 1 | 2015-10-10 | 400.00 |
| MOCK | 2 | 2017-11-07 | 10.00 |
| NAGA | 1 | 2015-11-11 | 500.00 |
| NASSA | 1 | 2015-12-16 | 800.00 |
+------------+-----------+-----------+-------------+
I'd like to pick up the maximum PersTrSeq, and attach it to all the records that have the same PersTrId. What I want:
+------------+-----------+------------+------------+----------------+
| PersTrID | PersTrSeq | PersTrDate | PersTrPaid | max(PersTrSeq) |
+------------+-----------+-----------+------------+-----------------+
| MOCK | 1 | 2015-10-10 | 400.00 | 2 |
| MOCK | 2 | 2017-11-07 | 10.00 | 2 |
| NAGA | 1 | 2015-11-11 | 500.00 | 1 |
| NASSA | 1 | 2015-12-16 | 800.00 | 1 |
+------------+-----------+-----------+------------+-----------------+
These two attempts didn't work. I've looked for other suggestions but haven't found anything helpful.
mysql> SELECT *, max(PersTrSeq) from PersTrans where PersTransId = 'Mock' group by PersTrSeq;
+------------+-----------+------------+------------+----------------+
| PersTrID | PersTrSeq | PersTrDate | PersTrPaid | max(PersTrSeq) |
+------------+-----------+-----------+------------+-----------------+
| MOCk | 1 | 2015-10-10 | 400.00 | 1 |
| MOCK | 2 | 2017-11-07 | 10.00 | 2 |
+------------+-----------+-----------+------------+-----------------+
mysql> SELECT *, max(PersTrSeq) as maxseq from PersTrans group by PersTrId;
+------------+-----------+------------+------------+--------+
| PersTrID | PersTrSeq | PersTrDate | PersTrPaid | maxseq |
+------------+------------+-----------+------------+--------+
| MOCK | 1 | 2015-10-10 | 400.00 | 2 |
| NAGA | 1 | 2015-11-11 | 500.00 | 1 |
| NASSA | 1 | 2015-12-16 | 800.00 | 1 |
+------------+-----------+-----------+------------+---------+
Can anyone offer a single query that will get the result I'm looking for?
Following query will work:
select *,
(select max(PersTrSeq) from PersTrans p2
where p2.PersTrId = p1.PersTrId
) as maxSeq
from PersTrans p1;
If I understand what you want, you want the same number of records as the actual data, substituting the max(PersTrSeq) for all rows with a certain PersTrID.
SELECT
`PerTrID`,
(SELECT max(`PersTrSeq`) FROM `PersTrans` b WHERE b.`PersTrID = a.`PersTrID`) as `PersTrSeq`,
`PersTrDate`,
`PersTrPaid`,
from `PersTrans` a
Is it possible to get the sum of values in last 10 rows with respect to the current row?
I have created a database for a shop, which contains a table named purchase_details. Structure of that table is:
+--------------------------------+---------------+-----+
| Field | Type | Key |
+--------------------------------+---------------+-----+
| Trans_ID | int(11) | PRI |
| Dealer_Name | varchar(40) | |
| Todays_Purchase | double(18,10) | |
| Total_Purchase_In_Last_10_Days | double(18,10) | |
+--------------------------------+---------------+-----+
Sample data:
+----------+-------------+-----------------+--------------------------------+
| Trans_ID | Dealer_Name | Todays_Purchase | Total_Purchase_In_Last_10_Days |
+----------+-------------+-----------------+--------------------------------+
| 1 | Rahul | 7769.1488285639 | NULL |
| 2 | Rahul | 4158.5117578537 | NULL |
| 3 | Rahul | 7200.1363099802 | NULL |
| 4 | Rahul | 9338.8341269511 | NULL |
| 5 | Rahul | 5897.7252866370 | NULL |
| 6 | Rahul | 3266.6656585172 | NULL |
| 7 | Rahul | 3188.0742696276 | NULL |
| 8 | Rahul | 4270.5917314234 | NULL |
| 9 | Rahul | 2604.3369713541 | NULL |
| 10 | Rahul | 7908.6014441989 | NULL |
| 11 | Rahul | 2693.4584823737 | NULL |
| 12 | Rahul | 7945.7825034862 | NULL |
| 13 | Rahul | 1904.1472157570 | NULL |
| 14 | Rajesh | 7093.0478540344 | NULL |
| 15 | Rajesh | 3219.3736989638 | NULL |
+----------+-------------+-----------------+--------------------------------+
I want get the sum of purchases done in last 10 transactions, with the condition that there should be at least 10 transactions to sum up.
Expected output:
+----------+-------------+-----------------+--------------------------------+
| Trans_ID | Dealer_Name | Todays_Purchase | Total_Purchase_In_Last_10_Days |
+----------+-------------+-----------------+--------------------------------+
| 1 | Rahul | 7769.1488285639 | 0.0000000000 |
| 2 | Rahul | 4158.5117578537 | 0.0000000000 |
| 3 | Rahul | 7200.1363099802 | 0.0000000000 |
| 4 | Rahul | 9338.8341269511 | 0.0000000000 |
| 5 | Rahul | 5897.7252866370 | 0.0000000000 |
| 6 | Rahul | 3266.6656585172 | 0.0000000000 |
| 7 | Rahul | 3188.0742696276 | 0.0000000000 |
| 8 | Rahul | 4270.5917314234 | 0.0000000000 |
| 9 | Rahul | 2604.3369713541 | 0.0000000000 |
| 10 | Rahul | 7908.6014441989 | 55602.6263900000 |
| 11 | Rahul | 2693.4584823737 | 50526.9360400000 |
| 12 | Rahul | 7945.7825034862 | 54314.2067800000 |
| 13 | Rahul | 1904.1472157570 | 49018.2176900000 |
| 14 | Rajesh | 7093.0478540344 | 0.0000000000 |
| 15 | Rajesh | 3219.3736989638 | 0.0000000000 |
+----------+-------------+-----------------+--------------------------------+
For this, I've created a mysql function, which will take the Trans_ID and Dealer_Name as a parameter, and will return the sum of Todays_Purchase column.
Function definition:
CREATE FUNCTION GET_TOTAL_PURCHASE(paramTransID INT, paramDealerName VARCHAR(40))
RETURNS DOUBLE(18,10)
READS SQL DATA
BEGIN
DECLARE totalPurchase DOUBLE(18,10);
SET totalPurchase = 0;
SELECT SUM(Todays_Purchase)
INTO totalPurchase
FROM purchase_details
WHERE Trans_ID > (paramTransID-10)
AND Trans_ID <= paramTransID
AND Dealer_Name = paramDealerName;
RETURN totalPurchase;
END
And the SQL query to update Total_Purchase_In_Last_10_Days column is:
UPDATE purchase_details
SET Total_Purchase_In_Last_10_Days = GET_TOTAL_PURCHASE(Trans_ID, Dealer_Name);
Above SQL works properly, but it takes too much time to execute. There are more than a million records in the table, so the query takes more than 5 minutes. Hoe to improve this?
Derived information (such as what you are asking for) is properly done in SELECTs, not by having redundant code in the table.
If one user pulls up his info, it will be reasonably cheap to compute the sum on the fly. And you already have the FUNCTION to do that.
However, can you really trust Trans_ID to be consecutive, no gaps, etc? Your nomenclature is inconsistent: "_Last_10_Days" vs "previous 10 rows" versus "Trans_". "10 days" can be tricky if there are gaps. Etc.
I have a table containing donations, and I am now creating a page to view statistics. I would like to fetch monthly data from the database with gross and cumulative gross.
mysql> describe donations;
+------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| transaction_id | varchar(64) | NO | UNI | | |
| donor_email | varchar(255) | NO | | | |
| net | double | NO | | 0 | |
| gross | double | NO | | NULL | |
| original_request | text | NO | | NULL | |
| time | datetime | NO | | NULL | |
| claimed | tinyint(4) | NO | | NULL | |
+------------------+------------------+------+-----+---------+----------------+
Here's what I've tried:
SET #cgross = 0;
SELECT YEAR(`time`), MONTH(`time`), SUM(`gross`), (#cgross := #cgross + SUM(`gross`)) AS `cumulative_gross` FROM `donations` GROUP BY YEAR(`time`), MONTH(`time`);
The result is:
+--------------+---------------+--------------+------------------+
| YEAR(`time`) | MONTH(`time`) | SUM(`gross`) | cumulative_gross |
+--------------+---------------+--------------+------------------+
| 2013 | 1 | 257 | 257 |
| 2013 | 2 | 140 | 140 |
| 2013 | 3 | 311 | 311 |
| 2013 | 4 | 279 | 279 |
+--------------+---------------+--------------+------------------+
Which is wrong. The desired result would be:
+--------------+---------------+--------------+------------------+
| YEAR(`time`) | MONTH(`time`) | SUM(`gross`) | cumulative_gross |
+--------------+---------------+--------------+------------------+
| 2013 | 1 | 257 | 257 |
| 2013 | 2 | 140 | 397 |
| 2013 | 3 | 311 | 708 |
| 2013 | 4 | 279 | 987 |
+--------------+---------------+--------------+------------------+
I tried this without SUM, and it did work as expected.
SET #cgross = 0;
SELECT YEAR(`time`), MONTH(`time`), SUM(`gross`), (#cgross := #cgross + 10) AS `cumulative_gross` FROM `donations` GROUP BY YEAR(`time`), MONTH(`time`);
+--------------+---------------+--------------+------------------+
| YEAR(`time`) | MONTH(`time`) | SUM(`gross`) | cumulative_gross |
+--------------+---------------+--------------+------------------+
| 2013 | 1 | 257 | 10 |
| 2013 | 2 | 140 | 20 |
| 2013 | 3 | 311 | 30 |
| 2013 | 4 | 279 | 40 |
+--------------+---------------+--------------+------------------+
Why doesn't it work with SUM? Any ideas how I could fix it?
Thanks,
Lassi
A subquery without variables will do it just as easily, and quite a bit more portably;
SELECT YEAR(`time`),
MONTH(`time`),
SUM(gross),
(SELECT SUM(gross)
FROM donations
WHERE `time`<=MAX(a.`time`)) cumulative_gross
FROM donations a GROUP BY YEAR(`time`), MONTH(`time`);
An SQLfiddle to test with.