Cross join with aggregate greatest value - mysql

I have the following table, let's call it Segments:
-------------------------------------
| SegmentStart | SegmentEnd | Value |
-------------------------------------
| 1 | 4 | 20 |
| 4 | 8 | 60 |
| 8 | 10 | 20 |
| 10 | 1000000 | 0 |
-------------------------------------
I am trying to join this table with itself, to obtain the following result set:
-------------------------------------
| SegmentStart | SegmentEnd | Value |
-------------------------------------
| 1 | 4 | 20 |
| 1 | 8 | 60 |
| 1 | 10 | 60 |
| 1 | 1000000 | 60 |
| 4 | 8 | 60 |
| 4 | 10 | 60 |
| 4 | 1000000 | 60 |
| 8 | 10 | 20 |
| 8 | 1000000 | 20 |
| 10 | 1000000 | 0 |
-------------------------------------
Basically, I would need to join every row, with every other row that comes after it, then get the MAX() of the value between each of the rows joined previously. Example: if I am joining row 1 with row 3, I would need the MAX(Value) from all of these 3 rows.
What I already done is the following query:
SELECT s1.SegmentStart, s2.SegmentEnd, GREATEST(s1.Value, s2.Value) as Value FROM Segments s1 CROSS JOIN Segments s2 ON s1.SegmentStart < s2.SegmentEnd
This query creates a similar table to the one desired, but the value fields get mixed up in the following way (I've marked between !! the row that differs):
-------------------------------------
| SegmentStart | SegmentEnd | Value |
-------------------------------------
| 1 | 4 | 20 |
| 1 | 8 | 60 |
| 1 | 10 | !20! |
| 1 | 1000000 | !20! |
| 4 | 8 | 60 |
| 4 | 10 | 60 |
| 4 | 1000000 | 60 |
| 8 | 10 | 20 |
| 8 | 1000000 | 20 |
| 10 | 1000000 | 0 |
-------------------------------------
The problem is with the GREATEST() function, because it only compares the two rows that are being joined (start-end 1-4, 8-10), and not the whole interval (in this case, it would be 3 rows, the ones with start-end 1-4, 4-8, 8-10)
How should I modify this query, or what query should I use, to get my desired result?
Additional info, that may help: the rows in the original table, are always ordered based on SegmentStart, and there can be no duplicate or missing values. Every interval between x and y will appear only once in the table, with no overlaps, and no gaps at all.
I am using Maria DB 10.3.13.

Something like this?
SELECT
s1.SegmentStart
, s2.SegmentEnd
, MAX(s.Value) as Value
FROM
Segments s1
INNER JOIN Segments s2 ON (
s2.SegmentEnd > s1.SegmentStart
)
INNER JOIN Segments s ON (
s.SegmentStart >= s1.SegmentStart
AND s.SegmentEnd <= s2.SegmentEnd
)
GROUP BY
s1.SegmentStart
, s2.SegmentEnd

Related

How to sum values of two tables and group by date

I am building a trading system where users need to know their running account balance by date for a specific user (uid) including how much they made from trading (results table) and how much they deposited or withdrew from their accounts (adjustments table).
Here is the sqlfiddle and tables: http://sqlfiddle.com/#!9/6bc9e4/1
Adjustments table:
+-------+-----+-----+--------+------------+
| adjid | aid | uid | amount | date |
+-------+-----+-----+--------+------------+
| 1 | 1 | 1 | 20 | 2019-08-18 |
| 2 | 1 | 1 | 50 | 2019-08-21 |
| 3 | 1 | 1 | 40 | 2019-08-21 |
| 4 | 1 | 1 | 10 | 2019-08-19 |
+-------+-----+-----+--------+------------+
Results table:
+-----+-----+-----+--------+-------+------------+
| tid | uid | aid | amount | taxes | date |
+-----+-----+-----+--------+-------+------------+
| 1 | 1 | 1 | 100 | 3 | 2019-08-19 |
| 2 | 1 | 1 | -50 | 1 | 2019-08-20 |
| 3 | 1 | 1 | 100 | 2 | 2019-08-21 |
| 4 | 1 | 1 | 100 | 2 | 2019-08-21 |
+-----+-----+-----+--------+-------+------------+
How do I get the below results for uid (1)
+--------------+------------+------------------+----------------+------------+
| ResultsTotal | TaxesTotal | AdjustmentsTotal | RunningBalance | Date |
+--------------+------------+------------------+----------------+------------+
| - | - | 20 | 20 | 2019-08-18 |
| 100 | 3 | 10 | 133 | 2019-08-19 |
| -50 | 1 | - | 84 | 2019-08-20 |
| 200 | 4 | 90 | 378 | 2019-08-21 |
+--------------+------------+------------------+----------------+------------+
Where RunningBalance is the current account balance for the particular user (uid).
Based on #Gabriel's answer, I came up with something like, but it gives me empty balance and duplicate records
SELECT SUM(ResultsTotal), SUM(TaxesTotal), SUM(AdjustmentsTotal), #runningtotal:= #runningtotal+SUM(ResultsTotal)+SUM(TaxesTotal)+SUM(AdjustmentsTotal) as Balance, date
FROM (
SELECT 0 AS ResultsTotal, 0 AS TaxesTotal, adjustments.amount AS AdjustmentsTotal, adjustments.date
FROM adjustments LEFT JOIN results ON (results.uid=adjustments.uid) WHERE adjustments.uid='1'
UNION ALL
SELECT results.amount AS ResultsTotal, taxes AS TaxesTotal, 0 as AdjustmentsTotal, results.date
FROM results LEFT JOIN adjustments ON (results.uid=adjustments.uid) WHERE results.uid='1'
) unionTable
GROUP BY DATE ORDER BY date
For what you are asking you would want to union then group the results from both tables, this should give the results you want. However, I recommend calculating the running balance outside of MySQL since this adds some complexity to our query.
Weird things could start to happen, for example, if someone already defined the #runningBalance variable as part of the queries scope.
SELECT aggregateTable.*, #runningBalance := ifNULL(#runningBalance, 0) + TOTAL
FROM (
SELECT SUM(ResultsTotal), SUM(TaxesTotal), SUM(AdjustmentsTotal)
, SUM(ResultsTotal) + SUM(TaxesTotal) + SUM(AdjustmentsTotal) as TOTAL
, date
FROM (
SELECT 0 AS ResultsTotal, 0 AS TaxesTotal, amount AS AdjustmentsTotal, date
FROM adjustments
UNION ALL
SELECT amount AS ResultsTotal, taxes AS TaxesTotal, 0 as AdjustmentsTotal, date
FROM results
) unionTable
GROUP BY date
) aggregateTable

return a unique list from query result after removing duplicate rows from the table

I have two columns product_id, r_store_id which have a few rows with same values. Rest of the column rows have different values
I have duplicate rows with same r_store_id and product_id because every time I have to add new entries into this table. I want unique rows list with latest update_dt
(refer the DB table below).
id | m_store_id |r_store_id|product_id | amount |update_dt |
1 | 4 | 1 | 45 | 10 |18/03/5 |
2 | 4 | 1 | 45 | 100 |18/03/9 |
3 | 4 | 1 | 45 | 20 |18/03/4 |
4 | 5 | 2 | 49 | 10 |18/03/8 |
5 | 5 | 2 | 49 | 60 |18/03/2 |
6 | 9 | 3 | 45 | 19 |18/03/5 |
7 | 9 | 3 | 45 | 56 |18/03/3 |
My result should look like this:
id | m_store_id |r_store_id|product_id | amount |update_dt |
2 | 7 | 1 | 45 | 100 |18/03/9 |
4 | 5 | 2 | 49 | 10 |18/03/8 |
6 | 9 | 3 | 45 | 19 |18/03/5 |
I want to put this result in a list like this:
List<Sales> salesList = (List<Sales>) query.list();
I am not able to find an easy solution. Please help me with this!
We can select the chronologically most recent update for each store, and then join to get all the variables:
select a.*
from mytable a
join (select m_store_id, r_store_id, product_id, max(update_dt) as maxdate
from mytable
group by 1,2,3) b
on a.m_store_id=b.m_store_id
and a.r_store_id=b.r_store_id
and a.product_id=b.product_id
and a.update_dt = b.maxdate;

Get Sum, Multiple Group By with Filter

I have a table with the following columns that I am trying to create a view from in order to create a report, I need to get the sum of completed hours for a particular class but with a specific filter:
| PK_CLASS_DAYS_ID | FK_MAIN_ID | FK_CLASS_ID | CLASS_DAY | OUTCOME | CLASS_DATE | HOURS |
|------------------|------------|-------------|-----------|---------|------------|-------|
| 1 | 27452 | 137 | 1 | *15 | 2015-11-15 | 8 |
| 2 | 27452 | 137 | 2 | *15 | 2015-11-16 | 8 |
| 3 | 27452 | 137 | 4 | *15 | 2015-11-18 | 8 |
| 4 | 27452 | 137 | 5 | BS15 | 2015-11-19 | 8 |
| 5 | 27452 | 2 | 1 | *16 | 2001-01-01 | 8 |
| 6 | 27452 | 48 | 1 | *16 | 2016-01-12 | 8 |
| 7 | 27452 | 48 | 2 | *16 | 2016-02-27 | 4 |
| 8 | 27452 | 2 | 1 | *17 | 2017-07-01 | 8 |
| 9 | 27452 | 137 | 1 | *16 | 2016-07-16 | 8 |
I need to find the SUM of hours completed for each class (FK_CLASS_ID) for every student in my table (currently I have filtered it to ID 27452 for testing purposes) while applying the following filter for each class (FK_CLASS_ID):
(1)CLASS_DAY must be distinct
(2)CLASS_OUTCOME must begin with "*"
(3)CLASS_DATE must be the most recent, while still having the previous two conditions. The resulting view should be as follows:
| PK_CLASS_DAYS_ID | FK_MAIN_ID | FK_CLASS_ID | Hrs |
|------------------|-------------|--------------|------|
| 1 | 27452 | 137 | 32 |
| 2 | 27452 | 2 | 8 |
| 3 | 27452 | 48 | 12 |
The furthest I've gotten with trying to accomplish this, is the following select statement:
SELECT
t1.CLASS,
SUM(class_hours) as Hrs,
GROUP_CONCAT('D',classes_days.class_day) as DaysList,
main.FULLNAME
FROM
classes t1
INNER JOIN classes_days ON classes_days.FK_CLASS_ID = t1.CLASS_ID
INNER JOIN main ON main.PK_MAIN_ID = classes_days.FK_MAIN_ID
WHERE
main.PK_MAIN_ID = 27452
GROUP BY FK_CLASS_ID
ORDER BY CLASS
For what you're wanting to accomplish, you would need to filter your joins with the desired summation queries and provide the joining on the desired criteria from the retrieved recordset.
Basing it off your provided query and desired results, it should look like:
SELECT
`t1`.`CLASS`,
SUM(`class_hours`.`HOURS`) AS `Hrs`,
GROUP_CONCAT('D', `class_hours`.`CLASS_DAY` ORDER BY `class_hours`.`CLASS_DAY`) AS `DaysList`,
`main`.`FULLNAME`
FROM `classes` AS `t1`
INNER JOIN (
#Filter the total hours by student, class, and day
SELECT `class_dates`.`FK_MAIN_ID`, `class_dates`.`CLASS_DAY`, `class_dates`.`FK_CLASS_ID`, SUM(`class_dates`.`HOURS`) as `HOURS`
FROM (
#Filter Most Recent Days beginning with star, by most recent date
SELECT `classes_days`.*
FROM `classes_days`
WHERE `classes_days`.`OUTCOME` LIKE '*%'
ORDER BY `CLASS_DATE` DESC
) AS `class_dates`
GROUP BY `class_dates`.`FK_MAIN_ID`, `class_dates`.`CLASS_DAY`, `class_dates`.`FK_CLASS_ID`
) AS `class_hours`
ON `class_hours`.`FK_CLASS_ID` = `t1`.`CLASS_ID`
INNER JOIN `main`
ON `main`.`PK_MAIN_ID` = `class_hours`.`FK_MAIN_ID`
GROUP BY `class_hours`.`FK_MAIN_ID`, `class_hours`.`FK_CLASS_ID`
ORDER BY `FULLNAME`, `CLASS`;
Resulting In:
| CLASS | Hrs | DaysList | FULLNAME |
|---------|-----|-------------|----------|
| History | 8 | D1 | Joe |
| Math | 32 | D1,D2,D4 | Joe |
| Science | 12 | D1,D2 | Joe |
| Math | 10 | D1,D2 | Mike |
Example: http://sqlfiddle.com/#!9/d5828/1
Original query of table is the top query. The subquery join example is the bottom query result. Removed the PK_MAIN_ID criteria to show it working against multiple entries
Do keep in mind that MySQL GROUP BY + ORDER BY does not always yield the desired results, and should be filtered using a subquery, which is demonstrated in the join subquery in order to get the most recent dates that begin with *.

MySQL Subselect Issue with two tables and aggregate functions into single query

I have 2 tables
Transaction table
+----+----------+-----+---------+----
| TID | CampaignID | DATE |
+----+----------+-----+---------+---+
| 1 | 5 | 2016-01-01 |
| 2 | 5 | 2016-01-01 |
| 3 | 2 | 2016-01-01 |
| 4 | 5 | 2016-01-01 |
| 5 | 1 | 2016-01-01 |
| 6 | 1 | 2016-02-02 |
| 7 | 3 | 2016-02-02 |
| 8 | 3 | 2016-02-02 |
| 9 | 5 | 2016-02-02 |
| 10| 4 | 2016-02-02 |
+----+----------+-----+---------+---+
Campaign Table
+-------------+----------------+--------------------
| CampaignID | DailyMaxImpressions | CampaignActive
+-------------+----------------+--------------------
| 1 | 5 | Y |
| 2 | 5 | Y |
| 3 | 5 | Y |
| 4 | 5 | Y |
| 5 | 1 | Y |
+-------------+----------------+--------------------
What I am trying to do is get a single random campaign where the the count in transaction table is less than the daily max impressions in the campaign table. I might also be passing a date s part of the query for the transaction table
So for CampaignId 1 there must be 4 trans of less in the transaction table and the Campaignactive must be a "Y"
Any help would be appreciated if this can be done in a single statement. ( mysql )
Thanks in advance,
Jeff Godstein
This should get it for you. The basic query is select each campaign that is active. The INNER query will pre-aggregate per campaign for the given date in question. From that, a LEFT-JOIN allows any campaign to be returned even if it does NOT exist within the subquery OR it DOES exist, but the count is less than that allowed for the date in question. The order by RAND() is obvious.
SELECT
c.CampaignID
from
Campaign c
LEFT JOIN
( select
t1.CampaignID,
count(*) as CampCount
from
Transaction t1
where
t1.Date = YourDateParameterValue
group by
t1.CampaignID ) as T
ON c.CampaignID = T.CampaignID
where
c.CampaignActive = 'Y'
AND ( t.CampaignID IS NULL
OR t.CampCount < c.DailyMaxImpressions )
order by
RAND()

How to get this specific user rankings query in mysql?

I've got tbl_items in my user database that I want to sort user rankings on a particular item with certain id (514). I have test data on my dev environment with this set of data:
mysql> select * from tbl_items where classid=514;
+---------+---------+----------+
| ownerId | classId | quantity |
+---------+---------+----------+
| 1 | 514 | 3 |
| 2 | 514 | 5 |
| 3 | 514 | 11 |
| 4 | 514 | 46 |
| 5 | 514 | 57 |
| 6 | 514 | 6 |
| 7 | 514 | 3 |
| 8 | 514 | 27 |
| 10 | 514 | 2 |
| 11 | 514 | 73 |
| 12 | 514 | 18 |
| 13 | 514 | 31 |
+---------+---------+----------+
12 rows in set (0.00 sec)
so far so good :) I wrote the following query:
set #row=0;
select a.*, #row:=#row+1 as rank
from (select a.ownerid,a.quantity from tbl_items a
where a.classid=514) a order by quantity desc;
+---------+----------+------+
| ownerid | quantity | rank |
+---------+----------+------+
| 11 | 73 | 1 |
| 5 | 57 | 2 |
| 4 | 46 | 3 |
| 13 | 31 | 4 |
| 8 | 27 | 5 |
| 12 | 18 | 6 |
| 3 | 11 | 7 |
| 6 | 6 | 8 |
| 2 | 5 | 9 |
| 7 | 3 | 10 |
| 1 | 3 | 11 |
| 10 | 2 | 12 |
+---------+----------+------+
12 rows in set (0.00 sec)
that ranks correctly the users. However in a table with lots of records, I need to do the following:
1) be able to get small portion of the list, around where the user ranking actually resides, something that would get me the surrounding records, preserving the overall rank:
I tried to do these things with setting a user variable to the ranking of the current user and by using offset and limit, but couldn't preserve the overall ranking.
This should get me something like the following (for instance ownerId=2 and surroundings limit 5:
+---------+----------+------+
| ownerid | quantity | rank |
+---------+----------+------+
| 3 | 11 | 7 |
| 6 | 6 | 8 |
| 2 | 5 | 9 | --> ownerId=2
| 7 | 3 | 10 |
| 1 | 3 | 11 |
+---------+----------+------+
5 rows in set (0.00 sec)
2) I'd also need another query (preferably single query) that gets me the top 3 places + the ranking of particular user with certain id, preferably with a single query, no matter if he's among the top 3 places or not. I couldn't get this as well
It would look like the following (for instance ownerId=2 again):
+---------+----------+------+
| ownerid | quantity | rank |
+---------+----------+------+
| 11 | 73 | 1 |
| 5 | 57 | 2 |
| 4 | 46 | 3 |
| 2 | 5 | 9 | --> ownerId=2
+---------+----------+------+
4 rows in set (0.00 sec)
Also I'm in a bit of a concern about the performance of the queries on a table with millions of records...
Hope someone helps :)
1) 5 entries around a given id.
set #row=0;
set #rk2=-1;
set #id=2;
select b.* from (
select a.*, #row:=#row+1 as rank, if(a.ownerid=#id, #rk2:=#row, -1) as rank2
from (
select a.ownerid,a.quantity
from tbl_items a
where a.classid=514) a
order by quantity desc) b
where b.rank > #rk2 - 3
limit 5;
Though you'll get an extra column rank2: you probably want to filter it out by explicit list of columns instead of b.*. Maybe it's possible whith a having clause rather than an extra nesting.
2) 3 top ranked entries + 1 specific id
select b.* from (
select a.*, #row:=#row+1 as rank
from (
select a.ownerid,a.quantity
from tbl_items a
where a.classid=514) a
order by quantity desc) b
where b.rank < 4 or b.ownerid=#id