Get cumulative sum based on unique item in MySQL - mysql

I'm using MySQL and I'm trying to write a stored procedure query that joins two tables and produces a particular column's running sum. Instead of the usual continuous running sum which is continuous, I would like the one that resets each time the item changes.
I hope what I'm requesting is clearer after my reproducible sample.
Table 1
CREATE TABLE `table1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`Date` date DEFAULT NULL,
`Item` varchar(20) DEFAULT NULL,
`Quantity` decimal(5,3) DEFAULT NULL,
`Volume` decimal(20,2) DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO table1 (Date, Item, Quantity, Volume)
VALUES ('2022-04-25', 'Ball', 5, 30),
('2022-04-25', 'Balloon', 3, 14),
('2022-04-25', 'Bag', 2, 7),
('2022-04-24', 'Ball', 7, 20),
('2022-04-24', 'Balloon', 1, 9),
('2022-04-24', 'Bag', 4, 18),
('2022-04-23', 'Ball', 9, 53),
('2022-04-23', 'Balloon', 4, 25),
('2022-04-23', 'Bag', 11, 12),
('2022-04-22', 'Ball', 13, 8);
Table 2
CREATE TABLE `table2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`Date` date DEFAULT NULL,
`Item` varchar(20) DEFAULT NULL,
`Size (inches)` decimal(10,2) DEFAULT NULL,
`density` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO table2 (Date, Item, `Size (inches)`, density)
VALUES ('2022-04-25', 'Ball', 15, 20),
('2022-04-25', 'Balloon', 13, 34),
('2022-04-25', 'Bag', 12, 17),
('2022-04-24', 'Ball', 17, 50),
('2022-04-24', 'Balloon', 11, 19),
('2022-04-24', 'Bag', 14, 8),
('2022-04-23', 'Ball', 19, 3),
('2022-04-23', 'Balloon', 14, 5),
('2022-04-23', 'Bag', 31, 2),
('2022-04-22', 'Ball', 42, 18);
This is the stored procedure I have at the moment:
DELIMITER $$
CREATE DEFINER=`localhost`#`%` PROCEDURE `procedure1`()
BEGIN
DROP TABLE IF EXISTS `procedure_table`;
SET #running_total:=0;
CREATE TABLE `procedure_table` AS SELECT * FROM (
SELECT i.`Item`,
i.`Date`,
ROUND(i.`Volume`/i.`Size (inches)`,2) as `Volume/Size`,
ROUND(i.`Quantity`/i.`Volume`,2) as `Quantity x Volume`,
i.`Size (inches)` as `Size (inches)`,
i.`density` as density,
i.`Quantity`,
ROUND(i.Volume) as `Oil Volume`,
(#running_total := #running_total + IFNULL(i.`Volume`,0)) AS `Cumulative Volume`
FROM (SELECT `table1`.*,
`table2`.`Size (inches)`,
`table2`.`density`
FROM `table1`
LEFT JOIN `table2`
ON `table1`.Item = `table2`.`Item`
AND Month(`table1`.Date) = Month(`table2`.Date)
AND Year(`table1`.Date) = Year(`table2`.Date)
ORDER BY `table1`.Item,Date) as i) u;
END$$
DELIMITER ;
When I run this I get a table that looks like this:
When instead I'd what I want is:
I've tried the PARTITION BY function but haven't been able to get it to work in MySQL.
How do I get my desired output?
Edit - Output without problematic column
SELECT * FROM (
SELECT i.`Item`,
i.`Date`,
ROUND(i.`Volume`/i.`Size (inches)`,2) as `Volume/Size`,
ROUND(i.`Quantity`/i.`Volume`,2) as `Quantity x Volume`,
i.`Size (inches)` as `Size (inches)`,
i.`density` as density,
i.`Quantity`,
ROUND(i.Volume) as `Oil Volume`
FROM (SELECT `table1`.*,
`table2`.`Size (inches)`,
`table2`.`density`
FROM `table1`
LEFT JOIN `table2`
ON `table1`.Item = `table2`.`Item`
AND Month(`table1`.Date) = Month(`table2`.Date)
AND Year(`table1`.Date) = Year(`table2`.Date)
ORDER BY `table1`.Item,Date) as i) u
ORDER BY Item;

Output without problematic column
SELECT *
FROM ( SELECT i.`Item`,
i.`Date`,
ROUND(i.`Volume`/i.`Size (inches)`,2) as `Volume/Size`,
ROUND(i.`Quantity`/i.`Volume`,2) as `Quantity x Volume`,
i.`Size (inches)` as `Size (inches)`,
i.`density` as density,
i.`Quantity`,
ROUND(i.Volume) as `Oil Volume`
FROM ( SELECT `table1`.*,
`table2`.`Size (inches)`,
`table2`.`density`
FROM `table1`
LEFT JOIN `table2` ON `table1`.Item = `table2`.`Item`
AND Month(`table1`.Date) = Month(`table2`.Date)
AND Year(`table1`.Date) = Year(`table2`.Date)
ORDER BY `table1`.Item,Date
) as i
) u
ORDER BY Item;
The ordering is not deterministic. Looking your desired output I see that the secondary sorting is performed by Date output column. I.e. for correct output rows ordering and cumulative sum calculation the the ORDER BY must be expanded to ORDER BY Item, `Date`;.
And the query will be:
SELECT *,
#sum := CASE WHEN Item = #item
THEN #sum + ROUND(i.Volume)
ELSE ROUND(i.Volume)
END AS `cumulative sum`,
#item := Item AS Item
FROM ( SELECT i.`Item`,
i.`Date`,
ROUND(i.`Volume`/i.`Size (inches)`,2) as `Volume/Size`,
ROUND(i.`Quantity`/i.`Volume`,2) as `Quantity x Volume`,
i.`Size (inches)` as `Size (inches)`,
i.`density` as density,
i.`Quantity`,
ROUND(i.Volume) as `Oil Volume`
FROM ( SELECT `table1`.*,
`table2`.`Size (inches)`,
`table2`.`density`
FROM `table1`
LEFT JOIN `table2` ON `table1`.Item = `table2`.`Item`
AND Month(`table1`.Date) = Month(`table2`.Date)
AND Year(`table1`.Date) = Year(`table2`.Date)
ORDER BY `table1`.Item,Date
) as i
) u
CROSS JOIN ( SELECT #item := '', #sum:=0 ) init_variables
ORDER BY Item, `Date`;
First additional column either adds current Oil Volume to previous one or takes current only depends on the fact does the item is the same like in previous row or not. Second additional column simply stores current Item value for to be used on the next row evaluation. These columns can be moved within the output fieldset, but their relative posession must be stored.
PS. If (Item, `Date`) values pair is not unique then the rows ordering is not definite again. In this case you must either group in the subquery providing this expression uniqueness or expand the ordering expression additionally.

Related

MYSQL 5.6 get latest data of each user

My Database table is as shown below. I need to get latest mark of each student. Latest entry is the row with maximum udate and maximum oder. (The oder will be incremented by one on each entry with same date)
In my example, I have two students Mujeeb, Zakariya and two subjects ENGLISH, MATHS. I need to get latest mark of each student for each subject. My expectd result is as follows
My sample data is
DROP TABLE IF EXISTS `students`;
CREATE TABLE IF NOT EXISTS `students` (
`uid` int(11) NOT NULL AUTO_INCREMENT,
`udate` date NOT NULL,
`oder` int(11) NOT NULL,
`name` varchar(20) NOT NULL,
`Subject` varchar(20) NOT NULL,
`mark` int(11) NOT NULL,
PRIMARY KEY (`uid`)
) ENGINE=MyISAM AUTO_INCREMENT=13 DEFAULT CHARSET=latin1;
INSERT INTO `students` (`uid`, `udate`, `oder`, `name`, `Subject`, `mark`) VALUES
(1, '2021-08-01', 1, 'Mujeeb', 'ENGLISH', 10),
(2, '2021-08-01', 1, 'Zakariya', 'ENGLISH', 20),
(3, '2021-08-10', 2, 'Mujeeb', 'ENGLISH', 50),
(4, '2021-08-11', 2, 'Zakariya', 'ENGLISH', 60),
(5, '2021-08-02', 1, 'Mujeeb', 'ENGLISH', 100),
(6, '2021-08-03', 1, 'Zakariya', 'ENGLISH', 110),
(7, '2021-08-10', 1, 'Mujeeb', 'ENGLISH', 500),
(8, '2021-08-11', 1, 'Zakariya', 'ENGLISH', 600),
(9, '2021-08-01', 2, 'Mujeeb', 'MATHS', 100),
(10, '2021-08-01', 2, 'Zakariya', 'MATHS', 75),
(11, '2021-08-10', 3, 'Mujeeb', 'MATHS', 50),
(12, '2021-08-11', 3, 'Zakariya', 'MATHS', 60);
Use NOT EXISTS:
SELECT s1.*
FROM students s1
WHERE NOT EXISTS (
SELECT 1
FROM students s2
WHERE s2.name = s1.name AND s2.Subject = s1.Subject
AND (s2.udate > s1.udate OR (s2.udate = s1.udate AND s2.oder > s1.oder))
);
Or with a correlated subquery in the WHERE clause:
SELECT s1.*
FROM students s1
WHERE s1.uid = (
SELECT s2.uid
FROM students s2
WHERE s2.name = s1.name AND s2.Subject = s1.Subject
ORDER BY s2.udate DESC, s2.oder DESC LIMIT 1
);
See the demo.
As ROW_NUMBER() function doesn't work at lower version of MySQL, So alternate way of row_number() is used for this solution.
-- MySQL (v5.6)
SELECT p.uid, p.udate, p.oder, p.name, p.Subject, p.mark
FROM (SELECT #row_no := IF((#prev_val = t.name && #prev_val1 = t.Subject), #row_no + 1, 1) AS row_number
, #prev_val := t.name AS name
, #prev_val1 := t.Subject AS Subject
, t.mark
, t.oder
, t.uid
, t.udate
FROM students t,
(SELECT #row_no := 0) x,
(SELECT #prev_val := '') y,
(SELECT #prev_val1 := '') z
ORDER BY t.name, t.Subject, t.udate DESC, t.oder DESC ) p
WHERE p.row_number = 1
ORDER BY p.name, p.Subject;
Please check the url http://sqlfiddle.com/#!9/b5befe/18

MySQL Attendance IN & OUT columns with correct times

I have a database for attendance, it works fine as long as the person does not work over 2 dates. I want to utilize IN and OUT system for records but I do not know how to do the final step, and what I saw on the forum does not work on MySQL or I am doing something wrong there.
This is my database and queries are under.
BTW Database is built using PHPmyadmin and MySQL Workbench.
CREATE TABLE `entries` (
`indexing` int(11) NOT NULL,
`emp_id` int(5) NOT NULL,
`Date` datetime DEFAULT current_timestamp() ) ;
INSERT INTO `entries` (`indexing`, `emp_id`, `Date`) VALUES
(61, 1, '2020-07-07 05:41:36'),
(62, 1, '2020-07-07 05:44:21'),
(63, 2, '2020-07-07 05:44:36'),
(64, 3, '2020-07-07 05:49:23'),
(65, 2, '2020-07-07 05:49:39'),
(66, 3, '2020-07-07 05:50:00'),
(67, 4, '2020-07-07 09:56:51'),
(68, 5, '2020-07-07 09:57:13'),
(69, 3, '2020-07-07 09:57:18'),
(70, 2, '2020-07-07 09:57:28'),
(71, 1, '2020-07-07 09:57:42'),
(72, 4, '2020-07-07 09:57:49'),
(73, 5, '2020-07-07 09:59:38'),
(74, 1, '2020-07-08 05:59:42'),
(75, 2, '2020-07-08 06:00:05'),
(76, 3, '2020-07-08 06:38:20'),
(77, 1, '2020-07-08 09:47:43'),
(78, 4, '2020-07-08 09:56:14'),
(79, 5, '2020-07-08 09:56:47'),
(80, 1, '2020-07-08 09:56:59'),
(81, 3, '2020-07-08 09:57:34'),
(82, 2, '2020-07-08 09:58:07'),
(83, 4, '2020-07-08 09:58:11'),
(84, 5, '2020-07-08 09:59:20'),
(85, 5, '2020-07-08 09:59:50'),
(86, 4, '2020-07-08 11:08:36'),
(87, 3, '2020-07-08 11:09:30');
CREATE TABLE `user` (
`emp_id` int(5) NOT NULL,
`Name` varchar(50) NOT NULL,
`company` set('First','second') NOT NULL DEFAULT 'First',
`department` set('Outbound','Inbound','UE','Returns','QC','Cleaner','Admin','IT
Technician','Supervisor','Manager') NOT NULL,
`driver` set('PPT','VNA','HLOP','CB','PPT VNA HLOP','PPT HLOP','PPT CB') DEFAULT NULL
) ;
INSERT INTO `user` (`emp_id`, `Name`, `company`, `department`, `driver`) VALUES
(1, 'Micinka', 'second', 'IT Technician', ''),
(2, 'Dusbica', 'First', 'IT Technician', ''),
(3, 'Klaudocka', 'First', 'Returns', ''),
(4, 'Patrycginis', 'First', 'Cleaner', ''),
(5, 'Stuistow', 'First', 'Cleaner', '');
--
ALTER TABLE `entries`
ADD PRIMARY KEY (`indexing`),
ADD KEY `emp_id` (`emp_id`);
--
-- Indexes for table `user`
--
ALTER TABLE `user`
ADD PRIMARY KEY (`emp_id`);
-- Constraints for table `entries`
--
ALTER TABLE `entries`
ADD CONSTRAINT `entries_ibfk_1` FOREIGN KEY (`emp_id`) REFERENCES `user` (`emp_id`) ON DELETE CASCADE;
COMMIT;
/*!40101 SET CHARACTER_SET_CLIENT=#OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=#OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=#OLD_COLLATION_CONNECTION */;
This are the Queries, and last one is how I would want the table look like but with IN and OUT times correct, now are both same.
select entries.emp_id, entries.Date, dense_rank() over (partition by entries.emp_id order by entries.indexing) % 2 AS 'IN and OUT' from entries;
drop table report_inout;
create view report_inout as select entries.emp_id, entries.Date,
CASE WHEN DENSE_RANK() OVER (PARTITION BY entries.emp_id ORDER BY entries.Date) % 2 = 0
THEN 'OUT' ELSE 'IN' END AS `IN and OUT`
FROM entries
ORDER BY
entries.indexing;
select date_format(report_inout.Date,'%d/%M/%Y') as `Date`,user.Name, time_format(report_inout.Date,'%H:%i:%s') as `IN`, time_format(report_inout.Date,'%H:%i:%s') as `OUT`,
user.company as Company,user.department as Department from report_inout
join user on user.emp_id = report_inout.emp_id
group by user.Name, report_inout.`In and Out`;
This are the results from my queries i posted.
emp_id;"Date";"IN and OUT"
1;"2020-07-07 05:41:36";"IN"
1;"2020-07-07 05:44:21";"OUT"
2;"2020-07-07 05:44:36";"IN"
3;"2020-07-07 05:49:23";"IN"
2;"2020-07-07 05:49:39";"OUT"
3;"2020-07-07 05:50:00";"OUT"
4;"2020-07-07 09:56:51";"IN"
5;"2020-07-07 09:57:13";"IN"
3;"2020-07-07 09:57:18";"IN"
2;"2020-07-07 09:57:28";"IN"
1;"2020-07-07 09:57:42";"IN"
4;"2020-07-07 09:57:49";"OUT"
5;"2020-07-07 09:59:38";"OUT"
1;"2020-07-08 05:59:42";"OUT"
2;"2020-07-08 06:00:05";"OUT"
3;"2020-07-08 06:38:20";"OUT"
1;"2020-07-08 09:47:43";"IN"
4;"2020-07-08 09:56:14";"IN"
5;"2020-07-08 09:56:47";"IN"
1;"2020-07-08 09:56:59";"OUT"
3;"2020-07-08 09:57:34";"IN"
2;"2020-07-08 09:58:07";"IN"
4;"2020-07-08 09:58:11";"OUT"
5;"2020-07-08 09:59:20";"OUT"
5;"2020-07-08 09:59:50";"IN"
and last query is this one, but it has always same time in IN and OUT
Date;"Name";"IN";"OUT";"Company";"Department"
08/July/2020;"Dusbica";"09:58:07";"09:58:07";"First";"IT Technician"
08/July/2020;"Dusbica";"06:00:05";"06:00:05";"First";"IT Technician"
08/July/2020;"Klaudocka";"09:57:34";"09:57:34";"First";"Returns"
08/July/2020;"Klaudocka";"11:09:30";"11:09:30";"First";"Returns"
08/July/2020;"Micinka";"09:47:43";"09:47:43";"second";"IT Technician"
08/July/2020;"Micinka";"09:56:59";"09:56:59";"second";"IT Technician"
08/July/2020;"Patrycginis";"11:08:36";"11:08:36";"First";"Cleaner"
08/July/2020;"Patrycginis";"09:58:11";"09:58:11";"First";"Cleaner"
08/July/2020;"Stuistow";"09:59:50";"09:59:50";"First";"Cleaner"
08/July/2020;"Stuistow";"09:59:20";"09:59:20";"First";"Cleaner"
Assuming that:
1st record for each separate emp_id is IN event
There is no lost events
WITH cte AS ( SELECT emp_id, `Date`,
ROW_NUMBER() OVER (PARTITION BY emp_id ORDER BY `Date`) - 1 rn
FROM entries )
SELECT t1.emp_id, user.name, t1.`Date` in_date, t2.`Date` out_date
FROM user
JOIN cte t1 ON user.emp_id = t1.emp_id
LEFT JOIN cte t2 ON t1.emp_id = t2.emp_id
AND t1.rn DIV 2 = t2.rn DIV 2
AND t2.rn MOD 2
WHERE NOT t1.rn MOD 2
ORDER BY emp_id, in_date;
fiddle
Idea.
We enumerate all rows for each employee separately starting with zero. So first IN is 0, first OUT is 1, 2nd IN is 2 and so on.
You can see that matched IN and OUT events will give the same result after integer divide their numbers by 2. And the reminder for IN will be 0 whereas for OUT it will be 1.
This is enough for correct joining.
Second copy of CTE table is joining using LEFT join because the last IN row may have no according OUT row - this means that the employee is now present at the object. And final row will contain NULL in out_date column in this case.

Ponderate average MYSQL

We have a little simulator of a tour-operator DB (MYSQL) and we are asked to get a Query that gives us the weighted avg of duration of the tours that we have.
https://en.wikipedia.org/wiki/Weighted_arithmetic_mean
Using subquery I got to this point where I have the days that each tour lasts and the weight of each tour from the total of tours, but I am stuck and don't know how to get the weighted avg from here. I know I have to use another select from the result I already got but I would appreciate some help.
SQLfiddle down here:
http://sqlfiddle.com/#!9/53d80/2
Tables and data
CREATE TABLE STAGE
(
ID INT AUTO_INCREMENT NOT NULL,
TOUR INT NOT NULL,
TYPE INT NOT NULL,
CITY INT NOT NULL,
DAYS INT NOT NULL,
PRIMARY KEY (ID)
);
CREATE TABLE TOUR
(
ID INT AUTO_INCREMENT NOT NULL,
DESCRIPTION VARCHAR(255) CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI
NOT NULL,
STARTED_ON DATE NOT NULL,
TYPE INT NOT NULL,
PRIMARY KEY (ID)
);
INSERT INTO TOUR (DESCRIPTION, STARTED_ON, TYPE) VALUES
('Mediterranian Cruise','2018-01-01',3),
('Trip to Nepal','2017-12-01',1),
('Tour in Nova York','2015-04-24',5),
('A week at the Amazones','2014-09-11',2),
('Visiting the Machu Picchu','2013-02-19',4);
INSERT INTO STAGE (TOUR, TYPE, CITY, DAYS) VALUES
(1, 1, 38254, 1),
(1, 2, 22460, 3),
(1, 2, 47940, 3),
(1, 2, 42600, 4),
(1, 3, 38254, 1),
(2, 1, 13097, 1),
(2, 2, 29785, 5),
(2, 3, 13097, 1),
(3, 1, 788, 2); ,
(3, 2, 48019, 6),
(3, 3, 788, 1),
(4, 1, 38254, 2),
(4, 2, 8703, 3);,
(4, 3, 38254, 4),
(5, 1, 10453, 1),
(5, 2, 32045, 5),
(5, 3, 10453, 2);
Query:
SELECT
AVG(TD.TOUR_DAYS) AS AVERAGE_DAYS,
COUNT(TD.TOUR_ID) AS WEIGHT
FROM
(
SELECT
TOUR.ID AS TOUR_ID,
SUM(DAYS) AS TOUR_DAYS,
COUNT(STAGE.ID) AS STAGE_DAYS
FROM
TOUR
INNER JOIN
STAGE
ON
TOUR.ID = STAGE.TOUR
GROUP BY
TOUR.ID
) AS TD
GROUP BY
TD.TOUR_DAYS
weigthed avg would be:
(1×7+1×8+2×9+1×12) / (1+1+2+1) = 9
Wheighted AVG can be calculated with SUM(value * wheight) / SUM(wheight). In your case:
SELECT SUM(AVERAGE_DAYS * WEIGHT) / SUM(WEIGHT)
FROM (
SELECT
AVG(TD.TOUR_DAYS) AS AVERAGE_DAYS,
COUNT(TD.TOUR_ID) AS WEIGHT
FROM
(
SELECT
TOUR.ID AS TOUR_ID,
SUM(DAYS) AS TOUR_DAYS,
COUNT(STAGE.ID) AS STAGE_DAYS
FROM
TOUR
INNER JOIN
STAGE
ON
TOUR.ID = STAGE.TOUR
GROUP BY
TOUR.ID
) AS TD
GROUP BY
TD.TOUR_DAYS
) sub
http://sqlfiddle.com/#!9/53d80/4
I'm not 100% sure, but it looks like the following query is doing exactly the same:
SELECT AVG(TOUR_DAYS)
FROM (
SELECT TOUR, SUM(DAYS) AS TOUR_DAYS
FROM STAGE
GROUP BY TOUR
) sub;
Or even without any subqueries:
SELECT SUM(DAYS) / COUNT(DISTINCT TOUR)
FROM STAGE;
That would mean, the requirement should be simplified to "Get average number of days per tour".

MySQL challenge using MIN and subquery

My intent is to return a date value based on a selected year as well as a minimum date value based on a dataset that includes the complete dataset across all years. The query always returns the minimum date value in 2017. I want it to return the minimum start_date from the whole dataset.
What I get is for min_date_over_all_years
orgA 2017-10-09
orgB 2017-10-08
Required result for min_date_over_all_years is
orgA 2015-10-10
orgB 2014-10-09
Please see the attached fiddle for the example:
http://sqlfiddle.com/#!9/c0f74/9
The schema is:
CREATE TABLE IF NOT EXISTS `project` (
`project_id` int(11) NOT NULL AUTO_INCREMENT,
`p_name` varchar(10) NOT NULL,
`start_date` DATE NOT NULL,
`organisation_id` int(11) NOT NULL,
PRIMARY KEY (`project_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `project` (`project_id`, `p_name`,
`start_date`, `organisation_id`)
VALUES
(1, 'testP1', '2017-10-09', 1),
(2, 'testP2', '2016-10-10', 1),
(3, 'testP3', '2015-10-10', 1),
(4, 'testP4', '2017-10-10', 2),
(5, 'testP5', '2014-10-10', 2),
(6, 'testP6', '2017-10-10', 1),
(7, 'testP7', '2016-10-10', 1),
(8, 'testP8', '2015-10-10', 1),
(9, 'testP9', '2017-10-08', 2),
(10, 'testP10', '2014-10-09', 2);
CREATE TABLE IF NOT EXISTS `organisation` (`organisation_id` int(11) NOT NULL AUTO_INCREMENT,
`org_name` varchar(10) NOT NULL,
PRIMARY KEY (`organisation_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `organisation` (`organisation_id`, `org_name`
)
VALUES
(1, 'orgA'),
(2, 'orgB');
AND the query I have tried (along with simpler subquery and Case versions) is:
SELECT o.org_name, MIN(p.start_date) AS min_date_2017, YEAR(p.start_date) AS year_selected,
(SELECT MIN(p.start_date) FROM project p2
INNER JOIN organisation o2 ON o2.organisation_id = p2.organisation_id
WHERE p2.organisation_id = o.organisation_id
GROUP BY o2.organisation_id) AS min_date_over_all_years
FROM organisation o
INNER JOIN project p on p.organisation_id = o.organisation_id
WHERE YEAR(p.start_date)=2017
GROUP BY o.organisation_id
You can't put a subquery that returns multiple rows in the SELECT list; when a subquery is being used as an expression, it has to return a single row with a single column.
You don't need a separate query.
SELECT o.org_name,
MIN(IF(YEAR(p.start_date) = 2017, p.start_date, NULL)) AS min_date_2017,
2017 AS year_selected,
MIN(p.start_date) AS min_date_over_all_years
FROM organisation AS o
INNER JOIN project AS p ON p.organisation_id = o.organisation_id
GROUP BY o.organisation_id
You can also join with a subquery that gets the overall data.
SELECT o.org_name, MIN(p.start_date) AS min_date_2017, YEAR(p.start_date) AS year_selected, overall.start_date AS min_date_over_all_years
FROM organisation o
INNER JOIN project p on p.organisation_id = o.organisation_id
INNER JOIN (
SELECT organisation_id, MIN(start_date) AS start_date
FROM project
GROUP BY organisation_id) AS overall ON o.organisation_id = overall.organisation_id
WHERE YEAR(p.start_date)=2017
GROUP BY o.organisation_id

Topological sorting in sql

I am resolving dependency between some objects in a table.
I have to do something with objects in order their dependency.
For example, the first object doesn't depend on any object. The second and third ones depends on first one and so on. I have to use topological sorting.
Could someone show the sample of implementation so sorting in t-sql.
I have a table:
create table dependency
(
DependencyId PK
,ObjectId
,ObjectName
,DependsOnObjectId
)
I want to get
ObjectId
ObjectName
SortOrder
Thank you.
It seams, it works:
declare #step_no int
declare #dependency table
(
DependencyId int
,ObjectId int
,ObjectName varchar(100)
,DependsOnObjectId int
,[rank] int NULL
,degree int NULL
);
insert into #dependency values (5, 5, 'Obj 5', 2, NULL, NULL)
insert into #dependency values (6, 6, 'Obj 6', 7, NULL, NULL)
insert into #dependency values (2, 2, 'Obj 2', 1, NULL, NULL)
insert into #dependency values (3, 3, 'Obj 3', 1, NULL, NULL)
insert into #dependency values (1, 1, 'Obj 1', 1, NULL, NULL)
insert into #dependency values (4, 4, 'Obj 4', 2, NULL, NULL)
insert into #dependency values (7, 7, 'Obj 7', 2, NULL, NULL)
update #dependency set rank = 0
-- computing the degree of the nodes
update d set d.degree =
(
select count(*) from #dependency t
where t.DependsOnObjectId = d.ObjectId
and t.ObjectId <> t.DependsOnObjectId
)
from #dependency d
set #step_no = 1
while 1 = 1
begin
update #dependency set rank = #step_no where degree = 0
if (##rowcount = 0) break
update #dependency set degree = NULL where rank = #step_no
update d set degree = (
select count(*) from #dependency t
where t.DependsOnObjectId = d.ObjectId and t.ObjectId != t.DependsOnObjectId
and t.ObjectId in (select tt.ObjectId from #dependency tt where tt.rank = 0))
from #dependency d
where d.degree is not null
set #step_no = #step_no + 1
end
select * from #dependency order by rank
You have a simple tree structure with only one path to each ObjectId so labeling based off number of DependsOnObjectId links traversed gives only one answer and a good enough answer to process the right stuff first. This is easy to do with a common table expression and has the benefit of easy portability:
with dependency_levels as
(
select ObjectId, ObjectName, 0 as links_traversed
from dependency where DependsOnObjectId is null
union all
select ObjectId, ObjectName, links_traversed+1
from dependecy
join dependency_levels on dependency.DependsOnObjectId = dependency_levels.ObjectId
)
select ObjectId, ObjectName, links_traversed
from dependency_levels
order by links_traversed