How to sort a list of people by birth and death dates when data are incomplete - mysql

I have a list of people who may or may not have a birth date and/or a death date. I want to be able to sort them meaningfully - a subjective term - by birth date.
BUT - if they don't have a birth date but they to have a death date, I want to have them collated into the list proximal to other people who died then.
I recognize that this is not a discrete operation - there is ambiguity about where someone should go when their birth date is missing. But I'm looking for something that is a good approximation, most of the time.
Here's an example list of what I'd like:
Alice 1800 1830
Bob 1805 1845
Carol 1847
Don 1820 1846
Esther 1825 1860
In this example, I'd be happy with Carol appearing either before or after Don - that's the ambiguity I'm prepared to accept. The important outcome is that Carol is sorted in the list relative to her death date as a death date, not sorting the death dates in with the birth dates.
What doesn't work is if I coalesce or otherwise map birth and death dates together. For example, ORDER BY birth_date, death_date would put Carol after Esther, which is way out of place by my thinking.

I think you're going to have to calculate an average age people end up living (for those having both birth and death dates). And either subtract them from death date or add them to birth date for people who don't have the other one.
Doing this in one query may not be efficient, and perhaps ugly because mysql doesn't have windowing functions. You may be better of precalculating the average living age beforehand. But let's try to do it in one query anyway:
SELECT name, birth_date, death_date
FROM people
ORDER BY COALESCE(
birth_date,
DATE_SUB(death_date, INTERVAL (
SELECT AVG(DATEDIFF(death_date, birth_date))
FROM people
WHERE birth_date IS NOT NULL AND death_date IS NOT NULL
) DAY)
)

N.B.: I've tried with a larger dataset, and it is not working completely as I'd expect.
Try with this query (it needs an id primary key column):
SELECT * FROM people p
ORDER BY (
CASE WHEN birth IS NOT NULL THEN (
SELECT ord FROM (
SELECT id, #rnum := #rnum + 1 AS ord
FROM people, (SELECT #rnum := 0) r1
ORDER BY (CASE WHEN birth IS NOT NULL THEN 0 ELSE 1 END), birth, death
) o1
WHERE id = p.id
) ELSE (
SELECT ord FROM (
SELECT id, #rnum := #rnum + 1 AS ord
FROM people, (SELECT #rnum := 0) r2
ORDER BY (CASE WHEN death IS NOT NULL THEN 0 ELSE 1 END), death, birth
) o2
WHERE id = p.id
)
END)
;
What I've done is, basically, to sort the dataset two times, once by birth date and then by death date. Then I've used these two sorted lists to assign the final order to the original dataset, picking the place from the birth-sorted list at first, and using the place from the death-sorted list when a row has no birth date.
Here's a few problems with that query:
I didn't run it against lots of datasets, so I can't really guarantee it will work with any dataset;
I didn't check its performance, so it could be quite slow on large datasets.
This is the table I've used to write it, tested with MySQL 5.6.21 (I can't understand why, but SQL Fiddle is rejecting my scripts with a Create script error, so I can't provide you with a live example).
Table creation:
CREATE TABLE `people` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL,
`birth` INT(11) NULL DEFAULT NULL,
`death` INT(11) NULL DEFAULT NULL,
PRIMARY KEY (`id`)
);
Data (I actually slightly changed yours):
INSERT INTO `people` (`name`, `birth`, `death`) VALUES ('Alice', 1800, NULL);
INSERT INTO `people` (`name`, `birth`, `death`) VALUES ('Bob', 1805, 1845);
INSERT INTO `people` (`name`, `birth`, `death`) VALUES ('Carol', NULL, 1847);
INSERT INTO `people` (`name`, `birth`, `death`) VALUES ('Don', 1820, 1846);
INSERT INTO `people` (`name`, `birth`, `death`) VALUES ('Esther', 1815, 1860);

you can use a subquery to pick a suitable birthdate for sorting purposes
and then a union to join with the records with a birthdate
for example:
select d1.name, null as birthdate, d1.deathdate, max(d2.birthdate) sort from
d as d1, d as d2
where d1.birthdate is null and d2.deathdate <=d1.deathdate
group by d1.name, d1.deathdate
union all
select name, birthdate, deathdate, birthdate from d
where birthdate is not null
order by 4
http://sqlfiddle.com/#!9/2d91c/1

Not sure if this will work, but worth a try (I can't test this on MySQL) so trying to guess:
order by case birth_date when null then death_date else birth_date end case

Related

Need to find Three most expensive path on average having code 'A' with delivery in May 2021

Statement:
Need to find Three most expensive path(from_to_ in table) on average that code (with_ in table) 'A' with delivery in May 2021? If two ties then include both.
Schema:
'observed_on', date,
'from_', varchar(3),
'to_', varchar(3),
'from_to_', varchar(8),
'with_', varchar(3),
'cart_no', varchar(8),
'deliver_on', date,
'd_charge', double,
Sample data:
click to view
Solution I tried:
SELECT
from_to_
,avg_price
FROM
(
SELECT
from_to_
,ROUND(AVG(d_charge),2) AS avg_price
,DENSE_RANK() OVER(ORDER BY ROUND(AVG(d_charge),2) DESC) rank_by_avgp
FROM
(
SELECT
*
FROM DELIVERY
WHERE deliver_on BETWEEN '2021-05-01' AND '2021-05-30'
AND with_ = 'A'
) AS A
GROUP BY from_to_
) AS bb
WHERE bb.rank_by_avgp <=3;
I know it's a workaround so I am looking for a better solution
#nishant, you have a quite poorly asked question here.
The issues with your question
The sample data, you reference, is a picture? Why not give text values or even better the DML statements to set the example up.
In the referenced dataset(picture) all the DELIVERY.deliver_on date values are before 2010. So the conditions you have here deliver_on BETWEEN '2021-05-01' AND '2021-05-30' will just not return anything for the example data.
If you say
I know it's a workaround so I am looking for a better solution
then based on what? It is a workaround for what? It looks to be producing correct results, so what is the problem with it? Do you want it to perform better or what?
You are not specifing the DB version. The different MySQL versions can have different solutions.
One possible solution
The example dataset setup:
DDL
CREATE TABLE DELIVERY (
observed_on DATE
, from_ VARCHAR(3)
, to_ VARCHAR(3)
, from_to_ VARCHAR(8)
, with_ VARCHAR(3)
, cart_no VARCHAR(8)
, deliver_on DATE
, d_charge DOUBLE
);
DML
INSERT INTO DELIVERY VALUES ('2012-01-19','Aus','Nzl','AusNzl','A','2118','2021-04-19',82.3);
INSERT INTO DELIVERY VALUES ('2012-01-19','Aus','Nzl','AusNzl','A','2118','2021-05-19',82.3);
INSERT INTO DELIVERY VALUES ('2012-01-19','Aus','Nzl','AusNzl','A','2118','2021-05-19',82.3);
INSERT INTO DELIVERY VALUES ('2013-01-19','Ind','Sla','IndSla','B','2233','2021-05-19',70.32);
INSERT INTO DELIVERY VALUES ('2013-01-19','Ind','Sla','IndSla','A','2233','2021-05-19',70.32);
INSERT INTO DELIVERY VALUES ('2013-01-19','Eur','Usa','EurUsa','C','2434','2021-05-19',67.53);
INSERT INTO DELIVERY VALUES ('2013-01-19','Eur','Usa','EurUsa','A','2434','2021-05-19',67.53);
INSERT INTO DELIVERY VALUES ('2013-01-19','Xyz','Usa','XyzUsa','A','2434','2021-05-19',67.53);
INSERT INTO DELIVERY VALUES ('2013-01-19','Xyz','Sla','XyzSla','A','2434','2021-05-19',67.51);
INSERT INTO DELIVERY VALUES ('2012-01-19','Aus','Nzl','AusNzl','A','2323','2021-05-19',82.3);
INSERT INTO DELIVERY VALUES ('2012-01-19','Aus','Nzl','AusNzl','A','2118','2021-06-19',82.3);
QUERY
SELECT from_to_
, avg_d_charge
, denserank_avg_d_charge
FROM /*SUB_to_calculate_the_denserank*/
(
SELECT from_to_
, ROUND(avg_d_charge, 2) AS avg_d_charge
, DENSE_RANK() OVER (
ORDER BY ROUND(avg_d_charge, 2) DESC
) denserank_avg_d_charge /*dense ranking*/
FROM /*SUB_to_calculate_the_averages*/
(
SELECT from_to_
, ROW_NUMBER() OVER (PARTITION BY from_to_) AS rownumber /*To filter for only one row per from_to_.*/
, AVG(d_charge) OVER (PARTITION BY from_to_) AS avg_d_charge /*Average caldulation*/
FROM /*DELIVERY*/
DELIVERY
WHERE 1 = 1
AND with_ = 'A' /* The "code" filter*/
AND DATE_SUB(deliver_on, INTERVAL DAYOFMONTH(deliver_on) - 1 DAY) = '2021-05-01' /* The 2021-05 filter*/
) SUB_to_calculate_the_averages
WHERE 1 = 1
AND rownumber = 1
) SUB_to_calculate_the_denserank
WHERE 1 = 1
AND denserank_avg_d_charge < 4;
The only main difference here from your solution is then that I do not use the aggegate GROUP BY here, only analytical functions. I prefer this quite often as it allows to carry later other attributes through the query without the need to apply the aggregate functions on them etc. But in the end then this comes down to performance and the requirements what/how should be done.

MySQL add balance from previous rows

I’ve tried a few things I’ve seen on here but it doesn’t work in my case, the balance on each row seems to duplicate.
Anyway I have a table that holds mortgage transactions, that table has a Column that stores an interest added value or a payment value.
So I might have:
Balance: 100,000
Interest added 100 - balance 100,100
Payment made -500 - balance 99,600
Interest added 100 - balance 99,700
Payment made -500 - balance 99,200
What I’m looking for is a query to pull all of these in date order newest first and summing the balance in a column depending on whether it has interest or payment (the one that doesn’t will be null) so at the end of the rows it will have the current liability
I can’t remember what the query I tried was but it ended up duplicating rows and the balance was weird
Sample structure and data:
CREATE TABLE account(
id int not null primary key auto_increment,
account_name varchar(50),
starting_balance float(10,6)
);
CREATE TABLE account_transaction(
id int not null primary key auto_increment,
account_id int NOT NULL,
date datetime,
interest_amount int DEFAULT null,
payment_amount float(10,6) DEFAULT NULL
);
INSERT INTO account (account_name,starting_balance) VALUES('Test Account','100000');
INSERT INTO account_transaction (account_id,date,interest_amount,payment_amount) VALUES(1,'2020-10-01 00:00:00',300,null);
INSERT INTO account_transaction (account_id,date,interest_amount,payment_amount) VALUES(1,'2020-10-01 00:00:00',null,-500);
INSERT INTO account_transaction (account_id,date,interest_amount,payment_amount) VALUES(1,'2020-11-01 00:00:00',300,null);
INSERT INTO account_transaction (account_id,date,interest_amount,payment_amount) VALUES(1,'2020-11-05 00:00:00',-500,null);
So interest will be added on to the rolling balance, and the starting balance is stored against the account - if we have to have a transaction added for this then ok. Then when a payment is added it can be either negative or positive to decrease the balance moving to each row.
So above example i'd expect to see something along the lines of:
I hope this makes it clearer
WITH
starting_dates AS ( SELECT id account_id, MIN(`date`) startdate
FROM account_transaction
GROUP BY id ),
combine AS ( SELECT 0 id,
starting_dates.account_id,
starting_dates.startdate `date`,
0 interest_amount,
account.starting_balance payment_amount
FROM account
JOIN starting_dates ON account.id = starting_dates.account_id
UNION ALL
SELECT id,
account_id,
`date`,
interest_amount,
payment_amount
FROM account_transaction )
SELECT DATE(`date`) `Date`,
CASE WHEN interest_amount = 0 THEN 'Balance Brought Forward'
WHEN payment_amount IS NULL THEN 'Interest Added'
WHEN interest_amount IS NULL THEN 'Payment Added'
ELSE 'Unknown transaction type'
END `Desc`,
CASE WHEN interest_amount = 0 THEN ''
ELSE COALESCE(interest_amount, 0)
END Interest,
COALESCE(payment_amount, 0) Payment,
SUM(COALESCE(payment_amount, 0) + COALESCE(interest_amount, 0))
OVER (PARTITION BY account_id ORDER BY id) Balance
FROM combine
ORDER BY id;
fiddle
PS. Source data provided (row with id=4) was altered according to desired output provided. Source structure was altered, FLOAT(10,6) which is not compatible with provided values was replaced with DECIMAL.
PPS. The presence of more than one account is allowed.

MySQL select records using MAX(datefield) minus three days

Clearly, I am missing the forest for the trees...I am missing something obvious here!
Scenario:
I've a typical table asset_locator with multiple fields:
id, int(11) PRIMARY
logref, int(11)
unitno, int(11)
tunits, int(11)
operator, varchar(24)
lineid, varchar(24)
uniqueid, varchar(64)
timestamp, timestamp
My current challenge is to SELECT records from this table based on a date range. More specifically, a date range using the MAX(timestamp) field.
So...when selecting I need to start with the latest timestamp value and go back 3 days.
EX: I select all records WHERE the lineid = 'xyz' and going back 3 days from the latest timestamp. Below is an actual example (of the dozens) I've been trying to run.
MySQL returns a single row with all NULL values for the following:
SELECT id, logref, unitno, tunits, operator, lineid,
uniqueid, timestamp, MAX( timestamp ) AS maxdate
FROM asset_locator
WHERE 'maxdate' < DATE_ADD('maxdate',INTERVAL -3 DAY)
ORDER BY uniqueid DESC
There MUST be something obvious I am missing. If anyone has any ideas, please share.
Many thanks!
MAX() is an aggregated function, which means your SELECT will always return one row containing the maximum value. Unless you use GROUP BY, but it looks that's not what you need.
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_max
If you need all the entries between MAX(timestamp) and 3 days before, then you need to do a subselect to obtain the max date, and after that use it in the search condition. Like this:
SELECT id, logref, unitno, tunits, operator, lineid, uniqueid, timestamp
FROM asset_locator
WHERE timestamp >= DATE_ADD( (SELECT MAX(timestamp) FROM asset_locator), INTERVAL -3 DAY)
It will still run efficiently as long as you have an index defined on timestamp column.
Note: In your example
WHERE 'maxdate' < DATE_ADD('maxdate',INTERVAL -3 DAY)
Here you were are actually using the string "maxdate" because of the quotes causing the condition to return false. That's why you were seeing NULL for all fields.
Edit: Oops, forgot the "FROM asset_locator" in query. It got lost at some point when writing the answer :)

MySQL Number of Days inside a DateRange, inside a month (Booking Table)

I'm attempting to create a report for an accommodation service with the following information:
Number of Bookings (Easy, use the COUNT function)
Revenue Amount (Kind of easy).
Number of Room nights. (Rather Hard it seems)
Broken down into each month of the year.
Limitations - I'm currently using PHP/MySQL to create this report.
I'm pulling the data out of the booking system 1 month at a time, then using an ETL process to put it into MySQL.
Because of this, I have duplicate records, when a booking splits across the end of the Month. (eg BookingID = 9216 below - This is because for Revenue purposes we need to split the percentage of the revenue into the corresponding month).
The Question.
How do I write some SQL that will:
Calculate the number of room nights that was booked into a Property and Group it by the month. Taking into account that if a booking spans across the end of the month, that the room nights that are inside of the same month, as the checkin are counted towards that month, and room nights which the same month as checkout are in the same month as checkout.
At first I used this: DATEDIFF(Checkout, Checkin).
But that lead to one month having 48 room nights in a 31 day month. (because a) it counted 1 booking as 11 nights, even through it was split across the 2 months, and b) because it appears twice).
Then once I have the statement I need to integrate it back into my CrossTab SQL for the entire year.
Some resources that I have found, but can't seem to make work (MySql Query- Date Range within a Date Range & php mysql double date range)
Here is a Sample of the Table: (There are ~100,000 rows of similar data).
CREATE TABLE IF NOT EXISTS `bookingdata` (
`idBookingData` int(11) NOT NULL AUTO_INCREMENT,
`PropertyID` int(10) NOT NULL,
`Checkin` date DEFAULT NULL,
`Checkout` date DEFAULT NULL,
`Rent` decimal(10,2) DEFAULT NULL,
`BookingID` int(11) DEFAULT NULL,
PRIMARY KEY (`idBookingData`),
UNIQUE KEY `idBookingData_UNIQUE` (`idBookingData`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=10472 ;
INSERT INTO `bookingdata` (`idBookingData`, `PropertyID`, `Checkin`, `Checkout`, `Rent`, `BookingID`) VALUES
(5148, 2, '2011-07-02', '2011-07-05', 1105.00, 10612),
(5149, 2, '2011-07-05', '2011-07-13', 2155.00, 10184),
(5151, 2, '2011-07-14', '2011-07-17', 1105.00, 11102),
(5153, 2, '2011-07-22', '2011-07-24', 930.00, 14256),
(5154, 2, '2011-07-24', '2011-08-04', 1832.73, 9216),
(5907, 2, '2011-07-24', '2011-08-04', 687.27, 9216),
(5910, 2, '2011-08-11', '2011-08-14', 1140.00, 13633),
(5911, 2, '2011-08-15', '2011-08-16', 380.00, 17770),
(5915, 2, '2011-08-25', '2011-08-29', 1350.00, 17719),
(5916, 2, '2011-08-30', '2011-09-01', 740.00, 16813);
You're on the right lines. You need to join your query with a table of the months for which you want data, which can either be permanent or (as shown in my example below) created dynamically in a UNION subquery:
SELECT YEAR(month.d),
MONTHNAME(month.d),
SUM(1 + DATEDIFF( -- add 1 because start&finish on same day is still 1 day
LEAST(Checkout, LAST_DAY(month.d)), GREATEST(Checkin, month.d)
)) AS days
FROM bookingdata
RIGHT JOIN (
SELECT 20110101 AS d
UNION ALL SELECT 20110201 UNION ALL SELECT 20110301
UNION ALL SELECT 20110401 UNION ALL SELECT 20110501
UNION ALL SELECT 20110601 UNION ALL SELECT 20110701
UNION ALL SELECT 20110801 UNION ALL SELECT 20110901
UNION ALL SELECT 20111001 UNION ALL SELECT 20111101
UNION ALL SELECT 20111201
) AS month ON
Checkin <= LAST_DAY(month.d)
AND month.d <= Checkout
GROUP BY month.d
See it on sqlfiddle.

How to avoid duplicate registrations in MySQL

I wonder if it is possible to restrain users to insert duplicate registration records.
For example some team is registered from 5.1.2009 - 31.12.2009. Then someone registers the same team for 5.2.2009 - 31.12.2009.
Usually the end_date is not an issue, but start_date should not be between existing records start and end date
CREATE TABLE IF NOT EXISTS `ejl_team_registration` (
`id` int(11) NOT NULL auto_increment,
`team_id` int(11) NOT NULL,
`league_id` smallint(6) NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
PRIMARY KEY (`team_id`,`league_id`,`start_date`),
UNIQUE KEY `id` (`id`)
);
I would check it in the code df the program, not the database.
If you want to do this in database, you can probably use pre-insert trigger that will fail if there are any conflicting records.
This is a classic problem of time overlapping. Say you want to register a certain team for the period of A (start_date) until B (end_date).
This should NOT be allowed in next cases:
the same team is already registered, so that the registered period is completely inside the A-B period (start_date >= A and end_date <= B)
the same team is already registered at point A (start_date <= A and end_date >= A)
the same team is already registered at point B (start_date <= B and end_date >= B)
In those cases, registering would cause time overlap. In any other it would not, so you're free to register.
In sql, the check would be:
select count(*) from ejl_team_registration
where (team_id=123 and league_id=45)
and ((start_date>=A and end_date<=B)
or (start_date<=A and end_date>=A)
or (start_date<=B and end_date>=B)
);
... with of course real values for the team_id, league_id, A and B.
If the query returns anything else than 0, the team is already registered and registering again would cause time overlap.
To demonstrate this, let's populate the table:
insert into ejl_team_registration (id, team_id, league_id, start_date, end_date)
values (1, 123, 45, '2007-01-01', '2007-12-31')
, (2, 123, 45, '2008-01-01', '2008-12-31')
, (3, 123, 45, '20010-01-01', '2010-12-31');
Let's check if we could register team 123 in leage 45 between '2009-02-03' and '2009-12-31':
select count(*) from ejl_team_registration
where (team_id=123 and league_id=45)
and ((start_date<='2009-02-03' and end_date>='2009-12-31')
or (start_date<='2009-03-31' and end_date>='2009-03-02')
or (start_date<='2009-12-31' and end_date>='2009-12-31')
);
The result is 0, so we can register freely.
Registering between e.g. '2009-02-03' and '2011-12-31' would not be possible.
I'll leave checking other values for you as a practice.
PS: You mentioned the end date is usually not an issue. As a matter of fact it is, since inserting an entry with invalid end date would cause overlapping as well.
Before doing your INSERT, do a SELECT to check.
SELECT COUNT(*) FROM `ejl_team_registration`
WHERE `team_id` = [[myTeamId]] AND `league_id` = [[myLeagueId]]
AND `start_date` <= NOW()
AND `end_date` >= NOW()
If that returns more than 0, then don't insert.