I am dealing with a huge database in MySQL about Italian working contracts (number of rows about 20 million). Each row of my core table represents a specific signed contract for a worker with a specific employer. In order to reconstruct the work history of each worker, when I indexed the table in the import process, I have ordered workers by their identification code and the starting date of each contract. Then, each row has its own progressive ID but at the same time, I have added two fields to each row one referring to the previous ID, the other to the following one. These two fields are effectively not null only if the previous or the subsequent ID refers to the same worker.
I have made a small example of how my data looks like here (alternatively, in the following script I have created a small reproducible example).
How the history of a worker may look like
How it should change at the end
My current task is to calculate the effective number of days worked by each individual on my table. Nonetheless, data are undoubtedly characterized by huge overlapping. After all, each individual may have several overlapping contracts. For example, a contract started on date 01/01/2010 and ended on date 01/01/2012 may be followed by several other shorter contracts started later on by ending before the date 01/01/2012. Therefore, if I count the number of days effectively worked by this individual, I may have a double counting. For this reason, I want to rearrange contracts by changing their end date in order to obtain subsequent nonoverlapping contracts. The only possible overlap could be of one day.
I have made a graphical example of how the working history of an individual may look like and how I want to re-arrange it in the following two images.
Since I cannot modify the starting date of each contract/row, I wanted to work on the ending date of each contract by modifying it according to the previous contract.
I worked by following these steps:
If the ending date of the previous contract is greater than the end of the current contract (of each row), I modified the ending date placing it equal to the end date of the previous one.
Since I do not know how many contracts are actually overlapping (each contract if mliked to the previous one and the following one but there may be an overlapping contract further in the past), I decided to iterate this process by the maximum number of contract that an individual may have in my table. With this procedure, I substantially extend the overlapping time up to the case where this overlapping ceases to occur. For example, the end date of contract n.3 of the example would extend to contract n.4, n.5, and n.6. At the end of this iterative procedure, they will all have the same ending date equal today 12.
Once finished this procedure I modified the end date of each contract by placing it equal to the starting date of the following one if there is overlapping.
Here below you can find the code I used for this procedure.
-- My example table (data_example.csv on GitHub)
drop table if exists mytable;
create table mytable
(
id INT,
WORKER_ID INT not null,
EMPLOYER_ID INT not null,
dt_start date not null, -- Contract start date
dt_end date, -- Contract end date
id_prev INT, -- ID of previous contract
dt_start_prev date, -- Start date of previous contract
dt_end_prev date, -- End date of previous contract
id_next INT, -- ID of next contract
dt_start_next date, -- Start date of next contract
dt_end_next date, -- End date of next contract
primary key(id)
);
insert into mytable
(id, WORKER_ID, EMPLOYER_ID, dt_start, dt_end,
id_prev, dt_start_prev, dt_end_prev,
id_next, dt_start_next, dt_end_next)
values
(1, 5157, 3384722, '2012-01-01', '2012-01-03', NULL, NULL, NULL, 2, '2012-01-02', '2012-01-04'),
(2, 5157, 3384722, '2012-01-02', '2012-01-04', 1, '2012-01-01', '2012-01-03', 3, '2012-01-04', '2012-01-12'),
(3, 5157, 96120, '2012-01-04', '2012-01-12', 2, '2012-01-02', '2012-01-04', 4, '2012-01-07', '2012-01-08'),
(4, 5157, 3384722, '2012-01-07', '2012-01-08', 3, '2012-01-04', '2012-01-12', 5, '2012-01-08', '2012-01-10'),
(5, 5157, 3384722, '2012-01-08', '2012-01-10', 4, '2012-01-07', '2012-01-08', 6, '2012-01-10', '2012-01-11'),
(6, 5157, 3954093, '2012-01-10', '2012-01-11', 5, '2012-01-08', '2012-01-10', 7, '2012-01-12', '2012-01-15'),
(7, 5157, 3384722, '2012-01-12', '2012-01-15', 6, '2012-01-10', '2012-01-11', 8, '2012-01-14', '2012-01-16'),
(8, 5157, 3954093, '2012-01-14', '2012-01-16', 7, '2012-01-12', '2012-01-15', 9, '2012-01-14', '2012-01-14'),
(9, 5157, 3384722, '2012-01-14', '2012-01-14', 8, '2012-01-14', '2012-01-16', 10, '2012-01-14', '2012-01-20'),
(10, 5157, 96120, '2012-01-14', '2012-01-20', 9, '2012-01-14', '2012-01-14', NULL, NULL, NULL),
(11, 5990, 1940957, '2012-01-01', '2012-01-30', NULL, NULL, NULL, 12, '2012-02-01', '2012-02-15'),
(12, 5990, 4822105, '2012-02-01', '2012-02-15', 11, '2012-01-01', '2012-01-30', 13, '2012-02-10', '2012-02-10'),
(13, 5990, 1940957, '2012-02-10', '2012-02-10', 12, '2012-02-01', '2012-02-15', 14, '2012-02-16', '2012-02-20'),
(14, 5990, 1940957, '2012-02-16', '2012-02-20', 13, '2012-02-10', '2012-02-10', 15, '2012-02-17', '2012-02-28'),
(15, 5990, 4822105, '2012-02-17', '2012-02-28', 14, '2012-02-16', '2012-02-20', NULL, NULL, NULL);
-- The following table counts the number of contracts for each individual
-- I will use it the determine the maximum number of contract per worker
drop table if exists max_act;
create table max_act
as select WORKER_ID, count(*) n
from mytable
group by WORKER_ID;
set SQL_SAFE_UPDATES = 0;
-- Here I create the procedure
drop procedure if exists doiterate;
delimiter //
create procedure doiterate()
begin
declare total INT unsigned DEFAULT 0;
-- The number of iterations is equal to the maximum value in the table 'max_act'
while total <= (select MAX(n) from max_act) do
-- If the end date of the previous contract is greater than the end of the current contract
-- the procedure sets the end date equal to the end date of the previous contract
update mytable a
set a.dt_end =
case
when a.dt_end is NOT null and a.dt_end_prev > a.dt_end then a.dt_end_prev
else a.dt_end end
;
-- Here I update in each row the end date of the previous contract
update mytable a
left outer join mytable p on a.id_prev = p.id
set a.dt_end_prev =
case
when a.dt_end_prev is NOT null and a.dt_end_prev != p.dt_end then p.dt_end
else a.dt_end_prev end
;
set total = total + 1;
end while;
end//
delimiter ;
CALL doiterate();
-- Here I set the end date of each contract equal to the beginning of the next one if there is overlapping
update mytable a
set a.dt_end =
case
when a.dt_end is NOT null and a.dt_start_next < a.dt_end then a.dt_start_next
else a.dt_end end
;
set SQL_SAFE_UPDATES = 1;
However, I think this procedure is all but optimal. I have estimated it would take me days until it ends. I would really appreciate it if someone may give me some hints on how to handle this issue. Thank you in advance.
As already stated in one comment, I tried the use of both LAG() and LEAD() functions to concatenate in chronological order all contracts by individual. However, the procedure - maybe my fault - results to be even slower.
Therefore, I simply decided to run the procedure only on those workers only on those workers who actually had at least two overlapping contracts, maybe not the best solution (for sure not in term of coding) but at least I was able to perform the procedure (it took me more or less 1 day and half).
-- Here I am identifying contracts with an overlapping previous contract
alter table mytable add column flag_overlap INT default 0;
update mytable set flag_overlap = 1 where dt_end is NOT null and dt_end_prev > dt_end;
-- Creating a table with only those workers with at least two overlapping contracts
drop table if exists mytable_id;
create table mytable_id as select WORKER_ID
from mytable where flag_overlap = 1
group by WORKER_ID;
-- This is my table of interests with all the contracts for those workers identified in the previous step
drop table if exists mytable_mod;
create table mytable_mod
as select *
from mytable a
inner join mytable_id b on a.WORKER_ID = b.WORKER_ID
order by WORKER_ID , dt_start;
alter table mytable_mod add unique index idx_ord_id(id);
-- The rest of the code is the same as the one posted in this question,
-- simply I referred to the table 'mytable_mod' and no longer to 'mytable'.
-- [...]
-- At the end I updated the 'revised' end date of my original table 'mytable'
UPDATE mytable a
left outer join mytable_mod b on a.ord_all = b.ord_all
set
a.dt_end = b.dt_end ,
a.dt_end_next = b.dt_end_next ,
a.dt_end_prev = b.dt_end_prev
;
Here is the schema:
CREATE TABLE `available_timings` (
`id` bigint(20) NOT NULL ,
`from_time` time DEFAULT NULL,
`to_time` time DEFAULT NULL
);
INSERT INTO `available_timings` (`id`, `from_time`, `to_time`) VALUES
(1, '15:11:00' , '17:15:00'),
(2, '15:11:00', '15:11:00'),
(3, '09:00:00', '12:30:00'),
(4, '15:40:00', '15:40:00'),
(5,'13:30:00', '17:15:00'),
(6, '16:10:00', '16:10:00'),
(7, '07:45:00', '11:45:00'),
(8, '19:00:00', '22:30:00'),
(9, '16:14:00', '16:14:00'),
(10, '09:30:00', '17:45:00'),
(11, '10:30:00','15:15:00');
http://sqlfiddle.com/#!9/fc9afe/2
I am trying to achieve whether the current time falls between from time and to time in mysql
SELECT *
FROM `available_timings`
WHERE curtime() >=`from_time` or curtime() <=`to_time`
i have searched many forum and also tried few queries but couldn't succeeded.
Can any one help me here to solve my problem
Thank you
Use BETWEEN
SELECT *
FROM `available_timings`
WHERE curtime() BETWEEN `from_time` AND `to_time`
Use AND instead of OR
WHERE curtime() >= `from_time`
AND curtime() <= `to_time`
You don't want any of these conditions to be true. You want both of them to be true.
I have a table which has a date column, some self-reports of happiness in another column, and a flag column which indicates a gym day.
I want to get the average happiness scores on the day before, the day of, and the day after a gym session.
If you imagine this table, the averages should return day_before = 1, day_of = 2, and day_after = 3.
So the set up is like in this fiddle, although in my actual database the gym flag column is joined in from a separate table.
CREATE TABLE test
(`date` datetime, `gym` int, `happiness` int)
;
INSERT INTO test
(`date`, `gym`, `happiness`)
VALUES
('2019-01-06 00:00:00', NULL, 1),
('2019-02-06 00:00:00', 1, 2),
('2019-03-06 00:00:00', NULL, 3),
('2019-04-06 01:00:00', NULL, 1),
('2019-05-06 01:00:00', 1, 2),
('2019-06-06 01:00:00', NULL, 3),
('2019-07-06 01:00:00', NULL, 1),
('2019-08-06 01:00:00', 1, 2),
('2019-09-06 01:00:00', NULL, 3)
;
I tried using a subquery to return when the "gym" column in date - 1 = 1, and also use the results in a case which would have "day of", "day before", and "day after" strings. Then I could simply group by that column. I couldn't get this to work and I'm not even sure if that's something you can do.
Use two self-joins.
SELECT AVG(before.happiness) AS day_before, AVG(current.happiness) AS day_of, AVG(after.happiness) AS day_after
FROM test AS current
JOIN test AS before ON before.date = DATE_SUB(current.date, INTERVAL 1 DAY)
JOIN test AS after ON after.date = DATE_ADD(current.date, INTERVAL 1 DAY)
WHERE current.gym = 1
Stumbled across potentially a bug(?) within phpMyAdmin, although it's more likely to maybe be my misunderstanding of MySQL, so was hoping someone could shed some light on this behaviour.
Using the following schema
CREATE TABLE IF NOT EXISTS `mlfsql_test` (
`id` int(11) NOT NULL,
`frequency_length` smallint(3) DEFAULT NULL,
`frequency_units` varchar(10) DEFAULT NULL,
`next_delivery_date` date DEFAULT NULL,
`last_created_delivery_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;
INSERT INTO `mlfsql_test`
(`id`, `frequency_length`, `frequency_units`, `next_delivery_date`, `last_created_delivery_date`) VALUES
(1, 2, 'week', '2014-06-25', NULL),
(2, 3, 'day', '2014-06-27', NULL),
(3, 1, 'week', '2014-08-08', NULL),
(4, 2, 'day', NULL, '2014-06-26');
I want to determine rows with an upcoming delivery, based on their currently set delivery date, or their last delivery date with the frequency taken into consideration.
Came up with the following query which works fine:
SELECT *, IF (next_delivery_date IS NOT NULL, next_delivery_date,
CASE frequency_units
WHEN 'day' THEN DATE_ADD(last_created_delivery_date, INTERVAL frequency_length DAY)
WHEN 'week' THEN DATE_ADD(last_created_delivery_date, INTERVAL frequency_length WEEK)
WHEN 'month' THEN DATE_ADD(last_created_delivery_date, INTERVAL frequency_length MONTH)
END)
AS next_order_due_date
FROM mlfsql_test
HAVING next_order_due_date IS NULL OR (next_order_due_date BETWEEN CURDATE() AND CURDATE() + INTERVAL 10 DAY)
With the data currently in the table, I am expecting it to return 3 rows, but phpMyAdmin states there are a total of 4 rows of results, although it only displays 3...
I've found that if I add a WHERE clause to my query such as WHERE 1, it'll return the 3 rows and also state that there is a total of 3.
Why does it give an incorrect number of returned rows without the WHERE clause? I'm assuming without one phpMyAdmin assumes that all rows will match, however only returns those that actually did, so the count is wrong? Any help would be appreciated.
Edit: phpMyAdmin Version 4.2.0
This seems to be a bug in phpMyAdmin v4.2.x. I have opened a bug ticket (see Bug #4473). I have also proposed a fix for this bug to them (see PR #1253). You can also apply this patch to fix it in v4.2.4. This is most likely to be fixed in upcoming bugfix release i.e. v4.2.5.
Edit 1: My patch was accepted and this issue is fixed in v4.2.5 (upcoming minor release).
I want to draw a graph accurately, Time vs Site Visits.
X axis will be 4, 8, 12, 16, 20, 24. That's increments of four hours.
Y axis total number of visits by first 4 hours, then by next four hours etc.
How can I do it using MySql? There might be some tricks using GROUP BY, but I couldn't get it. I stored all visit to my site, used unix time stamp for time.
The query can be like this -
SELECT FLOOR(HOUR(FROM_UNIXTIME(unix_ts)) / 4) period, COUNT(*) visit_count_per_4_hours FROM visits_table
WHERE DATE(FROM_UNIXTIME(unix_ts)) = DATE(NOW())
GROUP BY period;
This query returns visits for specified day, otherwise calculation should be modified.
TRY
SELECT SUM( visit ) , HOUR( `time_column` )
FROM time_table
WHERE DATE_SUB( `time_column` , INTERVAL 4 HOUR )
GROUP BY HOUR( `time_column` )
working example
CREATE TABLE IF NOT EXISTS `time_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`waqt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`visit` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ;
--
-- Dumping data for table `time_table`
--
INSERT INTO `time_table` (`id`, `waqt`, `visit`) VALUES
(1, '2011-07-28 13:29:04', 3),
(2, '2011-07-28 15:29:10', 4),
(3, '2011-07-28 13:45:35', 7),
(4, '2011-07-28 15:00:47', 5),
(5, '2011-07-28 14:45:03', 6),
(6, '2011-07-28 13:00:21', 3);
and then i execute per hour visit
SELECT SUM(visit), HOUR(waqt)
FROM time_table
WHERE DATE_SUB(`waqt`,INTERVAL 1 HOUR) GROUP BY HOUR(waqt)