Related
I do a MySql query that looks for results in two different tables.
Tables
Contract
id, contract, creditor_id, client_id, event_id
Invoice
id, contract_id, invoice, due, value
The idea is to select the contracts using some parameters in the query, such as:
initial delay and final, initial value and final, events, creditor.
For this, I use the INNER JOIN, HAVING and IN.
Details:
After receiving the result, I take the values and loop to make an update on each query result, using the result ID.
I built an example in SQL Fiddle for better visualization.
The problem is, when I do this query with very long results or thousands of lines, the query is really slow.
So, I wanted to know if there is a better way to do the same query in an optimal way.
Query:
SELECT `c`.`id`,
`c`.`contract`,
`c`.`creditor_id`,
`c`.`client_id`,
`c`.`event_id`,
`t`.`total_value`,
`delay`
FROM `contract` `c`
INNER JOIN
(SELECT contract_id,
Sum(value) total_value,
Datediff(Curdate(), due) AS delay
FROM invoice t GROUP BY contract_id
HAVING delay <= 99999
AND delay >= 1
AND total_value >= 1
AND total_value < 99999) t ON `t`.`contract_id` = `c`.`id`
WHERE `c`.`creditor_id` = 1
AND `c`.`event_id` IN(4, 7, 5, 8, 13, 3, 6, 15, 2, 24, 1, 21, 20, 14, 17, 18, 16, 23, 25, 22, 9, 10, 26, 12, 19, 11)
If "1..99999" means "any value", then remove the test from the query. That is construct a different query when the user wants an open-ended test.
Deal with the lack of due in the GROUP BY.
Change Datediff(Curdate(), due) > 123 to due < CURDATE() - INTERVAL 123 DAY. That will give us a chance to use due in an INDEX.
Qualify due and value; we can't tell which table they are in.
Please provide SHOW CREATE TABLE.
c could use INDEX(creditor_id, event_id), but after the above issues are addressed, there may be an even better index.
I am dealing with a huge database in MySQL about Italian working contracts (number of rows about 20 million). Each row of my core table represents a specific signed contract for a worker with a specific employer. In order to reconstruct the work history of each worker, when I indexed the table in the import process, I have ordered workers by their identification code and the starting date of each contract. Then, each row has its own progressive ID but at the same time, I have added two fields to each row one referring to the previous ID, the other to the following one. These two fields are effectively not null only if the previous or the subsequent ID refers to the same worker.
I have made a small example of how my data looks like here (alternatively, in the following script I have created a small reproducible example).
How the history of a worker may look like
How it should change at the end
My current task is to calculate the effective number of days worked by each individual on my table. Nonetheless, data are undoubtedly characterized by huge overlapping. After all, each individual may have several overlapping contracts. For example, a contract started on date 01/01/2010 and ended on date 01/01/2012 may be followed by several other shorter contracts started later on by ending before the date 01/01/2012. Therefore, if I count the number of days effectively worked by this individual, I may have a double counting. For this reason, I want to rearrange contracts by changing their end date in order to obtain subsequent nonoverlapping contracts. The only possible overlap could be of one day.
I have made a graphical example of how the working history of an individual may look like and how I want to re-arrange it in the following two images.
Since I cannot modify the starting date of each contract/row, I wanted to work on the ending date of each contract by modifying it according to the previous contract.
I worked by following these steps:
If the ending date of the previous contract is greater than the end of the current contract (of each row), I modified the ending date placing it equal to the end date of the previous one.
Since I do not know how many contracts are actually overlapping (each contract if mliked to the previous one and the following one but there may be an overlapping contract further in the past), I decided to iterate this process by the maximum number of contract that an individual may have in my table. With this procedure, I substantially extend the overlapping time up to the case where this overlapping ceases to occur. For example, the end date of contract n.3 of the example would extend to contract n.4, n.5, and n.6. At the end of this iterative procedure, they will all have the same ending date equal today 12.
Once finished this procedure I modified the end date of each contract by placing it equal to the starting date of the following one if there is overlapping.
Here below you can find the code I used for this procedure.
-- My example table (data_example.csv on GitHub)
drop table if exists mytable;
create table mytable
(
id INT,
WORKER_ID INT not null,
EMPLOYER_ID INT not null,
dt_start date not null, -- Contract start date
dt_end date, -- Contract end date
id_prev INT, -- ID of previous contract
dt_start_prev date, -- Start date of previous contract
dt_end_prev date, -- End date of previous contract
id_next INT, -- ID of next contract
dt_start_next date, -- Start date of next contract
dt_end_next date, -- End date of next contract
primary key(id)
);
insert into mytable
(id, WORKER_ID, EMPLOYER_ID, dt_start, dt_end,
id_prev, dt_start_prev, dt_end_prev,
id_next, dt_start_next, dt_end_next)
values
(1, 5157, 3384722, '2012-01-01', '2012-01-03', NULL, NULL, NULL, 2, '2012-01-02', '2012-01-04'),
(2, 5157, 3384722, '2012-01-02', '2012-01-04', 1, '2012-01-01', '2012-01-03', 3, '2012-01-04', '2012-01-12'),
(3, 5157, 96120, '2012-01-04', '2012-01-12', 2, '2012-01-02', '2012-01-04', 4, '2012-01-07', '2012-01-08'),
(4, 5157, 3384722, '2012-01-07', '2012-01-08', 3, '2012-01-04', '2012-01-12', 5, '2012-01-08', '2012-01-10'),
(5, 5157, 3384722, '2012-01-08', '2012-01-10', 4, '2012-01-07', '2012-01-08', 6, '2012-01-10', '2012-01-11'),
(6, 5157, 3954093, '2012-01-10', '2012-01-11', 5, '2012-01-08', '2012-01-10', 7, '2012-01-12', '2012-01-15'),
(7, 5157, 3384722, '2012-01-12', '2012-01-15', 6, '2012-01-10', '2012-01-11', 8, '2012-01-14', '2012-01-16'),
(8, 5157, 3954093, '2012-01-14', '2012-01-16', 7, '2012-01-12', '2012-01-15', 9, '2012-01-14', '2012-01-14'),
(9, 5157, 3384722, '2012-01-14', '2012-01-14', 8, '2012-01-14', '2012-01-16', 10, '2012-01-14', '2012-01-20'),
(10, 5157, 96120, '2012-01-14', '2012-01-20', 9, '2012-01-14', '2012-01-14', NULL, NULL, NULL),
(11, 5990, 1940957, '2012-01-01', '2012-01-30', NULL, NULL, NULL, 12, '2012-02-01', '2012-02-15'),
(12, 5990, 4822105, '2012-02-01', '2012-02-15', 11, '2012-01-01', '2012-01-30', 13, '2012-02-10', '2012-02-10'),
(13, 5990, 1940957, '2012-02-10', '2012-02-10', 12, '2012-02-01', '2012-02-15', 14, '2012-02-16', '2012-02-20'),
(14, 5990, 1940957, '2012-02-16', '2012-02-20', 13, '2012-02-10', '2012-02-10', 15, '2012-02-17', '2012-02-28'),
(15, 5990, 4822105, '2012-02-17', '2012-02-28', 14, '2012-02-16', '2012-02-20', NULL, NULL, NULL);
-- The following table counts the number of contracts for each individual
-- I will use it the determine the maximum number of contract per worker
drop table if exists max_act;
create table max_act
as select WORKER_ID, count(*) n
from mytable
group by WORKER_ID;
set SQL_SAFE_UPDATES = 0;
-- Here I create the procedure
drop procedure if exists doiterate;
delimiter //
create procedure doiterate()
begin
declare total INT unsigned DEFAULT 0;
-- The number of iterations is equal to the maximum value in the table 'max_act'
while total <= (select MAX(n) from max_act) do
-- If the end date of the previous contract is greater than the end of the current contract
-- the procedure sets the end date equal to the end date of the previous contract
update mytable a
set a.dt_end =
case
when a.dt_end is NOT null and a.dt_end_prev > a.dt_end then a.dt_end_prev
else a.dt_end end
;
-- Here I update in each row the end date of the previous contract
update mytable a
left outer join mytable p on a.id_prev = p.id
set a.dt_end_prev =
case
when a.dt_end_prev is NOT null and a.dt_end_prev != p.dt_end then p.dt_end
else a.dt_end_prev end
;
set total = total + 1;
end while;
end//
delimiter ;
CALL doiterate();
-- Here I set the end date of each contract equal to the beginning of the next one if there is overlapping
update mytable a
set a.dt_end =
case
when a.dt_end is NOT null and a.dt_start_next < a.dt_end then a.dt_start_next
else a.dt_end end
;
set SQL_SAFE_UPDATES = 1;
However, I think this procedure is all but optimal. I have estimated it would take me days until it ends. I would really appreciate it if someone may give me some hints on how to handle this issue. Thank you in advance.
As already stated in one comment, I tried the use of both LAG() and LEAD() functions to concatenate in chronological order all contracts by individual. However, the procedure - maybe my fault - results to be even slower.
Therefore, I simply decided to run the procedure only on those workers only on those workers who actually had at least two overlapping contracts, maybe not the best solution (for sure not in term of coding) but at least I was able to perform the procedure (it took me more or less 1 day and half).
-- Here I am identifying contracts with an overlapping previous contract
alter table mytable add column flag_overlap INT default 0;
update mytable set flag_overlap = 1 where dt_end is NOT null and dt_end_prev > dt_end;
-- Creating a table with only those workers with at least two overlapping contracts
drop table if exists mytable_id;
create table mytable_id as select WORKER_ID
from mytable where flag_overlap = 1
group by WORKER_ID;
-- This is my table of interests with all the contracts for those workers identified in the previous step
drop table if exists mytable_mod;
create table mytable_mod
as select *
from mytable a
inner join mytable_id b on a.WORKER_ID = b.WORKER_ID
order by WORKER_ID , dt_start;
alter table mytable_mod add unique index idx_ord_id(id);
-- The rest of the code is the same as the one posted in this question,
-- simply I referred to the table 'mytable_mod' and no longer to 'mytable'.
-- [...]
-- At the end I updated the 'revised' end date of my original table 'mytable'
UPDATE mytable a
left outer join mytable_mod b on a.ord_all = b.ord_all
set
a.dt_end = b.dt_end ,
a.dt_end_next = b.dt_end_next ,
a.dt_end_prev = b.dt_end_prev
;
I have original data like this.
Original Data
I need to create two report with it, this is the first report :
First Report
The running value can be achieved with this expression
RunningValue(Fields!City.Value+Fields!Month.Value,CountDistinct,"Region")
The second report i need is this:
Second Report
What can i do to add logic to the running value so it can avoid numbering row with Sum(Amount) zero ?
I'm not sure you can do this using RunningValue, other people may know of a way.
What I did was move the logic to the query.
I reproduced some data to match your final report numbers (your sample data does not match the sample report output).
Here's the sample data I used.
DECLARE #t TABLE(Region varchar(10), City varchar(10), MonthID int, Amount int)
INSERT INTO #t VALUES
('Asia', 'Tokyo', 4, 1000),
('Asia', 'Tokyo', 4, 500),
('Asia', 'Tokyo', 5, 2000),
('Asia', 'Tokyo', 5, -2000),
('Asia', 'Tokyo', 6, 1000),
('Asia', 'Tokyo', 6, -500),
('Asia', 'Bangkok', 4, 500),
('Asia', 'Bangkok', 4, 500),
('Asia', 'Bangkok', 5, 3000),
('Asia', 'Bangkok', 5, -500),
('Asia', 'Bangkok', 6, -750),
('Asia', 'Bangkok', 6, 750)
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY Region, City ORDER BY MonthID) as RowN1
, ROW_NUMBER() OVER(PARTITION BY (CASE Amount WHEN 0 THEN 0 ELSE 1 END), Region, City ORDER BY MonthID) as RowN2
FROM
(
SELECT
Region, City, MonthID
, SUM(Amount) AS Amount
FROM #t
GROUP BY Region, City, MonthID
) x
ORDER BY Region, City DESC, MonthID
I used the ROW_NUMBER function to assign a row numbers for both reports.
The first one "RowN1" is a simple row number within city
The second one "RowN2" does the same thing but it partitions any zero values so they are not in the same partition as the other data.
This gives us the following dataset
Now you can use a simple table to display the result in your first report using RowN1
In your second report use RowN2 with the expression
=IIF(Fields!Amount.Value=0, Nothing, Fields!RowN2.Value)
This simply forces a blank to be displayed if the amount is zero.
I did this and got the following results.
Note: I used a month number in the data just to make sorting easier, in the report I used =MonthName(Fields!MonthID.Value) to show the actual name.
I have the following sample data:
id, user_id, action, date, item_id
(5, 1, 'created', '2016-09-08, 1),
(6, 1, 'sold', '2016-09-14, 1),
(7, 2, 'created', '2016-09-08, 2),
(8, 2, 'sold', '2016-09-30, 2),
(9, 3, 'created', '2016-10-08, 3)
I'm trying to create a Query that returns the percentage of items sold within 1 week. The value of the column: "action" represents if the item has been put up for sale, or sold. How could this look?. Should I do this by using a subquery or?
Expected result should be a single percentage (the number of items sold within 1 week, of the total number of items created).
Assuming that the data is indeed this simple, this can easily be done by joining the same table to itself. The first reference to the sample data can be aliased as created and will filter to items with an action of created. Likewise, the sold table reference will restrict itself to items with an action of sold.
Once that's done, we'll get a row of data that has an item's creation and sold dates. Anything that doesn't have a sold action is simply discarded by an inner join. The built in function datediff(date1, date2) will give us the number of days between our two dates. If this is less than or equal to 7, you know that it was sold within a week.
select
created.id
, created.user_id
, created.item_id
, datediff(created.date, sold.date) as days_to_sell
from
sample_data created
join sample_data sold
on created.item_id = sold.item_id
where
created.action = 'created'
and sold.action = 'sold'
and datediff(created.date, sold.date) <= 7
I am developing a employee login system in which user check in and checkout timings are recorder. I have the following mySql table schema from which I would like to query the total working hours of an employee of a particular month.
AttendanceId UserId Operation CreatedDate
24 4 1 2016-03-20 23:18:59
25 4 2 2016-03-20 23:19:50
26 4 1 2016-03-20 23:20:28
27 4 2 2016-03-20 23:20:31
Operation 1 is for check in and operation 2 is for checkout. Can any one help me to build this query?
A pleasingly complicated question, thanks. My query deals with:
Attendances that aren't precisely measured in hours. The number of seconds is totalled and divided by 3600 at the end of the calculation.
Attendances that span the month boundary at either end (thanks strawberry)
Attendances in the current month that have started (there is an entry with operation "1") but not yet finished (there is no corresponding operation "2").
I used the following data for testing:
INSERT INTO Attendance(UserId, Operation, CreatedDate) VALUES
(4, 1, '2016-01-01 15:00:00'),
(4, 2, '2016-01-01 19:00:00'),
(4, 1, '2016-01-31 23:00:00'),
(4, 2, '2016-02-01 01:00:00'),
(4, 1, '2016-02-20 23:18:59'),
(4, 2, '2016-02-20 23:19:50'),
(4, 1, '2016-02-20 23:20:28'),
(4, 2, '2016-02-20 23:20:31'),
(4, 1, '2016-02-29 23:00:00'),
(4, 2, '2016-03-01 01:00:00'),
(4, 1, '2016-03-02 15:00:00'),
(4, 2, '2016-03-02 18:00:00'),
(4, 1, '2016-03-22 10:00:00');
The query selects all users' hours for a specific month. Selecting results for more than one month in one query is more complicated because of the possibility that attendances span month boundaries and if required it might be simplest to iterate over the months and run the query repeatedly, adjusting the four dates in the SQL appropriately.
The innermost query selects all arrival times and the corresponding departure time for all users. The outer query then restricts them to the current month, calculates the difference between the two times, and sums them by user.
SELECT UserId, SUM(TIMESTAMPDIFF(
SECOND,
GREATEST(TimeIn, '2016-02-01'),
LEAST(COALESCE(TimeOut, NOW()), '2016-03-01'))) / 3600 HoursInMonth
FROM (SELECT TimeIn.UserId, TimeIn.CreatedDate TimeIn, MIN(TimeOut.CreatedDate) TimeOut
FROM Attendance TimeIn
LEFT JOIN Attendance TimeOut ON TimeOut.UserId = TimeIn.UserId
AND TimeOut.Operation = 2
AND TimeOut.CreatedDate > TimeIn.CreatedDate
WHERE TimeIn.operation = 1
GROUP BY TimeIn.AttendanceId
ORDER BY TimeIn.CreatedDate) TimeInOut
WHERE DATE_FORMAT(TimeIn, '%Y-%m') = '2016-02'
OR DATE_FORMAT(TimeOut, '%Y-%m') = '2016-02'
OR (DATE_FORMAT(TimeIn, '%Y-%m') < '2016-02' AND TimeOut IS NULL)
GROUP BY UserId;