I started a HR management project and I want to count days between 2 dates without counting the holidays and weekends. So the HR can count employee's day off
Here's the case, I want to count between 2018-02-14 and 2018-02-20 where there is an office holiday on 2018-02-16. The result should be 3 days.
I have already created a table called tbl_holiday where I put all weekends and holidays in one year there
I found this post, and I tried it on my MariaDB
Here's my query:
SELECT 5 * (DATEDIFF('2018-02-20', '2018-02-14') DIV 7) +
MID('0123444401233334012222340111123400012345001234550', 7 *
WEEKDAY('2018-02-14') + WEEKDAY('2018-02-20') + 1, 1) -
(SELECT COUNT(dates) FROM tbl_holiday WHERE dates NOT IN (SELECT dates FROM tbl_holiday)) as Days
The query works but the result is 4 days, not 3 days. It means the query only exclude the weekends but not the holiday
What is wrong with my query? Am I missing something? Thank you for helping me
#RichardDoe, from the question comments.
In a reasonable implementation of a date table, you create a list of all days (covering a sufficient range to cope with any query you may run against it - 15 years each way from today is probably a useful minimum), and alongside each day you store a variety of derived attributes.
I wrote a Q&A recently with basic tools that would get you started in SQL Server: https://stackoverflow.com/a/48611348/9129668
Unfortunately I don't have a MySQL environment or intimate familiarity with it to allow me to write or adapt queries off the top of my head (as I'm doing here), but I hope this will illustrate the structure of a solution for you in SQL Server syntax.
In terms of the answer I link to (which generates a date table on the fly) and extending it by adding in your holiday table (and making some inferences about how you've defined your holiday table), and noting that a working day is any day Mon-Fri that isn't a holiday, you'd write a query like so to get the number of working days between any two dates:
WITH
dynamic_date_table AS
(
SELECT *
FROM generate_series_datetime2('2000-01-01','2030-12-31',1)
CROSS APPLY datetime2_params_fxn(datetime2_value)
)
,date_table_ext1 AS
(
SELECT
ddt.*
,IIF(hol.dates IS NOT NULL, 1, 0) AS is_company_holiday
FROM
dynamic_date_table AS ddt
LEFT JOIN
tbl_holiday AS hol
ON (hol.dates = ddt.datetime2_value)
)
,date_table_ext2 AS
(
SELECT
*
,IIF(is_weekend = 1 OR is_company_holiday = 1, 0, 1) AS is_company_work_day
FROM date_table_ext1
)
SELECT
COUNT(datetime2_value)
FROM
date_table_ext2
WHERE
(datetime2_value BETWEEN '2018-02-14' AND '2018-02-20')
AND
(is_company_work_day = 1)
Obviously, the idea for a well-factored solution is that these intermediate calculations (being general in nature to the entire company) get rolled into the date_params_fxn, so that any query run against the database gains access to the pre-defined list of company workdays. Queries that are run against it then start to resemble plain English (rather than the approach you linked to and adapted in your question, which is ingenious but far from clear).
If you want top performance (which will be relevant if you are hitting these calculations heavily) then you define appropriate parameters, save the lot into a stored date table, and index that table appropriately. This way, your query would become as simple as the final part of the query here, but referencing the stored date table instead of the with-block.
The sequentially-numbered workdays I referred to in my comment on your question, are another step again for the efficiency and indexability of certain types of queries against a date table, but I won't complicate this answer any further for now. If any further clarification is required, please feel free to ask.
I found the answer for this problem
It turns out, I just need to use a simple arithmetic operator for this problem
SELECT (SELECT DATEDIFF('2018-02-20', '2018-02-14')) - (SELECT COUNT(id) FROM tbl_holiday WHERE dates BETWEEN '2018-02-14' AND '2018-02-20');
Related
I would like to discuss the "best" way to storage date periods in a database. Let's talk about SQL/MySQL, but this question may be for any database. I have the sensation I am doing something wrong for years...
In english, the information I have is:
-In year 2014, value is 1000
-In year 2015, value is 2000
-In year 2016, there is no value
-In year 2017 (and go on), value is 3000
Someone may store as:
BeginDate EndDate Value
2014-01-01 2014-12-31 1000
2015-01-01 2015-12-31 2000
2017-01-01 NULL 3000
Others may store as:
Date Value
2014-01-01 1000
2015-01-01 2000
2016-01-01 NULL
2017-01-01 3000
First method validation rules looks like mayhem to develop in order to avoid holes and overlaps.
In second method the problem seem to filter one punctual date inside a period.
What my colleagues prefer? Any other suggestion?
EDIT: I used full year only for example, my data usually change with day granularity.
EDIT 2: I thought about using stored "Date" as "BeginDate", order rows by Date, then select the "EndDate" in next (or previous) row. Storing "BeginDate" and "Interval" would lead to hole/overlap problem as method one, that I need a complex validation rule to avoid.
It mostly depends on the way you will be using this information - I'm assuming you do more than just store values for a year in your database.
Lots of guesses here, but I guess you have other tables with time-bounded data, and that you need to compare the dates to find matches.
For instance, in your current schema:
select *
from other_table ot
inner join year_table yt on ot.transaction_date between yt.year_start and yt.year_end
That should be an easy query to optimize - it's a straight data comparison, and if the table is big enough, you can add indexes to speed it up.
In your second schema suggestion, it's not as easy:
select *
from other_table ot
inner join year_table yt
on ot.transaction_date between yt.year_start
and yt.year_start + INTERVAL 1 YEAR
Crucially - this is harder to optimize, as every comparison needs to execute a scalar function. It might not matter - but with a large table, or a more complex query, it could be a bottleneck.
You can also store the year as an integer (as some of the commenters recommend).
select *
from other_table ot
inner join year_table yt on year(ot.transaction_date) = yt.year
Again - this is likely to have a performance impact, as every comparison requires a function to execute.
The purist in me doesn't like to store this as an integer - so you could also use MySQL's YEAR datatype.
So, assuming data size isn't an issue you're optimizing for, the solution really would lie in the way your data in this table relates to the rest of your schema.
This question already has answers here:
ROW_NUMBER() in MySQL
(26 answers)
Closed 8 years ago.
I have a table that tracks the activity in several websites. Each row is of the following form: (Date, Hour, Website, Hits)
The Hour field is a number between 0 and 23 and represents an entire hour (for example, 22 is for any hits between 22:00 and 22:59).
I want to find the overall slowest hour for each website, meaning the input should be something like (Website, Hour).
In order to do that, I was thinking I should have a nested query to find the minimum hits for each website on each day, and then count the values of Hour (again, for each website on each day), and see which value is the maximal.
I'm still new to SQL so I'm having difficulties using the min() function properly, to find the minimal value only for a specific date and website. Then I have the same problem with using count() for a specific website.
I'm also curious if I can get not just the most common slowest hour, but maybe the 3 slowest, but at least to me it seems like it's really complicating the problem.
For the first nested query, I considered something like this:
SELECT DISTINCT Date Date_t, Website Website_t, Hour,
(SELECT min(Hits) from HITS_TABLE WHERE Date=Date_t and Website=Website_t) as MinHits
FROM HITS_TABLE
But not only it takes an abnormally long time to calculate, it also gives me multiple entries of (Date_t, Website_t, Hour, min(Hits)) for each value of Hour, so I take it that I'm not doing it in the smartest, nor the most efficient way.
Thanks in advance for any help!
You can get the minimum hour using a trick in MySQL:
select website, substring_index(group_concat(hour order by hits), ',', 1) as minhour
from table t
group by website;
For each website, this constructs a comma-delimited list of hours, ordered by the number of hits. The function substring_index() returns the first row.
This is something of a hack. In most other databases, you would use window/analytic functions, but these are not available in MySQL.
EDIT:
You can do this in standard SQL as well:
select t.*
from table t
where not exists (select 1
from table t2
where t2.hour = t.hour and
t2.hits < t.hits
);
This is interpreted as: "Get me all rows from the table where there are no other rows with the same hour and a lower number of hits." This is a round-about way of saying: "Get me the hour with the minimum value." Note that this will return multiple rows when there are ties.
As part of database testing, we are to verify if the data is correctly rendered onto the webpage from database.
We have a table called 'emp_details' which stores employee details. We store joining date of an employee in it. Now, using this joining date field, I need to get a list all the employees who have a start date or anniversary date within the last ten days.
I tried various combinations of DATEDIFF() in MySQL but did not succeed.
The format on the webpage would look like this:
Name Start Date Years
----------------------------------
William 07/25/2004 8
Gordon 07/22/2007 5
Jill 07/26/2009 3
Could anyone please help me with the query for MySQL DB.
Thanks,
select * from
employees where
dayofyear(`start date`) between dayofyear(curdate())-10 and dayofyear(curdate())
You can use following in WHERE clause
DAYOFYEAR(CURDATE()) - DAYOFYEAR(start_date) < 10
OR is greater than (365 - 10)
In rails 3 (also with meta_where gem if you feel like using it in your query), I got a really tricky query that I have been banging my head for:
Suppose I have two models, customers and purchases, customer have many purchases. Let's define customers with at least 2 purchases as "repeat_customer". I need to find the total number of repeat_customers by each day for the past 3 months, something like:
Date TotalRepeatCustomerCount
1/1/11 10 (10 repeat customers by the end of 1/1/11)
1/2/11 15 (5 more customer gained "repeat" status on this date)
1/3/11 16 (1 more customer gained "repeat" status on this date)
...
3/30/11 150
3/31/11 160
Basically I need to group customer count based on the date of creation of their second purchase, since that is when they "gain repeat status".
Certainly this can be achieved in ruby, something like:
Customer.includes(:purchases).all.select{|x| x.purchases.count >= 2 }.group_by{|x| x.purchases.second.created_at.to_date }.map{|date, customers| [date, customers.count]}
However, the above code will fire query on the same lines of Customer.all and Purchase.all, then do a bunch of calculation in ruby. I would much prefer doing selection, grouping and calculations in mysql, since it is not only much faster, it also reduces the bandwith from the database. In large databases, the code above is basically useless.
I have been trying for a while to conjure up the query in rails/active_record, but have no luck even with the nice meta_where gem. If I have to, I will accept a solution in pure mysql query as well.
Edited: I would cache it (or add a "repeat" field to customers), though only for this simplified problem. The criteria for repeat customer can change by the client at any point (2 purchases, 3 purchases, 4 purchases etc), so unfortunately I do have to calculate it on the spot.
SELECT p_date, COUNT(customers.id) FROM
(
SELECT p_date - INTERVAL 1 day p_date, customers.id
FROM
customers NATURAL JOIN purchases
JOIN (SELECT DISTINCT date(purchase_date) p_date FROM purchases) p_dates
WHERE purchases.purchase_date < p_date
GROUP BY p_date, customers.id
HAVING COUNT(purchases.id) >= 2
) a
GROUP BY p_date
I didn't test this in the slightest, so I hope it works. Also, I hope I understood what you are trying to accomplish.
But please note that you should not do this, it'll be too slow. Since the data never changes once the day is passed, just cache it for each day.
I'm currently developing a program that will generate reports based upon lead data. My issue is that I'm running 3 queries for something that I would like to only have to run one query for. For instance, I want to gather data for leads generated in the past day
submission_date > (NOW() - INTERVAL 1 DAY)
and I would like to find out how many total leads there were, and how many sold leads there were in that timeframe. (sold=1 / sold=0). The issue comes with the fact that this query is currently being done with 2 queries, one with WHEREsold= 1 and one with WHEREsold= 0. This is all well and good, but when I want to generate this data for the past day,week,month,year,and all time I will have to run 10 queries to obtain this data. I feel like there HAS to be a more efficient way of doing this. I know I can create a mySQL function for this, but I don't see how this could solve the problem.
Thanks!!
Why not GROUP BY sold so you get the totals for sold and not sold
One way to do is to exploit the aggregate functions (usually SUM and COUNT help you the most in this situation) along with MySQL's IF() function.
For example, you could use a query such as:
SELECT
SUM(IF(sold = 1, sold, 0)) AS TotalSold,
SUM(IF(sold = 0, sold, 0)) AS TotalUnsold,
SUM(IF(submission_date > (NOW() - INTERVAL 1 WEEK)
AND sold = 1, sold, 0) AS TotalSoldThisWeek
FROM ...
WHERE ...
The condition (e.g. sold = 1) could be as complex as you want by using AND and OR.
Disclamer: code wasn't tested, this was just provided as an example that should work with minor modifications.