Query event durations - mysql

Assume I have a notes table which contains (amongst other things) an entry which indicates when an employee starts/ends work on a project; there will only be one note that indicates the "start", but there could a number of different entries to indicate the "end" (we would ignore any extra "end" dates"). There will be times that have multiple people working on the project, as well as times when no-one is working on the project.
I need to query the table, to establish the number of days where the project had someone working on it:
projectID dateStart dateEnd
--------- ---------- ----------
20769720 2018-01-26
20769720 2018-01-29
20769720 2018-02-02
20769720 2018-03-20
20825496 2018-02-07
20825496 2018-02-12
20825496 2018-03-07
20825496 2018-03-15
The above table is what we have extracted as 'key' events depicting the start/end dates and can see that:
[Project 20769720] has someone working on 26-29th January (4 days);
[Project 20825496] has someone working from 7th Feb to 7th March (28 days) and from 15th March to present (19 days) = total 47 days
We considered hogging out the data to a temporary table and processing the data with a number of updates, but we can't create a temporary tables, cursors or stored procedures for this; it all has to be in a query, to return:
projectID days
--------- ----
20769720 4
20825496 47

Get the min dateStart and max dateEnd, and find the difference. Keep in mind that is a project starts and stops (ie. has gaps in work), this will not work properly.
SELECT
projectID,
DATEDIFF(min(dateStart),max(dateEnd)) as `days`
FROM projectLog a
GROUP BY projectID

Related

MySQL - Select row with column + X > column

We have a database for patients that shows the details of their various visits to our office, such as their weight during that visit. I want to generate a report that returns the visit (a row from the table) based on the difference between the date of that visit and the patient's first visit being the largest value possible but not exceeding X number of days.
That's confusing, so let me try an example. Let's say I have the following table called patient_visits:
visit_id | created | patient_id | weight
---------+---------------------+------------+-------
1 | 2006-08-08 09:00:05 | 10 | 180
2 | 2006-08-15 09:01:03 | 10 | 178
3 | 2006-08-22 09:05:43 | 10 | 177
4 | 2006-08-29 08:54:38 | 10 | 176
5 | 2006-09-05 08:57:41 | 10 | 174
6 | 2006-09-12 09:02:15 | 10 | 173
In my query, if I were wanting to run this report for "30 days", I would want to return the row where visit_id = 5, because it's 28 days into the future, and the next row is 35 days into the future, which is too much.
I've tried a variety of things, such as joining the table to itself, or creating a subquery in the WHERE clause to try to return the max value of created WHERE it is equal to or less than created + 30 days, but I seem to be at a loss at this point. As a last resort, I can just pull all of the data into a PHP array and build some logic there, but I'd really rather not.
The bigger picture is this: The database has about 5,000 patients, each with any number of office visits. I want to build the report to tell me what the average wait loss has been for all patients combined when going from their first visit to X days out (that is, X days from each individual patient's first visit, not an arbitrary X-day period). I'm hoping that if I can get the above resolved, I'll be able to work the rest out.
You can get the date of the first and next visit using query like this (Note that this doesn't has correct syntax for date comparing and it is just an schema of the query):
select
first_visits.patient_id,
first_visits.date first_date,
max(next_visit.created) next_date
from (
select patient_id, min(created) as "date"
from patient_visits
group by patient_id
) as first_visits
inner join patient_visits next_visit
on (next_visit.patient_id = first_visits.patient_id
and next_visit.created between first_visits.created and first_visits.created + 30 days)
group by first_visits.patient_id, first_visits.date
So basically you need to find start date using grouping by patient_id and then join patient_visits and find max date that is within the 30 days window.
Then you can join the result to patient_visits to get start and end weights and calculate the loss.

Database table structure for storing statistics data

I am trying to create a table in my MYSQL database for storing click data to my posts on daily basis, what I come up is something like this:
ID | post_id | click_type | created_date
1 1 page_click 2015-12-11 18:13:13
2 2 page_click 2015-12-13 11:16:34
3 3 page_click 2015-12-13 13:24:01
4 1 page_click 2015-12-15 15:31:10
For this type of storing I can get how many clicks does the post number 1 get in December 2015 and even I can get how many clicks does the post number something gets in 15 December between 01-11pm. However let's say I am getting 2000 clicks per day which means it will create 2000 rows per day which means 60.000 per month and 720.000 per year.
Another approach that comes to my mind is like this which stores a row for one day per post and if there is more than one click in that day it will increase the count
ID | post_id | click_type | created_date | count
1 1 page_click 2015-12-11 13
2 2 page_click 2015-12-11 26
3 3 page_click 2015-12-11 152
4 1 page_click 2015-12-12 14
5 2 page_click 2015-12-12 123
6 3 page_click 2015-12-12 163
In this approach if every page is clicked at least one time (which means creating the row) in every day it will generate 1000 rows each day (let's say I have 1000 posts) and 30.000 per month and 360.000 per year.
I am looking for an advice to how to store these statistics and if I want to get daily click statistics. I have some concerns about the performance (of course it's nothing for big data guys :D but sorry for my lack of experience). Do you think it will be ok if there is over 1 million rows in that table after 2-3 years? And which one is do you thing is going to be more effective for me?
720,000 records per year is not necessarily a lot of data. One option may be not to worry about it. Something to consider may be how long the click data matters. If after a year you don't really care anymore then you can have an historical data cleanup protocol that removes data that is older than you care about.
If you are worried about storing large amounts of data and you don't want to erase history, then you can consider pre-calculating your summarized statistics and storing them instead of your transaction detail.
The issue with this is that you have to know in advance what the smallest resolution of time will be that you will continue to care about. Also, if your motivation is saving space then you have to be careful that your summary data doesn't end up taking more space than the original transactions. This can easily happen if you store summarized data at multiple resolutions, as you might in a data warehouse arrangement.
This seems like a good application for rrdtool (http://oss.oetiker.ch/rrdtool/). Here you can specify several resolutions for different time intervals, e.g:
average 5 min for 1 day
average 30 min for 1 week
average 2 hours for 1 month
average 1 day for 1 Year
etc. This is also often used for graphs. Usually this is used with rrd-files, but it can also be based on mysql with rrdgraph_libdbi

Best way to select n-th rows based on data in a field for mySQL table

The final result of this will be used for a graphing application where sometimes we would not want the detailed granularity of data at the level it is stored in the table. This may be hard to phrase in a single question so I will give an example:
Example table:
DateTime AddressID Amount
1/1/2015 10:00:00 1 10
1/1/2015 10:00:00 2 8
1/1/2015 10:01:00 1 7
1/1/2015 10:01:00 2 12
1/1/2015 10:02:00 1 21
1/1/2015 10:02:00 2 15
etc...
Note: The times will always have 00 for the seconds - if that helps.
Note: The entries may NOT always have an entry for every minute, but they generally should. So it is possible some might times might be skipped. But there will always be an entry for both addressIDs (1 & 2) every time without fail.
I need to return the above 3 fields, in a period of time requested (for example past 24 hours), but only for certain increments of time FOR EACH OF THE ADDRESS ID's. For example, records for every 5 minutes, or every 10 minutes.
so in the case of 5 minutes it would return:
DateTime AddressID Amount
1/1/2015 10:**00**:00 1 10
1/1/2015 10:**00**:00 2 8
1/1/2015 10:**05**:00 1 11
1/1/2015 10:**05**:00 2 17
1/1/2015 10:**10**:00 1 28
1/1/2015 10:**10**:00 2 5
etc...
Performance is very important. I hope I explained that well enough for someone to get the idea of what I need and I thank you in advance for your suggestions.
EDIT: For clarification, the 5 minutes in the above example should be the minimum time BETWEEN each row. So, if in the above example, on the rare chance that there was a missing time entry for 10:05:00 it should not simply select the 10:10:00 row, it should select the 10:06:00 record and then the next row selected would be 10:11:00, etc.

SQL query for various time periods

I have a table that contains Following entries:
completed_time|| BOOK_CNT
*********************************************
2013-07-23 | 2
2013-07-22 | 1
2013-07-19 | 3
2013-07 16 |5
2013-07-12 |4
2013-07-11 |2
2013-07-02 |9
2013-06-30 |5
Now, I want to use above entries for data analysis.
Lets say DAYS_FROM, DAYS_TO and PERIOD are three variables.
I need to fire following sort of queries:
"Total book from DAYS_FROM to DAYS_TO in interval of PERIOD."
DAYS_FROM is a date in format YYYY-MM-DD
,DAYS_TO is a date in format YYYY-MM-DD
PERIOD is {1W,2W,1M,2M,1Y}
where W,M,Y represents WEEK,MONTH and YEAR.
Example: The queries DAYS_FROM=2013-07-23 , DAYS_TO=2013-07-03 and PERIOD=1W should return:
ith week - total
1 - 3
2- 8
3- 6
4- 14
Explanation:
1-3 means (The total book from 2013-07-21(sun) to 2013-07-23(tue) is 3 )
2-8 means (The total book from 2013-07-14(sun) to 2013-07-21(sun) is 8 )
3-16 means (The total book from 2013-07-07(sun) to 2013-07-14(sun) is 6 )
4-14 means (The total book from 2013-07-03(wed) to 2013-07-07(sun) is 14 )
Please refer the calendar image for better understanding.
How to fire such query?
What I tried?
SELECT DAY(completed_time), COUNT(total) AS Total
FROM my_tab
WHERE completed_time BETWEEN '2013-07-23' - INTERVAL 1 WEEK AND '2013-07-03'
GROUP BY DAY(completed_time);
The above queries subtracted 7 days from 2013-07-23 and thus considered 2013-07-16 to 2013-07-23 as first week, 2013-07-09 to 2013-07-16 as second week and so on.
A simple starting point would be something like below, of course you may want to adjust the ith value to suit your needs;
SET #period='1M';
SELECT CASE WHEN #period='1Y' THEN YEAR(completed_time)
WHEN #period='1M' THEN YEAR(completed_time)*100+MONTH(completed_time)
WHEN #period='2M' THEN FLOOR((YEAR(completed_time)*100+MONTH(completed_time))/2)*2
WHEN #period='1W' THEN YEARWEEK(completed_time)
WHEN #period='2W' THEN FLOOR(YEARWEEK(completed_time)/2)*2
END ith,
SUM(BOOK_CNT) Total
FROM my_tab
GROUP BY ith
ORDER BY ith DESC;
An SQLfiddle to test with.

MS Access: Using Single form to enter query parameters in MS access

compliment of the day.
Based on the previous feedback received,
After creating a Ticket sales database in MS Access. I want to use a single form to Query the price of a particular ticket at a particular month and have the price displayed back in the form in a text field or label.
Below are sample tables and used query
CompanyTable
CompID CompName
A Ann
B Bahn
C Can
KK Seven
- --
TicketTable
TicketCode TicketDes
10 Two people
11 Monthly
12 Weekend
14 Daily
TicketPriceTable
ID TicketCode Price ValidFrom
1 10 $35.50 8/1/2010
2 10 $38.50 8/1/2011
3 11 $20.50 8/1/2010
4 11 $25.00 11/1/2011
5 12 $50.50 12/1/2010
6 12 $60.50 1/1/2011
7 14 $15.50 2/1/2010
8 14 $19.00 3/1/2011
9 10 $40.50 4/1/2012
Used query:
SELECT TicketPriceTable.Price
FROM TicketPriceTable
WHERE (((TicketPriceTable.ValidFrom)=[DATE01]) AND ((TicketPriceTable.TicketCode)=[TCODE01]));
In MS Access, a mini boxes pops up to enter the parameters when running the query. How can I use a single form to enter the parameters for [DATE01] and [TCODE01]. and the price displayed in the same form in a textfield (For further calculations).
Such as 'Month' field equals to input to [DATE01] parameter
'Ticket Code' equals to input for [TCODE01] parameter
Textfield equals to output of the query result (Ticket price)
If possible, I would like to use only the Month and Year in this format MM/YYYY.The day is not necessarry. How can I achieve it in MS Access?
If any question, please don't hesitate to ask
Thanks very much for your time and anticipated feedback.
You can refer to the values in the form fields by using expressions like: [Forms]![NameOfTheForm]![NameOfTheField]
Entering up to 300 different types of tickets
Answer to your comment referring to Accessing data from a ticket database, based on months in MS Access)
You can use Cartesian products to create a lot of records. If you select two tables in a query but do not join them, the result is a Cartesian product, which means that every record from one table is combined with every record from the other.
Let's add a new table called MonthTable
MonthNr MonthName
1 January
2 February
3 March
... ...
Now if you combine this table containing 12 records with your TicketTable containing 4 records, you will get a result containing 48 records
SELECT M.MonthNr, M.MonthName, T.TicketCode, T.TicketDes
FROM MonthTable M, TicketTable T
ORDER BY M.MonthNr, T.TicketCode
You get something like this
MonthNr MonthName TicketCode TicketDes
1 January 10 Two people
1 January 11 Monthly
1 January 12 Weekend
1 January 14 Daily
2 February 10 Two people
2 February 11 Monthly
2 February 12 Weekend
2 February 14 Daily
3 March 10 Two people
3 March 11 Monthly
3 March 12 Weekend
3 March 14 Daily
... ... ... ...
You can also get the price actually valid for a ticket type like this
SELECT TicketCode, Price, ActualPeriod AS ValidFrom
FROM (SELECT TicketCode, MAX(ValidFrom) AS ActualPeriod
FROM TicketPriceTable
WHERE ValidFrom <= Date
GROUP BY TicketCode) X
INNER JOIN TicketPriceTable T
ON X.TicketCode = T.TicketCode AND X.ActualPeriod=T.ValidFrom
The WHERE ValidFrom <= Date is in case that you entered future prices.
Here the subquery selects the actually valid period, i.e. the ValidFrom that applies for each TicketCode. If you find sub-selects a bit confusing, you can also store them as query in Access or as view in MySQL and base a subsequent query on them. This has the advantage that you can create them in the query designer.
Consider not creating all your 300 records physically, but just getting them dynamically from a Cartesian product.
I let you put all the pieces together now.
In Access Forms you can set the RecordSource to be a query, not only a table. This can be either the name of a stored query or a SQL statement. This allows you to have controls bound to different tables through this query.
You can also place subforms on the main form that are bound to other tables than the main form.
You can also display the result of an expression in a TextBox by setting the ControlSource to an expression by starting with an equal sign
=DLookUp("Price", "TicketPriceTable", "TicketCode=" & Me!cboTicketCode.Value)
You can set the Format of a TextBox to MM\/yyyy or use the format function
s = Format$(Now, "MM\/yyyy")