Understanding how MySQL fetches data with same user id - mysql

I working on a table that keeps record of student attending course. The data are keep using student identification number, course id, coursename, start date, end date. Student can take more than one course so there are repeating student identification number but with different course id, course name, start date and end date. What I'm trying to to here is to select students based on number of day using DATEDIFF.
Sorry for not explaining my problem properly. the structure of the table i'm talking about:
studcourse1(internalstudentid, staffnoic, courseid, coursenm, StDt, EndDt, location, organizer, generalcategorycd, generalsubcategorycd, eid)
staffnoic - staff identification number,
StDt - Start Date.
EndDt - End Date
I've checked, there's no primary key or indexes on this table as it's not a base but a view.
Sorry if the previous statement is too long. Let's use this instead.
SELECT GradeGroupCd, StudCourse1.StaffNoIC, (DATEDIFF( EndDt, StDt ) +1) TotalDay, StDt, EndDt
FROM StudCourse1, tblStaff, tblRefTitleGred
WHERE tblStaff.TitleGredCd = tblRefTitleGred.TitleGredCd
AND StudCourse1.StaffNoIC = tblStaff.StaffNoIC
AND StDt >= '2009-1-1' AND YEAR(EndDt) <= YEAR(NOW())
AND (DATEDIFF( EndDt, StDt ) +1) > 90
AND (GeneralSubCategoryCd = 'S0012' OR GeneralSubCategoryCd = 'S0014')
GROUP BY GradeGroupCd, StudCourse1.StaffNoIC
The statement above fetches results for student(using staffnoic from table tblStaff and table StudCourse1) having taking course for more than 90 days using (DATEDIFF(EndDt, StDt) + 1). What that really confuse me is that for example a record from studcourse1 with staffnoic of '111111111111', sample data:
studcourse1(internalstudentid, staffnoic, courseid, coursenm, StDt, EndDt, location, organizer, generalcategorycd, generalsubcategorycd, eid)
studcourse1(10629,111111111111,AAA1811,Course1,2010-01-01 00:00:00, 2010-12-31 00:00:00, '', ABC Org, G003, S0012, E00001812)
(30684,111111111111,AAA6968,Course2,2009-02-10 00:00:00, 2012-02-09 00:00:00, '', ABC Org, G003, S0012, E00006894)
(30685,111111111111,AAA6970,Course3,2011-01-01 00:00:00, 2012-02-09 00:00:00, '', ABC Org, G003, S0014, E00006896)
Running the SQL statement will select the one with the StDt of 2010-01-01 00:00:00 and EndDt of 2010-12-31 00:00:00. Why is it singling out that record and not others as they all fall under AND StDt >= '2009-1-1' AND YEAR(EndDt) <= YEAR(NOW())". Year now referring to year 2012. How can I make it select the one with the StDt(2009-02-09) and EndDt(2012-02-09)?
And also "(DATEDIFF( EndDt, StDt ) +1) > 90", why does it add 1 to it? Wouldn't DATEDIFF( EndDt, StDt ) > 90 be the right one?
Sorry if that too many questions. Just learn MySQL recently. Thank you for your time.

Ok it is hard to understand what you have written. Please provide the tables you have used and the structure.
Simplest method to get this done is having 3 tables,
1) students with student id, name and so on
2) Course - course id, course name and so on.
3) Enrolment - enroll id, start date, end date and foreign keys(student id and course id).
And I didnt get what you meant by (What I'm trying to to here is to select students based on number of day using DATEDIFF.)

Related

Generate a join similar to a vlookup based on closest date

I have the following two tables:
movie_sales (provided daily)
movie_id
date
revenue
movie_rank (provided every few days or weeks)
movie_id
date
rank
The tricky thing is that every day I have data for sales, but only data for ranks once every few days. Here is an example of sample data:
`movie_sales`
- titanic (ID), 2014-06-01 (date), 4.99 (revenue)
- titanic (ID), 2014-06-02 (date), 5.99 (revenue)
`movie_rank`
- titanic (ID), 2014-05-14 (date), 905 (rank)
- titanic (ID), 2014-07-01 (date), 927 (rank)
And, because the movie_rate.date of 2014-05-14 is closer to the two sales dates, the output should be:
id date revenue closest_rank
titanic 2014-06-01 4.99 905
titanic 2014-06-02 5.99 905
The following query works to get the results by getting the min date difference in the sub-select:
SELECT
id,
date,
revenue,
(SELECT rank from movie_rank where id=s.id ORDER BY ABS(DATEDIFF(date, s.date)) ASC LIMIT 1)
FROM
movie_sales s
But I'm afraid that this would have terrible performance as it will literally be doing millions of subselects...on millions of rows. What would be a better way to do this, or is there really no proper way to do this since an index can not be properly done with a DATEDIFF ?
Unfortunately, you are right. The movie rank table must be searched for each movie sale and of all matching movie rows the closest be picked.
With an index on movie_rank(id) the DBMS finds the movie rows quickly, but an index on movie_rank(id, date) would be better, because the date could be read from the index and only the one best match would be read from the table.
But you also say that there are new ranks every few dates. If it is guaranteed to find a rank in a certain range, e.g. for each date there will be at least one rank in the twenty days before and at least one rank in the twenty days after, you can limit the search accordingly. (The index on movie_rank(id, date) would be essential for this, though.)
SELECT
id,
date,
revenue,
(
select r.rank
from movie_rank r
where r.id = s.id
and r.date between s.date - interval 20 days
and s.date + interval 20 days
order by abs(datediff(date, s.date)) asc
limit 1
)
FROM movie_sales s;
This is difficult to get quick with SQL. In a programming language I would choose this algorithm:
Sort the two tables by date and point to the first rows.
Move the rank pointer forward until we match the sales date or are beyond it. (If we aren't there already.)
Compare the sales date with the rank date we are pointing at and with the rank date of the previous row. Take the closer one.
Move the sales pointer one row forward.
Go to 2.
With this algorithm we would already be in about the position we want to be. Let's see, if we can do the same with SQL. Iterations are done with recursive queries in SQL. These are available in MySQL as of version 8.0.
We start with sorting the rows, i.e. giving them numbers. Then we iterate through both data sets.
with recursive
sales as
(
select *, row_number() over (partition by movie_id order by date) as rn
from movie_sales
),
ranks as
(
select *, row_number() over (partition by movie_id order by date) as rn
from movie_rank
),
cte (movie_id, revenue, srn, rrn, sdate, rdate, rrank, closest_rank) as
(
select
movie_id, s.revenue, s.rn, r.rn, s.date, r.date, r.ranking,
case when s.date <= r.date then r.ranking end
from (select * from sales where rn = 1) s
join (select * from ranks where rn = 1) r using (movie_id)
union all
select
cte.movie_id,
cte.revenue,
coalesce(s.rn, cte.srn),
coalesce(r.rn, cte.rrn),
coalesce(s.date, cte.sdate),
coalesce(r.date, cte.rdate),
coalesce(r.ranking, cte.rrank),
case when coalesce(r.date, cte.rdate) >= coalesce(s.date, cte.sdate) then
case when abs(datediff(coalesce(r.date, cte.rdate), coalesce(s.date, cte.sdate))) <
abs(datediff(cte.rdate, coalesce(s.date, cte.sdate)))
then coalesce(r.ranking, cte.rrank)
else cte.rrank
end
end
from cte
left join sales s on s.movie_id = cte.movie_id and s.rn = cte.srn + 1 and cte.closest_rank is not null
left join ranks r on r.movie_id = cte.movie_id and r.rn = cte.rrn + 1 and cte.rdate < cte.sdate
where s.movie_id is not null or r.movie_id is not null
-- where cte.closest_rank is null
)
select
movie_id,
sdate,
revenue,
closest_rank
from cte
where closest_rank is not null;
(BTW: I named the column ranking, because rank is a reserved word in SQL.)
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=e994cb56798efabc8f7249fd8320e1cf
This is probably still slow. The reason for this is: there are no pointers to a row in SQL. If we want to go from row #1 to row #2, we must search that row, while in a programming language we would really just move the pointer one step forward. If the tables had an ID, we could build a chain (next_row_id) instead of using row numbers. That could speed this process up. But well, I guess you already notice: this is not an algorithm made for SQL.
Another approach... Avoid the problem by cleansing the data.
Make sure the rank is available for every day. When a new date comes in, find the previous rank, then fill in all the rows for the intervening days.
(This will take some initial effort to 'fix' all the previous missing dates. After that, it is a small effort when a new list of ranks comes in.)
The "report" would be a simple JOIN on the date. You would probably need a 2-column INDEX(movie_id, date) or something like that.
Ultimate solution would be not to calculate all the ranks every time, but store them (in a new column, or even in a new table if you don't want to change existing tables).
Each time you update you could look for sales data without rank and calculate only for those.
With above approach you get rank always from last available rank BEFORE sales data (e.g. if you've data 14 days before and 1 days after, still the one before would be used)
If you strictly need to use ranking closest in time, then you need to run UPDATE also for newly arrived ranking info. I believe it would still be more efficient in the long run.

calculate Age from year, month and day fields

I have a table of people and I need to know how many of them are actual minors.
I have the following query:
SELECT count(*) as minors from
FilesMain a INNER JOIN Sides b
ON a.FileID = b.FileID
INNER JOIN SideData c
ON b.SideDataID = c.SideDataID
WHERE a.StatusCode IN (100,101) AND (YEAR(CURDATE()) - BirthYear<17)
Basically in the query above, I am calculating current date year minus BirthYear field.
I have the persons birth date separated to year, month and day in 3 different fields. please don't ask why. I inherited the data. What would be the correct way to use the Month and Day fields as well to get a more specific result. Just using Year will treats someone born January first and December 31 the same.
Thanks
... AND TIMESTAMPDIFF(YEAR,
CONCAT_WS('-', BirthYear, BirthMonth, BirthDay),
CURRENT_DATE) < 17
Also you may add generated column:
ALTER TABLE tablename
ADD COLUMN DOB DATE
GENERATED ALWAYS AS (CONCAT_WS('-', BirthYear, BirthMonth, BirthDay));
and use this column instead of the above expression.

SQl query to calculate number of active users at the end of everyday

I have three columns User_ID, New_Status and DATETIME.
New_Status contains 0(inactive) and 1(active) for users.
Every user starts from active status - ie. 1.
Subsequently table stores their status and datetime at which they got activated/inactivated.
How to calculate number of active users at the end of each date, including dates when no records were generated into the table.
Sample data:
| ID | New_Status | DATETIME |
+----+------------+---------------------+
| 1 | 1 | 2019-01-01 21:00:00 |
| 1 | 0 | 2019-02-05 17:00:00 |
| 1 | 1 | 2019-03-06 18:00:00 |
| 2 | 1 | 2019-01-02 01:00:00 |
| 2 | 0 | 2019-02-03 13:00:00 |
Format the date time value to a date only string and group by it
SELECT DATE_FORMAT(DATETIME, '%Y-%m-%d') as day, COUNT(*) as active
FROM test
WHERE New_Status = 1
GROUP BY day
ORDER BY day
In MySQL 8 you can use the row_number() window function to get the last status of a user per day. Then filter for the one that indicate the user was active GROUP BY the day and count them.
SELECT date(x.datetime),
count(*)
FROM (SELECT date(t.datetime) datetime,
t.new_status,
row_number() OVER (PARTITION BY date(t.datetime)
ORDER BY t.datetime DESC) rn
FROM elbat t) x
WHERE x.rn = 1
AND x.new_status = 1
GROUP BY x.datetime;
If not all days are in the table you need to create a (possibly derived) table with all days and cross join it.
Find out the last activity status of users whose activity was changed for each day
select User_ID, New_Status, DATE_FORMAT(DATETIME, '%Y-%m-%d')
from activity_table
where not exists
(
select 1
from activity_table at
where at.User_ID = activity_table.User_ID and
DATE_FORMAT(at.DATETIME, '%Y-%m-%d') = DATE_FORMAT(activity_table.DATETIME, '%Y-%m-%d') and
at.DATETIME > activity_table.DATETIME
)
order by DATE_FORMAT(activity_table.DATETIME, '%Y-%m-%d');
This is not the solution yet, but a very very useful information before solution. Note that here not all dates are covered yet and the values are individual records, more precisely their last values on each day, ordered by the date.
Let's get aggregate numbers
Using the query above as a subselect and aliasing it into a table, you can group by DATETIME and do a select sum(new_Status) as activity, count(*) total, DATETIME so you will know that activity - (total - activity) is the difference in comparison to the previous day.
Knowing the delta for each day present in the result
At the previous section we have seen how the delta can be calculated. If the whole query in the previous section is aliased, then you can self join it using a left join, with pairs of (previous date, current date), still having the gaps of dates, but not worrying about that just yet. In the case of the first date, its activity is the delta. For subsequent records, adding the previous day's delta to their delta yields the result you need. To achieve this you can use a recursive query, supported by MySQL 8, or, alternatively, you can just have a subquery which sums the delta of previous days (with special attention to the first date, as described earlier) will and adding the current date's delta yields the result we need.
Fill the gaps
The previous section would already perfectly work (assuming the lack of integrity problems), assuming that there were activity changes for each day, but we will not continue with the assumption. Here we know that the figures are correct for each date where a figure is present and we will need to just add the missing dates into the result. If the results are properly ordered, as they should be, then one can use a cursor and loop the results. At each record after the first one, we can determine the dates that are missing. There might be 0 such dates between two consequent dates or more. What we do know about the gaps is that their values are exactly the same as the previous record, that do has data. If there were no activity changes on a given date, then the number of active users is exactly the same as in the previous day. Using some structure, like a table you can generate the results you have with the knowledge described here.
Solving possible integrity problems
There are several possibilities for such problems:
First, a data item might exist prior to the introduction of this table's records were started to be spawned.
Second, bugs or any other causes might have made a pause in creating records for this activity table.
Third, the addition of user is or was not necessarily generating an activity change, since its popping into existence renders its previous state of activity undefined and subject to human standards, which might change over time.
Fourth, the removal of user is or was not necessarily generating an activity change, since its popping out of existence renders is current state of activity undefined and subject to human standards, which might change over time.
Fifth, there is an infinity of other issues which might cause data integrity issues.
To cope with these you will need to comprehensively analyze whatever you can from the source-code and the history of the project, including database records, logs and humanly available information to detect such anomalies, the time they were effective and figure out what their solution is if they exist.
EDIT
In the meantime I was thinking about the possibility of a user, who was active at the start of the day being deactivated and then activated again by the end of the day. Similarly, an inactive user during a day might be activated and then finally deactivated by the end of the day. For users that have more than an activation at the start of the day, we need to compare their activity status at the start and the end of the day to find out what the difference was.
SELECT
DATE(DATETIME),
COUNT(*)
FROM your_table
WHERE New_Status = 1
GROUP BY User_ID,
DATE(DATETIME)
For MySQL
WITH RECURSIVE
cte AS (
SELECT MIN(DATE(DT)) dt
FROM src
UNION ALL
SELECT dt + INTERVAL 1 DAY
FROM cte
WHERE dt < ( SELECT MAX(DATE(DT)) dt
FROM src )
),
cte2 AS
(
SELECT users.id,
cte.dt,
SUM( CASE src.New_Status WHEN 1 THEN 1
WHEN 0 THEN -1
ELSE 0
END ) OVER ( PARTITION BY users.id
ORDER BY cte.dt ) status
FROM cte
CROSS JOIN ( SELECT DISTINCT id
FROM src ) users
LEFT JOIN src ON src.id = users.id
AND DATE(src.dt) = cte.dt
)
SELECT dt, SUM(status)
FROM cte2
GROUP BY dt;
fiddle
Do not forget to adjust max recursion depth.
Here is what I believe is a good solution for this problem of yours:
SELECT SUM(New_Status) "Number of active users"
, DATE_FORMAT(DATEC, '%Y-%m-%d') "Date"
FROM TEST T1
WHERE DATE_FORMAT(DATEC,'%H:%i:%s') =
(SELECT MAX(DATE_FORMAT(T2.DATEC,'%H:%i:%s'))
FROM TEST T2
WHERE T2.ID = T1.ID
AND DATE_FORMAT(T1.DATEC, '%Y-%m-%d') = DATE_FORMAT(T2.DATEC, '%Y-%m-%d')
GROUP BY ID
, DATE_FORMAT(DATEC, '%Y-%m-%d'))
GROUP BY DATE_FORMAT(DATEC, '%Y-%m-%d');
Here is the DEMO

Need help for join and some calculations on a MySql insert

I'll try to provide some context so you can understand what I'm trying to achieve here. My company uses open source software to manage the employees leaves (Jorani, feel free to google it :) ).
There are different types of leave (holidays, sick leave, etc.) and we want to calculate the days "not used" from the holidays of 2016 and "copy" them to another type of leave called "Remaining Holidays 2016".
The important tables are:
entitleddays (here you specify how many days of each type you give to an employee)
id employee startdate enddate type days description
661 3 2016-01-01 2017-02-28 1 14.00 Holidays 2016
1296 3 2016-01-01 2016-12-31 4 18.00 Sick leave 2016
leaves (this table has information about the leaves taken by the employees)
id startdate enddate status employee cause duration type
2436 2016-08-01 2016-08-01 3 78 OK from managers 1.00 1
2766 2016-09-05 2016-09-12 3 63 Holidays 6.00 1
So basically we have:
Entitled leaves:
Data stored in the entitleddays table shown above. In our example let's say I have 14 days for my 2016 holidays.
Taken leaves:
Leaves taken by the user, stored in the table called leaves shown above. For our example let's say I took a day off the first of August and 6 days on September.
Available leaves:
Available days are calculated: entitled days minus "taken leaves". For this examplee, 14 entitled days - 7 = 7 days. So I still have seven days available for holidays :D
So my goal is to insert these 7 days for this user as entitled days for the new type: "Remaining days from 2016" and do this for every user. So the solution that comes up to my mind is to do something like this for every user:
INSERT INTO entitleddays (employee, startdate, enddate, type, days, description)
SELECT id, '2017-01-01', '2017-02-31', '8', (entitled holidays for 2016 minus all the taken leaves of this type), 'Remaining holidays from 2016'
FROM users
Where 8 is the new type of leave where I want to copy the days (Remaining holidays from 2016).
For example I can get the taken holidays from 2016 for a specific user doing this:
SELECT SUM(duration)
FROM leaves
WHERE employee=3 AND status=3 AND type=1
Note: Type 1 is the type of leave "Holidays 2016" and status 3 means that the leave request was accepted.
I can probably achieve all of this in a single SQL instruction but it can also be split in more if simpler or easiest to manage/understand.
Many thanks in advance.
This is how you can handle the calculation:
sum the entitleddays in a subquery by grouping the datasets in its table per employee
maybe even group by year? In this case I just filtered for 2016 via WHERE-clause
sum the taken holidays in a subquery, again by grouping per employee
group by year or filter directly for the one you need
join this subquery onto the other resultset of the other query
calculate (entitled days - taken leaves) in the outer query
Query:
SELECT
entitled.employee,
'2017-01-01',
'2017-02-31',
'8' AS type,
entitled.days - takenDays.days,
'Remaining holidays from 2016'
FROM
(
SELECT
employee,
SUM(days) AS days
FROM
entitleddays
WHERE
startdate >= '2016-01-01'
AND type = 1
GROUP BY
employee
) AS entitled
LEFT JOIN (
SELECT
employee,
SUM(duration) AS days
FROM
`leaves`
WHERE
startdate >= '2016-01-01'
AND type = 1
GROUP BY
employee
) AS takenDays ON takenDays.employee = entitled.employee
I am not sure if this is how you want to calculate the sums for the days of entitleddays and taken days. The query just checks if startdate >= '2016-01-01'.
Also you mentioned a table users in your attempt but didn't provide details for the table, so I left it out. I guess you could use it as a basis otherwise. In the current query the grouped result of entitleddays is the basis.
For the insert
INSERT INTO entitleddays (employee, startdate, enddate, type, days, description)
SELECT
entitled.employee,
'2017-01-01',
'2017-02-31',
'8' AS type,
entitled.days - takenDays.days,
'Remaining holidays from 2016'
FROM
(
SELECT
employee,
SUM(days) AS days
FROM
entitleddays
WHERE
startdate >= '2016-01-01'
AND type = 1
GROUP BY
employee
) AS entitled
LEFT JOIN (
SELECT
employee,
SUM(duration) AS days
FROM
`leaves`
WHERE
startdate >= '2016-01-01'
AND type = 1
GROUP BY
employee
) AS takenDays ON takenDays.employee = entitled.employee

Query a MySQL Database and Group By Date Range to Create a Chart

I'm looking to create the following chart from a MySQL database. I know how to actually create the chart (using excel or similar program), my problem is how to get the data needed to create the chart. In this example, I can see that on January 1, 60 tickets were in the state illustrated by the green line.
I need to track the historical state of tickets of a project through a date range. The date range is determined by a project manager (in this case it's January 1st through January 9th).
For each ticket, I have the following set of historical data. Each time something changes in the ticket (state, description, assignee, customer update, and other attributes not shown in this problem), a "timestamp" entry is made in the database.
ticket_num status_changed_date from_state to_state
123456 2011-01-01 18:03:44 -- 1
123456 2011-01-01 18:10:26 1 2
123456 2011-01-01 14:37:10 2 2
123456 2011-01-02 07:55:44 2 3
123456 2011-01-03 06:12:18 3 2
123456 2011-01-04 19:03:43 3 3
123456 2011-01-05 02:05:24 3 4
123456 2011-01-06 18:13:28 4 4
123456 2011-01-07 13:14:48 4 5
123456 2011-01-09 01:35:39 5 5
How can I query the database for a given time (determined by my script) and find out what state each of the tickets are in?
For example: To produce the chart shown above, given the date 2011-01-02 12:00:00, how many tickets were in the state "2"?
I've tried querying the database with specific dates and ranges, but can't figure out the proper way to get the data to create the chart. Thanks in advance for any help.
I'm not exactly sure I know what you want. But . . .
Assuming a table definition like:
create table ticket_data (ticket_num int,
status_changed_date datetime,
from_state int,
to_state int);
The following, for example would give you the number of values per day:
select date(status_changed_date) as status_date, count(*)
from ticket_data
group by status_date;
Now, if you want just from_state = 2, just add a where clause in to that effect. If you want just the ones on Jan 2, then add in where date(status_changed_date) = '2011-01-02'
Or, if you you're looking for the distinct number of tickets per day then, change count(*) to count(distinct ticket_num)
Is this what you're asking? SQL Fiddle here
Ok so if you are trying to get a count of records in a certain state at a certain time, I think a stored proc might be necessary.
CREATE PROCEDURE spStatesAtDate
#Date datetime,
#StateId int
AS
BEGIN
SET NOCOUNT ON;
SELECT COUNT(*) as Count
FROM ticket_table t1
WHERE to_state = #StateId AND status_changed_date < #Date
AND status_changed_date = (SELECT MAX(status_changed_date) FROM ticket_table t2 where t2.ticket_num=t1.ticket_num AND status_changed_date < #Date)
END
then to call this for the above example, you're query would look like
EXEC spStatesAtDate #Date='2011-01-02 12:00:00', #StateId=2
You can use a subquery to select the last modification date before a given point grouped by ticket_num and then select the states at this time.
SELECT
ticket_num,
to_state,
status_changed_date
FROM
tickets
WHERE
status_changed_date IN (
SELECT MAX(status_changed_date)
FROM tickets
WHERE status_changed_date < '2012-02-01 01:00:00'
GROUP BY ticket_num
)
It all boils down to common question: how to get list of items and their most recent statuses. So. Given one issue, we can get its most recent status with query:
select to_state
from ticket_states
where ticket_num = t.ticket_num
order by status_changed_date desc
limit 1
Next, we need to get all applicable distinct issue ids, which is a simple distinct select:
select distinct ticket_num from ticket_states
With these two subqueries we can already start building. For example, current list of issues and their latest statuses before specified date would be:
select t.ticket_num
, (select to_state
from ticket_states
where ticket_num = t.ticket_num
and status_changed_date <= '2012-01-01'
order by status_changed_date desc
limit 1) as last_state
from (select distinct ticket_num
from ticket_states) t;
All issues, which were non-existant at at the specified time will have last_state set to null.
This probably isn't the best way of doing this, but it is first which came to mind. I'll leave other stuff to you. Also I should mention that this is not a very efficient solution also.