Continue most recent value over a time range - mysql

I have this existing schema where a "schedule" table looks like this (very simplified).
CREATE TABLE schedule (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(45),
start_date date,
availability int(3),
PRIMARY KEY (id)
);
For each person it specifies a start date and percentage of work time available to spent on this project. That availability percentage implicitly continues until a newer value is specified.
For example take a project that lasts from 2012-02-27 to 2012-03-02:
id | name | start_date | availability
-------------------------------------
1 | Tom | 2012-02-27 | 100
2 | Tom | 2012-02-29 | 50
3 | Ben | 2012-03-01 | 80
So Tom starts on Feb., 27nd, full time, until Feb, 29th, from which on he'll be available only with 50% of his work time.
Ben only starts on March, 1st and only with 80% of his time.
Now the goal is to "normalize" this sparse data, so that there is a result row for each person for each day with the availability coming from the last specified day:
name | start_date | availability
--------------------------------
Tom | 2012-02-27 | 100
Tom | 2012-02-28 | 100
Tom | 2012-02-29 | 50
Tom | 2012-03-01 | 50
Tom | 2012-03-02 | 50
Ben | 2012-02-27 | 0
Ben | 2012-02-28 | 0
Ben | 2012-02-29 | 0
Ben | 2012-03-01 | 80
Ben | 2012-03-02 | 80
Think a chart showing the availability of each person over time, or calculating the "resource" values in a burndown diagram.
I can easily do this with procedural code in the app layer, but would prefer a nicer, faster solution.

To make this remotely effective, I recommend creating a calendar table. One that contains each and every date of interest. You then use that as a template on which to join your data.
Equally, things improve further if you have person table to act as the template for the name dimension of your results.
You can then use a correlated sub-query in your join, to pick which record in Schedule matches the calendar, person template you have created.
SELECT
*
FROM
calendar
CROSS JOIN
person
LEFT JOIN
schedule
ON schedule.name = person.name
AND schedule.start_date = (SELECT MAX(start_date)
FROM schedule
WHERE name = person.name
AND start_date <= calendar.date)
WHERE
calendar.date >= <yourStartDate>
AND calendar.date <= <yourEndDate>
etc
Often, however, it is more efficient to deal with it in one of two other ways...
Don't allow gaps in the data in the first place. Have a nightly batch process, or some other business logic that ensures all relevant dat apoints are populated.
Or deal with it in your client. Return each dimension in you report (data, and name) as seperate data sets to act as your templates, and then return the data as your final data set. Your client can itterate over the data and fill in the blanks as appropriate. It's more code, but can actually use less resource overall than trying to fill-the-gaps with SQL.
(If your client side code does this slowly, post another question examining that code. Provided that the data is sorted, this is acutally quite quick to do in most languages.)

Related

Using an SQL View to dynamically place field data in buckets

I have a complex(?) SQL query I am needing to build. We have an application that captures a simple data set for multiple clients:
ClientID | AttributeName | AttributeValue | TimeReceived
----------------------------------------------------------------
002 | att1 | 123.98 | 23:02:00 02-03-20017
----------------------------------------------------------------
003 | att2 | 987.2 | 23:02:00 02-03-20017
I need to be able to return a single record per client that looks something like this
Attribute | Hour_1 | Hour_2 | Hour_x |
--------------------------------------
att1 120.67 |
--------------------------------------
att2 | 10 | 89.3 |
The hours are to be determined by a time provided to the query. If the time was 11:00 on 02-03-20017, then hour 1 would be from 10-11 on 02-03-20017, and hour 2 from 9-10 on 02-03-20017. Attributes will be allocated to these hourly buckets based on the hour/date in their time stamp (not all buckets will have data). There will be a limit on the number of hours allocated in a single query. In summary, there are possibly 200-300 attributes and hourly blocks of up to 172 hours. To be honest I am not really sure where to start to build a query like this. Any guidance appreciated.

Is it possible to search for a year in a date range with MySQL

Given is data which contains a period of time, spanning years. Just like this:
| ID | Name | Alive |
|----|--------------------|-----------------------|
| 1 | Washington, George | 1732-02-22/1799-12-14 |
| 2 | Adams, John | 1735-10-30/1826-07-04 |
| 3 | Jefferson, Thomas | 1743-04-13/1826-07-04 |
…
Is it possible to store this data in MySQL in a way that a search for an intermediate date (over all fields, just a year), like the search term 1788, yields results?
What I am looking for is something like this:
CREATE TABLE t (
id INT NOT NULL,
name VARCHAR(30),
alive DATERANGE
);
SELECT * FROM t WHERE * LIKE '%1788%'
The only solution I see is to add another column which contains a list of years, (1732,1733,…) but I guess there are better solutions. Do I need the date in one field or twos, and what’s the column type I need for this to work? Can I have under specified date ranges in that column (such as 1155/1227) or do I have to rewrite them before insert (like 1155-01-01/1227-12-31)?
Border matches shall be returned as well. A search for the string 1799 should still return George Washington, even though he was not alive from 1st of January until 31st of December inclusively. I guess this is rather simple since it is a string match already.
If you can edit your data then I suggest changing it to fields Born & Died if not then we can use the LEFT and INSTR functions for Born and the SUBSTRING_INDEX functions for Died.
SELECT ID, Name, Alive,
LEFT([ColName],INSTR([Alive],"/")-1) AS Born,
SUBSTRING_INDEX(Alive,'/',-1) AS Died
FROM t
Which will split out Born and Died dates:
| ID | Name | Alive | Born | Died |
|----|--------------------|-----------------------|------------|------------|
| 1 | Washington, George | 1732-02-22/1799-12-14 | 1732-02-22 | 1799-12-14 |
| 2 | Adams, John | 1735-10-30/1826-07-04 | 1735-10-30 | 1826-07-04 |
| 3 | Jefferson, Thomas | 1743-04-13/1826-07-04 | 1743-04-13 | 1826-07-04 |
Then you can use:
WHERE Alive LIKE '%1788%'
To search dates.
Or individually as Born:
WHERE LEFT([ColName],INSTR([Alive],"/")-1) LIKE '%1788%'
Died:
WHERE SUBSTRING_INDEX(Alive,'/',-1) LIKE '%1788%'
Or if you just wanted the years in the Born and Died fields use an additional LEFT function:
SELECT ID, Name, Alive,
LEFT(LEFT([ColName],INSTR([Alive],"/")-1),4) AS Born,
LEFT(SUBSTRING_INDEX(Alive,'/',-1),4) AS Died
FROM t
Which would give you:
| ID | Name | Alive | Born | Died |
|----|--------------------|-----------------------|------|------|
| 1 | Washington, George | 1732-02-22/1799-12-14 | 1732 | 1799 |
| 2 | Adams, John | 1735-10-30/1826-07-04 | 1735 | 1826 |
| 3 | Jefferson, Thomas | 1743-04-13/1826-07-04 | 1743 | 1826 |
EDIT:
you can use the BETWEEN function the other way around for that.
SELECT ID, Name, Alive,
LEFT(LEFT([ColName],INSTR([Alive],"/")-1),4) AS Born,
LEFT(SUBSTRING_INDEX(Alive,'/',-1),4) AS Died
FROM t
WHERE 1788 BETWEEN LEFT(LEFT([ColName],INSTR([Alive],"/")-1),4) AND LEFT(SUBSTRING_INDEX(Alive,'/',-1),4)
Do I need the date in one field or twos
Definitely two, birth and death, and use the predicate BETWEEN ... AND ... for your searches. It’s less expensive than to split a field in twos at every query and it makes better use of indexes.
and what’s the column type I need for this to work
That’s trickier. I would normally definitely agree with comments saying that you must use date fields, for a variety of well known good reasons. However, it is obvious from your question that you are interested only in years and effectively disregard the actual dates; furthermore, you are dealing with historic data that might be incomplete: missing days or even months are usual in this context; such incomplete dates can be stored in date fields but return NULL on some operations, which might create problems; when you have a date field you cannot create an index on the year, so your queries would all be full table scans. In short, in your particular case, I’d go for SMALLINT UNSIGNED for the years and CHAR(5) to store the less useful month-and-day info, just in case you might need it in the future, to build a real date on the fly with CAST(CONCAT(year,'-', month_and_day) AS DATE).
In conclusion, this is the design I propose:
CREATE TABLE t (
id INT NOT NULL,
name VARCHAR(30),
birth_year SMALLINT UNSIGNED,
birth_md CHAR(5),
death_year SMALLINT UNSIGNED,
death_md CHAR(5)
);
CREATE INDEX t_ndx ON t(birth_year, death_year);
SELECT * FROM t WHERE 1788 BETWEEN birth_year AND death_year;
Like #CBroe suggested - you should have two columns instead(startDate & endDate OR bornDate & DeathDate), you can then write your query this way:
select * from t where YEAR(startDate) >= 1788 OR YEAR(endDate) <= 1788

join two records from the same table

(This is in access)
I have a table with all the history of the wages of my workers. Each record has an starting date in which the employee started receiving that wage and a starting date in which we started reporting this wage. Why?, legal stuff.. (Sometimes the starting date is older than the date we started reporting it)
-------------------------------------------------------
|WorkerID|StartingDate|ReportingDate|Salary|
-------------------------------------------------------
| 001 | 01/01/2015 | 01/01/2015 |10,000|
| 001 | 01/01/2016 | 01/02/2016 |15,000|
-------------------------------------------------------
So if I want to make a check for the worker 001 on 01/01/2016 it should be $15,000, but i have to report $10,000. So now I need a query that tells me the wage I should pay and the wage I should Report, just like this:
------------------------------------------------------------
|WorkerID|StartingDate|ReportingDate|PaySalary|ReportSalary|
------------------------------------------------------------
| 001 | 01/01/2016 | 01/02/2016 | 15,000 | 10,000 |
------------------------------------------------------------
The table is called Wages_History and I don´t have a clue of how to start the query... Thanks!
If I understand correctly, this is a simple where filter:
select *
from wages_history
where reporting_date > starting_date;
If this is the case, then I would suggest that you devote some effort to learning SQL . . . there are good books, tutorials, and courses available.
EDIT:
If you also want the previous salary:
select wh.*,
(select top 1 wh2.salary
from wages_history wh2
where wh2.worker_id = wh.worker_id and
wh2.starting_date < wh.starting_date
order by wh2.starting_date desc
) as prev_salary
from wages_history wh
where reporting_date > starting_date;

MySQL Daily Appointment Scheduling Schema

I am looking to create a booking system. On one end you have a client looking to book an appointment and on the other end you have a provider who has a schedule that a client can book against
When signing up as a provider they are allowed to pick their days of work and hours. They have html check-boxes which represent which days they can select and once they select a day the hours are are displayed (drop-downs in angular) as you can see below
html schedule form
On the MySQL side I am thinking I can have a table which has a column for each day and have a comma separated list in there for the start time, end time, lunch time and lunch length
i.e. Provider selects Monday and Tuesday to work from the hours below
Provider 'Schedule' Table
|ScheduleID|ProviderID|Monday |Tuesday |Wednesday|Thursday|Friday|Saturday|Sunday|
|----------|----------|--------|--------|---------|--------|------|--------|------|
|1 | 2 |09:00am,|10:00am,| | | | | |
| | |08:30pm,|07:00pm,| | | | | |
| | |12:00pm,|01:00pm,| | | | | |
| | |30 min |60 min | | | | | |
|----------|----------|--------|--------|---------|--------|------|--------|------|
The table would have a schedule id and a provider id which links back to the "provider" table to link the provider to his schedule
Or is this better?
|-------------|-------------|----------|-----------|----------|------------|--------------|
| schedule_id | provider_id | week_day |start_time | end_time | lunch_time | lunch_length |
|-------------|-------------|----------|-----------|----------|------------|--------------|
| 1 | 1 | Monday | 06:00 AM | 08:00 PM | 12:30 PM | 60 |
|-------------|-------------|----------|-----------|----------|------------|--------------|
| 2 | 1 | Friday | 06:00 AM | 08:00 PM | 12:30 PM | 60 |
|-------------|-------------|----------|-----------|----------|------------|--------------|
| 3 | 2 | Tuesday | 06:00 AM | 08:00 PM | 12:30 PM | 60 |
|-------------|-------------|----------|-----------|----------|------------|--------------|
if not post something that is
Before I go into how I believe you should structure your Provider 'Schedule' Table, please make sure to, in the future, remove fluff.
More on fluff here.
It may serve you better to make the following changes:
make all column headers lowercase, as this might prevent errors if you attempt to query your database another way
change scheduleId to id
Instead of having seven columns, one for every day of the week, you could simply put a weekDay column that stores the value of that weekday
Then create columns for startTime, endTime, lunchTime and lunchLength
Finally, create a scheduleId column that ties together all the different weekday rows of someone's schedule to one provider
Some considerations:
Instead of having strings "Monday" or "Sunday" in the weekDay column you could instead insert 0..6, where 0 is a Sunday and 6 is a Saturday to make it more compatible with other languages
You could always just keep scheduleId in this table and create another table with the individual schedule days and link them with a foreign key, but this might prove to cause more problems than it's worth
Keeping that lunchLength as just an integer, as that will make everything easier
The reasoning behind splitting up the data as much as possible is because if you are querying using another language you might need to go through all the extra work of splitting those Monday and Tuesday columns if you just want the startTime for instance.
Hopefully the above is either a solution or allows you to consider another approach.
Here is a Java Android Library that you can convert into JavaScript: https://bitbucket.org/warwick/schedule_utils_demo/src/master/
Running the code in a client side language will save your server the burden as scheduling code is very inefficient.
Hope this helps.

MySQL query to compare content of one column with title of other column

I know the title makes no sense at first glance. But here's the situation: the DB table is named 'teams'. In it, there are a bunch of columns for positions in a soccer team (gk1, def1, def2, ... , st2). Each column is type VARCHAR and contains a player's name. There is also a column named 'captain'. The content of that column (not the most fortunate solution) is not the name of the captain, but rather the position.
So if the content of 'st1' is Zlatan Ibrahimovic and he's the captain, then the content of 'captain' is the string 'st1', and not the string 'Zlatan Ibrahimovic'.
Now, I need to write a query which gets a row form the 'teams' table, but only if the captain is Zlatan Ibrahimovic. Note that at this point I don't know if he plays st1, st2 or some other position. So I need to use just the name in the query, to check if the position he plays on is set as captain. Logically, it would look like:
if(Zlatan is captain)
get row content
In MySQL, the if condition would actually be the 'where' clause. But is there a way to write it?
$query="select * from teams where ???";
The "Teams" table structure is:
-----------------------------------------------------------------
| gk1 | def1 | def2 | ... | st2 | captain |
-----------------------------------------------------------------
| player1 | player2 | player3 | ... | playerN | captainPosition |
-----------------------------------------------------------------
Whith all fields being of VARCHAR type.
Because the content of the captain column is the position and not the name, and you want to choose based on the position, this is trivial.
$query="select * from teams where captain='st1'";
Revised following question edit:
Your database design doesn't allow this to be done very efficiently. You are looking at a query like
SELECT * FROM teams WHERE
(gk1='Zlatan' AND captain='gk1') OR
(de1='Zlatan' AND captain='de1') OR
...
The design mandates this sort of query for many functions: how you can find the team which a particular player plays for without searching every position? [Actually you could do that by finding the name in a concatenation of all the positions, but it's still not very efficient or flexible]
A better solution would be to normalise your data so you had a single table showing which player was playing where:
Situation
Team | Player | Posn | Capt
-----+--------+------+------
1 | 12 | 1 | 0
1 | 11 | 2 | 1
1 | 13 | 10 | 0
...with other tables which allow you to identify the Team, Player and Postion referenced here. There would need to be some referential checks to ensure that each team had only one captain, and only plays one goalkeeper, etc.
You could then easily see that the captain of Team 1 is Player 11 who plays in position 2; or find the team (if any) for which player 11 is captain.
SELECT Name FROM Teams
WHERE Situation.Team = Teams.id
AND Situation.Capt = 1
AND Situation.Player = Players.id
AND Players.Name = 'Zlatan';
A refinement on that idea might be
Situation
Team | Player | Posn | Capt | Playing
-----+--------+------+------+--------
1 | 12 | 1 | 0 | 1
1 | 11 | 2 | 1 | 1
1 | 13 | 10 | 0 | 0
1 | 78 | 1 | 0 | 0
...so that you could have two players who are goalkeepers (for example) but only field of them.
Redesigning the database may be a lot of work; but it's nowhere near as complicated or troublesome as using your existing design. And you will find that the performance is better if you don't need to use inefficient queries.
By what have you exposed, you just need to put the two conditions and check if the query returned 1 record. If it returns no records, he is not the captain:
SELECT *
FROM Teams
WHERE name = 'Zlatan Ibrahimovic' AND position = 'st1';