MySQL Week Function Unexpected Results - mysql

I am querying a database of hour entries and summing up by company and by week. I understand that MySQL's week function is based on a calendar week. That being said, I'm getting some unexpected grouping results. Perhaps you sharp-eyed folks can lend a hand:
SELECT * FROM (
SELECT
tms.date,
SUM( IF( tms.skf_group = "HP Group", tms.hours, 0000.00 )) as HPHours,
SUM( IF( tms.skf_group = "SKF Canada", tms.hours, 000.00 )) as SKFHours
FROM time_management_system tms
WHERE date >= "2012-01-01"
AND date <= "2012-05-11"
AND tms.skf_group IN ( "HP Group", "SKF Canada" )
GROUP BY WEEK( tms.date, 7 )
# ORDER BY tms.date DESC
# LIMIT 7
) AS T1
ORDER BY date ASC
My results are as follows: (Occasionally we don't have entries on a Sunday for example. Do null values matter?)
('date'=>'2012-01-01','HPHours'=>'0.00','SKFHours'=>'2.50'),
('date'=>'2012-01-02','HPHours'=>'97.00','SKFHours'=>'78.75'),
('date'=>'2012-01-09','HPHours'=>'86.50','SKFHours'=>'100.00'),
('date'=>'2012-01-16','HPHours'=>'68.00','SKFHours'=>'96.25'),
('date'=>'2012-01-24','HPHours'=>'39.00','SKFHours'=>'99.50'),
('date'=>'2012-02-05','HPHours'=>'3.00','SKFHours'=>'93.00'),
('date'=>'2012-02-06','HPHours'=>'12.00','SKFHours'=>'122.50'),
('date'=>'2012-02-13','HPHours'=>'64.75','SKFHours'=>'117.50'),
('date'=>'2012-02-21','HPHours'=>'64.50','SKFHours'=>'93.00'),
('date'=>'2012-03-02','HPHours'=>'45.50','SKFHours'=>'143.25'),
('date'=>'2012-03-05','HPHours'=>'62.00','SKFHours'=>'136.75'),
('date'=>'2012-03-12','HPHours'=>'54.25','SKFHours'=>'133.00'),
('date'=>'2012-03-19','HPHours'=>'77.75','SKFHours'=>'130.75'),
('date'=>'2012-03-26','HPHours'=>'61.00','SKFHours'=>'147.00'),
('date'=>'2012-04-02','HPHours'=>'86.75','SKFHours'=>'96.75'),
('date'=>'2012-04-09','HPHours'=>'84.25','SKFHours'=>'120.50'),
('date'=>'2012-04-16','HPHours'=>'90.00','SKFHours'=>'127.25'),
('date'=>'2012-04-23','HPHours'=>'103.25','SKFHours'=>'89.50'),
('date'=>'2012-05-02','HPHours'=>'72.50','SKFHours'=>'143.75'),
('date'=>'2012-05-07','HPHours'=>'68.25','SKFHours'=>'119.00')
January 2nd is the first Monday, hence Jan 1st is only one day. I would expect the output to be consecutive Mondays (Monday Jan 2, 9, 16, 23, 30, etc)? The unexpected week groupings below continue throughout the results. Any ideas?
Thanks very much!

It's not clear what selecting tms.date even means when you're grouping by some function on tms.date. My guess is that it means "the date value from any source row corresponding to this group". At that point, the output is entirely reasonable.
Given that any given group can have seven dates within it, what date do you want to get in the results?
EDIT: This behaviour is actually documented in "GROUP BY and HAVING with Hidden Columns":
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause.
...
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.
The tms.date column isn't part of the GROUP BY clause - only a function operating on tms.date is part of the GROUP BY clause, so I believe the text above applies to the way that you're selecting tms.date: you're getting any date within that week.
If you want the earliest date, you might try
SELECT MIN(tms.date), ...
That's assuming that MIN works with date/time fields, of course. I can't easily tell from the documentation.

Question is not clear for me but I guess you don't want to group by week. Because week gives week of year. which is 19th week today.
I think you want to group by Weekday like GROUP BY WEEKday(tms.date)

Related

COUNT() domain names in emails based on the current month returning all records

I have a query as such
SELECT right(accounts.username, length(accounts.username)-
INSTR(accounts.username, '#')) domain,
COUNT(*) email_count
FROM tickets
LEFT JOIN accounts ON tickets.user = accounts.ID
WHERE (tickets.timestamp >= UNIX_TIMESTAMP(MONTH(CURRENT_DATE())))
GROUP BY domain
ORDER BY email_count DESC
I have a ticket table that I LEFT JOIN to associate the user accounts of that ticket to get the email(username) of that user.
I am trying to count the users email and how many tickets appear with a particular domain name of that user for the current MONTH. Problem is that it is ignoring the MONTH and returning all records that match.
For instance
yahoo.com 3,356
gmail.com 1,345
If I do a search for all records I get these numbers, but it should be much lower if it is just for the month. I am using UNIX timestamps for this.
Can anyone help me?
If you consider the UNIX_TIMESTAMP(MONTH(CURRENT_DATE()))) expression:
MONTH(CURRENT_DATE()) => 1
UNIX_TIMESTAMP(1) => this should result either in an error (1292 incorrect datetime value) or warning of the same and 0 as a result, depending on whether strict sql mode is enabled.
Since you wrote the query returns all records, strict sql mode must be turned off, which can cause issues like this. It would have been easier to get a straight error message.
If you want to return records from the current month, then you can use the following expression, where I used year() and month() functions to get current year and month and concatenated 1 to it to get the 1st day of the month:
tickets.timestamp >= UNIX_TIMESTAMP(CONCAT(YEAR(CURRENT_DATE()),'-',MONTH(CURRENT_DATE()),'-','1')
WHERE tickets.timestamp >= UNIX_TIMESTAMP(MONTH(CURRENT_DATE()))
This expression probably does not do what you think. MONTH() returns the number of the month (1 to 12), while you want the beginning of the current month.
You can use the following expression to compute the beginning of the month:
date_format(current_date(), '%Y-%m-01')
In your condition:
where tickets.timestamp >= unix_timestamp(date_format(current_date(), '%Y-%m-01'))
Modified for only current month:
SELECT
RIGHT(accounts.username, length(accounts.username)-INSTR(accounts.username, '#')) AS domain, COUNT(1) AS email_count
FROM tickets
LEFT JOIN accounts ON tickets.user = accounts.ID
WHERE
YEAR(tickets.timestamp) = YEAR(NOW())
AND MONTH(tickets.timestamp) = MONTH(NOW())
GROUP BY domain
ORDER BY email_count DESC

SQL query to select values grouped by hour(col) and weekday(row) based on the timestamp

I have searched SO for this question and found slightly similar posts but was unable to adapt to my needs.
I have a database with server requests since forever, each one with a timestamp and i'm trying to come up with a query that allows me to create a heatmatrix chart (CCC HeatGrid).
The sql query result must represent the server load grouped by each hour of each weekday.
Like this: Example table
I just need the SQL query, i know how to create the chart.
Thank you,
Those looks like "counts" of rows.
One of the issues is "sparse" data, we can address that later.
To get the day of the week ('Sunday','Monday',etc.) returned, you can use the DATE_FORMAT function. To get those ordered, we need to include an integer value 0 through 6, or 1 through 7. We can use an ORDER BY clause on that expression to get the rows returned in the order we want.
To get the "hour" across the top, we can use expressions in the SELECT list that conditionally increments the count.
Assuming your timestamp column is named ts, and assuming you want to pull all rows from the year 2014, we start with something like this:
SELECT DAYOFWEEK(t.ts)
, DATE_FORMAT(t.ts,'%W')
FROM mytable t
WHERE t.ts >= '2014-01-01'
AND t.ts < '2015-01-01'
GROUP BY DAYOFWEEK(t.ts)
ORDER BY DAYOFWEEK(t.ts)
(I need to check the MySQL documentation, WEEKDAY and DAYOFWEEK are real similar, but we want the one that returns lowest value for Sunday, and highest value for Saturday... i think we want DAYOFWEEK, easy enough to fix later)
The "trick" now is the columns across the top.
We can extract the "hour" from timestamp using the DATE_FORMAT() function, the HOUR() function, or an EXTRACT() function... take your pick.
The expressions we want are going to return a 1 if the timestamp is in the specified hour, and a zero otherwise. Then, we can use a SUM() aggregate to count up the 1. A boolean expression returns a value of 1 for TRUE and 0 for FALSE.
, SUM( HOUR(t.ts)=0 ) AS `h0`
, SUM( HOUR(t.ts)=1 ) AS `h1`
, SUM( HOUR(t.ts)=2 ) AS `h2`
, '...'
, SUM( HOUR(t.ts)=22 ) AS `h22`
, SUM( HOUR(t.ts)=23 ) AS `h23`
A boolean expression can also evaluate to NULL, but since we have a predicate (i.e. condition in the WHERE clause) that ensures us that ts can't be NULL, that won't be an issue.
The other issue we can encounter (as I mentioned earlier) is "sparse" data. To illustrate that, consider what happens (with our query) if there are no rows that have a ts value for a Monday. What happens is that we don't get a row in the resultset for Monday. If it does happen that a row is "missing" for Monday (or any day of the week), we do know that all of the hourly counts across the "missing" Monday row would all be zero.

MySQL Group By Order and Count(Distinct)

What is the best way to think about the Group By function in MySQL?
I am writing a MySQL query to pull data through an ODBC connection in a pivot table in Excel so that users can easily access the data.
For example, I have:
Select
statistic_date,
week(statistic_date,4),
year(statistic_date),
Emp_ID,
count(distict Emp_ID),
Site
Cost_Center
I'm trying to count the number of unique employees we have by site by week. The problem I'm running into is around year end, the calendar years don't always match up so it is important to have them by date so that I can manually filter down to the correct dates using a pivot table (2013/2014 had a week were we had to add week 53 + week 1).
I'm experimenting by using different group by statements but I'm not sure how the order matters and what changes when I switch them around.
i.e.
Group by week(statistic_date,4), Site, Cost_Center, Emp_ID
vs
Group by Site, Cost_Center, week(statistic_date,4), Emp_ID
Other things to note:
-Employees can work any number of days. Some are working 4 x 10's, others 5 x 8's with possibly a 6th day if they sign up for OT. If I sum the counts by week, I get anywhere between 3-7 per Emp_ID. I'm hoping to get 1 for the week.
-There are different pay code per employee so the distinct count helps when we are looking by day (VTO = Voluntary Time Off, OT = Over Time, LOA = Leave of Absence, etc). The distinct count will show me 1, where often times I will have 2-3 for the same emp in the same day (hits 40 hours and starts accruing OT then takes VTO or uses personal time in the same day).
I'm starting with a query I wrote to understand our paid hours by week. I'm trying to adapt it for this application. Actual code is below:
SELECT
dkh.STATISTIC_DATE AS 'Date'
,week(dkh.STATISTIC_DATE,4) as 'Week'
,month(dkh.STATISTIC_DATE) as 'Month'
,year(dkh.STATISTIC_DATE) as 'Year'
,dkh.SITE AS 'Site ID Short'
,aep.LOC_DESCR as 'Site Name'
,dkh.EMPLOYEE_ID AS 'Employee ID'
,count(distinct dkh.EMPLOYEE_ID) AS 'Distinct Employee ID'
,aep.NAME AS 'Employee Name'
,aep.BUSINESS_TITLE AS 'Business_Ttile'
,aep.SPRVSR_NAME AS 'Manager'
,SUBSTR(aep.DEPTID,1,4) AS 'Cost_Center'
,dkh.PAY_CODE
,dkh.PAY_CODE_SHORT
,dkh.HOURS
FROM metrics.DAT_KRONOS_HOURS dkh
JOIN metrics.EMPLOYEES_PUBLIC aep
ON aep.SNAPSHOT_DATE = SUBDATE(dkh.STATISTIC_DATE, DAYOFWEEK(dkh.STATISTIC_DATE) + 1)
AND aep.EMPLID = dkh.EMPLOYEE_ID
WHERE dkh.STATISTIC_DATE BETWEEN adddate(now(), interval -1 year) AND DATE(now())
group by dkh.SITE, SUBSTR(aep.DEPTID,1,4), week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE, dkh.EMPLOYEE_ID
The order you use in group by doesn't matter. Each unique combination of the values gets a group of its own. Selecting columns you don't group by gives you somewhat arbitrary results; you'd probably want to use some aggregation function on them, such as SUM to get the group total.
Grouping by values you derive from other values that you already use in group by, like below, isn't very useful.
week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE
If two rows have different weeks, they'll also have different dates, right?

SQL Statement Database

I have a Mysql Table that holds dates that are booked (for certain holiday properties).
Example...
Table "listing_availability"
Rows...
availability_date (this shows the date format 2013-04-20 etc)
availability_bookable (This can be yes/no. "Yes" = the booking changeover day and it is "available". "No" means the property is booked for those dates)
All the other dates in the year (apart from the ones with "No") are available to be booked. These dates are not in the database, only the booked dates.
My question is...
I have to make a SQL Statement that first calls the Get Date Function (not sure if this is correct terminology)
Then removes the dates from "availability_date" WHERE "availability_bookable" = "No"
This will give me the dates that are available for bookings, for the year, for a property.
Can anyone help?
Regards M
Seems like you've almost written the query.
SELECT availability_date FROM listing_availability
WHERE availability_bookable <> 'NO'
AND availability_date >= CURDATE()
AND YEAR(CURDATE()) = YEAR(availability_date)
I think I understand, and you'll obviously confirm. Your "availability_booking" has some records in it, but not every single day of the year, only those that may have had something, and not all are committed, some could have yes, some no.
So, you want to simulate All dates within a given date range... Say April 1 - July 1 as someone is looking to book a party within that time period. Instead of pre-filling your production table, you can't say that April 27th is open and available... since no such record exists.
To SIMULATE a calendar of days for a date range, you can do it using MySQL variables and join to "any" table in your database provided it has enough records to SIMULATE the date range you want...
select
#myDate := DATE_ADD( #myDate, INTERVAL 1 DAY ) as DatesForAvailabilityCheck
from
( select #myDate := '2013-03-31' ) as SQLVars,
AnyTableThatHasEnoughRows
limit
120;
This will just give you a list of dates starting with April 1, 2013 (the original #myDate is 1 day before the start date since the field selection adds 1 day to it to get to April 1, then continues... for a limit of 120 days (or whatever you are looking for range based -- 30days, 60, 90, 22, whatever). The "AnyTableThatHasEnoughRows" could actually be your "availability_booking" table, but we are just using it as a table with rows, no join or where condition, just enough to get ... 120 records.
Now, we can use this to join to whatever table you want and apply your condition. You just created a full calendar of days to compare against. Your final query may be different, but this should get it most of the way for you.
select
JustDates.DatesForAvailabilityCheck,
from
( select
#myDate := DATE_ADD( #myDate, INTERVAL 1 DAY ) as DatesForAvailabilityCheck
from
( select #myDate := '2013-03-31' ) as SQLVars,
listing_availability
limit
120 ) JustDates
LEFT JOIN availability_bookable
on JustDates.DatesForAvailabilityCheck = availability_bookable.availability_date
where
availability_bookable.availability_date IS NULL
OR availability_bookable.availability_bookable = "Yes"
So the above uses the sample calendar and looks to the availability. If no such matching date exists (via the IS NULL), then you want it meaning there is no conflict. However, if there IS a record in the table, you only want those where YES, you CAN book it, the entry on file might not be committed and CAN be in your result query of available dates.

Mysql summary query with date range, multiple tables

Im running a sql query that is returning results between dates I have selected (2012-07-01 - 2012-08-01). I can tell from the values they are wrong though.
Im confused cause its not telling me I have a syntax error but the values returned are wrong.
The dates in my database are stored in the date column in the format YYYY-mm-dd.
SELECT `jockeys`.`JockeyInitials` AS `Initials`, `jockeys`.`JockeySurName` AS Lastname`,
COUNT(`runs`.`JockeysID`) AS 'Rides',
COUNT(CASE
WHEN `runs`.`Finish` = 1 THEN 1
ELSE NULL
END
) AS `Wins`,
SUM(`runs`.`StakeWon`) AS 'Winnings'
FROM runs
INNER JOIN jockeys ON runs.JockeysID = jockeys.JockeysID
INNER JOIN races ON runs.RacesID = races.RacesID
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` <= STR_TO_DATE('2012,08,01', '%Y,%m,%d')
GROUP BY `jockeys`.`JockeySurName`
ORDER BY `Wins` DESC`
It's hard to guess what the problem is from your question.
Are you looking to summarize all the races in July and the races on the first of August? That's a slightly strange date range.
You should try the following kind of date-range selection if you want to be more precise. You MUST use it if your races.RaceDate column is a DATETIME expression.
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` < STR_TO_DATE('2012,08,01', '%Y,%m,%d') + INTERVAL 1 DAY
This will pick up the July races and the races at any time on the first of August.
But, it's possible you're looking for just the July races. In that case you might try:
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` < STR_TO_DATE('2012,07,01', '%Y,%m,%d') + INTERVAL 1 MONTH
That will pick up everything from midnight July 1, inclusive, to midnight August 1 exclusive.
Also, you're not using GROUP BY correctly. When you summarize, every column in your result set must either be a summary (SUM() or COUNT() or some other aggregate function) or mentioned in your GROUP BY clause. Some DBMSs enforce this. MySQL just rolls with it and gives strange results. Try this expression.
GROUP BY `jockeys`.`JockeyInitials`,`jockeys`.`JockeySurName`
My best guess is that the jocky surnames are not unique. Try changing the group by expression to:
group by `jockeys`.`JockeyInitials`, `jockeys`.`JockeySurName`
In general, it is bad practice to include columns in the SELECT clause of an aggregation query that are not included in the GROUP BY line. You can do this in MySQL (but not in other databases), because of a (mis)feature called Hidden Columns.