What is the best way to think about the Group By function in MySQL?
I am writing a MySQL query to pull data through an ODBC connection in a pivot table in Excel so that users can easily access the data.
For example, I have:
Select
statistic_date,
week(statistic_date,4),
year(statistic_date),
Emp_ID,
count(distict Emp_ID),
Site
Cost_Center
I'm trying to count the number of unique employees we have by site by week. The problem I'm running into is around year end, the calendar years don't always match up so it is important to have them by date so that I can manually filter down to the correct dates using a pivot table (2013/2014 had a week were we had to add week 53 + week 1).
I'm experimenting by using different group by statements but I'm not sure how the order matters and what changes when I switch them around.
i.e.
Group by week(statistic_date,4), Site, Cost_Center, Emp_ID
vs
Group by Site, Cost_Center, week(statistic_date,4), Emp_ID
Other things to note:
-Employees can work any number of days. Some are working 4 x 10's, others 5 x 8's with possibly a 6th day if they sign up for OT. If I sum the counts by week, I get anywhere between 3-7 per Emp_ID. I'm hoping to get 1 for the week.
-There are different pay code per employee so the distinct count helps when we are looking by day (VTO = Voluntary Time Off, OT = Over Time, LOA = Leave of Absence, etc). The distinct count will show me 1, where often times I will have 2-3 for the same emp in the same day (hits 40 hours and starts accruing OT then takes VTO or uses personal time in the same day).
I'm starting with a query I wrote to understand our paid hours by week. I'm trying to adapt it for this application. Actual code is below:
SELECT
dkh.STATISTIC_DATE AS 'Date'
,week(dkh.STATISTIC_DATE,4) as 'Week'
,month(dkh.STATISTIC_DATE) as 'Month'
,year(dkh.STATISTIC_DATE) as 'Year'
,dkh.SITE AS 'Site ID Short'
,aep.LOC_DESCR as 'Site Name'
,dkh.EMPLOYEE_ID AS 'Employee ID'
,count(distinct dkh.EMPLOYEE_ID) AS 'Distinct Employee ID'
,aep.NAME AS 'Employee Name'
,aep.BUSINESS_TITLE AS 'Business_Ttile'
,aep.SPRVSR_NAME AS 'Manager'
,SUBSTR(aep.DEPTID,1,4) AS 'Cost_Center'
,dkh.PAY_CODE
,dkh.PAY_CODE_SHORT
,dkh.HOURS
FROM metrics.DAT_KRONOS_HOURS dkh
JOIN metrics.EMPLOYEES_PUBLIC aep
ON aep.SNAPSHOT_DATE = SUBDATE(dkh.STATISTIC_DATE, DAYOFWEEK(dkh.STATISTIC_DATE) + 1)
AND aep.EMPLID = dkh.EMPLOYEE_ID
WHERE dkh.STATISTIC_DATE BETWEEN adddate(now(), interval -1 year) AND DATE(now())
group by dkh.SITE, SUBSTR(aep.DEPTID,1,4), week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE, dkh.EMPLOYEE_ID
The order you use in group by doesn't matter. Each unique combination of the values gets a group of its own. Selecting columns you don't group by gives you somewhat arbitrary results; you'd probably want to use some aggregation function on them, such as SUM to get the group total.
Grouping by values you derive from other values that you already use in group by, like below, isn't very useful.
week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE
If two rows have different weeks, they'll also have different dates, right?
Related
I have a query as such
SELECT right(accounts.username, length(accounts.username)-
INSTR(accounts.username, '#')) domain,
COUNT(*) email_count
FROM tickets
LEFT JOIN accounts ON tickets.user = accounts.ID
WHERE (tickets.timestamp >= UNIX_TIMESTAMP(MONTH(CURRENT_DATE())))
GROUP BY domain
ORDER BY email_count DESC
I have a ticket table that I LEFT JOIN to associate the user accounts of that ticket to get the email(username) of that user.
I am trying to count the users email and how many tickets appear with a particular domain name of that user for the current MONTH. Problem is that it is ignoring the MONTH and returning all records that match.
For instance
yahoo.com 3,356
gmail.com 1,345
If I do a search for all records I get these numbers, but it should be much lower if it is just for the month. I am using UNIX timestamps for this.
Can anyone help me?
If you consider the UNIX_TIMESTAMP(MONTH(CURRENT_DATE()))) expression:
MONTH(CURRENT_DATE()) => 1
UNIX_TIMESTAMP(1) => this should result either in an error (1292 incorrect datetime value) or warning of the same and 0 as a result, depending on whether strict sql mode is enabled.
Since you wrote the query returns all records, strict sql mode must be turned off, which can cause issues like this. It would have been easier to get a straight error message.
If you want to return records from the current month, then you can use the following expression, where I used year() and month() functions to get current year and month and concatenated 1 to it to get the 1st day of the month:
tickets.timestamp >= UNIX_TIMESTAMP(CONCAT(YEAR(CURRENT_DATE()),'-',MONTH(CURRENT_DATE()),'-','1')
WHERE tickets.timestamp >= UNIX_TIMESTAMP(MONTH(CURRENT_DATE()))
This expression probably does not do what you think. MONTH() returns the number of the month (1 to 12), while you want the beginning of the current month.
You can use the following expression to compute the beginning of the month:
date_format(current_date(), '%Y-%m-01')
In your condition:
where tickets.timestamp >= unix_timestamp(date_format(current_date(), '%Y-%m-01'))
Modified for only current month:
SELECT
RIGHT(accounts.username, length(accounts.username)-INSTR(accounts.username, '#')) AS domain, COUNT(1) AS email_count
FROM tickets
LEFT JOIN accounts ON tickets.user = accounts.ID
WHERE
YEAR(tickets.timestamp) = YEAR(NOW())
AND MONTH(tickets.timestamp) = MONTH(NOW())
GROUP BY domain
ORDER BY email_count DESC
I've just started a job and my boss wants me to learn mySQL so please bear with me, i've been learning for only 2 days and i'm not that good at it yet.
So i've been given 3 tables and several tasks to do.
The tables are:
mobile_log_messages_sms
mobile_providers
service_instances
And in them i've got to:
Find out how many messages there were in the last 25 days and how
much income did they make
Then i need to group them by day (so per day, exclude hours) and
provider name.
Also i need to ignore all the messages that have an empty string
under the service column
Also i need to ignore the messages that made 0 income and count only
those that have the column service_enabled = 1
And then i need to sort it descending, by date.
in the tables
mobile_log_messages_sms:
message_id - used to count the messages
price - using for price obviously, exlude those with 0
time - date in yyyy/mm/dd hh:mm:ss format
service - exclude all those that have an empty string (or null)
mobile_providers
provider_name - to use to group with
service_instances
enabled - only use if value is 1
I've started with:
SELECT message_id, price, time
FROM mobile_log_messages_sms
WHERE time BETWEEN '2017-02-26 00:00:00'
AND time AND '2017-03-22 00:00:00'
But i need to change the date format and then use the JOIN commands but i don't know how, and i know i need to add more to it, but i'm stumped even at the start. Also the starting just lists the messages but i need to count the total sum of the income (price) per day.
Can anyone point me in the right direction at least since i'm still a noob? Many thanks in advance and sorry if i worded something badly, english is not my first language.
Find out how many messages there were in the last 25 days and how much income did they make
1.
SELECT COUNT(message_id), SUM(price)
FROM mobile_log_messages_sms
WHERE CAST(time AS DATE) BETWEEN DATE_SUB(CURRENT_DATE,INTERVAL 25 DAY)
AND CURRENT_DATE;
2.
SELECT COUNT(message_id), SUM(price)
FROM mobile_log_messages_sms
WHERE CAST(time AS DATE) BETWEEN DATE_SUB(CURRENT_DATE,INTERVAL 25 DAY)
AND CURRENT_DATE
GROUP BY CAST(time AS DATE);
3.
SELECT COUNT(message_id), SUM(price)
FROM mobile_log_messages_sms
WHERE CAST(time AS DATE) BETWEEN DATE_SUB(CURRENT_DATE,INTERVAL 25 DAY)
AND CURRENT_DATE AND service IS NULL
GROUP BY CAST(time AS DATE);
rest can't done with join so make sure that at least one column should be common in tables.
The question I am working on is as follows:
What is the difference in the amount received for each month of 2004 compared to 2003?
This is what I have so far,
SELECT #2003 = (SELECT sum(amount) FROM Payments, Orders
WHERE YEAR(orderDate) = 2003
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate));
SELECT #2004 = (SELECT sum(amount) FROM Payments, Orders
WHERE YEAR(orderDate) = 2004
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate));
SELECT MONTH(orderDate), (#2004 - #2003) AS Diff
FROM Payments, Orders
WHERE Orders.customerNumber = Payments.customerNumber
Group By MONTH(orderDate);
In the output I am getting the months but for Diff I am getting NULL please help. Thanks
I cannot test this because I don't have your tables, but try something like this:
SELECT a.orderMonth, (a.orderTotal - b.orderTotal ) AS Diff
FROM
(SELECT MONTH(orderDate) as orderMonth,sum(amount) as orderTotal
FROM Payments, Orders
WHERE YEAR(orderDate) = 2004
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate)) as a,
(SELECT MONTH(orderDate) as orderMonth,sum(amount) as orderTotal FROM Payments, Orders
WHERE YEAR(orderDate) = 2003
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate)) as b
WHERE a.orderMonth=b.orderMonth
Q: How do I subtract two declared variables in MySQL.
A: You'd first have to DECLARE them. In the context of a MySQL stored program. But those variable names wouldn't begin with an at sign character. Variable names that start with an at sign # character are user-defined variables. And there is no DECLARE statement for them, we can't declare them to be a particular type.
To subtract them within a SQL statement
SELECT #foo - #bar AS diff
Note that MySQL user-defined variables are scalar values.
Assignment of a value to a user-defined variable in a SELECT statement is done with the Pascal style assignment operator :=. In an expression in a SELECT statement, the equals sign is an equality comparison operator.
As a simple example of how to assign a value in a SQL SELECT statement
SELECT #foo := '123.45' ;
In the OP queries, there's no assignment being done. The equals sign is a comparison, of the scalar value to the return from a subquery. Are those first statements actually running without throwing an error?
User-defined variables are probably not necessary to solve this problem.
You want to return how many rows? Sounds like you want one for each month. We'll assume that by "year" we're referring to a calendar year, as in January through December. (We might want to check that assumption. Just so we don't find out way too late, that what was meant was the "fiscal year", running from July through June, or something.)
How can we get a list of months? Looks like you've got a start. We can use a GROUP BY or a DISTINCT.
The question was... "What is the difference in the amount received ... "
So, we want amount received. Would that be the amount of payments we received? Or the amount of orders that we received? (Are we taking orders and receiving payments? Or are we placing orders and making payments?)
When I think of "amount received", I'm thinking in terms of income.
Given the only two tables that we see, I'm thinking we're filling orders and receiving payments. (I probably want to check that, so when I'm done, I'm not told... "oh, we meant the number of orders we received" and/or "the payments table is the payments we made, the 'amount we received' is in some other table"
We're going to assume that there's a column that identifies the "date" that a payment was received, and that the datatype of that column is DATE (or DATETIME or TIMESTAMP), some type that we can reliably determine what "month" a payment was received in.
To get a list of months that we received payments in, in 2003...
SELECT MONTH(p.payment_received_date)
FROM payment_received p
WHERE p.payment_received_date >= '2003-01-01'
AND p.payment_received_date < '2004-01-01'
GROUP BY MONTH(p.payment_received_date)
ORDER BY MONTH(p.payment_received_date)
That should get us twelve rows. Unless we didn't receive any payments in a given month. Then we might only get 11 rows. Or 10. Or, if we didn't receive any payments in all of 2003, we won't get any rows back.
For performance, we want to have our predicates (conditions in the WHERE clause0 reference bare columns. With an appropriate index available, MySQL will make effective use of an index range scan operation. If we wrap the columns in a function, e.g.
WHERE YEAR(p.payment_received_date) = 2003
With that, we will be forcing MySQL to evaluate that function on every flipping row in the table, and then compare the return from the function to the literal. We prefer not do do that, and reference bare columns in predicates (conditions in the WHERE clause).
We could repeat the same query to get the payments received in 2004. All we need to do is change the date literals.
Or, we could get all the rows in 2003 and 2004 all together, and collapse that into a list of distinct months.
We can use conditional aggregation. Since we're using calendar years, I'll use the YEAR() shortcut (rather than a range check). Here, we're not as concerned with using a bare column inside the expression.
SELECT MONTH(p.payment_received_date) AS `mm`
, MAX(MONTHNAME(p.payment_received_date)) AS `month`
, SUM(IF(YEAR(p.payment_received_date)=2004,p.payment_amount,0)) AS `2004_month_total`
, SUM(IF(YEAR(p.payment_received_date)=2003,p.payment_amount,0)) AS `2003_month_total`
, SUM(IF(YEAR(p.payment_received_date)=2004,p.payment_amount,0))
- SUM(IF(YEAR(p.payment_received_date)=2003,p.payment_amount,0)) AS `2004_2003_diff`
FROM payment_received p
WHERE p.payment_received_date >= '2003-01-01'
AND p.payment_received_date < '2005-01-01'
GROUP
BY MONTH(p.payment_received_date)
ORDER
BY MONTH(p.payment_received_date)
If this is a homework problem, I strongly recommend you work on this problem yourself. There are other query patterns that will return an equivalent result.
I think this is the problem:
In #2003 and #2004, you select only the sum. And even if you group by the month you still select one column i.e. each row does not say what month it is select for. So when you try to subtract SQL asks which row in #2003 should be subtracted from #2004.
So I think the solution is to select the month with the sum and do the subtract later based on the month.
I am having trouble understanding the structure of the query i wish to perform. What i have is a large set of data in a table with multiple UnitID's. The units have temperatures and Timestamps of when the temperatures where recorded.
I want to be able to display the data where I can see the Average temperature of each unit separated in a weekly interval.
Apologies for my previous post, I'm still a novice with querying. But i will show you what i have done so far.
SELECT UnitID AS 'Truck ID',
AVG(Temp) As 'AVG Temp',
LogTime AS 'Event Time',
DAY(g.`LogTime`) as 'Day',
MONTH(g.`LogTime`) as 'Month',
COUNT(*) AS 'Count'
FROM `temperature` as g
WHERE DATE_SUB(g.`LogTime`,INTERVAL 1 WEEK)
AND Ana > 13 AND Ana < 16 AND NOT g.Temp = -100
GROUP BY 'truck id', YEAR(g.`LogTime`),MONTH(g.`LogTime`),WEEK(g.`LogTime`)
Order BY 'truck id', YEAR(g.`LogTime`),MONTH(g.`LogTime`),WEEK(g.`LogTime`)
;
(Sorry, I don't know how to display a table result at the moment)
This result gives me the weekly temperature averages of a truck, and shows me on which day of the month the temperature was recorded, as well as a count of temperatures per week, per truck.
The Query I want to perform , creates 5 columns, being UnitID, Week1, Week2, Week3, Week4.
Within the 'Week' columns I want to be able to display a weekly(Every day of the Week) temperature average for each truck, where the following week is set a week after the previous week (ie. Week2 is set to display the avg(temp) one week from Week1).
And this is where I am stuck on the structure of how to create the query. Im not sure if i need to create sub-queries or use a Union clause. I have tried a couple of queries , but i have deleted them because they did not work. I'm not sure if this query is too complex or if its even possible.
If anyone will be able to help I would greatly appreciate it. If there is any other info I can supply that will help, I will try to do so.
Hopefully this is solvable. :p
MySQL has a WEEK function that will return the week of the year as an integer (0-52). You can use that in you GROUP BY clause, and then use the AVG aggregation function to get the average temperature. Your query would look something like this:
SELECT unitID, WEEK(dateColumn) AS week, AVG(tempColumn) AS averageTemperature
FROM myTable
GROUP BY unitID, WEEK(dateColumn);
Here is a list of other helpful Date and Time Functions that may be useful for querying your database.
I have a table (we're on InfoBright columnar storage and I use MySQL Workbench as my interface) that essentially tracks users and a count of activities with a datestamp. It's a daily aggregate table. Schema is essentially
userid (int)
activity_count (int)
date (date)
What I'm trying to find is how many of my users are churning from month to month, with a basis of an active user defined as one with a monthly activity count that sums up to > 10
To find how many users are active in a given month I am currently using
select year, month, count(distinct user) as users
from
(
select YEAR(date) as year, MONTH(date) as month, userid as user, sum(activity_count) as activity
from table
group by YEAR(date), MONTH(date), userid
having activity > 10
order by YEAR(date), MONTH(date)
) t1
group by year, month
Not being a SQL expert, I am sure this can be improved and would appreciate the input on that.
My bigger goal though is to figure out from month to month, how many of the users who are in this count are new or repeat from the previous month. I don't know how to do that without what feels like ugly nesting or joining, and I feel like it should be fairly simple.
Thanks in advance.
I think that further nesting is the best way to achieve this. I would look to do something like selecting the user for the min concatenated Year & Month as a middle layer to the above (i.e. between outer and inner queries) so that you can establish the first month that the user became active. You can then add a where clause to the outer query to filter so that only the months you require are showing. Let me know if you need help with the syntax.