SQL Pivot Table with Subtraction - mysql

I am using SQL and I have three columns of user_id, user_action, and timestamp that apply timestamps to five different types of user actions numbered 1 through 5.
I have several thousand user_ids and actions over a period of several years. Example of Raw Data
First, I need to create a pivot table that has the timestamps from
only two of the user_actions, grouped by user_id, and then create a
brand new column that subtracts the time difference - call this
column time_difference.
The code that was provided by Caius Jard in the comments works for this part.
Now, I need to add another column of week number (TIMESTAMP is in DATETIME2 format so I need to incorporate
DATEPART(week, timestamp) as week into this code and use it to create a two week moving average based on the week number and time_difference.

This isnt a complete answer to the entire question. Here's a stub for step 1:
SELECT
user,
MAX(CASE WHEN action = 2 THEN action END) as action2,
MAX(CASE WHEN action = 5 THEN action END) as action5,
DATEDIFF(
MAX(CASE WHEN action = 2 THEN action END),
MAX(CASE WHEN action = 5 THEN action END)
) as days_between
FROM t
WHERE action in (2,5)
GROUP BY user
Though this doesn't calculate your minus column-I wasn't entirely sure what data type timestamp is and whether any direct math was possible, or if it's a string that needs converting. If you can add this detail to your q it will help (or leave a comment)
I wasn't easily able to make sense of your step 2, please enhance your q by providing expected results

Related

MYSQL SUM, value has to change on condition

I have a "CONTRACTS" table in which the user can select whether a Contract is "ANUAL" or "MONTHLY" (working on MariaDB/phpmyadmin)
The data is stored in the following manner:
CONTRACT
PERIOD
CICLE
SALE PRICE
CATEGORY
001
1
YEARLY
12000
CAT1
002
1
MONTHLY
1000
CAT2
I want to make a report that tells me the SUM of monthly contracts by CATEGORY
RIGHT NOW, THIS QUERY BELOW WORKS but its useless, since its doing SUM of "yearly" contracts along with monthly contracts
SELECT SUM(contracts.salesprice), `categories`.*
FROM `contracts`
LEFT JOIN `categories` ON `contratos`.`cat_id` = `categories`.`id_cat`
GROUP BY categorias.descripcion_cat;1
I'm a newbie and so far I was fine with INSERT, SELECT, UPDATE, DELETE;
I tried reading all documentation about CASE or IF, but I cant figure how to tell mysql to SUM based AND calculate on conditions
when CICLE = YEARLY then SALEPRICE /12 (to get the monthly value)
You were on the correct track with CASE.
The following code snippet will convert your yearly sales prices into monthly:
SUM(
CASE
WHEN contracts.cicle = 'YEARLY' THEN (contracts.salesprice / 12)
WHEN contracts.cicle = 'MONTHLY' THEN contracts.salesprice
ELSE 0
END
)
To use it in your query, simply replace your SUM(...) with that one.
To explain what it is doing, the CASE statement has several WHEN conditions. It uses the value of the first one that is true, if none are true, it will use the ELSE value (which you can change if you don't like 0). All of those resulting values are then summed up with SUM.
The benefit of CASE over IF is that CASE can be expanded as needed if you need more calculations for bi-annual, quarter, etc.

Aggregating/Grouping a set of rows/records in MySQL

I have a table say "sample" which saves a new record each five minutes.
Users might ask for data collected for a specific sampling interval of say 10 min or 30 min or an hour.
Since I have a record every five minutes, when a user asks for data with a hour sample interval, I will have to club/group every 12 (60/5) records in to one record (already sorted based on the time-stamp), and the criteria could be either min/max/avg/last value.
I was trying to do this in Java once I fetch all the records, and am seeing pretty bad performance as I have to iterate through the collection multiple times, I have read of other alternatives like jAgg and lambdaj, but wanted to check if that's possible in SQL (MySQL) itself.
The sampling interval is dynamic and the aggregation function (min/max/avg/last) too is user provided.
Any pointers ?
You can do this in SQL, but you have to carefully construct the statement. Here is an example by hour for all four aggregations:
select min(datetime) as datetime,
min(val) as minval, max(val) as maxval, avg(val) as avgval,
substring_index(group_concat(val order by datetime desc), ',', 1) as lastval
from table t
group by floor(to_seconds(datetime) / (60*60));

MySQL Group By Order and Count(Distinct)

What is the best way to think about the Group By function in MySQL?
I am writing a MySQL query to pull data through an ODBC connection in a pivot table in Excel so that users can easily access the data.
For example, I have:
Select
statistic_date,
week(statistic_date,4),
year(statistic_date),
Emp_ID,
count(distict Emp_ID),
Site
Cost_Center
I'm trying to count the number of unique employees we have by site by week. The problem I'm running into is around year end, the calendar years don't always match up so it is important to have them by date so that I can manually filter down to the correct dates using a pivot table (2013/2014 had a week were we had to add week 53 + week 1).
I'm experimenting by using different group by statements but I'm not sure how the order matters and what changes when I switch them around.
i.e.
Group by week(statistic_date,4), Site, Cost_Center, Emp_ID
vs
Group by Site, Cost_Center, week(statistic_date,4), Emp_ID
Other things to note:
-Employees can work any number of days. Some are working 4 x 10's, others 5 x 8's with possibly a 6th day if they sign up for OT. If I sum the counts by week, I get anywhere between 3-7 per Emp_ID. I'm hoping to get 1 for the week.
-There are different pay code per employee so the distinct count helps when we are looking by day (VTO = Voluntary Time Off, OT = Over Time, LOA = Leave of Absence, etc). The distinct count will show me 1, where often times I will have 2-3 for the same emp in the same day (hits 40 hours and starts accruing OT then takes VTO or uses personal time in the same day).
I'm starting with a query I wrote to understand our paid hours by week. I'm trying to adapt it for this application. Actual code is below:
SELECT
dkh.STATISTIC_DATE AS 'Date'
,week(dkh.STATISTIC_DATE,4) as 'Week'
,month(dkh.STATISTIC_DATE) as 'Month'
,year(dkh.STATISTIC_DATE) as 'Year'
,dkh.SITE AS 'Site ID Short'
,aep.LOC_DESCR as 'Site Name'
,dkh.EMPLOYEE_ID AS 'Employee ID'
,count(distinct dkh.EMPLOYEE_ID) AS 'Distinct Employee ID'
,aep.NAME AS 'Employee Name'
,aep.BUSINESS_TITLE AS 'Business_Ttile'
,aep.SPRVSR_NAME AS 'Manager'
,SUBSTR(aep.DEPTID,1,4) AS 'Cost_Center'
,dkh.PAY_CODE
,dkh.PAY_CODE_SHORT
,dkh.HOURS
FROM metrics.DAT_KRONOS_HOURS dkh
JOIN metrics.EMPLOYEES_PUBLIC aep
ON aep.SNAPSHOT_DATE = SUBDATE(dkh.STATISTIC_DATE, DAYOFWEEK(dkh.STATISTIC_DATE) + 1)
AND aep.EMPLID = dkh.EMPLOYEE_ID
WHERE dkh.STATISTIC_DATE BETWEEN adddate(now(), interval -1 year) AND DATE(now())
group by dkh.SITE, SUBSTR(aep.DEPTID,1,4), week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE, dkh.EMPLOYEE_ID
The order you use in group by doesn't matter. Each unique combination of the values gets a group of its own. Selecting columns you don't group by gives you somewhat arbitrary results; you'd probably want to use some aggregation function on them, such as SUM to get the group total.
Grouping by values you derive from other values that you already use in group by, like below, isn't very useful.
week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE
If two rows have different weeks, they'll also have different dates, right?

Calculating time difference between activity timestamps in a query

I'm reasonably new to Access and having trouble solving what should be (I hope) a simple problem - think I may be looking at it through Excel goggles.
I have a table named importedData into which I (not so surprisingly) import a log file each day. This log file is from a simple data-logging application on some mining equipment, and essentially it saves a timestamp and status for the point at which the current activity changes to a new activity.
A sample of the data looks like this:
This information is then filtered using a query to define the range I want to see information for, say from 29/11/2013 06:00:00 AM until 29/11/2013 06:00:00 PM
Now the object of this is to take a status entry's timestamp and get the time difference between it and the record on the subsequent row of the query results. As the equipment works for a 12hr shift, I should then be able to build a picture of how much time the equipment spent doing each activity during that shift.
In the above example, the equipment was in status "START_SHIFT" for 00:01:00, in status "DELAY_WAIT_PIT" for 06:08:26 and so-on. I would then build a unique list of the status entries for the period selected, and sum the total time for each status to get my shift summary.
You can use a correlated subquery to fetch the next timestamp for each row.
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#;
Then you can use that query as a subquery in another query where you compute the duration between timestamp and next_timestamp. And then use that entire new query as a subquery in a third where you GROUP BY status and compute the total duration for each status.
Here's my version which I tested in Access 2007 ...
SELECT
sub2.status,
Format(Sum(Nz(sub2.duration,0)), 'hh:nn:ss') AS SumOfduration
FROM
(
SELECT
sub1.status,
(sub1.next_timestamp - sub1.timestamp) AS duration
FROM
(
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#
) AS sub1
) AS sub2
GROUP BY sub2.status;
If you run into trouble or need to modify it, break out the innermost subquery, sub1, and test that by itself. Then do the same for sub2. I suspect you will want to change the WHERE clause to use parameters instead of hard-coded times.
Note the query Format expression would not be appropriate if your durations exceed 24 hours. Here is an Immediate window session which illustrates the problem ...
' duration greater than one day:
? #2013-11-30 02:00# - #2013-11-29 01:00#
1.04166666667152
' this Format() makes the 25 hr. duration appear as 1 hr.:
? Format(#2013-11-30 02:00# - #2013-11-29 01:00#, "hh:nn:ss")
01:00:00
However, if you're dealing exclusively with data from 12 hr. shifts, this should not be a problem. Keep it in mind in case you ever need to analyze data which spans more than 24 hrs.
If subqueries are unfamiliar, see Allen Browne's page: Subquery basics. He discusses correlated subqueries in the section titled Get the value in another record.

MySQL Query to perform calculation and display data based on 2 different date criteria

Good morning,
I am trying to combine two queries into one so that the result array can be populated into a single table. Data is pulled from a single table, and math calculations must take place for one of the columns. Here is what I have currently:
SELECT
laboratory,
SUM(total_produced_week) AS total_produced_sum,
SUM(total_produced_over14) AS total_over14_sum,
100*(SUM(total_produced_over14)/sum(total_produced_week)) as divided_sum,
max(case when metrics_date =maxdate then total_backlog else null end) as total_backlog,
max(case when metrics_date =maxdate then days_workable else null end) as days_workable,
max(case when metrics_date =maxdate then workable_backlog else null end) as workable_backlog,
max(case when metrics_date =maxdate then deferred_over_30_days else null end) as deferred_over_30_days
FROM
test,
(
select max(metrics_date) as maxdate
from metrics
) as x
WHERE
YEAR(metrics_date) = YEAR(CURDATE())
AND MONTH(metrics_date) = MONTH(CURDATE())
GROUP BY
laboratory
ORDER BY 1 ASC
Here's the breakdown:
For each laboratory site, I need:
1) Perform a MONTH TO DATE (current month only) sum, division and multiply by 100 for each site to obtain percentage.
2) Display other columns (total_backlog, days_workable, workable_backlog, deferred_over_30_days) for the most recent update date (metrics_date) only.
The above query performs #1 just fine - I get a total_produced_sum, total_over14_sum and divided_sum column with correct math.
The other columns mentioned in #2, however, return NULL. Data is available in the table for the most recently updated date, so the columns should be reporting that data. It seems like I have a problem with the CASE, but I'm not very familiar with the function so it could be incorrect.
I am running MySQL 5.0.45
Thanks in advance for any suggestions!
Chris
P.S. Here are the two original queries that work correctly. These need to be combined so that the full resultset can be output to a table, organized by laboratory.
Query 1:
SELECT SUM(total_produced_week) AS total_produced_sum,
SUM(total_produced_over14) AS total_over14_sum
FROM test
WHERE laboratory = 'Site1'
AND YEAR(metrics_date) = YEAR(CURDATE()) AND MONTH(metrics_date) = MONTH(CURDATE())
Query 2:
SELECT laboratory, total_backlog, days_workable, workable_backlog, deferred_over_30_days,
items_over_10_days, open_ncs, total_produced_week, total_produced_over14
FROM metrics
WHERE metrics_date = (select MAX(metrics_date) FROM metrics)
ORDER BY laboratory ASC
Operator Error.
I created a copy of the original table (named "metrics") to a table named "test". I then modified the metrics_date in the new "test" table to include data from January 2011 (for the month-to-date). While the first part of the query that performs the math was using the "test" table (and working properly), the second half that pulls the most-recently-updated data was using the original "metrics" table, which did not have any rows with a metrics_date this month.
When I changed the query to use "test" for both parts of the query, everything works as expected. And now I feel really dumb.
Thanks anyway, guys!