Create a summary row for data across multiple tables

Create a summary row for data across multiple tables - mysql

I'm trying to write a SQL query to generate a summary row for the actions performed by a given user in a given period. I have the following relevant table structure:
users
id
team
audit_periods (can be processing, shipping, break, etc)
user_id
period_type (can be "processing", "shipping", etc -- not currently normalized)
started_at
finished_at (can be null for the current period, hence the logic around times below)
audit_tasks
audit_period_id
audit_task_type_id
created_at
score
audit_task_types
name ("scan", "place_in_pallet", etc)
score (seems redundant, but we need to maintain the score that the audit_task received at the time it was performed, as the audit_task_type score can change later)
For each user for a given period, I'd like to create something like the following row of data:
users.id users.email time_spent_processing time_spent_shipping ... number_of_scans number_of_pallets
which would be calculated by figuring out for each user:
What audit_periods fall at least partially in the desired window? (Uses started_at and finished_at.)
How long did a user spend in each type of audit_period? (Should involve group by audit_periods.period_type, I'd imagine.)
What audit_tasks fall within the desired window? (Uses created_at -- not in the code below yet.)
How many of each type of audit_task did a user accomplish during the window? (Joins out to audit_task_type, and likely involves a group by on audit_task_types.name.)
How many points were earned during the time period? (Sums the scores of all the audit_tasks in the window.)
I've exhausted all of the SQL tricks I know (not many) and came up with something like the following:
select
u.id as user_id,
u.email as email,
u.team as team,
ap.period_type as period_type,
att.name,
time_to_sec(
timediff(least("2011-03-17 00:00:00", ifnull(ap.finished_at, utc_timestamp())), greatest("2011-03-16 00:00:00", ap.started_at))
) as period_duration,
sum(at.score) as period_score
from audit_periods as ap
inner join users as u on ap.user_id = u.id
left join audit_tasks as at on at.audit_period_id = ap.id
left join audit_task_types as att on at.audit_task_type_id = att.id
where (ap.started_at >= "2011-03-16 00:00:00" or (ap.finished_at >= "2011-03-17 00:00:00" and ap.finished_at <= "2011-03-17 00:00:00"))
and (ap.finished_at <= "2011-03-17 00:00:00" or (ap.started_at >= "2011-03-16 00:00:00" and ap.started_at <= "2011-03-16 00:00:00"))
and u.team in ("Foo", "Bar")
group by u.id, ap.id, at.id
but this seems to be functionally equivalent to just selecting all of the audit tasks in the end. I've tried some subqueries as well, but to little avail. More directly, this generates something like (skipping less important columns):
user_id | period_type | period_duration | name | score
1 processing 1800s scan 200
1 shipping 1000s place_in_pallet 100
1 shipping 1000s place_in_pallet 100
1 break 500s null null
when I want:
user_id | processing | shipping | break | scan | place_in_pallet | score
1 1800s 1000s 500s 1 2 400
I can easily fetch all of the audit_tasks for a given user and roll them up in code, but I might be fetching hundreds of thousands of audit_tasks over a given period, so it needs to be done in SQL.
Just to be clear -- I'm looking for a query to generate one row per user, containing summary data collected across the other 3 tables. So, for each user, I want to know how much time he spent in each type of audit_period (3600 seconds processing, 3200 seconds shipping, etc), as well as how many of each audit_task he performed (5 scans, 10 items placed in pallet, etc).
I think I have the elements of a solution, I'm just having trouble piecing them together. I know exactly how I would accomplish this in Ruby/Java/etc, but I don't think I understand SQL well enough to know which tool I'm missing. Do I need a temp table? A union? Some other construct entirely?
Any help is greatly appreciated, and I can clarify if the above is complete nonsense.

You will need to break this up into two crosstab queries which give you the information about audit_periods by user and another query that will give you the audit_task information by user and then join that to the Users table. It isn't clear how you want to roll up the information in each of the cases. For example, if a given user has 10 audit_period rows, how should the query roll up those durations? I assumed a sum of the durations here but you might want a min or max or perhaps even an overall delta.
Select U.user_id
, AuditPeriodByUser.TotalDuration_Processing As processing
, AuditPeriodByUser.TotalDuration_Shipping As shipping
, AuditPeriodByUser.TotalDuration_Break As break
, AuditTasksByUser.TotalCount_Scan As scan
, AuditTasksByUser.TotalCount_Place_In_Pallet As place_in_pallet
, AuditTasksByUser.TotalScore As score
From users As U
Left Join (
Select AP.user_id
, Sum( Case When AP.period_type = 'processing'
Then Time_To_Sec(
TimeDiff(
Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) )
As TotalDuration_Processing
, Sum( Case When AP.period_type = 'shipping'
Then Time_To_Sec(
TimeDiff(
Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) )
As TotalDuration_Shipping
, Sum( Case When AP.period_type = 'break'
Then Time_To_Sec(
TimeDiff(
Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) )
As TotalDuration_Break
From audit_periods As AP
Where AP.started_at >= #StartDate
And AP.finished_at <= #EndDate
Group by AP.user_id
) As AuditPeriodByUser
On AuditPeriodByUser.user_id = U.user_id
Left Join (
Select AP.user_id
, Sum( Case When AT.Name = 'scan' Then 1 Else 0 End ) As TotalCount_Scan
, Sum( Case When AT.Name = 'place_in_pallet' Then 1 Else 0 End ) As TotalCount_Place_In_Pallet
, Sum( AT.score ) As TotalScore
From audit_tasks As AT
Join audit_task_types As ATT
On ATT.id = AT.audit_task_type_id
Join audit_periods As AP
On AP.audit_period_id = AP.id
Where AP.started_at >= #StartDate
And AP.finished_at <= #EndDate
Group By AP.user_id
) As AuditTasksByUser
On AuditTasksByUser.user_id = U.user_id

Related

improve sql query with 2 EXISTS sub queries

I have this query (mysql):
SELECT `budget_items`.*
FROM `budget_items`
WHERE (budget_category_id = 4
AND ((is_custom_for_family = 0)
OR (is_custom_for_family = 1
AND custom_item_family_id = 999))
AND ((EXISTS
(SELECT 1
FROM balance_histories
WHERE balance_histories.budget_item_id = budget_items.id
AND balance_histories.family_id = 999
AND payment_date >= '2021-02-01'
AND payment_date <= '2021-02-28' ))
OR (EXISTS
(SELECT 1
FROM budget_lines
WHERE family_id = 999
AND budget_id = 188311
AND budget_item_id = budget_items.id
AND amount > 0))))
It runs multiple times on app start. It takes more than 10 seconds (all of them).
I have indexes on:
balance_histories table: budget_item_id, family_id (tried also payment_date)
budget_lines table: family_id, budget_id, budget_item_id
How can I improve the speed? Query or maybe mysql (8) configuration.
balance_histories table:
budget_lines table:

I would start this query in reverse of what you have. Assuming you COULD have years of data, but your EXISTS query is looking more specifically at a date-range, or specific budget lines, start there, it will probably be much smaller. Once you have DISTINCT IDs, then go back to the budget items by qualified ID PLUS the additional criteria.
To help optimize the queries, I would have indexes on
table index
balance_histories ( family_id, payment_date, budget_item_id )
budget_lines ( family_id, budget_id, amount )
budget_items ( id, budget_category_id, is_custom_for_family, custom_item_family_id )
select
bi.*
from
-- pre-query a list of DISTINCT IDs from the balance history
-- and budget lines that qualify. THEN join to the rest.
( select distinct
bh.budget_item_id id
from
balance_histories bh
where
bh.family_id = 999
AND bh.payment_date >= '2021-02-01'
AND bh.payment_date <= '2021-02-28'
UNION
select
bl.budget_item_id
FROM
budget_lines bl
WHERE
bl.family_id = 999
AND bl.budget_id = 188311
AND bl.amount > 0 ) PQ
JOIN budget_items bi
on PQ.id = bi.id
AND bi.budget_category_id = 4
AND ( bi.is_custom_for_family = 0
OR
( bi.is_custom_for_family = 1
AND bi.custom_item_family_id = 999 )
)
Feedback
As for many SQL queries, there are typically multiple ways to get a solution. Sometimes using EXISTS works well, sometimes not as much. You need to consider cardinality of your data, and that is what I was shooting for. Look at what you were asking for first: Get budget items that are all category for and custom for family is 1 or 0 (which is all), but if family, only those for 999. You were correct on your balance of AND/OR. However, this is going through EVERY RECORD, and if you have millions of rows, that is what you are scanning through. Only after scanning through every row are you now doing a secondary query (for each record that qualified) against the histories for the specific date range OR family/budget.
My guess is that the number of possible records returned from your two EXISTS queries was going to be very small. So, by starting by getting a DISTINCT list of just those IDs that are part of that union would be the very small subset. Once that single "ID" if found, it now becomes a direct match to the budget items table and have the final filtering limits of categoryID / Family / Custom Item considerations.
By having indexes better match the context of your query WHERE clause will optimize pulling data. I have had answers to several other questions with similar resolutions and clarify indexes and why in those... take a look for example, and another here.

Growth for each quarter+year in SQL over my user table

I am using MYSQL and I have a User database table where my registered users are stored. I'd love to see how many users have registered on an increasing timeline for each quarter. So maybe Q1 2016 I had 1000 users total, then in Q2 2016 I had 2000 users register, in Q3 2016 4000 total users registered, etc (so I want to see the increase, not just how many registered in each quarter)
From another Stack Overflow post, I was able to create a query to see it by each day:
select u.created, count(*)
from (select distinct date(DateCreated) created from `Users`) u
join `Users` u2 on u.created >= date(u2.DateCreated)
group by u.created
and this works for each day, but I'd like to now group it by quarter and year. I tried using the QUARTER(d) function in mysql and even QUARTER(d) + YEAR(d) to concat it but I still can't get the data right (The count(*) ends up producing incredibly high values).
Would anyone be able to help me get my data grouped by quarter/year? My timestamp column is called DateCreated (it's a unix timestamp in milliseconds, so I have to divide by 1000 too)
Thanks so much

I would suggest using a correlated subquery -- this allows you to easily define each row in the result set. I think this is the logic that you want:
select dates.yyyy, dates.q,
(select count(*)
from Users u
where u.DateCreated < dates.mindc + interval 3 month
) as cnt
from (select year(DateCreated) as yyyy, quarter(DateCreated) as q
min(DateCreated) as mindc
from Users u
group by year(DateCreated), quarter(DateCreated)
) dates;

Displaying data with respect to specific date?

I am trying to make a reporting system where I need to display report
for each date.
These is my table schema for selected_items
This is stock_list
I am using php in the back-end and java in the front end to display
the data. I tried a couple of queries to get the desired output but so
far I am not able to get it.These are some of the queries i used.
SELECT
COALESCE(stock_list.date, selected_items.date) AS date,
SUM( stock_list.qty ) AS StockSum,
SUM( stock_list.weight ) AS Stockweight,
COUNT( selected_items.barcode ) AS BilledItems,
SUM( selected_items.weight ) AS Billedweight
FROM stock_list join selected_items
ON stock_list.date = selected_items.date
GROUP BY COALESCE(stock_list.date, selected_items.date)
ORDER BY COALESCE(stock_list.date, selected_items.date);
This gives me the first five columns but the output gives me wrong values.
Then I also tried Union.
SELECT SUM( qty ) AS StockSum, SUM( weight ) AS Stockweight
FROM `stock_list`
WHERE DATE LIKE '08-Jan-2016'
UNION SELECT COUNT( barcode ) AS BilledItems, SUM( weight ) AS Billedweight
FROM `selected_items`
WHERE DATE LIKE '08-Jan-2016'
UNION SELECT SUM( qty ) AS TotalStock, SUM( weight ) AS TotalWeight
FROM `stock_list`;
Here I get the correct values for four columns but the problem is the >result is displayed in two columns when I would like it to be in 4 columns.
Can anyone guide me please I have figured the java part of it but I am not good at php and mysql.
Thank you

Unfortunately, SQL Fiddle crashed while I was trying to execute this query
SELECT sl.date AS date, B.qtySum AS StockSum, B.weightSum AS Stockweight,
C.barcodeCount AS BilledItems, C.weightSum AS Billedweight
FROM stock_list sl
JOIN (SELECT SUM(qty) as qtySum, SUM(weight) as weightSum
FROM STOCK_LIST GROUP BY date) AS B
ON B.date = sl.date
JOIN (SELECT SUM (weight) AS weightSum, COUNT(barcode) AS barcodeCount
FROM SELECTED_ITEMS GROUP BY date) AS C
ON C.date = sl.date;
As it was tried here. The problem with joins is that the rows will be joined multiple times and thus, the sum goes awry. For example, you have four rows that are joined from the second table and so the sum is four times higher as it should. With subqueries you can avoid this problem as you count and sum up variables before joining them and therefore, the numbers should fit. Alas, I couldn't run the query so I'm not 100% sure it works, but it should be the right approach.

Summarize weekly earnings, including empty records

I'm trying to summarize weekly earnings for a time management system (PHP/MySQL), but having a lot of trouble with this SQL query. Basically, I need to return sums of earnings for the past 8 weeks including records for weeks with no earnings, but I can't make this work when I add anything into the WHERE clause to narrow it down to specific kinds of tasks. There are three tables involved in this query:
tbltask stores information about tasks, including the date the task is logged, which user did it, how much time should be billed for, and whether the task is billable or not (some are not billable and should be excluded from the earnings calculation)...
task_id task_name time_est billable date_logged user_id
----------------------------------------------------------------------
223 some task 120 0 2014-12-19 1
224 a billable task 45 1 2014-12-19 2
225 also billable 90 1 2014-12-20 1
tbluser stores user information, so i need to join with it to get the payrate per hour...
user_id payrate
--------------------
1 50
2 75
calendar is just a table with a wide range of dates so that I can join with it and produce results for dates with no records.
datefield
--------------------
2013-01-01
2013-01-02
2013-01-03
[...]
2025-12-31
Below is what I have so far, to give me a total of everyone's earnings by week (starting on a Monday) for the 8 weeks prior to the date in question. This seems to work as expected, but counts all tasks instead of just billable tasks. If there are no tasks logged for any of these weeks, I get records returned with 0 as total_earned, which is important because I need records for the past 8 weeks even if no time is logged.
SELECT FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) AS first_day,
SUM( IFNULL( time_est /60 * payrate, 0 ) ) AS total_earned
FROM calendar
LEFT JOIN tbltask ON tbltask.date_logged = calendar.datefield
LEFT JOIN tbluser ON tbltask.user_id = tbluser .user_id
WHERE datefield <= '2014-12-26'
GROUP BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7))
ORDER BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) DESC
LIMIT 8
BUT, I need to only add up earnings for tasks that are billable (billable=1). As soon as I add this in, I no longer get weeks with no logs, so there are weeks missing from the records returned.
SELECT FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) AS first_day,
SUM( IFNULL( time_est /60 * payrate, 0 ) ) AS total_earned
FROM calendar
LEFT JOIN tbltask ON tbltask.date_logged = calendar.datefield
LEFT JOIN tbluser ON tbltask.user_id = tbluser .user_id
WHERE datefield <= '2014-12-26' AND billable = 1
GROUP BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7))
ORDER BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) DESC
LIMIT 8
I understand why this result makes sense (because there are no billable=1 tasks completed in those weeks, so no record is returned), but I can't for the life of me figure out how to rewrite the query to get what I want. I would also like to write queries that get the earnings for a particular user instead of all users totalled (user_id=1) but of course that gives me the same problem. I think I might need to use a subquery?
Can anyone point me in the right direction?
SOLUTION:
In case anyone else ends up struggling with something similar, I used terary's suggestion of IF() to move the billable=1 and user_id=1 logic inside the SUM calculation, instead of putting it in the WHERE clause. This solved my problem because it returns all the empty weeks with a 0 for total earnings instead of skipping those weeks without records. I'm sure there are other ways to do this, but this does work. Here's the resulting query:
SELECT FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) AS first_day,
SUM(IF(billable=1 AND user_id=1, time_est /60 * payrate, 0)) AS total_earned
FROM calendar
LEFT JOIN tbltask ON tbltask.date_logged = calendar.datefield
LEFT JOIN tbluser ON tbltask.user_id = tbluser .user_id
WHERE datefield <= '2014-12-26' AND billable = 1
GROUP BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7))
ORDER BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) DESC
LIMIT 8

Forgive me I didn't create the tables the run the queries. So I can not debug what you have.
However,
A)
You can create tbltasks.wkyear which which will eleminate the need for one of your tables.
this can be stored via on insert via trigger (maybe default value?). or just write the thing.
B) Well I guess I am uncertain of your goal?
I guess I have half a clue now.
I think the SELECT IF(billiable=1,50,0) is your friend.
http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html
http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_yearweek
SELECT YEARWEEK('1987-01-01');
-> 198653

sql query to get duplicate records with different dates

I need to get records with different date field ,
table Sites:
field id
reference
created
Every day we add lot of records, so I need to do a function that extract all records existing with duplicates of rows just was added, to do some notifications.
the conditions that i can't get is the difference between records of the current day and the old data in the table should be (one day to 4 days) .
If is there any simple query to do that without using transaction .

I'm not sure I totally understand what you mean by duplicate records, but here's a basic date query:
SELECT fieldId, reference, created, DATE(created) as the_date
FROM Sites
WHERE the_date
BETWEEN DATE( DATE_SUB( NOW() , INTERVAL 3 DAY ) )
AND DATE ( NOW() )

I'm making several assumptions such as:
You don't want the "first" row returned
Duplicates don't carry the
date forward (The next after initial 4 days is not a duplicate)
The 4 days means +4 days so Day 5 is included
So, my code is :
with originals as (
select s1.*
from sites as s1
where 0 = (
select count(*)
from sites as s2
where s1.field_id = s2.field_id
and s1.reference = s2.reference
and s1.created <> s2.created
and DATEDIFF(DAY,s2.created, s1.created) between 1 and 4
)
)
select s1.*
from sites as s1
inner join originals as o
on s1.field_id = o.field_id
and s1.reference = o.reference
and s1.created <> o.created
where DATEDIFF(DAY,o.created, s1.created) between 1 and 4
order by 1,2,3;
Here it is in a fiddle: http://sqlfiddle.com/#!3/9b407/20
This could be simpler if some conditions are relaxed.

thanks a lot for every one who tried to help me ,
i have found this solution after lot of test
SELECT `id`,`reference`,count(`config_id`) as c,`created` FROM `sites`
where datediff(date(current_date()),date(`created`)) < 4
group by `reference`
having c > 1
thanks a lot for your help

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008