I have been looking at several different questions related to hourly average queries but I could not find any that addresses the following.
I have a log table that keeps track on how many times a page is accessed by a user:
ID USERID PAGEID SECNO DATE
1 123 120 14 6/08/2013 10:07:29 AM
1 124 438 1 6/08/2013 11:00:01 AM
1 123 211 18 6/09/2013 14:07:59 PM
1 123 120 14 6/10/2013 05:07:18 PM
1 124 312 4 6/10/2013 08:04:32 PM
1 128 81 54 6/11/2013 07:02:15 AM
and I am trying to get two different queries. One that looks like this:
HOURLY Count Average
12am 0 0
1am 0 0
2am 0 0
3am 0 0
4am 0 0
5am 1 0
6am 0 0
7am 1 0
8am 0 0
9am 0 0
10am 1 0
11am 1 0
12pm 0 0
1pm 0 0
2pm 1 0
3pm 0 0
4pm 0 0
5pm 1 0
6pm 0 0
7pm 0 0
8pm 1 0
9pm 0 0
10pm 0 0
11pm 0 0
The second query like this:
DAY PERCENTAGE
Monday 10%
Tuesday 16%
Wednesday 14%
Thursday 22%
Friday 21%
Saturday 14%
Sunday 3%
**Please notice that the average value is just a sample
So far for the first query I have something like this:
SELECT
HOUR(date) AS hourly,
Count(*)
FROM
logs
GROUP BY
hourly
I tried adding AVG() after Count() but did not work.
My log table does not have data for every single hour but i still need to display all the hours on my report. if hour empty, then value 0. Any ideas how could I achieve that?
Try this for the first query:
SELECT
h.hour,
IFNULL(tmp.the_count,0),
IFNULL(tmp.the_avg,0)
FROM
hourly h
LEFT JOIN (
SELECT
hourly,
SUM(visits) the_count,
SUM(visits)/COUNT(DISTINCT userid) as the_avg
FROM (
SELECT
HOUR(date) AS hourly,
COUNT(*) as visits,
userid
FROM
logs
GROUP BY
hourly,
userid
) as tmp
GROUP BY
hourly
) as tmp
ON tmp.hourly = h.hour
Try this for the second query:
SELECT
theday,
IFNULL(percentage,0) as percentage
FROM (
SELECT DATE_FORMAT('2013-06-16','%W') as theday UNION
SELECT DATE_FORMAT('2013-06-16' - INTERVAL 1 DAY,'%W') as theday UNION
SELECT DATE_FORMAT('2013-06-16' - INTERVAL 2 DAY,'%W') as theday UNION
SELECT DATE_FORMAT('2013-06-16' - INTERVAL 3 DAY,'%W') as theday UNION
SELECT DATE_FORMAT('2013-06-16' - INTERVAL 4 DAY,'%W') as theday UNION
SELECT DATE_FORMAT('2013-06-16' - INTERVAL 5 DAY,'%W') as theday UNION
SELECT DATE_FORMAT('2013-06-16' - INTERVAL 6 DAY,'%W') as theday
) as weekt
LEFT JOIN (
SELECT
DATE_FORMAT(date,'%W') AS daily,
(COUNT(*)/(SELECT COUNT(*) FROM logs))/100 as percentage
FROM
logs
WHERE
date >= '2013-06-10'
AND date <= '2013-06-16'
GROUP BY
daily
) as logdata
ON logdata.daily = weekt.theday
SQL has no way to "create" an hour out of nothing. So the simple trick is to have a table numbers (number int) with the numbers you need (may be 1- 31 to be ready for month, or 1-366 for year). That table you can left join with your data in the kind of
select n.number as hour, count(*) as cnt
from numbers as n
left join logtable as l
on hour(l.date) = n.number
group by n.number
You could "simulate" it without a table, but there are several occasions where that table is helpful.
Related
Table1
id
hour
date
tableValue1
tableValue2
1
3
2020-05-29
123
145
2
2
2020-05-29
1500
3400
Table2:
id
hour
date
tableValue3
tableValue4
1
1
2020-05-29
4545
3697
2
3
2020-05-29
5698
2896
Table3:
id
hour
date
tableValue5
tableValue6
1
2
2020-05-29
7841
5879
2
1
2020-05-29
1485
3987
I want to select multiple columns from different tables with one query.
Expected Output:
hour
tableValue1
tableValue3
tableValue5
1
0
4545
1485
2
1500
0
7841
3
123
5698
0
I've tried this query without success:
SELECT hour , tableValue1 WHERE date = "2020-05-29" AND hour BETWEEN 0 AND 10 FROM table1
UNION ALL
SELECT hour , tableValue3 WHERE date = "2020-05-29" AND hour BETWEEN 0 AND 10 FROM table2
UNION ALL
SELECT hour , tableValue5 WHERE date = "2020-05-29" AND hour BETWEEN 10 AND 10 FROM table3
I'm getting instead the following:
hour
tableValue1
3
123
2
1500
1
4545
3
5698
2
5879
1
3987
The columns tables have in common are hour and date, do I need to redesign database structure to link the tables, so that I can use JOIN command, but how? Or is there a sql command to select multiple column from multiple tables?
There are a couple of issues in your code:
your WHERE clause should be found after the FROM clause in your subqueries
you want different columns, but you associate only one column for each of your table: if you want three columns, each of your subqueries should return three columns
your rows are not ordered because you're missing an ORDER BY clause at the end of your code.
your rows are not aggregated to remove the zeroes in excess: in that case it is sufficient to apply a MAX aggregation function for each relevant field, partitioning on the "hour" field
WITH cte AS (
SELECT hour,
tableValue1,
0 AS tableValue3,
0 AS tableValue5
FROM table1
WHERE date = "2020-05-29" AND hour BETWEEN 0 AND 10
UNION ALL
SELECT hour,
0 AS tableValue1,
tableValue3,
0 AS tableValue5
FROM table2
WHERE date = "2020-05-29" AND hour BETWEEN 0 AND 10
UNION ALL
SELECT hour,
0 AS tableValue1,
0 AS tableValue3,
tableValue5
FROM table3
WHERE date = "2020-05-29" AND hour BETWEEN 0 AND 10
ORDER BY hour
)
SELECT hour,
MAX(tableValue1) AS tableValue1,
MAX(tableValue3) AS tableValue3,
MAX(tableValue5) AS tableValue5
FROM cte
GROUP BY hour
Check the demo here.
You must introduce empty columns in first query
SELECT hour , tableValue1, 0 tableValue3, 0 tableValue5 FROM table1 WHERE date = "2020-05-29" AND hour BETWEEN 0 AND 10
UNION ALL
SELECT hour , 0, tableValue3, 0 FROM table2 WHERE date = "2020-05-29" AND hour BETWEEN 0 AND 10
UNION ALL
SELECT hour , 0,0 tableValue5 FROM table3 WHERE date = "2020-05-29" AND hour BETWEEN 10 AND 10
This is my table, i want to find concurrent user per hour for a given week
I am trying to calculate number of concurrent users in a time range. The input looks something like the below
Table
id user_id login_time
1 23 2016-06-08 09:10:00
2 24 2016-06-08 08:55:00
3 25 2016-06-08 09:29:00
4 26 2016-06-08 09:40:00
5 27 2016-06-08 09:08:00
6 28 2016-06-09 13:40:00
7 31 2016-06-09 14:04:00
How to get the concurrent users in time range ?
Expected Output Table
Date
Hour
User
2014-08-04
0
3
2014-08-04
1
2
2014-08-04
2
0
2014-08-05
0
1
Similar question
concurrent users sql
I created a DBFIDDLE
first I entered the data from your question
half-way I changed data to what was given here: http://sqlfiddle.com/#!9/67356f/2
first the cte1 contains the first and last date from users.
cte2 contains all the dates between StartDate and EndDate
cte3 contains all (24) hours for the dates.
After this is is just counting to see if a user is logged in.
WITH RECURSIVE cte1 AS (
SELECT
DATE(MIN(login_time)) StartDate,
DATE(MAX(login_time)) EndDate
FROm users),
cte2 AS (
SELECT cte1.StartDate
from cte1
union all
select DATE_ADD(cte2.StartDate, INTERVAL 1 DAY)
from cte2
cross join cte1 where cte2.StartDate < cte1.EndDate
),
cte3 AS (
SELECT StartDate, 0 as H
FROM cte2
UNION ALL
SELECT StartDate, H+1 FROM cte3 WHERE H<24
)
select * from (
select
StartDate as `Date`,
H as `hour`,
(SELECT count(*) from users
WHERE login_time BETWEEN DATE_ADD(StartDate, interval H HOUR) AND DATE_ADD(StartDate, interval (H+1) HOUR)
) as `Count`
from cte3) x
where x.`Count` <>0
order by 1,2;
You can begin with this, but (from my opinion) it has no sense the result you are trying to get because you need to calculate the time:
If a user enters 9:30 and left 9:35 and re-enter 9:45 is not a concurrent user but you get this in the SQL.
If a user enters 9:59 and enter 10:01 you have a concurrent user but you won't see this with this logic of "hour"
Concurrent user with different day (23:59 and 00:01 logins)
In any case, the SQL you are asking for:
SQL Fiddle
SELECT
up.user_id,
up.diff as TimeDiff,
FROM
(
SELECT TIMESTAMPDIFF(HOUR,u1.login,u2.login) as diff, u1.user_id FROM users u1
JOIN users u2
ON u1.user_id = u2.user_id
AND u1.login < u2.login ) up
WHERE up.diff < 1
And without DIFF time (as you requested):
SELECT
g.id,
g.hour,
g.datelogin,
COUNT(*) as times
FROM
(SELECT HOUR(login) as hour, DATE(login) as datelogin, id FROM users) g
GROUP BY datelogin, hour, id
HAVING COUNT(*) > 1 -- This will show only counts is bigger than 1
Changing the question because of a misunderstanding in use case.
Amazon Redshift Query for the following problem statement.
The data structure:
id - primary key
acc_id - id unique to a loan account (this id will be same for all
emi's for a particular loan account, this maybe repeated 6 times or
12 times based on loan tenure which can be 6 months or 12 months
respectively)
status - PAID or UNPAID (the emi's unpaid are followed my unpaid
emi's only)
s_id - just a scheduling id which would be consecutive numbers for a
a particular loan id
due_date - the due date for that particular emi
principal - amount that is due
The table:
id acc_id status s_id due_date principal
9999957 10003 PAID 102 2018-07-02 12:00:00 4205
9999958 10003 UNPAID 103 2018-08-02 12:00:00 4100
9999959 10003 UNPAID 104 2018-09-02 12:00:00 4266
9999960 10003 UNPAID 105 2018-10-02 12:00:00 4286
9999962 10004 PAID 106 2018-07-02 12:00:00 3200
9999963 10004 PAID 107 2018-08-02 12:00:00 3100
9999964 10004 UNPAID 108 2018-09-02 12:00:00 3266
9999965 10004 UNPAID 109 2018-10-02 12:00:00 3286
The use case -
The unpaid amount becomes delinquent (overdue) after the due_date.
So I need to calculate delinquent amount at the end of every month from the first due_date in this case is 2nd July to last due_date (assume it to be 2nd November which is the current month)
I also need to calculate days past due at the end of that month.
Illustration from the above data:
From the sample data provided, no EMI is due at the end of July so amount delinquent is 0
But at the end of August - the id 9999958 is due - as of 31st August
the amount delinquent is 4100 and days past due is 29 (31st August minus 2nd August)
The catch: I need to calculate this for the loan (acc_id) and not the emi.
To further explain, A first EMI will be 29 days due on 1st month and 59 days due on second month, also second EMI will be 29 days due on second month. But I need this at loan level (acc_id).
The same example continued for 30th september, the acc_id 10003 is due since 2nd August so as of 30th September the due amount is 8366 (4100 + 4266) and DPD (days_past_due) is 59 (29 + 30).
Also acc_id 10004 is due 3100 and DPD is 28 (30th september - 2nd september).
The final output would be something like this:
Month_End DPD_Band Amount
2018/08/31 0-29 4100
2018/08/31 30-59 0
2018/08/31 60-89 0
2018/08/31 90+ 0
2018/09/30 0-29 3100
2018/09/30 30-59 8366
2018/09/30 60-89 0
2018/09/30 90+ 0
Query attempt: DPD bands can be created based on case statements on delinquent days. I need real help in first creating End-of-months and then finding the portfolio level amounts as explained above for different delinquent days.
Edited to be RedShift compatible after the op clarified which RDBMS. (MySQL would need a different answer)
The following creates one record for each month between your first record, and the end of last month.
It then joins on to your unpaid records, and the aggregation chooses which bracket to put the results in to.
WITH
first_month AS
(
SELECT LAST_DAY(MIN(due_date)) AS end_date FROM yourTable
),
months AS
(
SELECT
LAST_DAY(ADD_MONTHS(first_month.end_date, s.id)) AS end_date
FROM
first_month
CROSS JOIN
generate_series(
1,
DATEDIFF(month, (SELECT end_date FROM first_month), CURRENT_DATE)
)
AS s(id)
),
monthly_delinquents AS
(
SELECT
yourTable.*,
months.end_date AS month_end_date,
DATEDIFF(DAY, yourTable.due_date, months.end_date) AS days_past_due
FROM
months
LEFT JOIN
yourTable
ON yourTable.status = 'UNPAID'
AND yourTable.due_date < months.end_date
)
SELECT
month_end_date,
SUM(CASE WHEN days_past_due >= 00 AND days_past_due < 30 THEN principal ELSE 0 END) AS dpd_00_29,
SUM(CASE WHEN days_past_due >= 30 AND days_past_due < 60 THEN principal ELSE 0 END) AS dpd_30_59,
SUM(CASE WHEN days_past_due >= 60 AND days_past_due < 90 THEN principal ELSE 0 END) AS dpd_60_89,
SUM(CASE WHEN days_past_due >= 90 THEN principal ELSE 0 END) AS dpd_90plus
FROM
monthly_delinquents
GROUP BY
month_end_date
ORDER BY
month_end_date
That said, normally the idea of pivoting things like this is a bad idea. What happens when something is a year past due? It just sits in the 90plus category and never moves. And, if you want to expand it you need to change the query and any other query you ever write that depends on it.
Instead, you could normalise your output...
WITH
first_month AS
(
SELECT LAST_DAY(MIN(due_date)) AS end_date FROM yourTable
),
months AS
(
SELECT
LAST_DAY(ADD_MONTHS(first_month.end_date, s.id)) AS end_date
FROM
first_month
CROSS JOIN
generate_series(
1,
DATEDIFF(month, (SELECT end_date FROM first_month), CURRENT_DATE)
)
AS s(id)
),
monthly_delinquents AS
(
SELECT
yourTable.*,
months.end_date AS month_end_date,
DATEDIFF(DAY, yourTable.due_date, months.end_date) AS days_past_due
FROM
months
LEFT JOIN
yourTable
ON yourTable.status = 'UNPAID'
AND yourTable.due_date < months.end_date
)
SELECT
month_end_date,
(days_past_due / 30) * 30 AS days_past_due_band,
SUM(principal) AS total_principal,
COUNT(*) AS total_rows
FROM
monthly_delinquents
GROUP BY
month_end_date,
(days_past_due / 30) * 30
ORDER BY
month_end_date,
(days_past_due / 30) * 30
In the Data shown,we need to do a continuous pattern check of Leaves,
for eg:
CASE
WHEN count("Leaves") BETWEEN 1 AND 2 THEN '1-2'
WHEN count("Leaves") BETWEEN 3 AND 5 THEN '3-5'
WHEN count("Leaves") >5 THEN '>5'
ELSE 'Above 5' END AS "Leave Occurence",
On Jan 1st and 2nd employee has taken 2 leaves togather which belongs to
'1-2'Bucket,
similarly 8,9,10,11th JAN it is contineous for 4 Days hence in '3-4'Bucket
and contineously more than 5 Leaves belongs to [<5]Bucket
Now we need the count of each Bucket for Month wise.
Here 1-2Bucket is 2
3-4Bucket is 1
<5 Bucket is also 1
We used this code ,but it gives the SUM,But not checking the contineous pattern
Year Month Leaves
2011 1-Jan 1
2-Jan 1
3-Jan 0
4-Jan 0
5-Jan 0
6-Jan 0
7-Jan 0
8-Jan 1
9-Jan 1
10-Jan 1
11-Jan 1
12-Jan 0
13-Jan 0
14-Jan 0
15-Jan 1
16-Jan 1
17-Jan 1
18-Jan 1
19-Jan 1
20-Jan 1
21-Jan 0
22-Jan 0
23-Jan 1
24-Jan 1
You can identify each group of leaves by counting the number of non-leaves before it. Then you have aggregation:
select min(date), max(date), count(*) as numdays
from (select t.*,
(select count(*)
from t t2
where t2.date <= t.date and t2.leave = 0
) as grp
from t
where t.leave = 1
) t
group by grp;
You can then format the results however you like. This gives you one row per continuous "leave" period.
With respect to the sample table below, and keeping in mind the following definitions,
start, end and timestamp are all unix timestamps
Definition: duration = ((end - start)/3600), that is, in hours
I would like to get the following mysql query:
Group by student and calculate all money spent by each student - that is, (duration x cost)
This is what I got and it works, but is incomplete!
SELECT student, SUM(ceil(cost*(end-start)/3600)) AS expenses
FROM schedules GROUP BY student;
AND (this one does not work, but the idea is actually what I want to attain)
SELECT student, SUM(SELECT ceil(cost*(end-start)/3600) FROM schedules WHERE paid = 1) AS expenses, SUM(SELECT ceil(cost*(end-start)/3600) FROM schedules WHERE paid = 0) AS debts FROM schedules GROUP BY student;
My BIGGEST problem is with calculating expenses from today into the past as well as debts if the date of today is greater than start and paid is still set to 0
Thank you all for your ideas!
Sample Table
id meta_id start end admin student tutor course cost paid paydate timestamp
18 4 1359867600 1359867690 jnc banjune cameron 2 90 1 1361521193 1359881165
19 4 1360472400 1360472490 jnc banjune cameron 2 90 1 1361521195 1359881165
20 4 1359867600 1359867690 jnc saadcore cameron 2 90 1 1361547064 1359881165
25 6 1359914400 1359919800 jnc johndoe cameron 3 35 1 1361547080 1359893058
26 6 1360000800 1360006200 jnc johndoe cameron 3 35 0 0 1359893058
27 6 1360087200 1360092600 jnc johndoe cameron 3 35 0 0 1359893058
I got the desired solution
SELECT
student,
SUM(CASE WHEN paid = 1 AND FROM_UNIXTIME(start) <= now() THEN ceil(cost*(end-start)/3600)
ELSE 0 END) as expenses,
SUM(CASE WHEN paid = 0 AND FROM_UNIXTIME(start) <= now() THEN ceil(cost*(end-start)/3600)
ELSE 0 END) as debts
FROM schedules
GROUP BY student;