I have statistic table for internet radio (MySQL), there are such columns:
ip_address
time_start (datetime of listening start)
time_end (datetime of listening finish)
I need to select the listeners peak for each day, I mean maximum number of simultaneous unique ip listeners.
And it would be great also to have start and finish time for that peak.
For example:
2011-30-01 | 4 listeners peak | from 10:30 | till 11:25
IMHO it's simpler to load these 35'000 rows in memory, enumerate them, and maintain a count of the concurrent listener at a given moment.
This would be simpler if you load the row in the following format:
IP, Time, flag_That_Indicate_StartOrStop_Listening_For_This_Given_IP
so you'll be able to load the data ordered by time, and the you should simply enumerate all rows maintaining a list of listening IP.
Anyway, how do you consider multiple connections from the same IP?
There can be 10 different listeners behind a NAT using the same IP address.
Update:
You don't really need to change the DB structure, it's enough use a different SQL to load the data
SELECT ip_address, Time_Start AS MyTime, 1 As StartStop
FROM MyTable
ORDER BY Time_Start
UNION ALL
SELECT ip_address, Time_Stop AS MyTime, 0 As StartStop
FROM MyTable
Using this SQL you should be able to load all the data, and then enumerate all the rows.
It's important that the rows are sorted correctly.
if StartStop = 1 it's somone that start listening --> Add it's IP to the list of listeners, and increment the listeners count by 1
if StartStop = 0 it's someone that stop listening --> remove it's IP from the list of listeners, and decrement the listeners count by 1
and in the enumeration loop check when you reach the maximum number of concurrent listeners
Let go to find for an algorithm to get results with best performance.
Spliting time: Time is a continuous dimension, we need some points to mark as checkpoint where do a listener recount. How to find intervals or when check for total radio listener. I thing that the best strategy is to get different time_start and time_end.
This is my approach to split time. I create a view to simplify post:
create view time_split as
select p_time from (
Select
time_start
from
your_table
union
Select
time_end
from
your_table
) as T
I suggest to you 2 database index:
your_table( time_start, time_end) <--(1) explained below
your_table( time_end)
to avoid tablescan.
Count listeners peak: Join previous table with your table to do a recount of peak at each time checkpoint:
This is my approach for count listeners by check point time:
create view peak_by_time as
select p_time, count(*) as peak
from
your_table t
inner join
time_split
on time_split.p_time between t.time_start and t.time_end
group by
p_time
order by
p_time, peak
Remember to make a database index on your_table( time_start, time_end) <--(1) Here
Looking for max peak: Unfortunately MySQL don't has analytic functions, then over partition is not available and is not a way to take max peak over a day in previous view. Then you should get max peak of previous views. This is a performance killer operation. I suggest to you make this operation and next on in application logic and not in data base.
This is my approach for get max_peak by day (performance killer):
create view max_peak_by_day as
select
cast(p_time as date) as p_day ,
max(peak) as max_peak
from peak_by_time
group by cast(p_time as date)
Looking for slot times: at this moment you have max_peak for each day, now you need to look for continuous check times with same max_peak. Also MySQL don't has statistical functions neither CTE. I suggest to you that this code will be wrote on app layer. But, if you want to do this in database solution this is a way (warning performance killer):
First, extend peak_by_time view to get previous peak for p_time and for previous p_time:
create view time_split_extended as
select c.p_time, max( p.p_time) as previous_ptime
from
time_split c
inner join
time_split p
on p.p_time < c.p_time
group by c.p_time
create view peak_by_time_and_previous as
select
te.p_time,
te.previous_ptime,
pc.peak as peak,
pp.peak as previous_peak
from
time_split_extended te
inner join
peak_by_time pc on te.p_time = pc.p_time
inner join
peak_by_time pp on te.previous_ptime = pp.p_time
Now check that previous slot and current one have a max_peak:
select
cast(p_time as date) as p_day,
min( p_time ) as slot_from,
max( p_time) as slot_to,
peak
from
peak_by_time_and_previous p
inner join
max_peak_by_day m
on cast(p.p_time as date) = m.p_day and
p.peak = m.max_peak
where
p.peak = p.previous_peak
group by cast(p_time as date)
Disclaimer:
This is not tested. Sure that they are mistakes with table aliases or columns.
The last steps are performance killers. Perhaps someone can suggest best approach for this steps.
Also, I suggest to you that create temporary tables and materialize each view of this answer. This will improve performance and also you can know how many time takes each step.
This is essentially an implementation of the answer given by Max above. For simplicity I'll represent each listening episode as a start time and length as integer values (they could be changed to actual datetimes, and then the queries would need to be modified to use date arithmetic.)
> select * from episodes;
+--------+------+
| start | len |
+--------+------+
| 50621 | 480 |
| 24145 | 546 |
| 93943 | 361 |
| 67668 | 622 |
| 64681 | 328 |
| 110786 | 411 |
...
The following query combines the start and end times with a UNION, flagging end times to distinguish from start times, and keeping a running accumulator of the number of listeners:
SET #idx=0;
SET #n=0;
SELECT (#idx := #idx + 1) as idx,
t,
(#n := #n + delta) as n
FROM
(SELECT start AS t,
1 AS delta
FROM episodes
UNION ALL
SELECT start + len AS t,
-1 AS delta FROM episodes
ORDER BY t) stage
+------+--------+------+
| idx | t | n |
+------+--------+------+
| 1 | 8 | 1 |
| 2 | 106 | 2 |
| 3 | 203 | 3 |
| 4 | 274 | 2 |
| 5 | 533 | 3 |
| 6 | 586 | 2 |
...
where 't' is the start of each interval (it's a new "interval" whenever the number of listeners, "n", changes). In a version where "t" is an actual datetime, you could easily group by day to obtain a peak episode for each day, or other such summaries. To get the end time of each interval - you could take the table above and join it to itself on right.idx = left.idx + 1 (i.e. join each row with the succeeding one).
SELECT
COUNT(*) AS listeners,
current.time_start, AS peak_start,
MIN(overlap.time_end) AS peak_end
FROM
yourTable AS current
INNER JOIN
yourTable AS overlap
ON overlap.time_start <= current.time_start
AND overlap.time_end > current.time_start
GROUP BY
current.time_start,
current.time_end
HAVING
MIN(overlap.time_end) < COALESCE((SELECT MIN(time_start) FROM yourTable WHERE timeStart > current.timeStart), current.time_end+1)
For each record, join on everything that overlaps.
The MIN() of the overlapping records' time_end is when the first current listener stops listening.
If that time is less than next occurance of a time_start, it's a peak. (Peak = start immediately followed by a stop)
Related
I am trying to create a candle using the table as below. It has a score and month and there can be as many as 4 scores in a month.
id | score | month
1 | 10 | 12
.. | .. | ..
And here is what I actually did,
select
score as open,
max(score) as high,
min(score) as low
from score_table
group by month
I am successful in getting Open, high and low.
My problem is getting the close, basically the fourth score of a month. I tried some solutions using joins unfortunately I am wrong and couldn't get it right which actually landed me in too many confusion. I'm not good at SQL and need help...
I see when you group by month the records just give you a high and a low with the same values
What I changed is to get the month and the high and low .
There should be separate columns for the high ,low and open in a list form to break the high lows up per time period (if you only working on one candle its fine but many candles over a time period there should be a row for each time period)
that data is quite hard to work with the way the table is set out but you can construct something like this to make it easier for your self
id | Month | Open | High | Low |
would be more ideal to work with that data but non the the least I changed the the MySQL query a bit to reflect data as per your description. I achieved it by combining 2 MySQL queries to get the open data from row 3
select x.open, y.high, y.low from ( select (score) as open
from score
where id = 3 )as x,
(select max(score) as high,
min(score) as low
from score ) as y;
I have very limited experience with MySQL past standard queries, but when it comes to joins and relations between multiple tables I have a bit of an issue.
I've been tasked with creating a job that will pull a few values from a mysql database every 15 minutes but the info it needs to display is pulled from multiple tables.
I have worked with it for a while to figure out the relationships between everything for the phone system and I have discovered how I need to pull everything out but I'm trying to find the right way to create the job to do the joins.
I'm thinking of creating a new table for the info I need, with columns named as:
Extension | Total Talk Time | Total Calls | Outbound Calls | Inbound Calls | Missed Calls
I know that I need to start with the extension ID from my 'user' table and match it with 'extensionID' in my 'callSession'. There may be multiple instances of each extensionID but each instance creates a new 'UniqueCallID'.
The 'UniqueCallID' field then matches to 'UniqueCallID' in my 'CallSum' table. At that point, I just need to be able to say "For each 'uniqueCallID' that is associated with the same 'extensionID', get the sum of all instances in each column or a count of those instances".
Here is an example of what I need it to do:
callSession Table
UniqueCallID | extensionID |
----------------------------
A 123
B 123
C 123
callSum table
UniqueCallID | Duration | Answered |
------------------------------------
A 10 1
B 5 1
C 15 0
newReport table
Extension | Total Talk Time | Total Calls | Missed Calls
--------------------------------------------------------
123 30 3 1
Hopefully that conveys my idea properly.
If I create a table to hold these values, I need to know how I would select, join and insert those things based on that diagram but I'm unable to construct the right query/statement.
You simply JOIN the two tables, and do a group by on the extensionID. Also, add formulas to summarize and gather the info.
SELECT
`extensionID` AS `Extension`,
SUM(`Duration`) AS `Total Talk Time`,
COUNT(DISTINCT `UniqueCallID`) as `Total Calls`,
SUM(IF(`Answered` = 1,0,1)) AS `Missed Calls`
FROM `callSession` a
JOIN `callSum` b
ON a.`UniqueCallID` = b.`UniqueCallID`
GROUP BY a.`extensionID`
ORDER BY a.`extensionID`
You can use a join and group by
select
a.extensionID
, sum(b.Duration) as Total_Talk_Time
, count(b.Answered) as Total_Calls
, count(b.Answered) -sum(b.Answered) as Missed_calls
from callSession as a
inner join callSum as b on a.UniqueCallID = b.UniqueCallID
group by a.extensionID
This should do the trick. What you are being asked to do is to aggregate the number of and duration of calls. Unless explicitly requested, you do not need to create a new table to do this. The right combination of JOINs and AGGREGATEs will get the information you need. This should be pretty straightforward... the only semi-interesting part is calculating the number of missed calls, which is accomplished here using a "CASE" statement as a conditional check on whether each call was answered or not.
Pardon my syntax... My experience is with SQL Server.
SELECT CS.Extension, SUM(CA.Duration) [Total Talk Time], COUNT(CS.UniqueCallID) [Total Calls], SUM(CASE CS.Answered WHEN '0' THEN SELECT 1 ELSE SELECT 0 END CASE) [Missed Calls]
FROM callSession CS
INNER JOIN callSum CA ON CA.UniqueCallID = CS.UniqueCallID
GROUP BY CS.Extension
In Access I have an MachinesList and ActionsList. A Machine can be set to Active and set to Inactive several times per year. Each change of status has its own ActionID and ActionDate.
In VBA I added some code to get the first and last date the Machine is Active. As this can happen more than once I now can create a list with start- and enddates each time the Machine is Active.
Two questions:
1) Can this be done with a query instead of VBA?
2) Is it possible to display these dates in some sort of timeline in Access?
This is what I have to create my list of dates:
SELECT DISTINCT Requests.RequestNumber, Requests.MachineID, Actions.Assignee, Actions.Action, Actions.TRDate FROM SelectedIDs LEFT JOIN (Requests LEFT JOIN Actions ON Requests.RequestNumber = Actions.RequestNumber) ON SelectedIDs.MachineID = Requests.MachineID ORDER BY Requests.MachineID, Actions.TRDate;
I do need the RequestNumber and the Assignee (in case of Activation) for further use. And since the RequestNumber for Activation and Deactivation differ I cannot use the MIN(date) and MAX(date) functionality because of the GROUP BY clause.
The list produced in VBA looks somewhat like this:
2325 ID1234 29-11-2016 16-3-2017
2323 ID1234 28-3-2017 27-4-2017
2203 ID9999 25-1-2017 27-2-2017
This list I want to see in some sort of timeline in Access.
Something like this:
ID | wk01 | wk02 | wk03 | etc
88 | N | N | Y | Y | Y | N
99 | N | Y | Y | N | N | Y
But any timeline is fine. Suggestions anyone?
Thanks, Karin
It is possible to do, yes. Your resulting recordset will not be tabular, it will be more like { Week, MachineID, Active } and it will be up to you to format it for reporting purposes. It may take a few subqueries, for example an obvious one is max of activation date before the week's end, and min de-activation date after that activation date. If that deactivation date is before the week's start the machine was not active. The actual SQL depends on how your Week table is defined. If you don't have a week table things get a bit more complicated as you also need a sub-query that replaces the week table.
Okay I'm still fairly new to MS Access, but have got some of the bases down. My next issue is pulling data from two different queries but still needing them to show.
Here's what I have
I have one query with the following information
| ID Number | Points |
The other query has the following
| ID Number | Points over 1000 |
In this new query I need to do display the following
| ID Number | Points | Points over 1000 | Total Points |
There's going to be some rows where Points over 1000 doesn't exist and needs to be empty or a 0, but I need the ID Number In Points over 1000 to match and check the ID Number in just the points column.
and in the end add them up in the Points total
I hope that makes sense?
Thanks again
In theory this Query should work the way you want it to.
SELECT
tmpQ.ID,
Sum(tmpQ.Points) As ActualPoints,
Sum(tmpQ.PointsOver1000) As Over1000,
[ActualPoints] + [Over1000] As TotalPoints
FROM
(
SELECT
qryA.[ID Number] As ID,
Sum(qryA.Points) As Points,
Sum(0) As PointsOver1000
FROM
qryA
GROUP BY
qryA.[ID Number]
UNION ALL
SELECT
qryB.[ID Number] As ID,
Sum(0) As Points,
Sum(qryB.PointsOver1000) As PointsOver1000
FROM
qryB
GROUP BY
qryB.[ID Number]
) As tmpQ
GROUP BY
tmpQ.ID;
Where qryA and qryB are the two queries you have that will give you the result of two different Points.
I have a SELECT query that returns some fields like this:
Date | Campaign_Name | Type | Count_People
Oct | Cats | 1 | 500
Oct | Cats | 2 | 50
Oct | Dogs | 1 | 80
Oct | Dogs | 2 | 50
The query uses aggregation and I only want to include results where when Type = 1 then ensure that the corresponding Count_People is greater than 99.
Using the example table, I'd like to have two rows returned: Cats. Where Dogs is type 1 it's excluded because it's below 100, in this case where Dogs = 2 should be excluded also.
Put another way, if type = 1 is less than 100 then remove all records of the corresponding campaign name.
I started out trying this:
HAVING CASE WHEN type = 1 THEN COUNT(DISTINCT Count_People) > 99 END
I used Teradata earlier int he year and remember working on a query that used an analytic function "Qualify PartitionBy". I suspect something along those lines is what I need? I need to base the exclusion on aggregation before the query is run?
How would I do this in MySQL? Am I making sense?
Now that I understand the question, I think your best bet will be a subquery to determine which date/campaign combinations of a type=1 have a count_people greater than 99.
SELECT
<table>.date,
<table>.campaign_name,
<table>.type,
count(distinct count_people) as count_people
FROM
(
SELECT
date,
campaign_name
FROM
<table>
WHERE type=1
HAVING count(distinct count_people) > 99
GROUP BY 1,2
) type1
LEFT OUTER JOIN <table> ON
type1.campaign_name = <table>.campaign_name AND
type1.date = <table>.date
WHERE <table>.type IN (1,2)
GROUP BY 1,2,3
The subquery here only returns campaign/date combinations when both the type=1 AND it has greater than 99 count_people. It uses a LEFT JOIN back to the to insure that only those campaign/date combinations make it into the result set.
The WHERE on the main query keeps the results to only Types 1 and 2, which you stated was already a filter in place (though not mentioned in the question, it was stated in a comment to a previous answer).
Based on your comments to answer by #JNevill I think you will have no option but to use subselects to pre-filter the record set you are dealing with, as working with HAVING is going to limit you only to the current record being evaluated - there is no way to compare against previous or subsequent records in the set in this manner.
So have a look at something like this:
SELECT
full_data.date AS date,
full_data.campaign_name AS campaign_name,
full_data.type AS type,
COUNT(full_data.people) AS people_count
FROM
(
SELECT
date,
campaign_name,
type,
COUNT(people) AS people_count
FROM table
WHERE type IN (1,2)
GROUP BY date, campaign_name, type
) AS full_data
LEFT JOIN
(
SELECT
date,
campaign_name,
COUNT(people) AS people_count
FROM table
WHERE type = 1
GROUP BY date, campaign_name
HAVING people_count < 100
) AS filter
ON
full_data.date = filter.date
AND full_data.campaign_name = filter.campaign_name
WHERE
filter.date IS NULL
AND filter.campaign_name IS NULL
The first subselect is basically your current query without any attempt at using HAVING to filter out results. The second subselect is used to find all date/campaign name combos which have people_count > 100 and use those as a filter for against the full data set.