Count timestamps with offset between two databases in a view - mysql

I am trying to count time stamps between two databases but one has overlapping time stamps, due to not my design flaw.
SELECT date(time + INTERVAL 8 HOUR) as day, COUNT(DISTINCT comment)
FROM news.data
GROUP BY day
UNION ALL
SELECT date(time + INTERVAL 8 HOUR) as day, COUNT(DISTINCT comment)
FROM`news-backup`.`data`
GROUP BY day
ORDER BY year(day) desc, day(day) DESC
LIMIT 20
What seems to happen, there are some timestamps in range of both databases so they produce separate counts for certain dates. So it would give count for TODAY from news and news-backup
EX:
date count
2013-1-15 10
2013-1-15 13
2013-1-14 8
2013-1-13 15
What I want is
EX:
date count
2013-1-15 23
2013-1-14 8
2013-1-13 15
Here is a kicker, I need it in a view, so there are some limitations with that (no subqueries allowed). Thoughts? And no I cannot change the data dump sequence that happens between to DBs

You can't put a subquery in a view, but you can put a view in a view.
So:
create view1 as
SELECT date(time + INTERVAL 8 HOUR) as day, 'current' as which, COUNT(DISTINCT comment) as cnt
FROM news.data
GROUP BY day
UNION ALL
SELECT date(time + INTERVAL 8 HOUR) as day, 'backup' as which, COUNT(DISTINCT comment) as cnt
FROM`news-backup`.`data`
GROUP BY day, which
I'm not sure what you logic for combining them is:
create view2 as
select day, max(cnt) -- sum(cnt)? prefer current or backup?
from view1
group by day
ORDER BY day desc
The documentation that bans subqueries is here. Be sure to search for "The SELECT statement cannot contain".
If you have a table of all the dates, the following "absurd" SQL might work:
select c.date,
coalesce( (select count(distinct comment) from news.data where date(time + INTERVAL 8 HOUR) = c.date),
(select count(distinct comment) from news_backup.data where date(time + INTERVAL 8 HOUR) = c.date)
) as NumComments
from calendar c
This version is assuming you want the "new" first, then the backup. If you want the sum, then you would add them.

Related

calculating variance with mysql week over week

I need to display week over week difference with mysql in Week Over Week Users column. My data looks like the following:
Date Users Week Over Week Users
06-01-2019 10 10
06-08-2019 15 15
06-15-2019 5 5
Currently, Week Over Week Users only reflects the data that I have in Users column. The desired output would be:
Date Users Week Over Week Users
06-01-2019 10 10
06-08-2019 15 5
06-15-2019 5 -10
Basically if on the second week the number of users grew up to 15 users, then I need to display 5 (as in +5 users since last week, so new week Users - last week Users would be the formula)
Here is my code:
(
SUM(
CASE
WHEN WEEK(`Date`) = WEEK(CURRENT_DATE()) THEN `Users`
ELSE 0
END
) - SUM(
CASE
WHEN WEEK(`Date`) = WEEK(CURRENT_DATE()) - 1 THEN `Users`
ELSE 0
END
)
)
But it doesn't work as it duplicates the Users column.
You want lag():
select t.*,
(users - lag(users, 1, 0) over (order by date)) as week_over_week
from t;
If you are running MySQL 5.x, where window functions such as lag() are not available, you can use a correlated subquery to get the "previous" value:
select
t.date,
t.users,
t.users - coalesce(
(
select t1.users
from mytable t1
where t1.date < t.date
order by t1.date desc
limit 1
),
0
) week_over_week_users
from mytable t

Aggregating table data in MySQL, is there an easier way to do this?

I'm trying to write a query that aggregates data from a table.
Essentially I have a long list of devices that have been inventoried and eventually installed over the last couple of years.
I want to find the average amount of time between when the device was received and when it was installed, and then have that data sorted by the month the device was installed. BUT in each month's row, I also want to include the data from the previous months.
So essentially what I want to see is: (sorry for terrible formatting)
MonthInstalled | TimeToInstall | Total#Devices
-----------------+---------------+----------------------------
Jan | 10 Days | 5
Feb(=Jan+Feb) | 15 Days | 18 (5 in Jan + 13 in Feb)
Mar(=Jan+Feb+Mar)| 13 Days | 25 (5 + 13 + 7)
...
The query I currently have written looks like this:
INSERT INTO DevicesInstall
SELECT ROUND(AVG(DATEDIFF(dvc.dt_install , dvc.dt_receive)), 1) AS 'Install',
COUNT(dvc.dvc_model) AS 'Total Devices',
MAX(dvc.dt_install) AS 'Date',
loc.loc_campus AS 'Campus'
FROM dvc_info dvc, location loc
WHERE dvc.dvc_loc_bin = loc.loc_bin
AND dvc.dt_install < '20160201'
;
Although this is functional, I have to iterate this for each month manually, so it is not scale-able. Is there a way to condense this at all?
We can return the dates using an inline view (derived table), and then join to the dvc_info table, so we can get the "cumulative" results.
To get the results for:
Jan
Jan+Feb
Jan+Feb+Mar
We need to return three copies of the rows for Jan, and two copies of the rows for Feb, and then collapse the those rows into an appropriate group.
The loc_campus is being included in the SELECT list... not clear why that is needed. If we want results "by campus", then we need to include that expression in the GROUP BY clause. Otherwise, the value returned for that non-aggregate is indeterminate... we will get a value for some row "in the group", but it could be any row.
Something like this:
SELECT d.dt AS `before_date`
, loc.loc_campus AS `Campus`
, ROUND(AVG(DATEDIFF(dvc.dt_install,dvc.dt_receive)),1) AS `Install`
, COUNT(dvc.dvc_model) AS `Total Devices`
, MAX(dvc.dt_install) AS `latest_dt_install`
FROM ( SELECT '2016-01-01' + INTERVAL 1 MONTH AS dt
UNION ALL SELECT '2016-01-01' + INTERVAL 2 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 3 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 4 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 5 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 6 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 7 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 8 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 9 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 10 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 11 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 12 MONTH
) d
CROSS
JOIN location loc
LEFT
JOIN dvc_info dvc
ON dvc.dvc_loc_bin = loc.loc_bin
AND dvc.dt_install < d.dt
GROUP
BY d.dt
, loc.loc_campus
ORDER
BY d.dt
, loc.loc_campus
Note that the value returned for d.dt will be the "up until" date. We're going to get '2016-02-01' returned for the January results. If we want to return a value of January date, we can use an expression in the SELECT list...
SELECT DATE_FORMAT(d.dt + INTERVAL -1 MONTH,'%Y-%m') AS `month`
Lots of options on query alternatives.
But it looks like the "big hump" is that to get cumulative results, we need to return multiple copies of the dvc_info rows, so the rows can be collapsed into each "grouping".
I recommend working on just the SELECT first. And get that tested working, before monkeying around to turn it into an INSERT ... SELECT.
FOLLOWUP
We can use any query as an inline view (derived table d) that returns a set of dates we want.
e.g.
FROM ( SELECT DATE_FORMAT(m.install_dt,'%Y-%m-01') + INTERVAL 1 MONTH AS dt
FROM dvc_install m
WHERE m.install_dt >= '2016-01-01'
GROUP BY DATE_FORMAT(m.install_dt,'%Y-%m-01') + INTERVAL 1 MONTH
) d
Note that with this approach, if there are no install_dt in February, we won't get back a row for February. Using the static UNION ALL SELECT approach allows us to get back "zero" counts, i.e. to return rows for months where there isn't an install_dt in that month. (But that's the answer to a different question... how do I get back a "zero" count for February when there aren't any rows for Februrary?)
Alternatively, if we have a calendar table e.g. cal that contains a list of the dates we want, we could just reference the table in place of the inline view, or the inline view query could get rows from that.
FROM ( SELECT cal.dt
FROM cal cal
WHERE cal.dt >= '2016-01-01'
AND cal.dt <= NOW()
AND DATE_FORMAT(cal.dt,'%d') = '01'
) d

Find number of "active" rows each month for multiple months in one query

I have a mySQL database with each row containing an activate and a deactivate date. This refers to the period of time when the object the row represents was active.
activate deactivate id
2015-03-01 2015-05-10 1
2013-02-04 2014-08-23 2
I want to find the number of rows that were active at any time during each month. Ex.
Jan: 4
Feb: 2
Mar: 1
etc...
I figured out how to do this for a single month, but I'm struggling with how to do it for all 12 months in a year in a single query. The reason I would like it in a single query is for performance, as information is used immediately and caching wouldn't make sense in this scenario. Here's the code I have for a month at a time. It checks if the activate date comes before the end of the month in question and that the deactivate date was not before the beginning of the period in question.
SELECT * from tblName WHERE activate <= DATE_SUB(NOW(), INTERVAL 1 MONTH)
AND deactivate >= DATE_SUB(NOW(), INTERVAL 2 MONTH)
If anybody has any idea how to change this and do grouping such that I can do this for an indefinite number of months I'd appreciate it. I'm at a loss as to how to group.
If you have a table of months that you care about, you can do:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from months m;
If you don't have such a table handy, you can create one on the fly:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from (select date('2015-01-01') as month_start, date('2015-01-31') as month_end union all
select date('2015-02-01') as month_start, date('2015-02-28') as month_end union all
select date('2015-03-01') as month_start, date('2015-03-31') as month_end union all
select date('2015-04-01') as month_start, date('2015-04-30') as month_end
) m;
EDIT:
A potentially faster way is to calculate a cumulative sum of activations and deactivations and then take the maximum per month:
select year(date), month(date), max(cumes)
from (select d, (#s := #s + inc) as cumes
from (select activate_date as d, 1 as inc from table t union all
select deactivate_date, -1 as inc from table t
) t cross join
(select #s := 0) param
order by d
) s
group by year(date), month(date);

Return a zero for a day with no results

I have a query which returns the total of users who registered for each day. Problem is if a day had no one register it doesn't return any value, it just skips it. I would rather it returned zero
this is my query so far
SELECT count(*) total FROM users WHERE created_at < NOW() AND created_at >
DATE_SUB(NOW(), INTERVAL 7 DAY) AND owner_id = ? GROUP BY DAY(created_at)
ORDER BY created_at DESC
Edit
i grouped the data so i would get a count for each day- As for the date range, i wanted the total users registered for the previous seven days
A variation on the theme "build your on 7 day calendar inline":
SELECT D, count(created_at) AS total FROM
(SELECT DATE_SUB(NOW(), INTERVAL D DAY) AS D
FROM
(SELECT 0 as D
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
) AS D
) AS D
LEFT JOIN users ON date(created_at) = date(D)
WHERE owner_id = ? or owner_id is null
GROUP BY D
ORDER BY D DESC
I don't have your table structure at hand, so that would need adjustment probably. In the same order of idea, you will see I use NOW() as a reference date. But that's easily adjustable. Anyway that's the spirit...
See for a live demo http://sqlfiddle.com/#!2/ab5cf/11
If you had a table that held all of your days you could do a left join from there to your users table.
SELECT SUM(CASE WHEN U.Id IS NOT NULL THEN 1 ELSE 0 END)
FROM DimDate D
LEFT JOIN Users U ON CONVERT(DATE,U.Created_at) = D.DateValue
WHERE YourCriteria
GROUP BY YourGroupBy
The tricky bit is that you group by the date field in your data, which might have 'holes' in it, and thus miss records for that date.
A way to solve it is by filling a table with all dates for the past 10 and next 100 years or so, and to (outer)join that to your data. Then you will have one record for each day (or week or whatever) for sure.
I had to do this only for MS SqlServer, so how to fill a date table (or perhaps you can do it dynamically) is for someone else to answer.
A bit long winded, but I think this will work...
SELECT count(users.created_at) total FROM
(SELECT DATE_SUB(CURDATE(),INTERVAL 6 DAY) as cdate UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 5 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 4 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 3 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 2 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 1 DAY) UNION ALL
SELECT CURDATE()) t1 left join users
ON date(created_at)=t1.cdate
WHERE owner_id = ? or owner_id is null
GROUP BY t1.cdate
ORDER BY t1.cdate DESC
It differs from your query slightly in that it works on dates rather than date times which your query is doing. From your description I have assumed you mean to use whole days and therefore have used dates.

Average posts per hour on MySQL?

I have a number of posts saved into a InnoDB table on MySQL. The table has the columns "id", "date", "user", "content". I wanted to make some statistic graphs, so I ended up using the following query to get the amount of posts per hour of yesterday:
SELECT HOUR(FROM_UNIXTIME(`date`)) AS `hour`, COUNT(date) from fb_posts
WHERE DATE(FROM_UNIXTIME(`date`)) = CURDATE() - INTERVAL 1 DAY GROUP BY hour
This outputs the following data:
I can edit this query to get any day I want. But what I want now is the AVERAGE of each hour of every day, so that if on Day 1 at 00 hours I have 20 posts and on Day 2 at 00 hours I have 40, I want the output to be "30". I'd like to be able to pick date periods as well if it's possible.
Thanks in advance!
You can use a sub-query to group the data by day/hour, then take the average by hour across the sub-query.
Here's an example to give you the average count by hour for the past 7 days:
select the_hour,avg(the_count)
from
(
select date(from_unixtime(`date`)) as the_day,
hour(from_unixtime(`date`)) as the_hour,
count(*) as the_count
from fb_posts
where `date` >= unix_timestamp(current_date() - interval 7 day)
and created_on < unix_timestamp(current_date())
group by the_day,the_hour
) s
group by the_hour
Aggregate the information by date and hour, and then take the average by hour:
select hour, avg(numposts)
from (SELECT date(`date`) as day, HOUR(FROM_UNIXTIME(`date`)) AS `hour`,
count(*) as numposts
from fb_posts
WHERE DATE(FROM_UNIXTIME(`date`)) between <date1> and <date2>
GROUP BY date(`date`), hour
) d
group by hour
order by 1
By the way, I prefer including the explicit order by, since most databases do not order the results of a group by. Mysql happens to be one database that does.
SELECT
HOUR(FROM_UNIXTIME(`date`)) AS `hour`
, COUNT(`id`) \ COUNT(DISTINCT TO_DAYS(`date`)) AS avgHourlyPostCount
FROM fb_posts
WHERE `date` > '2012-01-01' -- your optional date criteria
GROUP BY hour
This gives you a count of all the posts, divided by the number of days, by hour.