Count records which fulfil given condition - mysql

I have a table with the following schema:
+-------------------------------------------------------+
| table_counter |
+----+---------------------+------------+---------------+
| id | timestamp | entry_type | country |
+----+---------------------+------------+---------------+
+----+---------------------+------------+---------------+
| 10 | 2017-05-01 12:00:00 | click | Germany |
+----+---------------------+------------+---------------+
| 11 | 2017-05-01 12:00:00 | view | Austria |
+----+---------------------+------------+---------------+
| 12 | 2017-05-01 12:00:00 | click | UK |
+----+---------------------+------------+---------------+
| 13 | 2017-05-01 12:00:00 | view | USA |
+----+---------------------+------------+---------------+
I need to return the following result: Select the sum of views and clicks of the top 5 countries by sum of views in the past 30 days.
I know how to count the records all right, but how do I define the constrains? How do I return all entries from five countries with the highest number of views?
Limiting the result to the last 30 days is trivial, but I'm pretty much stuck at the beginning.

Using order by and limit keywords,
SELECT SUM(IF(entry_type = "view", 1, 0)) as view_count FROM t3 GROUP BY country, entry_type ORDER BY view_count DESC LIMIT 5
--EDIT
As per the requirement stated in the comments, here's the updated query:
SELECT SUM(view_click_count) as all_total FROM (SELECT country, SUM(IF(entry_type = "view", 1, 0)) as view_count, SUM(IF(entry_type = "click", 1, 0)) as click_count, count(entry_type) as view_click_count FROM t3 GROUP BY country ORDER BY view_count DESC LIMIT 5) t2
all_total gives the total count as needed, for top 5 countries.

You can do it this way:
select
tc.country,
count(case entry_type when 'click' then 1 else null end) clicks,
count(case entry_type when 'view' then 1 else null end) views
from table_counter tc
inner join (
select top 5 country from [dbo].[table_counter]
where entry_type = 'view'
and timestamp >= DATEADD(DAY, -30, GETDATE())
group by country
order by count(entry_type) desc
) t on t.country = tc.country
where timestamp >= DATEADD(DAY, -30, GETDATE())
group by tc.country
order by views desc
This is for SQL Server. A few tweaks might be needed for MySQL (i.e. 'Limit' instead of 'TOP')

You can get top 5 countries by views with the following query, e.g.:
SELECT country, count(*) as 'views'
FROM table
WHERE timestamp BETWEEN DATE_SUB(NOW(), INTERVAL 1 MONTH) AND NOW()
AND entry_type = 'view'
GROUP BY country
ORDER BY count(*) DESC
LIMIT 5
Now, to select clicks, you can add another query in SELECT , e.g.:
SELECT t.country, COUNT(*) as 'views',
(SELECT COUNT(*)
FROM `table`
WHERE country = t.country
AND entry_type = 'click'
AND timestamp BETWEEN DATE_SUB(NOW(), INTERVAL 1 MONTH) AND NOW()
) as 'clicks'
FROM `table` t
WHERE t.timestamp BETWEEN DATE_SUB(NOW(), INTERVAL 1 MONTH) AND NOW()
AND t.entry_type = 'view'
GROUP BY t.country
ORDER BY count(*) DESC
LIMIT 5
Here's the SQL Fiddle.
Update
To get the SUM of views and clicks, wrap the above query into another SELECT, e.g.:
SELECT country, views + clicks
FROM(
SELECT t.country, COUNT(*) as 'views',
(SELECT COUNT(*)
FROM `table`
WHERE country = t.country
AND entry_type = 'click'
AND timestamp BETWEEN DATE_SUB(NOW(), INTERVAL 1 MONTH) AND NOW()
) as 'clicks'
FROM `table` t
WHERE t.timestamp BETWEEN DATE_SUB(NOW(), INTERVAL 1 MONTH) AND NOW()
AND t.entry_type = 'view'
GROUP BY t.country
ORDER BY count(*) DESC
LIMIT 5
) b;
Here's the updated SQL Fiddle.

Related

Query to subtract same column value at different interval of day with SQL database

In MySQL, I want to subtract one of column value at different interval of time based on another column 'timestamp'.
table structure is :
id | generator_id | timestamp | generated_value
1 | 1 | 2019-05-27 06:55:20 | 123456
2 | 1 | 2019-05-27 07:55:20 | 234566
3 | 1 | 2019-05-27 08:55:20 | 333456
..
..
20 | 1 | 2019-05-27 19:55:20 | 9876908
From above table I want to fetch the generated_value column value which should be difference of first timestamp fo day and timestamp of last value of day.
In above example I am looking query which should give me output as 9,753,452 (9876908 - 123456).
In general to fetch the single record of first value and last value of day I use below query
// Below will give me end day value
SELECT * FROM generator_meters where generator_id=1 and timestamp like '2019-05-27%' order by timestamp desc limit 1 ;
//this will give me last day value
SELECT * FROM generator_meters where generator_id=1 and timestamp like '2019-05-27%' order by timestamp limit 1 ;
Question is how should I get the final generated_value by doing minus of first value of day from last value of day.
Expected Output
generator_id | generated_value
1 | 9753452
Thanks in advance !!
In your example the value gets bigger and bigger. If this is guaranteed to be so, you can use
select max(generated_value) - min(generated_value) as result
from sun_electric.generator_meters
where generator_id = 1
and date(timestamp) = date '2019-05-27';
Or for multiple IDs:
select generator_id, max(generated_value) - min(generated_value) as result
from sun_electric.generator_meters
and date(timestamp) = date '2019-05-27'
group by generator_id
order by generator_id;
If the value is not ascending, then you can use the following query for ID 1:
select last_row.generated_value - first_row.generated_value as result
from
(
select *
from sun_electric.generator_meters
where generator_id = 1
and date(timestamp) = date '2019-05-27'
order by timestamp
limit 1
) first_row
cross join
(
select *
from sun_electric.generator_meters
where generator_id = 1
and date(timestamp) = date '2019-05-27'
order by timestamp desc
limit 1
) last_row;
Here is one way to get a result for multiple IDs:
select
minmax.generator_id,
(
select generated_value
from sun_electric.generator_meters gm
where gm.generator_id = minmax.generator_id
and gm.timestamp = minmax.max_ts
) -
(
select generated_value
from sun_electric.generator_meters gm
where gm.generator_id = minmax.generator_id
and gm.timestamp = minmax.min_ts
) as result
from
(
select generator_id, min(timestamp) as min_ts, max(timestamp) as max_ts
from sun_electric.generator_meters
where date(timestamp) = date '2019-05-27'
group by generator_id
) minmax
order by minmax.generator_id;
You can also move the subqueries to the from clause and join them, if you like this better. Yet another approach would be to use window functions, available as of MySQL 8.
This following script will return your expected results for the filtered ID and Date-
SELECT generator_id,CAST(timestamp AS DATE) ,
(
SELECT generated_value
FROM sun_electric.generator_meters B
WHERE timestamp = max(timestamp)
)
-
(
SELECT generated_value
FROM sun_electric.generator_meters B
WHERE timestamp = min(timestamp)
) AS Diff
FROM sun_electric.generator_meters
WHERE generator_id = 1
AND CAST(timestamp AS DATE) = '2019-05-27'
GROUP BY generator_id,CAST(timestamp AS DATE) ;
If you want the same result with GROUP BY ID and Date just remove the filter as below-
SELECT generator_id,CAST(timestamp AS DATE) ,
(
SELECT generated_value
FROM sun_electric.generator_meters B
WHERE timestamp = max(timestamp)
)
-
(
SELECT generated_value
FROM sun_electric.generator_meters B
WHERE timestamp = min(timestamp)
) AS Diff
FROM sun_electric.generator_meters
GROUP BY generator_id,CAST(timestamp AS DATE) ;

How can one calculate percentage from previous week in aggregate query?

I have a challenge I set out to do that seemed initially trivial. Not so for my developper brain.
Consider the following simple view, used to validate a cron that queries a subset of 200 000 statements every saturday.
It goes as follows:
mysql> SELECT
-> DATE_FORMAT(s.created, "%Y-%m-%d") as "Date",
-> count(s.id) AS "Accounts credited",
-> sum(s.withdrawal) "Total Credited"
-> -- 100 * (sum(s.withdrawal) - sum(prev.withdrawal))
-- / sum(prev.withdrawal) "Difference in %"
-> FROM statements s
-> -- LEFT JOIN prev
-> -- s.created - interval 7 DAY
-> -- ON prev.created = s.created - interval 7 DAY
-- AND (prev.status_id = 'OPEN'
-- OR prev.status_id = 'PENDING')
-> WHERE (s.status_id = 'OPEN' OR s.status_id = 'PENDING')
-> GROUP BY YEAR(s.created), MONTH(s.created), DAY(s.created)
-> ORDER BY s.created DESC
-> LIMIT 8;
+------------+-------------------+----------------+
| Date | Accounts credited | Total Credited |
+------------+-------------------+----------------+
| 2019-01-19 | 18175 | 3173.68 |
| 2019-01-12 | 18135 | 4768.43 |
| 2019-01-05 | 17588 | 6968.49 |
| 2018-12-29 | 17893 | 5404.18 |
| 2018-12-22 | 17353 | 7048.18 |
| 2018-12-15 | 16893 | 7181.34 |
| 2018-12-08 | 16220 | 9547.09 |
| 2018-12-01 | 15476 | 7699.59 |
+------------+-------------------+----------------+
8 rows in set (0.79 sec)
As is, the query is efficient and practical. I merely would like to add a column, difference in percentage, from previous week's total, as seen with the -- commented out code.
I have tried various approaches, but because of the GROUP BY, adding an inline column to get the sum(withdrawal) of previous week makes the query run ... forever.
I then tried the LEFT JOIN approach, but this has the same problem, Obviously. I think the added JOIN has to fetch the sum of previous week for every row of the outer select.
I then had the (not so smart) idea of querying my view, even but then it seems I would have the same issue.
I assume there are much more optimal approaches out there to this simple task.
Is there an elegant way to calculate a percentage from such a query?
Would a stored procedure or some other 'non-plain-sql' approach be more optimal?
I used this query in SQL Server:
SELECT TOP 8
DATE_FORMAT(s.created, "%Y-%m-%d") as "Date",
count(s.id) AS "Accounts credited",
sum(s.withdrawal) "Total Credited",
100 * (sum(s.withdrawal) - sum(s1.withdrawal)) / sum(s1.withdrawal) "Difference in %"
FROM statements s
LEFT JOIN statements s1 ON s1.created = s.created - 7
AND (s1.status_id = 'OPEN' OR s1.status_id = 'PENDING')
WHERE (s.status_id = 'OPEN' OR s.status_id = 'PENDING')
GROUP BY YEAR(s.created), MONTH(s.created), DAY(s.created)
ORDER BY s.created DESC
Your just handle null or zero s1.withdrawal.
I wish it work.
If you are happy with your original query then a correlated sub query like so may be all you need
select t.*,
(select totalcredited from t t1 where t1.dt < t.dt order by t1.dt desc limit 1) prev,
(
totalcredited / (select totalcredited from t t1 where t1.dt < t.dt order by t1.dt desc limit 1) * 100
) -100 as chg
from (your query) as t;
I've noticed a mistake in my previous example so here's an update.
NOTE: the query compares the current week with the previous one.
I hope that this is what you need.
SELECT
Date,
SUM(CASE week WHEN 0 THEN accounts_credited ELSE 0 END) AS 'Accounts credited',
SUM(CASE week WHEN 0 THEN total_credited ELSE 0 END) AS 'Total Credited',
100 * (
SUM(CASE week WHEN 0 THEN total_credited ELSE 0 END) - SUM(CASE week WHEN 1 THEN total_credited ELSE 0 END)
) / SUM(CASE week WHEN 1 THEN total_credited ELSE 0 END) AS 'Difference in %'
FROM
(SELECT
DATE_FORMAT(created, '%Y-%m-%d') as 'Date',
COUNT(id) AS 'accounts_credited',
SUM(withdrawal) 'total_credited',
0 AS 'week'
FROM
statements
WHERE
status_id IN ('OPEN','PENDING')
AND
YEARWEEK(created, 1) = YEARWEEK(CURDATE(), 1)
GROUP BY
DATE(created)
UNION
SELECT
DATE_FORMAT(created, '%Y-%m-%d') as 'Date',
COUNT(id) AS 'accounts_credited',
SUM(withdrawal) 'total_credited',
1 AS 'week'
FROM
statements
WHERE
status_id IN ('OPEN','PENDING')
AND
(
DATE(created) >= CURDATE() - INTERVAL DAYOFWEEK(CURDATE())+6 DAY
AND
DATE(created) < CURDATE() - INTERVAL DAYOFWEEK(CURDATE())-1 DAY
)
GROUP BY
DATE(created)
) AS tmp
ORDER BY Date
GROUP BY Date
This is your query:
select date_format(s.created, '%Y-%m-%d') as "Date",
count(*) AS "Accounts credited",
sum(s.withdrawal) "Total Credited"
from statements s
where s.status_id in ('OPEN', 'PENDING')
group by date_format(s.created, '%Y-%m-%d')
order by s.created desc
limit 8;
In MySQL, perhaps the simplest solution is variables. However, because of the rules around MySQL variables, this is a bit complicated:
select s.*,
(case when (#new_prev := #prev) = NULL then NULL -- never gets here
when (#prev := Total_Credited) = NULL then NULL -- never gets here
else #new_prev
end) as previous_week_Total_Credited
from (select date_format(s.created, '%Y-%m-%d') as "Date",
count(*) AS Accounts_credited,
sum(s.withdrawal) as Total_Credited
from statements s
where s.status_id in ('OPEN', 'PENDING')
group by date_format(s.created, '%Y-%m-%d')
order by "Date" desc
) s cross join
(select #prev := NULL) params
limit 8;
You can then just use this as a subquery for your final calculation.

Add to query in order to get total sum of previous results SQL

I currently have a query that provides the result set below, I now need to add to this query to provide a total at the bottom of all the sales. I am not sure how to do this.
Current query:
SELECT
product,
COUNT(OrderNumber) AS CountOf
FROM
orders
WHERE
STATUS = 'booking' AND
Date(OrderDate) <= CURDATE() AND
Date(OrderDate) > DATE_SUB(CURDATE(),INTERVAL 30 DAY)
GROUP BY
product
ORDER BY CountOf DESC
Current Resultset:
product| count
-----------------------
pd1 | 3
pd4 | 1
pd2 | 1
desired result set =
product| count
-----------------------
pd1 | 3
pd4 | 1
pd2 | 1
Total | 5
Maybe you can add a UNION, and a SELECT with total amount. Something like this:
SELECT
product,
COUNT(OrderNumber) AS CountOf
FROM
orders
WHERE
STATUS = 'booking' AND
Date(OrderDate) <= CURDATE() AND
Date(OrderDate) > DATE_SUB(CURDATE(),INTERVAL 30 DAY)
GROUP BY
product
UNION
SELECT 'Total', count(OrderNumber) AS CountOf
FROM orders
WHERE
STATUS = 'booking' AND
Date(OrderDate) <= CURDATE() AND
Date(OrderDate) > DATE_SUB(CURDATE(),INTERVAL 30 DAY)
ORDER BY CountOf DESC;
Try using an Inner join on the same table, the union did not work due to there being the incorrect amount of columns on each side.
The Initial select had 2 set columns, where the second select (after the union) did not.

SQL Query to find rows that didn't occur this month

I am trying to find the number of sellers that made a sale last month but didn't make a sale this month.
I have a query that works but I don't think its efficient and I haven't figured out how to do this for all months.
SELECT count(distinct user_id) as users
FROM transactions
WHERE MONTH(date) = 12
AND YEAR(date) = 2015
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
AND transactions.user_id NOT IN
(
SELECT distinct user_id
FROM transactions
WHERE MONTH(date) = 1
AND YEAR(date) = 2016
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
)
The structure of the table is:
+---------+------------+-------------+--------+
| user_id | date | status | amount |
+---------+------------+-------------+--------+
| 1 | 2016-01-01 | 'COMPLETED' | 1.00 |
| 2 | 2015-12-01 | 'COMPLETED' | 1.00 |
| 3 | 2015-12-01 | 'COMPLETED' | 2.00 |
| 1 | 2015-12-01 | 'COMPLETED' | 3.00 |
+---------+------------+-------------+--------+
So in this case, users with ID 2 and 3, didn't make a sale this month.
Use conditional aggregation:
SELECT count(*) as users
FROM
(
SELECT user_id
FROM transactions
-- 1st of previous month
WHERE date BETWEEN SUBDATE(SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1), interval 1 month)
-- end of current month
AND LAST_DAY(CURRENT_DATE)
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
GROUP BY user_id
-- any row from previous month
HAVING MAX(CASE WHEN date < SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
THEN date
END) IS NOT NULL
-- no row in current month
AND MAX(CASE WHEN date >= SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
THEN date
END) IS NULL
) AS dt
SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1) = first day of current month
SUBDATE(first day of current month, interval 1 month) = first day of previous month
LAST_DAY(CURRENT_DATE) = end of current month
if you want to generify it, you can use curdate() to get current month, and DATE_SUB(curdate(), INTERVAL 1 MONTH) to get last month (you will need to do some if clause for January/December though):
SELECT count(distinct user_id) as users
FROM transactions
WHERE MONTH(date) = MONTH(DATE_SUB(curdate(), INTERVAL 1 MONTH))
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
AND transactions.user_id NOT IN
(
SELECT distinct user_id
FROM transactions
WHERE MONTH(date) = MONTH(curdate())
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
)
as far as efficiency goes, I don't see a problem with this one
The following should be pretty efficient. In order to make it even more so, you would need to provide the table definition and and the EXPLAIN.
SELECT COUNT(DISTINCT user_id) users
FROM transactions t
LEFT
JOIN transactions x
ON x.user_id = t.user_id
AND x.date BETWEEN '2016-01-01' AND '2016-01-31'
AND x.status = 'COMPLETED'
AND x.amount > 0
WHERE t.date BETWEEN '2015-12-01' AND '2015-12-31'
AND t.status = 'COMPLETED'
AND t.amount > 0
AND x.user_id IS NULL;
Just some input for thought:
You could create aggregated lists of user-IDs per month, representing all the unique buyers in that month. In your application, you would then simply have to subtract the two months in question in order to get all user-IDs that have only made a sale in one of the two months.
See below for query- and post-processing-examples.
In order to make your query efficient, I would recommend at least a 2-column index for table transactions on [status, amount]. However, in order to prevent the query from having to look up data in the actual table, you could even create a 4-column index [status, amount, date, user_id], which should further improve the performance of your query.
Postgres (v9.0+, tested)
SELECT (DATE_PART('year', t.date) || '-' || DATE_PART('month', t.date)) AS d,
STRING_AGG( DISTINCT t.user_id::TEXT, ',' ) AS buyers
FROM transactions t
WHERE t.status = 'COMPLETED'
AND t.amount > 0
GROUP BY DATE_PART('year', t.date),
DATE_PART('month', t.date)
ORDER BY DATE_PART('year', t.date),
DATE_PART('month', t.date)
;
MySQL (not tested)
SELECT (YEAR(t.date) || '-' || MONTH(t.date)) AS d,
GROUP_CONCAT( DISTINCT t.user_id ) AS buyers
FROM transactions t
WHERE t.status = 'COMPLETED'
AND t.amount > 0
GROUP BY YEAR(t.date), MONTH(t.date)
ORDER BY YEAR(t.date), MONTH(t.date)
;
Ruby (example for post-processing)
db_result = ActiveRecord::Base.connection_pool.with_connection { |con| con.execute( db_query ) }
unique_buyers = db_result.map{|e|[e['d'],e['buyers'].split(',')]}.to_h
buyers_dec15_but_not_jan16 = unique_buyers['2015-12'] - unique_buyers['2016-1']
buyers_nov15_but_not_dec16 = unique_buyers['2015-11']||[] - unique_buyers['2015-12']
...(and so on)...

Change MySql value based on time that has past

I have the following table:
user_id post_streak streak_date streak first_name club_id
-------- ----------- ------------ --------- ----------- --------
18941684 1 2015-05-05 15:36:18 3 user 1000
I want to change streak to 0 if it has been longer then 12 days.
current query:
select
first_name, streak, user_id from myTable
where
club_id = 1000
and
post_streak = 1
and
streak_date between date_sub(now(),INTERVAL 12 DAY) and now()
order by streak desc;
Which doesn't show results older then 12 days. I want to show all results but change "streak" to 0 if it has been longer the 12 days.
What is the best way to go about this?
UPDATE table
SET (streak)
VALUES (0)
WHERE streak_date < DATEADD(DAY, -12, NOW() );
SELECT first_name, streak, user_id from myTable
WHERE
club_id = 1000
AND
post_streak = 1
ORDER BY streak DESC;
First query will set all streak values to 0 for records that have streak_date of more than 12 days ago
Second query will get a list of all your records that have a club_id of 1000 and a post_streak of 1
Put the condition in the select, rather than the where:
select first_name,
(case when streak_date between date_sub(now(), INTERVAL 12 DAY) and now()
then streak
else 0
end) as streak,
user_id from myTable
where club_id = 1000
order by streak desc;
I'm not sure if the post_streak condition is needed in the where clause.
http://sqlfiddle.com/#!9/d8bbd/6
select
user_id,
first_name,
streak_date,
IF(streak_date between date_sub(now(),INTERVAL 12 DAY) and now(),streak,0)
from myTable
where
club_id = 1000
and
post_streak = 1
order by streak desc;