MYSQL: country with most new users in January? - mysql

I have 2 tables, users and events:
**Users:**
usersid
age
geo_country
gender
**events:**
ts
usersid
event
videoid
Where ts is the timestamp field. And possible events are 'start_video', 'browse_catalog', 'exit_video'
I want to find out which country had the most new users in January.
My code is as follows:
SELECT DISTINCT (u.geo_country), COUNT(e.userid) As Users_Ids
FROM (SELECT userid, DATE(MIN(ts)) AS first_time
FROM events
WHERE ts BETWEEN '2017-01-01 00:00:00' and '2017-01-31 24:00:00'
GROUP BY userid) AS e
LEFT JOIN users u ON u.userid= e.userid
GROUP BY first_time
ORDER BY COUNT(e.userid) DESC;
Since I don't have the session field, is my subquery all right in providing new users for January 2017?
Any help would be highly appreciated.
Thanks,
Claudia

I think the query that you posted is slightly incorrect.
Theoretically, the GROUP BY should describe how to group the data set for the aggregate function. In your primary query, you want to count the number of users by country, so instead of grouped by first_time, the aggregation COUNT should go with GROUP BY u.geo_country, also, as a result, the DISTINCT on geo_country is no longer necessary.
The GROUP BY first_time will also providing wrong answers as it provides the count aggregation for number of users per unique first_time recorded not by unique country.
The correct query should be:
SELECT u.geo_country,
COUNT(e.userid) As Users_Ids
FROM (SELECT userid, DATE(MIN(ts)) AS first_time
FROM events
GROUP BY userid
HAVING first_time BETWEEN '2017-01-01 00:00:00' and '2017-01-31 24:00:00')
AS e
LEFT JOIN users u ON u.userid= e.userid
GROUP BY u.geo_country
ORDER BY Users_Ids DESC;

Related

How to use SQL to count events in the first week

I'm trying to write a SQL query, which says how many logins each user made in their first week.
Assume, for the purpose of this question, that I have a table with at least user_id and login_date. I'm trying to produce an output table with user_id and num_logins_first_week
Use aggregation to get the first date for each user. Then join in the logins and aggregate:
select t.user_id, count(*) as num_logins_first_week
from t join
(select user_id, min(login_date) as first_login_date
from t
group by user_id
) tt
on tt.user_id = t.user_id and
t.login_date >= tt.first_login_date and
t.login_date < tt.first_login_date + interval 7 day
group by t.user_id;

SQL beginner practice problems

Given two tables, orders (order_id, date, $, customer_id) and customers (ID, name)
Here's my method but I'm not sure if it's working & I'd like to know if there's faster/better way of solving these problems:
1) find out number of customers who made at least one order on date 7/9/2018
Select count (distinct customer_id)
From
(
Select customer_id from orders a
Left join customer b
On a.customer_id = b.ID
Group by customer_id,date
Having date = 7/9/2018
) a
2) find out number of customers who did not make an order on 7/9/2018
Select count (customer_id) from customer where customer_id not in
(
Select customer_id from orders a
Left join customer b
On a.customer_id = b.ID
Group by customer_id,date
Having date = 7/9/2018
)
3) find the date with most sales between 7/1 and 7/30
select date, max($)
from (
Select sum($),date from orders a
Left join customer b
On a.customer_id = b.ID
Group by date
Having date between 7/1 and 7/30
)
Thanks,
For problem 1, a valid solution might look like this:
SELECT COUNT(DISTINCT customer_id) x
FROM orders
WHERE date = '2018-09-07'; -- or is that '2018-07-09' ??
For problem 2, a valid solution might look like this:
SELECT COUNT(*) x
FROM customer c
LEFT
JOIN orders o
ON o.customer_id = x.customer_id
AND o.date = '2018-07-09'
WHERE o.crder_id IS NULL;
Assuming there are no ties, a valid solution to problem 3 might look like this:
SELECT date
, COUNT(*) sales
FROM orders
WHERE date BETWEEN '2018-07-01' AND '2018-07-30'
GROUP
BY date
ORDER
BY sales DESC
LIMIT 1;
The default format for a date in MySQL is YYYY-MM-DD, although this can be customized. You have to put quotes around it, otherwise it's treated as an arithmetic expression.
And none of your queries need to join with the customer table. The customer ID is already in the orders table, and you're not returning any info about the customers (like the name or address), you're just counting them.
1) You don't need the subquery or grouping.
SELECT COUNT(DISTINCT customer_id)
FROM orders
WHERE date = '2018-07-09'
2) Again, you don't need GROUP BY in the subquery. There's also a better pattern than NOT IN to get the count of non-matching rows.
SELECT COUNT(*)
FROM customer AS c
LEFT JOIN order AS o on c.id = o.customer_id AND o.date = '2018-07-09'
WHERE o.id IS NULL
See Return row only if value doesn't exist for various patterns to do this.
3) You can't use MAX($) in the outer query because the inner query doesn't return a column with that name. But even if you fix that, it still won't work, because the date column won't necessarily come from the same row that has the maximum. See SQL select only rows with max value on a column for more explanation of this.
You don't need a subquery at all. Use a query that returns the total sales for each day, then use ORDER BY to get the highest one.
SELECT date, SUM($) AS total_sales
FROM orders
WHERE date BETWEEN '2018-07-01' AND '2017-07-30'
GROUP BY date
ORDER BY total_sales DESC
LIMIT 1
If "most sales" is supposed to mean "most number of sales", replace SUM($) with COUNT(*).

Using an alias in where clause with a group by

Using a SQL query, I am trying to find the number of users that have had page views greater than 5 in a given month.
What I have so far is exactly the above except, I can't add the condition of a minimum of 5 page views. It is currently showing the number of users who have had at least 1 page view in a given month.
SELECT CONCAT(MONTH(analytics.date),'/',YEAR(analytics.date)) AS DATE,
COUNT(analytics.id) AS views,
COUNT(DISTINCT users.id) AS num_users
FROM users
LEFT JOIN analytics ON users.id = analytics.user_id
WHERE users.banned = 0
AND analytics.id IS NOT NULL
GROUP BY YEAR(analytics.date), MONTH(analytics.date)
I tried adding AND views > 5 in the where clause but that didn't work as I get an unknown column.
I don't think a HAVING clause will work as this is applied after the GROUP BY and I need to find individual users who have had more than 5 page views.
How else can I achieve this?
If this is your requirement, then you need to aggregate twice, once at the user level and second at the analytics level. Or, use a subquery in the where clause. Here is what you may need:
SELECT CONCAT(MONTH(a.date),'/',YEAR(a.date)) AS DATE,
COUNT(a.id) AS views,
COUNT(DISTINCT u.id) AS num_users
FROM users u LEFT JOIN
analytics a
ON u.id = a.user_id
WHERE u.banned = 0 AND a.id IS NOT NULL AND
5 <= (SELECT COUNT(*) FROM analytics a2 WHERE a2.user_id = u.userid)
GROUP BY YEAR(a.date), MONTH(a.date);
This uses the overall count for the limit.
EDIT: TO speed the subquery, be sure you have an index on analytis(user_id, date).
You have to use a subquery for this, since you're selecting which users feed into the GROUP BY. Here, we do a subquery in the WHERE clause to ask for each row if the user has at least five entries in the analytics table.
SELECT CONCAT(MONTH(analytics.date),'/',YEAR(analytics.date)) AS DATE,
COUNT(analytics.id) AS views,
COUNT(DISTINCT users.id) AS num_users
FROM users
LEFT JOIN analytics ON users.id = analytics.user_id
WHERE users.banned = 0
AND (SELECT COUNT(*) FROM analytics AS a WHERE a.user_id = users.id) > 5
AND analytics.id IS NOT NULL
GROUP BY YEAR(analytics.date), MONTH(analytics.date)
If you want there to be more than 5 views for the user in the given month, then you have to modify your query and you'll need to use an inner join:
SELECT CONCAT(MONTH(analytics.date),'/',YEAR(analytics.date)) AS DATE,
COUNT(analytics.id) AS views,
COUNT(DISTINCT users.id) AS num_users
FROM users
JOIN analytics ON users.id = analytics.user_id
WHERE users.banned = 0
AND (SELECT COUNT(*) FROM analytics AS a WHERE a.user_id = users.id AND EXTRACT(YEAR_MONTH FROM a.date) = EXTRACT(YEAR_MONTH FROM analytics.date)) > 5
AND analytics.id IS NOT NULL
GROUP BY YEAR(analytics.date), MONTH(analytics.date)

MySQL query with GROUP BY and ORDER BY timestamp DESC

I am saving the history of Facebook likes for a page, identified by user_id.
Now from this table, I need to get a set representing the user_id's and their latest number of likes, based on the most recent timestamp.
I started off with this:
SELECT *
FROM facebook_log
GROUP BY user_id
ORDER BY timestamp DESC;
But that does not do what I want because it returns the first records with the lowest timestamps.
I read something online about GROUP returning the very first records from the table.
I also understood something about JOIN the table with itself, but that doesn't work either, or I did something wrong.
If you just need the user_id and the timestamp, you can just do
select f.user_id, max(f.timestamp)
from facebook_log
group by user_id;
if you need all the data from the table, you can do
select f.*
from facebook_log f
inner join (select max(timestamp) mt, user_id
from facebook_log
group by user_id) m
on m.user_id = f.user_id and m.mt = f.timestamp
You can also get the latest number of likes by using this MySQL trick:
select f.user_id, max(f.timestamp),
substring_index(group_concat(f.numlikes order by f.timestamp desc), ',', 1) as LatestLikes
from facebook_log f
group by f.user_id;

get the average time for time from subscription until payment

I have two tables. The first is subscribers. Subscribers are also appointed to a category. The second table is payments that the subscribers made. I want to know what the average time is between the time of subscription and the FIRST payment of a subscriber (the can make multiple).
Here is a piece of SQL, but it doesn't do what I want just yet - although I have the feeling I'm close ;)
SELECT category,
AVG(TIMESTAMPDIFF(HOUR, subs.timestamp, MIN(payments.timestamp)))
FROM subs
JOIN payments ON (payments.user_id = subs.user_id)
GROUP BY category
Now I get "Invalid use of group function" - because of the MIN function, so that ain't right. What do I have to do now? Thanks in advance!
SELECT category,
AVG(TIMESTAMPDIFF(HOUR, subs.timestamp, p.timestamp))
FROM subs
JOIN ( SELECT user_id
, min(timestamp) timestamp
FROM payments
GROUP BY user_id
) p
ON p.user_id = subs.user_id
GROUP BY category
If you needed to update another table with the results of this query, you could do something like this (not tested, so there may be syntax errors but hopefully you get the idea). I assume that another_table has category and avg_hrs_spent columns.
UPDATE another_table
SET avg_hrs_spent =
(
SELECT a.avg_hrs_spent FROM
(
(SELECT category,
AVG(TIMESTAMPDIFF(HOUR, subs.timestamp, p.timestamp)) avg_hrs_spent
FROM subs
JOIN ( SELECT user_id
, min(timestamp) timestamp
FROM payments
GROUP BY user_id
) p
ON p.user_id = subs.user_id
GROUP BY category) a
)
WHERE a.category = another_table.category
)