For a game, I want to count the number of user sign-ups by hour (using MySQL.).
Quite easy, something like that:
SELECT COUNT(*), DAY(date_user), HOUR(date_user)
FROM users,
GROUP BY DAY(date_user), HOUR(date_user)
After that, I want to only take in consideration users which have played the game at least one time. I have a second table with scores.
SELECT COUNT(*), DAY(date_user), HOUR(date_user)
FROM users, scores
WHERE users.id = scores.userid
GROUP BY DAY(date_user), HOUR(date_user)
Great...
Now, I want the two queries to result in one table, like:
Global signups | Signups of playing users | Day | Hour
I have not found a working query for this yet. Should I use unions? Or joins?
Here's one way:
select
day(date_user)
, hour(date_user)
, count(distinct u.id) as GlobalSignups
, count(distinct s.userid) as SignupsOfPlayingUsers
from users u
left join scores s on u.id = s.userid
group by day(date_user), hour(date_user)
Counting distinct user id's gives the total number of users. When the left join fails, s.userid will be NULL, and NULLs are not counted by count(). So count(distinct s.userid) returns the number of users with signups.
Related
I have 3 tables:
I would like to select the difference of the total gain and total spent per user. So my hypothetical table could be:
I tried this:
SELECT g.total - s.total AS quantity, id FROM
(SELECT SUM(quantity) AS total FROM gain GROUP BY user) AS g,
(SELECT SUM(quantity) AS total FROM spent GROUP BY user) AS s, users
But it doesn't work...
You need to use the users table as base table, to be able to consider all the users, and then LEFT JOIN to the sub queries computing the total spent and total gain. This is because some user may not have any entry in either gain or spent table(s). Also, Coalesce() function handles the NULL (in case of no matching row)
SELECT
u.id AS user,
COALESCE(tot_gain, 0) - COALESCE(tot_spent, 0) AS balance
FROM users AS u
LEFT JOIN (SELECT user, SUM(quantity) as tot_spent
FROM spent
GROUP BY user) AS s ON s.user = u.id
LEFT JOIN (SELECT user, SUM(quantity) as tot_gain
FROM gain
GROUP BY user) AS g ON g.user = u.id
Madhur's solution is fine. An alternative is union all and group by:
select user, sum(gain) as gain, sum(spent) as spent
from ((select user, quantity as gain, 0 as spent
from gain
) union all
(select user, 0, quantity as spent
from spent
)
) u
group by user;
You can join to user if you want users that are not in either table or you need additional columns. However, that join may not be necessary.
I have the following scenario:
Table: users : user_id, username ,...
Table: login: user_id, login_date, ...
Table: point: user_id, points, point_time
Joins will be on the basis of users.user_id with other tables.
Now, I want to get count of all the logins as well as sum of all the points earned by the user.
Now, when I do:
select users.user_id,count(*) from users
inner join login on users.user_id=login.user_id
group by users.user_id
It returns count as 36(for example).
Whenever I run:
select users.user_id,count(*),sum(points) from users
inner join point on users.user_id=point.user_id
group by users.user_id
It returns sum as 400(for example) and count as 2.
But if I combine both the queries:
select users.user_id,count(*),sum(points) from users
inner join login on users.user_id=login.user_id
inner join point on users.user_id=point.user_id
group by users.user_id
It returns count as 72 (36 * 2) and sum as 800 (400 *2).
Twice because of multiple userIds present.
I tried several things like combining with distincts but nothing seems to work. Please help.Better if it's possible with joins alone. Thanks in advance. I am using mysql with Php.
You can sum the points in a subquery and select distinct logins in the count
select users.user_id,l.login,p.points from users
inner join (select user_id, count(1) login from login
group by login) as l on users.user_id=login.user_id
inner join (select user_id, sum(point) as point
from point group by user_id ) as p on users.user_id=point.user_id
You should be able to do your count by joining in your login table and then including a subquery to get your count of points:
select users.user_id, count(*) as login_count,
(select sum(points) from point
where point.user_id = users.user_id) as points_sum
from users
inner join login on users.user_id=login.user_id
group by users.user_id
I have the following problem with my query:
I have two tables:
Customer
Subscriber
linked together by customer.id=subscriber.customer_id
in the subscriber table, I have records with id_customer=0 (these are email records, that do not have a full customer account)
Now i want to show how many customers I have per day, and how many subscribers with id_customer, and how many subscribers WITH id_customer=0 (emailonlies i call them)
Somehow, i cannot manage to get those emailonlies.
Perhaps it has something to do with not using the right join type.
When i use left join, i get the right amount of customers, but not the right amount of emailonlies. When I use inner join i get the wrong amount of customers. Am i using the group function correctly? i think it has something to do with that.
THIS IS MY QUERY:
` SELECT DATE(c.date_register),
COUNT(DISTINCT c.id) AS newcustomers,
COUNT(DISTINCT s.customer_id) AS newsubscribedcustomers,
COUNT(DISTINCT s.subscriber_id AND s.customer_id=0) AS emailonlies
FROM customer c
LEFT JOIN subscriber s ON s.customer_id=c.id
GROUP BY DATE(c.date_register)
ORDER BY DATE(c.date_register) DESC
LIMIT 10
;`
I'm not entirely sure, but I think in DISTINCT s.subscriber_id AND s.customer_id=0, it runs the AND before the DISTINCT, so the DISTINCT only ever sees true and false.
Why don't you just take
COUNT(DISTINCT s.subscriber_id) - (COUNT(DISTINCT s.customer_id) - 1)?
(The -1 is there because DISTINCT s.customer_id will count 0.)
Got it, only risk is that i get no email onlies if there are no customers on this day, becuase of the left join. But this one works:
SELECT customers.regdatum,customers.customersqty,subscribers.emailonlies
FROM (
(SELECT DATE(c.date_register) AS regdatum,COUNT(DISTINCT c.id) AS customersqty
FROM customer c
GROUP BY DATE(c.date_register)
) AS customers
LEFT JOIN
(SELECT DATE(s.added) AS voegdatum,COUNT(DISTINCT s.subscriber_id) AS emailonlies
FROM subscriber s
WHERE s.customer_id=0
GROUP BY DATE(s.added)
) AS subscribers
ON customers.regdatum=subscribers.voegdatum
)
ORDER BY customers.regdatum DESC
;
I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.
I have two tables user (one) and transaction (many) and I need to get the average time in days from when a user was created to when they made their first transaction. I'm using AVG(TIMESTAMPDIFF) which is working well, except that the GROUP BY returns an average against every user instead of one single average for all unique users in the transaction table. If I remove the GROUP BY, I get a single average figure but it takes into account multiple transactions from users, whereas I just want to have one per user (the first they made).
Here's my SQL:
SELECT AVG(TIMESTAMPDIFF(DAY, u.date_created, t.transaction_date)) AS average
FROM transaction t
LEFT JOIN user u ON u.id = t.user_id
WHERE t.user_id IS NOT NULL AND t.status = 1
GROUP BY t.user_id;
I'd appreciate it if someone can help me return the average for unique users only. It's fine to break the query down into two, but the tables are large so returning lots of data and putting it back in is a no-go. Thanks in advance.
SELECT AVG(TIMESTAMPDIFF(DAY, S.date_created, S.transaction_date)) AS average
FROM (
SELECT u.date_created, t.transaction_date
FROM transaction t
INNER JOIN user u ON u.id = t.user_id
WHERE t.status = 1
GROUP BY t.user_id
HAVING u.date_created = MIN(u.date_created)
) s
I replaced the LEFT JOIN with an INNER JOIN because I think that's what you want, but it's not 100% equivalant to your WHERE t.user_id IS NOT NULL.
Feel free to put the LEFT JOIN back if need be.
select avg( TIMESTAMPDIFF(DAY, u.date_created, min_tdate) ) as average
from user u
inner join
(select t.user_id, min(t.transaction_date) as min_tdate
from transaction t
where t.status=1;
group by t.user_id
) as min_t
on u.id=min_t.user_id;