I posted about this a few weeks ago, but I don't think I asked the question clearly because the answers I got were not what I was looking for. I think it's best to start again.
I'm trying to query a database to retrieve the number of unique entries over time. The data looks something like this:
Day | UserID
1 | A
1 | B
2 | B
3 | A
4 | B
4 | C
5 | D
I'd like the query result to look this this
Time Span | COUNT(DISTINCT UserID)
Day 1 to Day 1 | 2
Day 1 to Day 2 | 2
Day 1 to Day 3 | 2
Day 1 to Day 4 | 3
Day 1 to Day 5 | 4
If I do something like
SELECT COUNT(DISTINCT `UserID`) FROM `table` GROUP BY `Day`
, the distinct counts will not consider user IDs of previous days.
Any Ideas? The data set I'm using is quite large, so multiple-queries and post processing takes a long time (that's how I'm currently doing it).
Thanks
You can use a subquery
Sample table
create table visits (day int, userid char(1));
insert visits values
(1,'a'),
(1,'b'),
(2,'b'),
(3,'a'),
(4,'b'),
(4,'c'),
(5,'d');
The query
select d.day, (select count(distinct userid) from visits where day<=d.day)
from (select distinct day from visits) d
how about something like this:
SELECT Count(UserID), Day
FROM
(SELECT Count(UserID) as Logons, UserID, Day
FROM yourDailyLog
GROUP BY Day, UserID)
GROUP BY Day
The inner select should eliminate the duplicate visits by a same user on a given day.
Stay away from DISTINCT. It is usually a questionable approach to almost any SQL problem.
Wait: I see now that you want the time period to increase over time. That makes things a little trickier. Why don't you aggregate the rest of this information in code rather than doing it all through sql?
Related
I wanted to ask you for your help to create a query that will let me do the following:
I have an online shop, and I have a database where I store the products each user has viewed with a timestamp. I want to visualize 'product sessions', what I mean by this is to obtain the time the user spent watching products divided by day (and also how many products were viewed during this day).
An example:
My recently viewed table with user ID
I want to create a query that will give me this output
| DAY | TIME | PRODUCTS |
----------------------------------
|2018-07-31| 00:00:04 | 2 |
----------------------------------
|2018-08-01| 02:38:56 | 5 |
So far I was only able to do this:
SELECT DATE(`added_timestamp`) AS day, COUNT(*) AS num_products
FROM tb_recently_viewed
WHERE `user_id`= 'bac240e3eefbb7dff0bc03d00f392f0d'
GROUP BY DATE(`added_timestamp`)
ORDER BY day
Which outputs the products seen in a day:
Any help would be appreciate it.
Here you go:
select
date(added_timestamp) as day,
count(*) as num_products,
timediff(max(time(added_timestamp)), min(time(added_timestamp))) as time_diff
from tb_recently_viewed
group by date(added_timestamp)
Result:
day num_products time_diff
---------- ------------ ---------
2018-07-31 2 00:00:04
2018-08-01 2 02:38:56
Guess there are many varianat of this question, however this has a twist.
My primary table contains logged kilometers for certain dates for certain users:
Table km_run:
|entry|mnumber|dato |km | where 'dato' is the specific date. Formats are like:
|1 |3 |2013-01-01|5.7|
For a specific user ('mnumber') I want to calculate the sum in each week of a year. For this purpose I have made a 'dummy-table' just containing the week numbers from 1 to 53:
Table `week_list`:
|week|
|1 |
|2 |
etc..
This query gives the sum, however I cannot find a way to return a zero if there are no entries in 'km_run' for the specific week.
SELECT `week_list`.`week`, WEEKOFYEAR(`km_run`.`dato`), SUM(`km_run`.`km`)
FROM `week_list` LEFT JOIN `km_run` ON WEEKOFYEAR(`dato`) = `week_list`.`week`
WHERE `km_run`.`mnumber` = 3 AND `km_run`.`dato` >= '2013-01-01'
AND `km_run`.`dato` < '2014-01-01'
GROUP BY WEEKOFYEAR(`dato`)
I have tried to do COALESCE( SUM(km),0) and I have also tried to use the IFNULL function around the sum. Despite the left join, not all records from week_list are returned in the sql statement.
Here's the result:
week | WEEKOFYEAR(`km_run`.`dato`) | SUM(`km_run`.`km`)
1 | 1 | 58.4
3 | 3 | 50.7
4 | 4 | 39.2
As you can see, week two is skipped instead of returning a 0
Firstly JOIN works, creating such rows:
week=2 weekofyear=null mnumber=null sum=0 ...
Then, WHERE clause (for example, where mnumber=3) excludes rows with nulls.
You could try something like this:
SELECT week, SUM(km) FROM (
(SELECT km_run.km AS km, WEEKOFYEAR(km_run.dato) AS week
FROM km_run
WHERE mnumber = 3 AND km_run.dato >= '2013-01-01' AND km_run.dato < '2014-01-01')
UNION
(SELECT 0 AS km, week_list.week as week FROM week_list)
) GROUP BY week
I have a simple table to keep count of the number of visitors on a website.
|Day|Visitors|
|1 |2 |
|2 |5 |
|4 |1 |
I want to select the number of visitors per day for days 1 to 4, but I also want a value for day 3. Since day 3 is missing, I wonder if it is possible to select all integers in a range, and if the column is missing, a default is to be returned. A simple "SELECT visitors FROM table WHERE day >= 1 AND day <= 4 ORDER By day" query will return "2, 5, 1", but the query I'm looking for will return "2, 5, 0, 1".
Here is an example for your data:
select n.n as days, coalesce(visitors, 0) as visitors
from (select 1 as n union all select 2 union all select 3 union all select 4
) n left outer join
t
on t.days = n.n;
You need to fill in all the numbers of days in the n subquery. Perhaps you have another table with sequential numbers which can help with this and other queries.
Use the power of the scripting language that you are using for the website to check for the missing days and show 0 for those days
If you REALLY NEED to get this from the database, you can use a table to hold the day numbers and do a LEFT JOIN with it:
SELECT coalesce(table.visitors, 0) AS visitors
FROM days_table
LEFT JOIN table ON days_table.day = table.day
WHERE table.day >= 1 AND day <= 4 ORDER By day
I have 1 table with similar data:
CustomerID | ProjectID | DateListed | DateCompleted
123456 | 045 | 07-29-2010 | 04-03-2011
123456 | 123 | 10-12-2011 | 11-30-2011
123456 | 157 | 12-12-2011 | 02-10-2012
123456 | 258 | 06-07-2011 | NULL
Basically, a customer contacts us, we get a project on our list, and we mark it completed when we're done with it.
What I'm after is a simple (you'd think, at least) count of all projects, with expected output like below:
YEAR | TotalListed | TotalCompleted
2010 | 1 | 0
2011 | 3 | 2
2012 | 0 | 1
However, my query below - because of the join - isn't showing 2012's count, because there's been no listed project for 2012. However, I can't really reverse the query, as then 2010's count wouldn't show up (since nothing was completed in 2010).
I'm open to any suggestions, or tips like how to do this. I've pondered a temp table, is that the best way to go? I'm open to anything that gets me what I need!
(If the code looks familiar, ya'll helped me get the subquery made! MySQL Subquery with main query data variable)
SELECT YEAR(p1.DateListed) AS YearListed, COUNT(p1.ProjectID) As Listed, PreQuery.Completed
FROM(
SELECT YEAR(DateCompleted) AS YearCompleted, COUNT(ProjectID) AS Completed
FROM projects
WHERE CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY YEAR(DateCompleted)
) PreQuery
RIGHT OUTER JOIN projects p1 ON PreQuery.YearCompleted = YEAR(p1.DateListed)
WHERE CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY YearListed
ORDER BY p1.DateListed
After reviewing your table, query, and expected results - I believe I have found a more-revised query to suit your needs. It is a fairly-full rewrite of your existing query though, but I've tested it with your given data and received the same results you want/expect:
SELECT
years.`year`,
SUM(IF(YEAR(DateListed) = years.`year`, 1, 0)) AS TotalListed,
SUM(IF(YEAR(DateCompleted) = years.`year`, 1, 0)) AS TotalCompleted
FROM
projects
LEFT JOIN (
SELECT DISTINCT `year` FROM (
SELECT YEAR(DateListed) AS `year` FROM projects
UNION SELECT YEAR(DateCompleted) AS `year` FROM projects WHERE DateCompleted IS NOT NULL
) as year_inner
) AS years
ON YEAR(DateListed) = `year`
OR YEAR(DateCompleted) = `year`
WHERE
CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY
years.`year`
ORDER BY
years.`year`
To explain, we should start with the inner query (aliased as year_inner). It selects a full list of years in the DateListed and DateCompleted columns and then selects a DISTINCT list of those to create the years alias sub-query. This sub-query is used to get a full list of "years" that we want data for. Doing it this way, opposed to a sub-query with counts and groupings will allow you to only have to define the WHERE clause on the outermost query (though, if efficiency becomes an issue with thousands and thousands of records, you could always add a WHERE clause to the inner query too; or an index to the date columns).
After we've built our inner queries, we join the projects table on the results with a LEFT JOIN for the DateListed or DateCompleted's YEAR() value - which will allow us to bring back null columns too!
For the field selections, we use the year column from our inner query to assure that we get a full list of years to display. Then, we compare the current row's DateListed & DateCompleted YEAR() value to the current year; if they're equal, add 1 - else add 0. When we GROUP BY year, our SUM() will count all of the 1's for that year for each column and give you the output you want (hopefully, of course =P).
I have a table called user_logins which tracks user logins into the system. It has three columns, login_id, user_id, and login_time
login_id(INT) | user_id(INT) | login_time(TIMESTAMP)
------------------------------------------------------
1 | 4 | 2010-8-14 08:54:36
1 | 9 | 2010-8-16 08:56:36
1 | 9 | 2010-8-16 08:59:19
1 | 3 | 2010-8-16 09:00:24
1 | 1 | 2010-8-16 09:01:24
I am looking to write a query that will determine the number of unique logins for each day if that day has a login and only for the past 30 days from the current date. So for the output should look like this
logins(INT) | login_date(DATE)
---------------------------
1 | 2010-8-14
3 | 2010-8-16
in the result table 2010-8-16 only has 3 because the user_id 9 logged in twice that day and him logging into the system only counts as 1 login for that day. I am only looking for unique logins for a particular day. Remember I only want the past 30 days so its like a snapshot of the last month of user logins for a system.
I have attempted to create the query with little success what I have so far is this,
SELECT
DATE(login_time) as login_date,
COUNT(login_time) as logins
FROM
user_logins
WHERE
login_time > (SELECT DATE(SUBDATE(NOW())-1)) FROM DUAL)
AND
login_time < LAST_DAY(NOW())
GROUP BY FLOOR(login_time/86400)
I know this is wrong and this returns all logins only starting from the beginning of the current month and doesn't group them correctly. Some direction on how to do this would be greatly appreciated. Thank you
You need to use COUNT(DISTINCT ...):
SELECT
DATE(login_time) AS login_date,
COUNT(DISTINCT login_id) AS logins
FROM user_logins
WHERE login_time > NOW() - interval 30 day
GROUP BY DATE(login_time)
I was a little unsure what you wanted for your WHERE clause because your question seems to contradict itself. You may need to modify the WHERE clause depending on what you want.
As Mark suggests you can use COUNT(DISTINCT...
Alternatively:
SELECT login_day, COUNT(*)
FROM (
SELECT DATE_FORMAT(login_time, '%D %M %Y') AS login_day,
user_id
FROM user_logins
WHERE login_time>DATE_SUB(NOW(), INTERVAL 1 MONTH)
GROUP BY DATE_FORMAT(login_time, '%D %M %Y'),
user_id
)
GROUP BY login_day