Select distinct ID's - mysql

We would like an SQL statement that lists the number of times a unique IP/uniqueID pair has visited on any unique date ordered by the maximum numbers of times that the UniqueID/IP pair has visited.
Here is the table structure:
Column Type
------------------------------
Date Timestamp
NumberofUsers smallint
ipaddress varchar(16)
location varchar(2)
Count bigint(20)
Here is the sql we have been trying:
SELECT
LicenseID,
MAX(Date) AS LatestAccess,
COUNT(DISTINCT Location) AS DifferentCountries,
COUNT(DISTINCT IPAddress) AS DistinctIPCount,
COUNT(DISTINCT Date,IPAddress) AS DistinctDate
FROM
LicenseHistory
WHERE
(LicenseID<>30002)
GROUP BY
LicenseID
ORDER BY
DistinctDate DESC
Here is some sample date from the table in CSV format:
2009-10-08 10:37,30002,8,24.108.64.80,CA,2399
2009-05-27 16:57,24508,50,24.108.64.80,CA,645
2008-11-06 12:04,30,100,24.108.64.80,CA,282
2008-02-04 10:51,24508,30,24.69.19.207,CA,62
2009-10-08 14:52,13136,5,24.108.64.80,CA,285
2013-05-13 13:10,718,10,66.251.68.106,US,23860
2008-02-12 11:10,30002,8,24.69.19.207,CA,36
2008-04-09 17:49,18504,10,70.90.32.57,US,121
2007-07-26 13:38,30002,8,76.226.201.191,US,2
2009-12-03 22:35,30002,8,196.25.255.214,ZA,14
2013-05-13 6:49,20341,4,66.232.201.125,US,2676
2007-07-28 23:57,30002,8,75.81.107.238,US,1
2007-07-29 10:39,30002,8,70.63.54.162,US,1
2007-07-30 3:53,30002,8,121.210.199.31,AU,4
2007-07-30 5:11,30002,8,41.207.67.10,KE,2
Here is some sample results (not correct yet, last column should not match second to last):
uniqueID LatestAccess DifferentCountries DistinctIPCount DistinctDate
--------------------------------------------------------------------------------
20677 2013-05-13 18:20:15 4 162 162
27749 2013-05-14 05:30:59 7 155 155
459 2013-05-13 11:12:47 2 143 143
24965 2013-05-14 13:44:56 6 123 123
25226 2013-05-06 16:11:56 3 104 104
20370 2013-05-14 05:54:04 4 100 100
The problem I think is in the "COUNT(DISTINCT Date,IPAddress) AS DistinctDate" piece.

You need a COUNT DISTINCT. Here's a guess because there's no table structure provided:
SELECT
VisitDate,
COUNT(DISTINCT IPAddress, UniqueID) AS UniqueVisits
FROM MyTable
GROUP BY VisitDate
ORDER BY UniqueVisits DESC
Or if your visit date is a datetime or timestamp, cut out the time part with the DATE function (note the changes on the second and fifth lines):
SELECT
DATE(VisitDate),
COUNT(DISTINCT IPAddress, UniqueID) AS UniqueVisits
FROM MyTable
GROUP BY DATE(VisitDate)
ORDER BY UniqueVisits DESC

Your date format has a time in it. So, I think all the dates are unique. Try this:
SELECT
LicenseID,
MAX(Date) AS LatestAccess,
COUNT(DISTINCT Location) AS DifferentCountries,
COUNT(DISTINCT IPAddress) AS DistinctIPCount,
COUNT(DISTINCT date(Date), IPAddress) AS DistinctDate
FROM
LicenseHistory
WHERE
(LicenseID<>30002)
GROUP BY
LicenseID
ORDER BY
DistinctDate DESC

Related

How to optimize the subqueries in SQL?

I have a data-set, the columns sample information are like below:
Date ID Cost
05/01 1001 30
05/01 1024 19
05/01 1001 29
05/02 1001 28
05/02 1002 19
05/02 1008 16
05/03 1017 89
05/04 1003 28
05/04 1001 16
05/05 1017 28
05/06 1002 44
... etc...
And I want to create a table to display the top one payer(cost the most) on each day, which means there are only two columns in the table, and the output sample should be like this:
Date ID
05/01 1001
05/02 1001
05/03 1017
05/04 1003
...etc...
I know this question is simple, and my problem is that I want to simplify the queries.
My query:
select Date, ID
from (select Date, ID, max(SumCost)
from (select Date, ID, sum(cost) as SumCost
from table1
group by Date, ID) a
group by Date, ID) b;
It seems kind of stupid, and I want to optimize the queries. The point is that I want to only output the Date and the Id, these two columns.
Any suggestions?
Here is a method using a correlated subquery:
select t.*
from t
where t.cost = (select max(t2.cost) from t t2 where t2.date = t.date);
If we take a max cost when there are multiple costs for the player on the same day, then this query will work. The query that you have written above is incorrect.
Select date, ID
from
(
Select Date, ID, row_number() over(partition by date order by cost desc) as rnk
from table
) a
where rnk = 1

How can I ignore duplicate values in another column when using GROUP?

I have the following query:
SELECT
DATE(`timeStamp`),COUNT(*)
FROM
`wf`.sh`
WHERE
(DATE(`timeStamp`) >= curdate()- INTERVAL 31 DAY)
GROUP BY
DATE(`timeStamp`)
HAVING
COUNT(DATE(`timeStamp`)) > 0
ORDER BY
DATE(`timeStamp`) ASC;
The purpose of this query is to retrieve the amount of users online in my system per day, in the space of a month.
Example dataset:
uID timeStamp
1 2016-11-28 00:27:01
1 2016-11-28 01:10:15
1234 2016-11-28 02:50:00
2 2016-11-28 06:11:09
47 2016-11-28 08:32:48
1246 2016-11-28 09:51:47
In its current format, this query returns the count of rows with duplicate dates, for example:
timeStamp COUNT(*)
2017-01-29 256
2017-01-30 224
2017-01-31 240
2017-02-01 95
2017-02-02 136
I have another field uID; I need to modify my query so that GROUP also ignores rows with a duplicate uID field for each day. I tried creating another GROUP BY but was given an error that 'incorrect GROUP BY clause' (or something of that nature).
Can this be done via pure MySQL?
You can use a subselect
SELECT
visitDate,COUNT(*)
FROM
(SELECT DISTINCT DATE(`timeStamp`) as visitDate, uID FROM `wf`.sh`) alias_t
WHERE
(visitDate >= curdate()- INTERVAL 31 DAY)
GROUP BY
visitDate
HAVING
COUNT(visitDate) > 0
ORDER BY
visitDate ASC;

Cumulative Counts by Date Issue

I have a table that shows , for each date, a list of customer ids - shows customers who were active on any particular day. So each date can include ids that are also present in another date.
bdate customer_id
2012-01-12 111
2012-01-13 222
2012-01-13 333
2012-01-14 111
2012-01-14 333
2012-01-14 666
2012-01-14 777
I am looking to write a query which calculates the total number of unique ids between two dates - the starting date is the row date and the ending date is a particular date in the future.
My query looks like this:
select
bdate,
count(distinct customer_id) as cts
from users
where bdate between bdate and current_date
group by 1
order by 1
But this produces a count of unique users for each date like this:
bdate customer_id
2012-01-12 1
2012-01-13 2
2012-01-14 4
my desired result is ( for a count of users between starting row date and 2012-01-14 )
bdate customer_id
2012-01-12 5 - includes (111,222,333,666,777)
2012-01-13 5 - includes (222,333,111,666,777)
2012-01-14 4 - includes (111,333,666,777)
Like #Strawberry said, you can make a join like this:
select
t1.bdate,
count(distinct t2.customer_id) as cts
from users t1
join users t2 on t2.bdate >= t1.bdate
where t1.bdate between t1.bdate and current_date
group by t1.bdate
order by t1.bdate
join t2 can get you all the users between particular day and current_date, then count t2's customer_id, that's it.
SqlFiddle Demo Here

MySQL Selecting MAX total with date and ID

I have a cron script that writes the total number of active users to a table every day. I'm trying to now generate a simple report that would show the "high water mark" for each month. Because some accounts expire during the month it's possible the highest number may NOT be at the end of the month.
Here's a sample of my table structure
tblUserLog
-----------
record_id INT(11) // PRIMARY KEY
run_date DATE // DATE RUN
ttl_count INT(11) // TOTAL FOR DAY
Sample data:
record_id run_date ttl_count
1 2013-06-01 500
2 2013-06-10 510
3 2013-06-20 520
4 2013-06-30 515
5 2013-07-01 525
6 2013-07-10 530
7 2013-07-20 540
8 2013-07-31 550
9 2013-08-01 560
What I would like returned is:
record_id run_date ttl_count
3 2013-06-20 520
8 2013-07-31 550
9 2013-08-01 560
I've tried two queries that are close...
// This will give me the total for the first of the month
SELECT s.record_id, s.run_date, s.ttl_count
FROM tblStatsIndividual s
JOIN (
SELECT record_id
FROM tblStatsIndividual
GROUP BY DATE_FORMAT(run_date, '%Y %m')
HAVING MAX(ttl_count)
) s2
ON s2.record_id = s.record_id
ORDER BY run_date DESC
This returns the total for the first of each month, along with the record_id and correct date for the total.
Tried this...
SELECT record_id,max(run_date), max(ttl)
FROM (
SELECT record_id,run_date, max(ttl_count) AS ttl
FROM tblStatsIndividual
GROUP BY DATE_FORMAT(run_date, '%Y %m')
) a
GROUP BY DATE_FORMAT(run_date, '%Y %m')
ORDER BY run_date DESC
This one appears to get the correct "high water mark" but it's not returning the record_id, or the run_date for the row that IS the high water mark.
How do you get the record_id and the run_date for the highest total?
Something like
Select detail.Record_ID, detail.Run_Date, detail.ttl_Count
From tblStatsIndividual detail
Inner Join
(Select Year(run_date) as Year, Month(Run_date) as Month, Max(ttl_count) as ttl
From tblStatsIndividual
Group By Year(run_date), Month(Run_date)) maximums
On maximums.Year = Year(detail.Run_date) and maximums.Month = Month(detail.Run_date)
and maximums.ttl = detail.ttl_count
Should do it. NB based on your requirement if you had two records in the same month with the same (and highest in the month) ttl_count, they would both be returned.
Based on the help from #Tony Hopkinson, This query gets me the info. The one caveat is it shows the ID and date for the first occurrence of the MAX total, so if the total is the same three days in a row on a month, the first day's ID is returned. For my purpose, the last ID would be more ideal, but I can live with this:
SELECT s.Record_ID, s.Run_Date, s.ttl_Count
FROM tblStatsIndividual s
INNER JOIN (
SELECT YEAR(run_date) AS yr, MONTH(run_date) AS mon, MAX(ttl_count) AS ttl
FROM tblStatsIndividual
GROUP BY DATE_FORMAT(run_date, '%Y %m')
) maximums
ON maximums.yr = YEAR(s.run_date)
AND maximums.mon = MONTH(s.run_date)
AND maximums.ttl = s.ttl_Count
GROUP BY ttl_count
ORDER BY run_date DESC

MySql order by date after DATE_FORMAT

I have two tables, fbpost and fbalbum, which each have a DATETIME column called createdTime. I'm finding the number of albums per month, and the number of posts per month, and adding them together.
Here's my query(which works as described):
SELECT createdTime, itemCount FROM
(SELECT DATE_FORMAT(createdTime, '%m-%Y') AS createdTime, COUNT(*) AS itemCount FROM fbpost WHERE page_id =2
GROUP BY YEAR(createdTime), MONTH(createdTime)
UNION ALL
SELECT DATE_FORMAT(createdTime, '%m-%Y') AS createdTime, COUNT(*) AS itemCount FROM fbalbum WHERE page_id=2
GROUP BY YEAR(createdTime), MONTH(createdTime)) AS foo
GROUP BY createdTime
This gives the results:
01-2009 | 173
01-2010 | 21
01-2011 | 521
01-2012 | 776
02-2009 | 117
02-2010 | 158
02-2011 | 678
...
But I would like the results to be ordered like this:
01-2009 | 173
02-2009 | 56
03-2009 | 543
04-2009 | 211
05-2009 | 723
06-2009 | 55
07-2009 | 521
...
How can I achieve this?
Note: DATE_FORMAT() gives a string, not a DATETIME, so you can't sort by date. But, if I take out the DATE_FORMAT() in the two nested select statements, I get 2 rows for most months, since that would leave the day. Though there would be only one row per month for each nested select, they day would usually differ, since the last item in a month may be on any day.
Don't use DATE_FORMAT until your outer query:
SELECT DATE_FORMAT(a.createdTime, '%m-%Y') AS createdTime, SUM(a.itemCount) AS itemCount
FROM
(SELECT DATE(createdTime) AS createdTime, COUNT(*) AS itemCount
FROM fbpost WHERE page_id = 2
GROUP BY DATE(createdTime)
UNION ALL
SELECT DATE(createdTime) AS createdTime, COUNT(*) AS itemCount
FROM fbalbum WHERE page_id = 2
GROUP BY DATE(createdTime)) a
GROUP BY YEAR(a.createdTime), MONTH(a.createdTime)
See it in action *
*Demo does not have page_id