How to find duplicate rows with SQL- GROUP BY - mysql

I've a table
+----+------------+
| id | day |
+----+------------+
| 1 | 2006-10-08 |
| 2 | 2006-10-08 |
| 3 | 2006-10-09 |
| 4 | 2006-10-09 |
| 5 | 2006-10-09 |
| 5 | 2006-10-09 |
| 6 | 2006-10-10 |
| 7 | 2006-10-10 |
| 8 | 2006-10-10 |
| 9 | 2006-10-10 |
+----+------------
I want to group by the frequency and its count, for eg:-
Since there's a date 2006-10-08 that appears twice, hence frequency 2 and there is only one date that appears twice , hence total dates 1.
Another eg:-
2006-10-10 and 2006-10-09 both appears 4 times, hence frequency 4 and total dates with frequency 4 are 2.
Following is the expected output.
+----------+--------------------------------+
| Freuency | Total Dates with frequency N |
+----------+--------------------------------+
| 1 | 0 |
| 2 | 1 |
| 3 | 0 |
| 4 | 2 |
+----------+--------------------------------+ and so on till the maximum frequency.
What I've tried is the following:-
select day, count(*) from test GROUP BY day;
It returns the frequency of each date, ie
+------------+----------+
| day | count(*) |
+------------+----------+
| 2006-10-08 | 2 |
| 2006-10-09 | 4 |
| 2006-10-09 | 4 |
+------------+----------+
Please help with the above problem.

Just use your query as a subquery:
select freq, count(*)
from (select day, count(*) as freq
from test
group by day
) d
group by freq;
If you want to get the 0 values, then you have to work harder. A numbers table is handy (if you have one) or you can do:
select n.freq, count(d.day)
from (select 1 as freq union all select 2 union all select 3 union all select 4
) n left join
(select day, count(*) as freq
from test
group by day
) d
on n.freq = d.freq
group by n.freq;

Related

SQL query that randoms the id from all posible ids in table and outputs the rows containing that id

I want a query that selects all rows that have the UploadedbyUserID = Rand() (selects random id from possible UploadbyUserID in this case 4, 3 and 22 and only those 3 not 2 nor 5)
And if the rand gives 4 it outputs this:
+------+------+------------+--------------------+
| id | name | date | UploadedbyUserID |
+------+------+------------+--------------------+
| 1 | 2222 | Testing | 4 |
| 2 | Jack | description| 4 |
| 6 | Zara | 2007-02-06 | 4 |
+------+------+------------+--------------------+
This is the whole table
+------+------+------------+--------------------+
| id | name | date | UploadedbyUserID |
+------+------+------------+--------------------+
| 1 | 2222 | Testing | 4 |
| 2 | Jack | description| 4 |
| 3 | ffdsd| 2007-05-06 | 4 |
| 4 | dsm | 2007-05-27 | 3 |
| 5 | dddd | 2007-04-06 | 3 |
| 6 | Zara | 2007-02-06 | 4 |
| 7 | John | 2007-01-24 | 22 |
+------+------+------------+--------------------+
and if it randomizes 3 it outputs this
+------+------+------------+--------------------+
| id | name | date | UploadedbyUserID |
+------+------+------------+--------------------+
| 4 | dsm | 2007-05-27 | 3 |
| 5 | dddd | 2007-04-06 | 3 |
+------+------+------------+--------------------+
Ask if you need more information
Hmmm. This is one way:
select t.*
from (select uploadedbyuserid
from t
order by rand()
limit 1
) u join
t
using (uploadedbyuserid);
First, let me say that this is weighted by the number of times that a user has uploaded something. So, user "4" would appear a bit more often than "3", in your example. If this is an issue:
select t.*
from (select uploadedbyuserid
from (select distinct uploadedbyuserid from t) t
order by rand()
limit 1
) u join
t
using (uploadedbyuserid);
The next observation is that this can be compute intensive. If you have lots of rows, there are various ways to speed these up. For instance, one simple method would be to get about 1 out of 10000 rows:
select t.*
from (select uploadedbyuserid
from (select distinct uploadedbyuserid
from t
) t
where rand() < 0.001
order by rand()
limit 1
) u join
t
using (uploadedbyuserid);

Find Latest record by date for each distinct column value

I am working with a dataset with a similar format to the following:
Table: Account
*-----------*----------*-------------*
| id | amount | date |
*-----------*----------*-------------*
| 1 | 100 | 01/01/2016 |
| 2 | 100 | 01/02/2016 |
| 3 | 100 | 01/03/2016 |
| 4 | 200 | 01/04/2016 |
| 5 | 200 | 01/05/2016 |
| 6 | 200 | 01/06/2016 |
| 7 | 300 | 01/07/2016 |
| 8 | 300 | 01/08/2016 |
| 9 | 300 | 01/09/2016 |
| 10 | 400 | 01/10/2016 |
*-----------*----------*-------------*
I need a query to return that returns the most recent record for every distinct value in the table. So, the above table would return
*-----------*----------*-------------*
| id | amount | date |
*-----------*----------*-------------*
| 3 | 100 | 01/03/2016 |
| 6 | 200 | 01/06/2016 |
| 9 | 300 | 01/09/2016 |
| 10 | 400 | 01/10/2016 |
*-----------*----------*-------------*
I am still new to subqueries but I tried the following
SELECT a.id, a.amount, a.date FROM account a WHERE a.date IN (SELECT MAX(date) FROM account)
However this only return the latest date. How can I get the latest date for every distinct value in the amount column.
If you only need amount:
SELECT amount, MAX(date) from myTable group by amount
If you need more data:
SELECT * from myTable where (amount, date) IN (
SELECT amount, MAX(date) as date from table group by amount
)
Or maybe this will run faster:
SELECT * from myTable A WHERE NOT EXISTS (
SELECT 1
FROM myTable B
WHERE A.date < B.date
AND A.amount = B.amount
)

Complex SQL query suggestions please

I have three tables with schema as below:
Table: Apps
| ID (bigint) | USERID (Bigint)| START_TIME (datetime) |
-------------------------------------------------------------
| 1 | 13 | 2013-05-03 04:42:55 |
| 2 | 13 | 2013-05-12 06:22:45 |
| 3 | 13 | 2013-06-12 08:44:24 |
| 4 | 13 | 2013-06-24 04:20:56 |
| 5 | 13 | 2013-06-26 08:20:26 |
| 6 | 13 | 2013-09-12 05:48:27 |
Table: Hosts
| ID (bigint) | APPID (Bigint)| DEVICE_ID (Bigint) |
-------------------------------------------------------------
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 1 | 1 |
| 4 | 3 | 3 |
| 5 | 1 | 4 |
| 6 | 2 | 3 |
Table: Usage
| ID (bigint) | APPID (Bigint)| HOSTID (Bigint) | Factor (varchar) |
-------------------------------------------------------------------------------------
| 1 | 1 | 1 | Low |
| 2 | 1 | 3 | High |
| 3 | 2 | 2 | Low |
| 4 | 3 | 4 | Medium |
| 5 | 1 | 5 | Low |
| 6 | 2 | 2 | Medium |
Now if put is userid, i want to get the count of rows of table rows for each month (of all app) for each "Factor" month wise for the last 6 months.
If a DEVICE_ID appears more than once in a month (based on START_TIME, based on joining Apps and Hosts), only the latest rows of Usage (based on combination of Apps, Hosts and Usage) be considered for calculating count.
Example output of the query for the above example should be: (for input user id=13)
| MONTH | USAGE_COUNT | FACTOR |
-------------------------------------------------------------
| 5 | 0 | High |
| 6 | 0 | High |
| 7 | 0 | High |
| 8 | 0 | High |
| 9 | 0 | High |
| 10 | 0 | High |
| 5 | 2 | Low |
| 6 | 0 | Low |
| 7 | 0 | Low |
| 8 | 0 | Low |
| 9 | 0 | Low |
| 10 | 0 | Low |
| 5 | 1 | Medium |
| 6 | 1 | Medium |
| 7 | 0 | Medium |
| 8 | 0 | Medium |
| 9 | 0 | Medium |
| 10 | 0 | Medium |
How is this calculated?
For Month May 2013 (05-2013), there are two Apps from table Apps
In table Hosts , these apps are associated with device_id's 1,1,1,4,3
For this month (05-2013) for device_id=1, the latest value of start_time is: 2013-05-12 06:22:45 (from tables hosts,apps), so in table Usage, look for combination of appid=2&hostid=2 for which there are two rows one with factor Low and other Medium,
For this month (05-2013) for device_id=4, by following same procedure we get one entry i.e 0 Low
Similarly all the values are calculated.
To get the last 6 months via query i'm trying to get it with the following:
SELECT MONTH(DATE_ADD(NOW(), INTERVAL aInt MONTH)) AS aMonth
FROM
(
SELECT 0 AS aInt UNION SELECT -1 UNION SELECT -2 UNION SELECT -3 UNION SELECT -4 UNION SELECT -5
)
Please check sqlfiddle: http://sqlfiddle.com/#!2/55fc2
Because the calculation you're doing involves the same join multiple times, I started by creating a view.
CREATE VIEW `app_host_usage`
AS
SELECT a.id "appid", h.id "hostid", u.id "usageid",
a.userid, a.start_time, h.device_id, u.factor
FROM apps a
LEFT OUTER JOIN hosts h ON h.appid = a.id
LEFT OUTER JOIN `usage` u ON u.appid = a.id AND u.hostid = h.id
WHERE a.start_time > DATE_ADD(NOW(), INTERVAL -7 MONTH)
The WHERE condition is there because I made the assumption that you don't want July 2005 and July 2006 to be grouped together in the same count.
With that view in place, the query becomes
SELECT months.Month, COUNT(DISTINCT device_id), factors.factor
FROM
(
-- Get the last six months
SELECT (MONTH(NOW()) + aInt + 11) % 12 + 1 "Month" FROM
(SELECT 0 AS aInt UNION SELECT -1 UNION SELECT -2 UNION SELECT -3 UNION SELECT -4 UNION SELECT -5) LastSix
) months
JOIN
(
-- Get all known factors
SELECT DISTINCT factor FROM `usage`
) factors
LEFT OUTER JOIN
(
-- Get factors for each device...
SELECT
MONTH(start_time) "Month",
device_id,
factor
FROM app_host_usage a
WHERE userid=13
AND start_time IN (
-- ...where the corresponding usage row is connected
-- to an app row with the highest start time of the
-- month for that device.
SELECT MAX(start_time)
FROM app_host_usage a2
WHERE a2.device_id = a.device_id
GROUP BY MONTH(start_time)
)
GROUP BY MONTH(start_time), device_id, factor
) usageids ON usageids.Month = months.Month
AND usageids.factor = factors.factor
GROUP BY factors.factor, months.Month
ORDER BY factors.factor, months.Month
which is insanely complicated, but I've tried to comment explaining what each part does. See this sqlfiddle: http://sqlfiddle.com/#!2/5c871/1/0

how to approach this in MySql query?

I want to select the data as per condition:I have a table with physician_key and corresponding quality score for a given month. I want to select count of distinct physicians with quality score 1,2.
For a month, there could be more entries for a physician_key and accordingly the quality assigned(on scale 1-7). I want to select only the count of those physicians which have quality (1,2) and if the same physician has quality >2 in given month, I don't want to count that physician.I want the information by product and month
I created an example table, since you didn't provide one:
mysql> select * from sales_mkt_rep_qual;
+-------------------+---------+-------+-------------------+
| GEO_PHYSICIAN_KEY | product | month | SALES_REP_QUALITY |
+-------------------+---------+-------+-------------------+
| 1 | a | 8 | 1 |
| 1 | a | 8 | 2 |
| 1 | a | 8 | 3 |
| 2 | b | 8 | 2 |
| 2 | b | 8 | 1 |
| 2 | b | 9 | 2 |
| 1 | a | 9 | 2 |
| 2 | b | 9 | 3 |
| 3 | a | 9 | 2 |
+-------------------+---------+-------+-------------------+
The query from your comment indeed gives an error:
SELECT COUNT(DISTINCT GEO_PHYSICIAN_KEY) AS encount_1to2,
product,MONTH
FROM sales_mkt_rep_qual
WHERE MAX(SALES_REP_QUALITY) = 2 ;
ERROR 1111 (HY000): Invalid use of group function
If you change that to:
SELECT DISTINCT geo_physician_key AS encount_1to2, product, month
FROM sales_mkt_rep_qual
WHERE (geo_physician_key,month,product)
NOT IN (
SELECT geo_physician_key, month, product
FROM sales_mkt_rep_qual
WHERE sales_rep_quality >2 );
you see the detailed result:
+--------------+---------+-------+
| encount_1to2 | product | month |
+--------------+---------+-------+
| 2 | b | 8 |
| 1 | a | 9 |
| 3 | a | 9 |
+--------------+---------+-------+
No, you can introduce the counting:
SELECT COUNT(distinct geo_physician_key ) AS no_of_physicians,product, month
FROM sales_mkt_rep_qual
WHERE (geo_physician_key,month,product)
NOT IN (
SELECT geo_physician_key, month, product
FROM sales_mkt_rep_qual WHERE sales_rep_quality >2 )
GROUP BY month, product;
+------------------+---------+-------+
| no_of_physicians | product | month |
+------------------+---------+-------+
| 1 | b | 8 |
| 2 | a | 9 |
+------------------+---------+-------+
If that still isn't what you are looking for, give more specific table structure and data example.
Try this:
SELECT count(DISTINCT physician_key)
FROM my_table
WHERE month = desired_month
AND max(quality) = 2
GROUP BY month
Actually I want the data to be like the output below:
+--------------+---------+-------+
| encount_1to2 | product | MONTH |
+--------------+---------+-------+
| 2 | b | 8 |
+--------------+---------+-------+
and for the criteria SALES_REP_QUALITY <= 2, isn't there a possibility that while selecting the distinct geo physician key, it might select out of first 2 considering it matches the criteria? Thats the reason I have used Thanix approach of max function with group by product and month, so that the aggregate function is applied on every product within a month

SQL Group by using the First N elements in each group [duplicate]

This question already has an answer here:
Top N per Group Sql problem in mysql
(1 answer)
Closed 9 years ago.
Suppose I have the next table:
+------------+---------+
| MovieId | rating |
+------------+---------+
| 1 | 4 |
| 1 | 3 |
| 1 | 2 |
| 1 | 4 |
| 1 | 5 |
| 2 | 3 |
| 2 | 4 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 5 |
| 4 | 4 |
| 4 | 2 |
+------------+---------+
I would like to get the average by group BUT using the first 2 elements on each group.
Example:
+------------+---------+
| MovieId | rating |
+------------+---------+
| 1 | 4 |
| 1 | 3 |
| 2 | 3 |
| 2 | 4 |
| 3 | 1 |
| 3 | 2 |
| 4 | 4 |
| 4 | 2 |
+------------+---------+
answer expected:
+------------+---------+
| MovieId | AVG |
+------------+---------+
| 1 | 3.5 |
| 2 | 3.5 |
| 3 | 1.5 |
| 4 | 3 |
+------------+---------+
This is the SQL query I have to get the AVG for all of the movies. But as I said, I would like to use just the first 2 elements for each group.
SELECT movieid, AVG(cast(rating as DECIMAL(10,2))) AS AVG
FROM ratings
group by movieid
If you can help me to make the SQL I appreciate. I will also use Linq just in case some of you know it.
In a SQL DBMS -- as in the relational model -- there is no "first". Do you mean any arbitrary 2 rows for each movie, or the two highest ratings, or something else?
If you can't define an order, then the query is meaningless.
If you can define an order, join the table to itself as I show in my canonical example to create a ranking, and select where RANK < 3.
FOR Mysql:-
select id, avg(rating)
from (SELECT a.*, #num := #num + 1 rownum,
(select count(*)
from movies m
where m.id<=a.id) last_count,
(select count(*)
from movies m1
where a.id=m1.id) grp_count
from movies a, (SELECT #num := 0) d) f
where grp_count-(last_count-rownum)<=2
group by id;
you can use rownum function in oracle. And row_number() function in sql server.
This is a solution in SQL
Create table #tempMovie (movieId int ,rating int)
INSERT INTO #tempMovie
Select * from table where movieidid=1 Limit 2
Union all
Select * from table where movieidid=2 Limit 2
Union all
Select * from table where movieidid=3 Limit 2
Union all
Select * from table where movieidid=4 Limit 2
Temporary table #tempmovie table will contain data like this
+------------+---------+
| MovieId | rating |
+------------+---------+
| 1 | 4 |
| 1 | 3 |
| 2 | 3 |
| 2 | 4 |
| 3 | 1 |
| 3 | 2 |
| 4 | 4 |
| 4 | 2 |
+------------+---------+
then apply group by
Select movieId, AVG(rating)
from #tempMovie
Group by movieId
Drop table #tempmovie