Finding the most common entry in a column - SQL - mysql

I have a table called MyTable like so
A B
101 Dog
209 Cat
209 Cat
209 Dog
193 Cow
193 Dog
101 Dog
193 Dog
193 Cow
And I want to pull out the most common B for each A so it would end up being like this (note that there can be ties)
A B
101 Dog
209 Cat
193 Dog
193 Cow
How could I write sql to do this?

Alternatively, you can use HAVING clause instead of JOIN.
SELECT A, B
FROM table1 o
GROUP BY A, B
HAVING COUNT(*) =
(
SELECT MAX(totalCOunt)
FROM
(
SELECT A, B, COUNT(*) totalCount
FROM table1
GROUP BY A,B
) x
WHERE o.A = x.A
GROUP BY x.A
)
SQLFiddle Demo

You could use a filtering join to list the (A,B) combination with the highest rowcount:
select src.*
from (
select A
, B
, count(*) cnt
from YourTable
group by
A
, B
) src
join (
select A
, max(cnt) as maxcnt
from (
select A
, B
, count(*) cnt
from YourTable
group by
A
, B
) comb
group by
A
) maxab
on maxab.A = src.A
and maxab.maxcnt = src.cnt
Example at SQL Fiddle.
If your database supports windowing functions, you can use dense_rank(), like:
select *
from (
select dense_rank() over (
partition by A
order by cnt desc) as rn
, *
from (
select A
, B
, count(*) cnt
from YourTable
group by
A
, B
) t1
) t2
where rn = 1
Window function example at SQL Fiddle. Windowing functions are available on recent versions of SQL Server, Oracle and PostgeSQL.

select g3.A,g3.B
from
(
select A,Max(C) MC
from
(
select A,B,count(*) C
from (<your entire select query>) tbl
group by A,B
) g1
group by A
) g2
join
(
select A,B,count(*) C
from (<your entire select query>) tbl
group by A,B
) g3 on g2.A=G3.A and g3.C=g2.MaxC

SQL FIDDLE Example
select
A, B
from
(
select
A, B, row_number() over (partition by A order by cnt desc) as RowNum
from
(
select
T.A, T.B, count(*) over (partition by T.A, T.B) as cnt
from T
) as A
) as B
where RowNum = 1

Related

Select rows with non matching column

I am trying to retrieve rows with same Volume value or with only 1 Volume, but could not come up with a SQL logic.
Data:
ID
Volume
A
100
A
100
B
101
B
102
B
103
B
104
C
400
Required Output:
ID
Volume
A
100
A
100
C
400
This one is achievable using a subquery.
select * from test where col1 in (
select t.col1
from(
select col1, col2,
dense_rank() over (partition by col1 order by col2) as dr
from test) t
group by t.col1
having sum(case when t.dr = 1 then 0 else t.dr end) = 0)
Try this dbfiddle.
This can be done on a more easy way:
select t1.id,
t1.volume
from tbl t1
inner join (select id
from tbl
group by id
having count(distinct volume) = 1
) as t2 on t1.id=t2.id;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=92bc234e631a1106b0e322bc4954d696
having count(distinct volume) = 1 will return only the id that have the same volume , including the id with just one volume.
I'd naturally be inclined towards Ergest Basha's pattern.
It can also be expressed using NOT EXISTS()
SELECT
t.*
FROM
tbl AS t
WHERE
NOT EXISTS (
SELECT *
FROM tbl
WHERE id = t.id
AND volume <> t.volume
)
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=243c42008f527391514d1ad124730587

Need to aggregate results of the query for the non-exact matching name

I have a query that calculates the number of mentions of each partner in the database.
The query looks like this:
SELECT partner, COUNT(*) AS Total
FROM database.data
GROUP BY 1;
The output then looks like this:
partner | total
---------+------
X | 5
Y | 3
Z | 2
X aa | 6
aa X | 7
Y aa | 1
What I need to do is partners like X, X aa, aa X to be counted together in one row (same for Y and Y aa. I tried playing with adding HAVING function in the end, but wasn't able to make it work and not sure it's actually the right one to use.
Would appreciate any help! Thanks!
Without optimization:
WITH RECURSIVE
-- rank by length
cte1 AS ( SELECT partner, total, DENSE_RANK() OVER (ORDER BY LENGTH(partner)) rnk
FROM data ),
-- find pairs where partner is a substring of anoter partner
cte2 AS ( SELECT partner, total, rnk, partner short
FROM cte1
WHERE rnk = 1
UNION ALL
SELECT cte1.partner,
cte1.total,
cte1.rnk,
CASE WHEN LOCATE(cte2.partner, cte1.partner)
THEN cte2.partner
ELSE cte1.partner
END
FROM cte1, cte2
WHERE cte1.rnk = cte2.rnk + 1 ),
-- select shortest
cte3 AS ( SELECT partner,
total,
rnk, short,
ROW_NUMBER() OVER (PARTITION BY partner ORDER BY LENGTH(short)) rn
FROM cte2 )
-- get needed data
SELECT short partner, SUM(total) total
FROM cte3
WHERE rn = 1
GROUP BY short
ORDER BY partner
fiddle
I think this should do it:
SELECT d1.partner, COUNT(*) AS Total
FROM (SELECT * FROM database.data d1
WHERE CHAR_LENGTH(d1.partner) = 1
GROUP BY d1.partner
) d1
-- i
LEFT JOIN database.data d2 ON
-- get x in aa x and x aa
d2.partner LIKE CONCAT('%', d1.partner, '%')
-- remoove x aa and aa x from the count table
GROUP BY d1.partner;
With mock data:
CREATE TABLE IF NOT EXISTS tmp_mock_data
SELECT * FROM (
SELECT 'X' partner
UNION ALL
SELECT 'Y'
UNION ALL
SELECT 'Z'
UNION ALL
SELECT 'Y aa'
UNION ALL
SELECT 'Z aa'
UNION ALL
SELECT 'X aa'
UNION ALL
SELECT 'aa Z'
UNION ALL
SELECT 'aa Y'
UNION ALL
SELECT 'aa X'
) A
CROSS JOIN (SELECT NULL UNION ALL SELECT NULL) B;
SELECT d1.partner, COUNT(*) AS Total
FROM (SELECT * FROM tmp_mock_data d1
WHERE CHAR_LENGTH(d1.partner) = 1
GROUP BY d1.partner
) d1
-- i
LEFT JOIN tmp_mock_data d2 ON
-- get x in aa x and x aa
d2.partner LIKE CONCAT('%', d1.partner, '%')
-- remoove x aa and aa x from the count table
GROUP BY d1.partner;
DROP TABLE tmp_mock_data;
SELECT partner,COUNT(*) AS Total
FROM(
SELECT (CASE WHEN ASCII(LEFT(partner,1)) BETWEEN 65 AND 90 THEN LEFT(partner,1) ELSE RIGHT(partner,1) END) AS partner
FROM database.data)
GROUP BY 1;

Join of two tables where result displays on min() field of table 2

Working with sql procedure and encountering a problem to which not finding solution.
There is table A with fields a,b,c.
And a table B with fields a, w, x, y, z.
Here, I want the result of join of both tables where A.a = B.a and select fields like c, x, y, z in output where w is minimum. w is an integer.
Following code is helping but when i want to display more fields from table 2, getting an error saying Group by clause not included.
SELECT OutO.routingSequence,
tbl2.a AS parentOrderNumber,
tbl2.c AS operationNumber,
tbl2.d as headerStatus,
tbl2.e as orderNumber
FROM Operation OutO
JOIN (
SELECT a, MIN(c) c
FROM (
SELECT h.parentordernumber a, o.operationNumber c
FROM header h , operation o
WHERE o.ordernumber=h.Parentordernumber
AND (
SELECT DATEDIFF(day,o.scheduledStartDate, GETDATE()) AS DiffDate
) < 3
AND (
SELECT DATEDIFF(day,o.scheduledStartDate, GETDATE()) AS DiffDate
) > -5
) tbl
GROUP BY a
) tbl2
ON OutO.ordernumber = tbl2.a
WHERE OutO.operationnumber = tbl2.c
Please help on this!!!
Try it with distinct instead of group by
Select DISTINCT OutO.routingSequence, tbl2.a as parentOrderNumber, tbl2.c as operationNumber from Operation OutO
Updated SQL you have to select fields from inner select if you want to get it as output
SELECT OutO.routingSequence,
tbl2.a AS parentOrderNumber,
tbl2.c AS operationNumber,
tbl2.d as headerStatus,
tbl2.e as orderNumber
FROM Operation OutO
JOIN (
SELECT a, MIN(c) c,MIN(d) AS d,MIN(e) as e
FROM (
SELECT h.parentordernumber as a, o.operationNumber as c ,h.headerStatus as d, h.orderNumber as e
FROM header h inner join operation o
on o.ordernumber=h.Parentordernumber
AND (
SELECT DATEDIFF(day,o.scheduledStartDate, GETDATE()) AS DiffDate
) < 3
AND (
SELECT DATEDIFF(day,o.scheduledStartDate, GETDATE()) AS DiffDate
) > -5
) tbl
GROUP BY a
) tbl2
ON OutO.ordernumber = tbl2.a
WHERE OutO.operationnumber = tbl2.c
sorry but i don't understand where is d and e and w columns is your sql
you can use row_number and partition by, to get ordered by min value first row like this
SELECT OutO.routingSequence,
tbl2.a AS parentOrderNumber,
tbl2.c AS operationNumber,
tbl2.d as headerStatus,
tbl2.e as orderNumber
FROM Operation OutO
JOIN (
select a,c,e,d from (
SELECT h.parentordernumber a, o.operationNumber c,
something1 as d,
something2 as e,
-- where this = 1 there is your minimum row
row_number() over (partition by h.parentordernumber
order by w ) Rn
FROM header h , operation o
WHERE o.ordernumber=h.Parentordernumber
AND (
SELECT DATEDIFF(day,o.scheduledStartDate, GETDATE()) AS DiffDate
) < 3
AND (
SELECT DATEDIFF(day,o.scheduledStartDate, GETDATE()) AS DiffDate
) > -5
) where Rn = 1 -- here we need only first row in every windows
) tbl2
ON OutO.ordernumber = tbl2.a
WHERE OutO.operationnumber = tbl2.c

Group By value RAND()

It's possible get a random value of the group by?
----------------
nID | val
---------------
A | XXX
A | YYY
B | L
B | M
B | N
B | P
----------------
With this SQL:
SELECT nID, VAL FROM T1 GROUP BY nID
My result always is:
nID val
--------
A XXX
B L
But i want a diferent result of evey nID. Like:
nID val
--------
A YYY
B N
or
nID val
--------
A XXX
B P
It's possible?
http://sqlfiddle.com/#!2/357b8/3
Use a sub-query.
SELECT r.nID,
(SELECT r1.val FROM T1 r1 WHERE r.nID=r1.nID ORDER BY rand() LIMIT 1) AS 'val' FROM T1 r
GROUP BY r.nID
http://sqlfiddle.com/#!2/357b8/18
You can use order by rand()
then group by them.
Like
SELECT nID, VAL FROM (
SELECT nID, VAL
FROM T1
ORDER BY RAND()
)AS subquery
GROUP BY nID
SELECT
t1.nID,
(SELECT
t2.var
FROM your_table t2
WHERE t1.nID = t2.nID ORDER BY rand() LIMIT 1
) AS var
FROM your_table t1
GROUP BY t1.nID ;
Try This
SELECT nID, VAL
FROM (select nID, VAL from T1 order by rand()) as T
group by nID
The following solution is similar in spirit to those from xdazz or jonnyynnoj. But instead of SELECT FROM T1 GROUP BY nID I use a subquery to select all distinct IDs. I believe there is a chance that the performance might differ, so give this one a try as well.
SELECT nID,
(SELECT VAL
FROM T1
WHERE T1.nID = ids.nID
ORDER BY RAND()
LIMIT 1
) AS VAL
FROM (SELECT DISTINCT nID FROM T1) AS ids
rand + rownum
SELECT t.*
, #rownum := #rownum+1 AS rowNum
FROM(
SELECT nID, VAL
FROM T1
ORDER BY RAND()
) AS t, (SELECT #rownum :=0) AS R
GROUP BY nID
ORDER BY nID, rowNum

check 2 primary key in one table

I have table psc_Pro_ProfessorPositions(ProfessorID,PositionID,StartDate,EndDate). It have 2 primary key is ProfessorID,PositionID.
I want to check ProfessorID,PositionID not in table to insert.I wrote like this:
insert into CoreUIs.dbo.psc_Pro_ProfessorPositions
(
ProfessorID,PositionID,StartDate,EndDate
)
select a.MaQuanLy,b.MaQuanLy,convert(smalldatetime,NgayHieuLuc),convert(smalldatetime,NgayHetHieuLuc)
from inserted
inner join GiangVien a on a.MaGiangVien = inserted.MaGiangVien
inner join ChucVu b on b.MaChucVu = inserted.MaChucVu
where a.MaQuanLy not in (select ProfessorID from CoreUIs.dbo.psc_Pro_ProfessorPositions)
and b.MaQuanLy not in (select PositionID from CoreUIs.dbo.psc_Pro_ProfessorPositions)
But it's wrong.Can help me?Thanks all.
;WITH x AS
(
SELECT TeacherID, ClassID, ClassStuID, s = [SUM],
rn = ROW_NUMBER() OVER (PARTITION BY TeacherID ORDER BY ClassID)
FROM dbo.TB1
)
SELECT TeacherID, ClassID, ClassStuID,
[SUM] = CASE rn WHEN 1 THEN s ELSE NULL END
FROM x
ORDER BY TeacherID, [SUM] DESC;
You can employ CTEs with ROW_NUMBER()OVER() to identify the first row of each TeacherID:
; with a as (
select * from TB1
union
select * from TB2
)
, b as (
select *, r=ROW_NUMBER()over(partition by a.TeacherID order by a.TeacherID,
a.ClassID, a.ClassStuID) from a
)
select b.TeacherID, b.ClassID, b.ClassStuID
, [SUM]=case b.r when 1 then b.[SUM] else null end
from b
order by b.TeacherID, b.r
go
Result: