SQL select maximum number of duplicates value in a column - mysql

Here I have this table:
Copies
nInv | Subject | LoanDate | BookCode |MemberCode|
1 |Storia |15/04/2019 00:00:00 |7844455544| 1 |
2 |Geografia |12/09/2020 00:00:00 |8004554785| 4 |
4 |Francese |17/05/2006 00:00:00 |8004894886| 3 |
5 |Matematica |17/06/2014 00:00:00 |8004575185| 3 |
I'm trying to find the value of the highest number of duplicates in the MemberCode column. So in this case I should get 3 as result, as its value appears two times in the table. Also, MemberCode is PK in another table, so ideally I should select all rows of the second table that match the MemberCode in both tables. For the second part I guess I should write something like SELECT * FROM Table2, Copies WHERE Copies.MemberCode = Table2.MemberCode but I'm missing out almost everything on the first part. Can you guys help me?

Use group by and limit:
select membercode, count(*) as num
from t
group by membercode
order by count(*) desc
limit 1;

SELECT MAX(counted) FROM
(SELECT COUNT(MemberCode) AS counted
FROM table_name GROUP BY MemberCode)

Using analytic functions, we can assign a rank to each member code based on its count. Then, we can figure out what its count is.
WITH cte AS (
SELECT t2.MemberCode, COUNT(*) AS cnt,
RANK() OVER (ORDER BY COUNT(*) DESC, t2.MemberCode) rnk
FROM Table2 t2
INNER JOIN Copies c ON c.MemberCode = t2.MemberCode
GROUP BY t2.MemberCode
)
SELECT cnt
FROM cte
WHERE rnk = 1;

Something like this
with top_dupe_member_cte as (
select top(1) MemberCode, Count(*)
from MemberTable
group by MemberCode
order by 2 desc)
select /* columns from your other table */
from OtherTable ot
join top_dupe_member_cte dmc on ot.MemberCode=dmc.MemberCode;

Related

Which query produces the following output?

Q
1: Which query produces the following output from table marks
table name : marks rnk 1 2 3 4
Output rnk 1 3 6 10
select rnk from (select b.rnk as alpha,sum(a.rnk) as rnk from (select * from marks) a join (select * from marks) b on a.rnk <= b.rnk group by 1 )
select rnk from (select b.rnk as alpha,sum(a.rnk) as rnk from (select * from marks) a join (select * from marks) b on a.rnk > b.rnk group by 1 )
select rnk from (select b.rnk as alpha,sum(a.rnk) as rnk from (select * from marks) a join (select * from marks) b on a.rnk = b.rnk group by 1 )
select rnk from (select b.rnk as alpha,avg(a.rnk) as rnk from (select * from marks) a join (select * from marks) b on a.rnk <= b.rnk group by 1 )
This was a question asked in an interview. And I didn't even new the topic related to this.
I failed the test but I really want to know which topics should I cover so I can be more prepared for future. The answer is first I guess but I don't understand what's going on in this query.
Sorry for the bad title but I was unable to even express my thoughts
Thanks in advance and sorry for my bad english.
Which query produces the following output from table marks
Correct answer - none.
The subquery must be assigned with an alias. There is no outer subquery alias in each of these queries - i.e. all 4 queries will produce syntax error like 'Every derived table must have its own alias'.
If you fix these errors then there is another problem - the queries does not contain ORDER BY clause. So the output rows ordering is not defined (is not deterministic), and even when the query produces needed rows then the ordering of these rows may not match to shown one.
If you fix this problem then the query #1 will produce desired output.
The answer is #1.
(We had to add an alias for the sub-query: Z)
We use 2 aliases for the same table and join them so that each row in a is joined to all rows less than itself in b.
We then return a.rnk which is like an id and the sum of b.rnk which is therefore a running total.
Akina is right that there is no order by in the query so there is no guarantee that the order will be the same. (The question was not "how can we garantie this result" but, which query could produce this output")
As you had a problem with this question I suggest that you need to find an SQL tutorial and start from the basics. There are a number of good tutorials out there.
create table marks (rnk int);
insert into marks values (1),(2),(3),(4);
✓
✓
select rnk from
( select
b.rnk as alpha,
sum(a.rnk) as rnk
from
(select * from marks) a
join (select * from marks) b
on a.rnk <= b.rnk group by 1
)z;
| rnk |
| --: |
| 1 |
| 3 |
| 6 |
| 10 |
db<>fiddle here

SELECT value BEFORE max mysql

so, i have this table generated annually :
+----+------+
| id | name |
+----+------+
| 1 | 20162|
| 2 | 20162|
| 3 | 20171|
| 4 | 20171|<<<||| "how do i get this bfore max value"
| 5 | 20172|
| 6 | 20172|
+----+------+
If i query :
SELECT name FROM table WHERE where name=(SELECT max(name))
The result is 20172
How do i get the value before that (20171)?
This will get second max.
SELECT *
FROM table
WHERE name NOT IN (SELECT MAX(name) FROM table )
ORDER BY id DESC LIMIT 1
You can do the little trick:
SELECT name FROM table ORDER BY name DESC LIMIT 1,1
If you want a portable solution (since the above will only work in MySQL probably):
SELECT name FROM table t1 where
1 = (SELECT count(distinct name)
from table t2
where t2.name > t1.name )
This selects the name which has exactly one other name which is greater than it. It will have issues when e.g. all values are the same but that may actually be the intention sometimes.
Also you can try this:
SELECT name
FROM (SELECT DISTINCT name FROM yourtable) TMP
ORDER BY name DESC LIMIT 1,1
how about this:
select b.second_id,b.name from
(select id, max(name) as max from tbl_max) as a
LEFT JOIN
(select id as second_id, name from tbl_max ) as b
on a.id <> second_id where name < a.max order by name desc limit 1;
Another solution much better:
set #max= 0;
select max(name) into #max from tbl_max;
select * from tbl_max where name < #max ORDER BY id desc limit 1
it return a result of
4 20171
If you want Max(name)
SELECT id,MAX(name) FROM table
WHERE name < (SELECT MAX(name) FROM table )
or
SELECT id,MAX(name) FROM table
WHERE name <> (SELECT MAX(name) FROM table )
And there is an article for Find nth high value
(as Question changed later, this tricks is not useful for this problem)
can you use this simple order by ?
SELECT * FROM tablename
group by name
order by name desc
limit 1,1
if you have multiple equal values in MAX(name) you need group by, even this query is not general solution!
You can try below code.It will give you nth highest record.
SELECT NAME FROM table ORDER BY name DESC LIMIT n,1
Hope this will helps
SELECT DISTINCT name FROM table ORDER BY name DESC LIMIT 1,1;
The LIMIT clause accepts two values, an offset and a count. When only 1 value is provided, it assumes the offset is 0
DISTINCT is necessary as you have duplicate values for name. If this was unintentional, it's not needed.
Example at http://sqlfiddle.com/#!9/ea6e2/2

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?

MySQL sorting by date with GROUP BY

My table titles looks like this
id |group|date |title
---+-----+--------------------+--------
1 |1 |2012-07-26 18:59:30 | Title 1
2 |1 |2012-07-26 19:01:20 | Title 2
3 |2 |2012-07-26 19:18:15 | Title 3
4 |2 |2012-07-26 20:09:28 | Title 4
5 |2 |2012-07-26 23:59:52 | Title 5
I need latest result from each group ordered by date in descending order. Something like this
id |group|date |title
---+-----+--------------------+--------
5 |2 |2012-07-26 23:59:52 | Title 5
2 |1 |2012-07-26 19:01:20 | Title 2
I tried
SELECT *
FROM `titles`
GROUP BY `group`
ORDER BY MAX( `date` ) DESC
but I'm geting first results from groups. Like this
id |group|date |title
---+-----+--------------------+--------
3 |2 |2012-07-26 18:59:30 | Title 3
1 |1 |2012-07-26 19:18:15 | Title 1
What am I doing wrong?
Is this query going to be more complicated if I use LEFT JOIN?
This page was very helpful to me; it taught me how to use self-joins to get the max/min/something-n rows per group.
In your situation, it can be applied to the effect you want like so:
SELECT * FROM
(SELECT group, MAX(date) AS date FROM titles GROUP BY group)
AS x JOIN titles USING (group, date);
I found this topic via Google, looked like I had the same issue.
Here's my own solution if, like me, you don't like subqueries :
-- Create a temporary table like the output
CREATE TEMPORARY TABLE titles_tmp LIKE titles;
-- Add a unique key on where you want to GROUP BY
ALTER TABLE titles_tmp ADD UNIQUE KEY `group` (`group`);
-- Read the result into the tmp_table. Duplicates won't be inserted.
INSERT IGNORE INTO titles_tmp
SELECT *
FROM `titles`
ORDER BY `date` DESC;
-- Read the temporary table as output
SELECT *
FROM titles_tmp
ORDER BY `group`;
It has a way better performance. Here's how to increase speed if the date_column has the same order as the auto_increment_one (you then don't need an ORDER BY statement) :
-- Create a temporary table like the output
CREATE TEMPORARY TABLE titles_tmp LIKE titles;
-- Add a unique key on where you want to GROUP BY
ALTER TABLE titles_tmp ADD UNIQUE KEY `group` (`group`);
-- Read the result into the tmp_table, in the natural order. Duplicates will update the temporary table with the freshest information.
INSERT INTO titles_tmp
SELECT *
FROM `titles`
ON DUPLICATE KEY
UPDATE `id` = VALUES(`id`),
`date` = VALUES(`date`),
`title` = VALUES(`title`);
-- Read the temporary table as output
SELECT *
FROM titles_tmp
ORDER BY `group`;
Result :
+----+-------+---------------------+---------+
| id | group | date | title |
+----+-------+---------------------+---------+
| 2 | 1 | 2012-07-26 19:01:20 | Title 2 |
| 5 | 2 | 2012-07-26 23:59:52 | Title 5 |
+----+-------+---------------------+---------+
On large tables this method makes a significant point in terms of performance.
Well, if dates are unique in a group this would work (if not, you'll see several rows that match the max date in a group). (Also, bad naming of columns, 'group', 'date' might give you syntax errors and such specially 'group')
select t1.* from titles t1, (select group, max(date) date from titles group by group) t2
where t2.date = t1.date
and t1.group = t2.group
order by date desc
Another approach is to make use of MySQL user variables to identify a "control break" in the group values.
If you can live with an extra column being returned, something like this will work:
SELECT IF(s.group = #prev_group,0,1) AS latest_in_group
, s.id
, #prev_group := s.group AS `group`
, s.date
, s.title
FROM (SELECT t.id,t.group,t.date,t.title
FROM titles t
ORDER BY t.group DESC, t.date DESC, t.id DESC
) s
JOIN (SELECT #prev_group := NULL) p
HAVING latest_in_group = 1
ORDER BY s.group DESC
What this is doing is ordering all the rows by group and by date in descending order. (We specify DESC on all the columns in the ORDER BY, in case there is an index on (group,date,id) that MySQL can do a "reverse scan" on. The inclusion of the id column gets us deterministic (repeatable) behavior, in the case when there are more than one row with the latest date value.) That's the inline view aliased as s.
The "trick" we use is to compare the group value to the group value from the previous row. Whenever we have a different value, we know that we are starting a "new" group, and that this row is the "latest" row (we have the IF function return a 1). Otherwise (when the group values match), it's not the latest row (and we have the IF function returns a 0).
Then, we filter out all the rows that don't have that latest_in_group set as a 1.
It's possible to remove that extra column by wrapping that query (as an inline view) in another query:
SELECT r.id
, r.group
, r.date
, r.title
FROM ( SELECT IF(s.group = #prev_group,0,1) AS latest_in_group
, s.id
, #prev_group := s.group AS `group`
, s.date
, s.title
FROM (SELECT t.id,t.group,t.date,t.title
FROM titles t
ORDER BY t.group DESC, t.date DESC, t.id DESC
) s
JOIN (SELECT #prev_group := NULL) p
HAVING latest_in_group = 1
) r
ORDER BY r.group DESC
If your id field is an auto-incrementing field, and it's safe to say that the highest value of the id field is also the highest value for the date of any group, then this is a simple solution:
SELECT b.*
FROM (SELECT MAX(id) AS maxid FROM titles GROUP BY group) a
JOIN titles b ON a.maxid = b.id
ORDER BY b.date DESC
Use the below mysql query to get latest updated/inserted record from table.
SELECT * FROM
(
select * from `titles` order by `date` desc
) as tmp_table
group by `group`
order by `date` desc
Use the following query to get the most recent record from each group
SELECT
T1.* FROM
(SELECT
MAX(ID) AS maxID
FROM
T2
GROUP BY Type) AS aux
INNER JOIN
T2 AS T2 ON T1.ID = aux.maxID ;
Where ID is your auto increment field and Type is the type of records, you wanted to group by.
MySQL uses an dumb extension of GROUP BY which is not reliable if you want to get such results therefore, you could use
select id, group, date, title from titles as t where id =
(select id from titles where group = a.group order by date desc limit 1);
In this query, each time the table is scanned full for each group so it can find the most recent date. I could not find any better alternate for this. Hope this will help someone.

mysql - query to fetch most popular items which haven't been seen by user

Getting the most popular items is relatively easy. But let us say I have a table with two columns: item_id and viewer_id.
Given a viewer_id, I want to fetch the top X item_id rows which have been viewed the MOST times AND have not been viewed by the given viewer_id. So for example:
item_id | viewer_id
A | 1
A | 3
C | 2
C | 3
C | 4
D | 5
Getting most popular items not seen by viewer 2 should give back A, D.
What is a good way to go about this?
As I understand it you don't just want them listed but ordered from most to least popular; this should work.
SELECT item_id
FROM table_name a
WHERE NOT EXISTS
(
SELECT viewer_id FROM table_name t
WHERE a.item_id=t.item_id AND t.viewer_id=2
)
GROUP BY item_id
ORDER BY COUNT(item_id) DESC;
Try this:
select item_id, count(*) as timesViewed from t2
where item_id not in (
select distinct t1.item_id from t1
where viewer_id = 2
)
group by item_id
order by timesViewed desc
Working example
Something like below should work:
SELECT t.item_id, COUNT(t.viewer_id) AS view_count FROM table t
WHERE t.item_id NOT IN (SELECT DISTINCT item_id FROM table t2 WHERE viewer_id = your_viewer_id)
GROUP BY item_id
ORDER BY view_count DESC