Ok, I think the answer of this is somewhere but I can't find it...
(and even my title is bad)
To be short, I want to get the fewest number of group I can make from a part of an association table
1st, Keep in mind this is already a result of a 5 table (+1k line) join with filter and grouping, that I'll have to run many time on a prod server as powerful as a banana...
2nd, This is a fake case that picture you my problem
After some Querying, I've got this data result :
+--------------------+
|id_course|id_teacher|
+--------------------+
| 6 | 1 |
| 6 | 4 |
| 6 | 14 |
| 33 | 1 |
| 33 | 4 |
| 34 | 1 |
| 34 | 4 |
| 34 | 10 |
+--------------------+
As you can see, I've got 3 courses, witch are teach by up to 3 teacher. I need to attend at one of every course, but I want as few different teacher as possible (I'm shy...).
My first query
Should answer : what is the smallest number of teacher I need to cover every unique course ?
With this data, it's a 1, cause Teacher 1 or Teacher 4 make courses for these 3 one.
Second query
Now that I've already get these courses, I want to go to two other courses, the 32 and the 50, with this schedule :
+--------------------+
|id_course|id_teacher|
+--------------------+
| 32 | 1 |
| 32 | 12 |
| 50 | 12 |
+--------------------+
My question is : For id_course N, will I have to get one more teacher ?
I want to check course by course, so "check for course 32", no need to check many at the same time
The best way I think is to count an inner join with a list of teacher of same fewest rank from the first query, so with our data we got only two : Teacher(1, 4).
For the Course 32, Teacher2 don't do this one, but as the Teacher1 do Courses(6, 33, 34, 32) I don't have to get another teacher.
For the Course 50, the only teacher to do it is the Teacher12, so I'll not find a match in my choice of teacher, and I'll have to get one more (so two in total with these data)
Here is a base [SQLFiddle
Best regards, Blag
You want to get a distinct count of ID_Teachers with the least count then... get a distinct count and limit the results to 1 record.
So perhaps something like...
SELECT count(Distinct ID_Teacher), Group_concat(ID_Teacher) as TeachersIDs
FROM Table
WHERE ID_Course in ('Your List')
ORDER BY count(Distinct ID_Teacher) ASC Limit 1
However this will randomly select if a tie exists... so do you want to provide the option to select which group of teachers and classes should ties exist? Meaning there are multiple paths to fulfill all classes involving the same number of teachers... For example teachers A, B and A, C fulfill all required classes.... should both records return in the result or is 1 sufficient?
So I've finally found a way to do what I want !
For the first query, as my underlying real need was "is there a single teacher to do everything", I've lower a bit my expectation and go for this one (58 lines on my true case u_u") :
SELECT
(
SELECT count(s.id_teacher) nb
FROM t AS m
INNER JOIN t AS s
ON m.id_teacher = s.id_teacher
GROUP BY m.id_course, m.id_teacher
ORDER BY nb DESC
LIMIT 1
) AS nbMaxBySingleTeacher,
(
SELECT COUNT(DISTINCT id_course) nb
FROM t
) AS nbTotalCourseToDo
[SQLFiddle
And I get back two value that answer my question "is one teacher enough ?"
+--------------------------------------+
|nbMaxBySingleTeacher|nbTotalCourseToDo|
+--------------------------------------+
| 4 | 5 |
+--------------------------------------+
The 2nd query use the schedule of new course, and take the id of one I want to check. It should tell me if I need to get one more teacher, or if it's ok with my actual(s) one.
SELECT COUNT(*) nb
FROM (
SELECT
z.id_teacher
FROM z
WHERE
z.id_course = 50
) t1
WHERE
FIND_IN_SET(t1.id_teacher, (
SELECT GROUP_CONCAT(t2.id_teacher) lst
FROM (
SELECT DISTINCT COUNT(s.id_teacher) nb, m.id_teacher
FROM t AS m
INNER JOIN t AS s
ON m.id_teacher = s.id_teacher
GROUP BY m.id_course, m.id_teacher
ORDER BY nb DESC
) t2
GROUP BY t2.nb
ORDER BY nb DESC
LIMIT 1
));
[SQLFiddle
This tell me the number of teacher that are able to teach the courses I already have AND the new one I want. So if it's over zero, then I don't need a new teacher :
+--+
|nb|
+--+
|1 |
+--+
Related
Please note that I'm an absolute n00b in MySQL but somehow I managed to build some (for me) complex queries that work as they should. My main problem now is that for a many of the queries we're working on:
The querie is becoming too big and very hard to see through.
The same subqueries get repeated many times and that is adding to the complexity (and probably to the time needed to process the query).
We want to further expand this query but we are reaching a point where we can no longer oversee what we are doing. I've added one of these subqueries at the end of this post, just as an example.
!! You can fast foward to the Problem section if you want to skip the details below. I think the question can be answered also without the additional info.
What we want to do
Create a MySQL query that calculates purchase orders and forecasts for a given supplier based on:
Sales history in a given period (past [x] months = interval)
Current stock
Items already in backorder (from supplier)
Reserved items (for customers)
Supplier ID
I've added an example of a subquery at the bottom of this message. We're showing just this part to keep things simple for now. The output of the subquery is:
Part number
Units sold
Units sold (outliers removed)
Units sold per month (outliers removed)
Number of invoices with the part number in the period (interval)
It works quite OK for us, although I'm sure it can be optimised. It removes outliers from the sales history (e.g. one customer that orders 50 pcs of one product in one order). Unfortunately it can only remove outliers with substantial data, so if the first order happens to be 50 pcs then it is not considered an outlier. For that reason we take the amount of invoices into account in the main query. The amount of invoices has to exceed a certain number otherwise the system wil revert to a fixed value of "maximum stock" for that product.
As mentioned this is only a small part of the complete query and we want to expand it even further (so that it takes into account the "sales history" of parts that where used in assembled products).
For example if we were to build and sell cars, and we want to place an
order with our tyre supplier, the query calculates the amount of tyres we need to order based on the sales history of the various car models (while also taking into account the stock of the cars, reserved cars and stock of the tyres).
Problem
The query is becomming massive and incomprehensible. We are repeating the same subqueries many times which to us seems highly inefficient and it is the main cause why the query is becomming so bulky.
What we have tried
(Please note that we are on MySQL 5.5.33. We will update our server soon but for now we are limited to this version.)
Create a VIEW from the subqueries.
The main issue here is that we can't execute the view with parameters like supplier_id and interval period. Our subquery calculates the sum of the sold items for a given supplier within the given period. So even if we would build the VIEW so that it calculates this for ALL products from ALL suppliers we would still have the issue that we can't define the interval period after the VIEW has been executed.
A stored procedure.
Correct me if I'm wrong but as far as I know, MySQL only allows us to perform a Call on a stored procedure so we still can't run it against the parameters (period, supplier id...)
Even this workaround won't help us because we still can't run the SP against the parameters.
Using WITH at the beginning of the query
A common table expression in MySQL is a temporary result whose scope is confined to a single statement. You can refer this expression multiple times with in the statement.
The WITH clause in MySQL is used to specify a Common Table Expression, a with clause can have one or more comms-separated subclauses.
Not sure if this would be the solution because we can't test it. WITH is not supported untill MySQL version 8.0.
What now?
My last resort would be to put the mentioned subqueries in a temp table before starting the main query. This might not completely eliminate our problems but at least the main query will be more comprehensible and with less repetition of fetching the same data. Would this be our best option or have I overlooked a more efficient way to tackle this?
Many thanks for your kind replies.
SELECT
GREATEST((verkocht_sd/6*((100 + 0)/100)),0) as 'units sold p/month ',
GREATEST(ROUND((((verkocht_sd/6)*3)-voorraad+reserved-backorder),0),0) as 'Order based on units sold',
SUM(b.aantal) as 'Units sold in period',
t4.verkocht_sd as 'Units sold in period, outliers removed',
COUNT(*) as 'Number of invoices in period',
b.art_code as 'Part number'
FROM bongegs b -- Table that has all the sales records for all products
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code) -- Right Join stock data to also include items that are not in table bongegs (no sales history).
LEFT JOIN artcred ON (artcred.art_code = b.art_code) -- add supplier ID to the part numbers.
LEFT JOIN
(
SELECT
SUM(b.aantal) as verkocht_sd,
b.art_code
FROM bongegs b
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code)
LEFT JOIN artcred ON (artcred.art_code = b.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f" -- Selects only invoices
and artcred.vln = 1 -- 1 = Prefered supplier
and artcred.cred_nr = 9117 -- Supplier ID
and b.aantal < (select * from (SELECT AVG(b.aantal)+3*STDDEV(aantal)
FROM bongegs b
WHERE
b.bon_soort = 'f' and
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)) x)
GROUP BY b.art_code
) AS t4
ON (b.art_code = t4.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f"
and artcred.vln = 1
and artcred.cred_nr = 9117
GROUP BY b.art_code
Bongegs | all rows from sales forms (invoices F, offers O, delivery notes V)
| art_code | bon_datum | bon_soort | aantal |
|:---------|:---------: |:---------:|:------:|
| item_1 | 2021-08-21 | f | 6 |
| item_2 | 2021-08-29 | v | 3 |
| item_6 | 2021-09-03 | o | 2 |
| item_4 | 2021-10-21 | f | 6 |
| item_1 | 2021-11-21 | o | 6 |
| item_3 | 2022-01-17 | v | 6 |
| item_1 | 2022-01-21 | o | 6 |
| item_4 | 2022-01-26 | f | 6 |
Artcred | supplier ID's
| art_code | vln | cred_nr |
|:---------|:----:|:-------:|
| item_1 | 1 | 1001 |
| item_2 | 1 | 1002 |
| item_3 | 1 | 1001 |
| item_4 | 1 | 1007 |
| item_5 | 1 | 1004 |
| item_5 | 2 | 1008 |
| item_6 | 1 | 1016 |
| item_7 | 1 | 1567 |
totvrd | stock
| art_code | voorraad | reserved | backorder |
|:---------|:---------: |:--------:|:---------:|
| item_1 | 1 | 0 | 5 |
| item_2 | 0 | 0 | 0 |
| item_3 | 88 | 0 | 0 |
| item_4 | 9 | 0 | 0 |
| item_5 | 67 | 2 | 20 |
| item_6 | 112 | 9 | 0 |
| item_7 | 65 | 0 | 0 |
| item_8 | 7 | 1 | 0 |
Now, on to the query. You have LEFT JOINs to the artcred table, but then include artcred in the WHERE clause making it an INNER JOIN (required both left and right tables) in the result. Was this intended, or are you expecting more records in the bongegs table that do NOT exist in the artcred.
Well to be honest I was not fully aware that this would essentially form an INNER JOIN but in this case it doesn't really matter. A record that exists in bongegs always exists in artcred as well (every sold product must have a supplier). That doesn't work both ways since a product can be in artcred without ever being sold.
You also have RIGHT JOIN on totvrd which implies you want every record in the TotVRD table regardless of a record in the bongegs table. Is this correct?
Yes it is intended. Otherwise only products with actual sales in the period would end up in the result and we also wanted to include products with zero sales.
One simplification:
and b.aantal < ( SELECT * from ( SELECT AVG ...
-->
and b.aantal < ( SELECT AVG ...
A personal problem: my brain hurts when I see RIGHT JOIN; please rewrite as LEFT JOIN.
Check you RIGHTs and LEFTs -- that keeps the other table's rows even if there is no match; are you expecting such NULLs? That is, it looks like they can all be plain JOINs (aka INNER JOINs).
These might help performance:
b: INDEX(bon_soort, bon_datum, aantal, art_code)
totvrd: INDEX(art_code)
artcred: INDEX(vln, cred_nr, art_code)
Is b the what you keep needing? Build a temp table:
CREATE TEMPORARY TABLE tmp_b
SELECT ...
FROM b
WHERE ...;
But if you need to use tmp_b multiple times in the same query, (and since you are not yet on MySQL 8.0), you may need to make it a non-TEMPORARY table for long enough to run the query. (If you have multiple connections building the same permanent table, there will be trouble.)
Yes, 5.5.33 is rather antique; upgrade soon.
(pre
By getting what I believe are all the pieces you had, I think this query significantly simplifies the query. Lets first start with the fact that you were trying to eliminate the outliers by selecting the standard deviation stuff as what to be excluded. Then you had the original summation of all sales also from the bongegs table.
To simplify this, I have the sub-query ONCE internal that does the summation, counts, avg, stddev of all orders (f) within the last 6 months. I also computed the divide by 6 for per-month you wanted in the top.
Since the bongegs is now all pre-aggregated ONCE, and grouped per art_code, it does not need to be done one after the other. You can use the totals directly at the top (at least I THINK is similar output without all actual data and understanding of your context).
So the primary table is the product table (Voorraad) and LEFT-JOINED to the pre-query of bongegs. This allows you to get all products regardless of those that have been sold.
Since the one aggregation prequery has the avg and stddev in it, you can simply apply an additional AND clause when joining based on the total sold being less than the avg/stddev context.
The resulting query below.
SELECT
-- appears you are looking for the highest percentage?
-- typically NOT a good idea to name columns starting with numbers,
-- but ok. Typically let interface/output name the columns to end-users
GREATEST((b.verkocht_sdperMonth * ((100 + 0)/100)),0) as 'units sold p/month',
-- appears to be the total sold divided by 6 to get monthly average over 6 months query of data
GREATEST( ROUND(
( (b.verkocht_sdperMonth * 3) - v.voorraad + v.reserved - v.backorder), 0), 0)
as 'Order based on units sold',
b.verkocht_sd as 'Units sold in period',
b.AvgStdDev as 'AvgStdDeviation',
b.NumInvoices as 'Number of invoices in period',
v.art_code as 'Part number'
FROM
-- stock, master inventory, regardless of supplier
-- get all products, even though not all may be sold
Voorraad v
-- LEFT join to pre-query of Bongegs pre-grouped by the art_code which appears
-- to be basis of all other joins, std deviation and average while at it
LEFT JOIN
(select
b.arc_code,
count(*) NumInvoices,
sum( b.aantal ) verkocht_sd,
sum( b.aantal ) / 6.0 verkocht_sdperMonth,
avg( b.aantal ) AvgSale,
AVG(b.aantal) + 3 * STDDEV( b.aantal) AvgStdDev
from
bongegs b
JOIN artcred ac
on b.art_code = ac.art_code
AND ac.vln = 1
and ac.cred_nr = 9117
where
-- only for ORDERS ('f') and within last 6 months
b.bon_soort = 'f'
AND b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
group by
b.arc_code ) b
-- result is one entry per arc_code, thus preventing any Cartesian product
ON v.art_code = b.art_code
GROUP BY
v.art_code
I have the following table in Mysql
Name Age Group
abel 7 A
joe 6 A
Rick 7 A
Diana 5 B
Billy 6 B
Pat 5 B
I want to randomize the rows, but they should still remain grouped by the Group column.
For exmaple i want my result to look something like this.
Name Age Group
joe 6 A
abel 7 A
Rick 7 A
Billy 6 B
Pat 5 B
Diana 5 B
What query should i use to get this result? The entire table should be randomised and then grouped by "Group" column.
What you describe in your question as GROUPing is more correctly described as sorting. This is a particular issue when talking about SQL databases where "GROUP" means something quite different and determines the scope of aggregation operations.
Indeed "group" is a reserved word in SQL, so although mysql and some other SQL databases can work around this, it is a poor choice as an attribute name.
SELECT *
FROM yourtable
ORDER BY `group`
Using random values also has a lot of semantic confusion. A truly random number would have a different value every time it is retrieved - which would make any sorting impossible (and databases do a lot of sorting which is normally invisible to the user). As long as the implementation uses a finite time algorithm such as quicksort that shouldn't be a problem - but a bubble sort would never finish, and a merge sort could get very confused.
There are also degrees of randomness. There are different algorithms for generating random numbers. For encryption it's critical than the random numbers be evenly distributed and completely unpredictable - often these will use hardware events (sometimes even dedicated hardware) but I don't expect you would need that. But do you want the ordering to be repeatable across invocations?
SELECT *
FROM yourtable
ORDER BY `group`, RAND()
...will give different results each time.
OTOH
SELECT
FROM yourtable
ORDER BY `group`, MD5(CONCAT(age, name, `group`))
...would give the results always sorted in the same order. While
SELECT
FROM yourtable
ORDER BY `group`, MD5(CONCAT(DATE(), age, name, `group`))
...will give different results on different days.
DROP TABLE my_table;
CREATE TABLE my_table
(name VARCHAR(12) NOT NULL
,age INT NOT NULL
,my_group CHAR(1) NOT NULL
);
INSERT INTO my_table VALUES
('Abel',7,'A'),
('Joe',6,'A'),
('Rick',7,'A'),
('Diana',5,'B'),
('Billy',6,'B'),
('Pat',5,'B');
SELECT * FROM my_table ORDER BY my_group,RAND();
+-------+-----+----------+
| name | age | my_group |
+-------+-----+----------+
| Joe | 6 | A |
| Abel | 7 | A |
| Rick | 7 | A |
| Pat | 5 | B |
| Diana | 5 | B |
| Billy | 6 | B |
+-------+-----+----------+
Do the random first then sort by column group.
select Name, Age, Group
from (
select *
FROM yourtable
order by RAND()
) t
order by Group
Try this:
SELECT * FROM table order by Group,rand()
I'd appreciate your help with an SQL problem.
I have some student quiz score data in an SQL table and I wish to write a query to extract the information that I want. Candidates can attempt the tests as many times as they wish. Ideally, for each candidate, I wish to find out their highest percentage score on each of the tests. And I wish to get an average percentage of their highest percentage score on each test. Many of the candidates will not have done all of the tests. For example, candidate 1's highest scores on tests 1, 2 and 3 are 50%, 100% and 0%, leaving an overall average of 50%.
The table is named resultsets. The relevant column titles names are: Candidate (this is the student ID number), QuizName (the title of each quiz), and PercentageScore. It looks like this:
Candidate | QuizName | PercentageScore
---------------------------------------
1 | Test1 | 25
1 | Test1 | 50
1 | Test2 | 100
1 | Test3 | 0
2 | Test1 | 50
2 | Test1 | 100
3 | Test3 | 75
I'm hoping to get a table that looks something like this:
Candidate | Test1 | Test 2 | Test 3 | AveragePercentageScore
---------------------------------------
1 | 50 | 100 | 0 | 50
2 | 50 | 100 | 0 | 50
3 | 0 | 0 | 75 | 25
(Thanks Jain) I'd like to know the SQL command that I should enter.
Thank you!
Aside from being a beginner, it would be good to get a handle on basic table / database structures, relationships, use of primary / foreign keys and especially data normalization.
As for learning queries, I have seen other people utilize SQL Zoo as it has sample data and covers samples of how to look for certain things that require different query, joins, left-joins, aggregates etc.
All that said, sometimes it makes things easier if you can understand queries based on YOUR data, not some generic sample database that you have no context on its application to your data.
With all that said, I will help you get started. You need aggregates (min, max, avg, count) that are typically applicable based on a "GROUP BY" column(s). In this first case, you want to find "for each candidate" (the group by), and each "QUIZ" for that CANDIDATE (also part of group by), you want the highest test.
SELECT
Q.candidate,
Q.quizname,
MAX( Q.PercentageScore ) as HighestScore
from
YourQuizTable Q
group by
Q.candidate,
Q.quizname
Will result in the following.
Candidate QuizName HighestScore
1 Test1 50
1 Test2 100
1 Test3 0 (a legit score on file)
2 Test1 100
3 Test3 75
From that, you could create a pivot. Now, different sql engines have different pivot syntax, but to better see on these specific quizes posted, I will be doing a hard-coded pivot. Since the pivot is derived (uses the first query as the basis), the first query IS the basis of the pivot.
SELECT
smry.Candidate,
if( smry.quizname = 'Test1', smry.HighestScore, 0 ) as HiTest1,
if( smry.quizname = 'Test2', smry.HighestScore, 0 ) as HiTest2,
if( smry.quizname = 'Test3', smry.HighestScore, 0 ) as HiTest3,
AVG( smry.HighestScore ) as AvgTest
from
( SELECT
Q.candidate,
Q.quizname,
MAX( Q.PercentageScore ) as HighestScore
from
YourQuizTable Q
group by
Q.candidate,
Q.quizname ) smry
group by
smry.Candidate
The "IF()" is applied as each row is attempted, and each row will only ever have an instance of 1 quizname, it would only be either "Test1", "Test2" or "Test3". IF it IS that proper test, then grab the highest score as the basis to show in that column result. The last column is a simple average.
The final group by is now keeping them per candidate, but this time WITHOUT the group of the quiz as the inner query had.
I asked a similar question earlier today, but I've run into another issue that I need assistance with.
I have a logging system that scans a server and catalogs every user that's online at that given moment. Here is how my table looks like:
-----------------
| ab_logs |
-----------------
| id |
| scan_id |
| found_user |
-----------------
id is an autoincrementing primary key. Has no real value other than that.
scan_id is an integer that is incremented after each successful scan of all users. It so I can separate results from different scans.
found_user. Stores which user was found online during the scan.
The above will generate a table that could look like this:
id | scan_id | found_user
----------------------------
1 | 1 | Nick
2 | 2 | Nick
3 | 2 | John
4 | 3 | John
So on the first scan the system found only Nick online. On the 2nd it found both Nick and John. On the 3rd only John was still online.
My problem is that I want to get the total amount of unique users connected to the server at the time of each scan. In other words, I want the aggregate number of users that have connected at each scan. Think counter.
From the example above, the result I want from the sql is:
1
2
2
EDIT:
This is what I have tried so far, but it's wrong:
SELECT COUNT(DISTINCT(found_user)) FROM ab_logs WHERE DATE(timestamp) = CURDATE() GROUP BY scan_id
What I tried returns this:
1
2
1
The code below should give you the results you are looking for
select s.scan_id, count(*) from
(select distinct
t.scan_id
,t1.found_user
from
tblScans t
inner join tblScans t1 on t.scan_id >= t1.scan_id) s
group by
s.scan_id;
Here is sqlFiddle
It assumes the names are unique and includes current and every previous scans in the count
Try with group by clause:
SELECT scan_id, count(*)
FROM mytable
GROUP BY scan_id
I want to return all rows that have a certain value in a column and have more than 5 instances in which a number is that certain value. For example, I would like to return all rows of the condition in which if the value in the column M has the number 1 in it and there are 5 or more instances of M having the number 1 in it, then it will return all rows with that condition.
select *
from tab
where M = 1
group by id --ID is the primary key of the table
having count(M) > 5;
EDIT: Here is my table:
id | M | price
--------+-------------+-------
1 | | 100
2 | 1 | 50
3 | 1 | 30
4 | 2 | 20
5 | 2 | 10
6 | 3 | 20
7 | 1 | 1
8 | 1 | 1
9 | 1 | 1
10 | 1 | 1
11 | 1 | 1
Originally I just want to insert into a trigger so that if the number of M = 1's is greater than 5, then I want to create an exception. The query I asked for would be inserted into the trigger. END EDIT.
But my table is always empty. Can anyone help me out? Thanks!
Try this :
select *
from tab
where M in (select M from tab where M = 1 group by M having count(id) > 5);
SQL Fiddle Demo
please try
select *,count(M) from table where M=1 group by id having count(M)>5
Since you group on your PK (which seems a futile excercise), you are counting per ID, whicg will indeed always return 1.
As i explain after this code, this query is NOT good, it is NOT the answer, and i also explain WHY. Please do not expect this query to run correctly!
select *
from tab
where M = 1
group by M
having count(*) > 5;
Like this, you group on what you are counting, which makes a lot more sense. At the same time, this will have unexpected behaviour, as you are selecting all kinds of columns that are not in the group by or in any aggregate. I know mySQL is lenient on that, but I don;t even want to know what it will produce.
Try indeed a subquery along these lines:
select *
from tab
where M in
(SELECT M
from tab
group by M
having count(*) > 5)
I've built a SQLFiddle demo (i used 'Test' as table name out of habit) accomplishing this (I don't have a mySQL at hand now to test it).
-- Made up a structure for testing
CREATE TABLE Test (
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
M int
);
SELECT id, M FROM tab
WHERE M IN (
SELECT M
FROM Test
WHERE M = 1
GROUP BY M
HAVING COUNT(M) > 5
)
The sub-query is a common "find the duplicates" kind of query, with the added condition of a specific value for the column M, also stating that there must be at least 5 dupes.
It will spit out a series of values of M which you can use to query the table against, ending with the rows you need.
You shouldn't use SELECT * , it's a bad practice in general: don't retrieve data you aren't actually using, and if you are using it then take the little time needed to type in a list of field, you'll likely see faster querying and on the other hand the code will be way more readable.