MySQL: limit query or subquery (in left/inner join?) - mysql

Apologise in advance, I'm novice in (My)SQL - this should be an easy question for expert DBAs - but I don't even know where to start finding a solution at all. I'm not even sure if I applied LEFT JOIN in the correct way below.
My (DB) structure is quite simple:
I have testsuites, and several testcases are linked to each testsuite ("logical entities")
During testcase kick-off, I'm creating an entry for each testsuite in the testsuiteinstance table - and one entry in testcaseinstance for each testcase.
My goal is to fetch the last 10 testcaseinstances of all testcases belonging to a certain testsuite
This is the query I use to fetch all testcaseinstances:
SELECT * FROM testcaseinstance AS tcinst
LEFT JOIN testsuiteinstance tsinst ON tsinst.id=tcinst.testsuiteinstance_id
LEFT JOIN testsuite ts ON ts.id=tsinst.testsuite_id
WHERE ts.id = 349 ORDER BY tcinst.id DESC;
So, let's say I have two testcases in a testsuite and both testcase was executed 100 times each. This query gives me 200 rows. If I put "LIMIT 10" at the end, I will only get the last 10 rows for one testcase type, but I want 20 rows (the last 10-10 belonging to the two testcases)
I'd appreciate some description beside the solution query or a pointer to a "tutorial" I can start looking at related to the topic (whatever would that be :D)
Thanks in advance!

Here's one approach; consider this (slightly contrived) example...
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
Let's say we want to return the top 3 even numbers and the top 3 odd numbers from this list. Ignoring for the moment that there's another, simpler, solution to this particular example we can instead do something like this...
SELECT x.*
, COUNT(*) rank
FROM ints x
JOIN ints y
ON MOD(y.i,2) = MOD(x.i,2)
AND y.i >= x.i
GROUP
BY i
ORDER
BY MOD(x.i,2) DESC
, x.i DESC;
+---+------+
| i | rank |
+---+------+
| 9 | 1 |
| 7 | 2 |
| 5 | 3 |
| 3 | 4 |
| 1 | 5 |
| 8 | 1 |
| 6 | 2 |
| 4 | 3 |
| 2 | 4 |
| 0 | 5 |
+---+------+
From here, the process of grabbing just the top 3 from each group becomes trivial...
SELECT x.*
, COUNT(*) rank
FROM ints x
JOIN ints y
ON MOD(y.i,2) = MOD(x.i,2)
AND y.i >= x.i
GROUP
BY i
HAVING rank <=3
ORDER
BY MOD(x.i,2),x.i DESC;
+---+------+
| i | rank |
+---+------+
| 8 | 1 |
| 6 | 2 |
| 4 | 3 |
| 9 | 1 |
| 7 | 2 |
| 5 | 3 |
+---+------+
...and this can be simplified to...
SELECT x.*
FROM ints x
JOIN ints y
ON MOD(y.i,2) = MOD(x.i,2)
AND y.i >= x.i
GROUP
BY i
HAVING COUNT(*) <=3;
+---+
| i |
+---+
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+

Related

Get the newest record from MySQL from 2 tables more optimalized [duplicate]

This question already has answers here:
MySQL join two table with the maximum value on another field
(3 answers)
Closed 2 years ago.
I have some problems with query in SQL.
I have 2 tables.
people
+----+--------+------+
| id | name | val2 |
+----+--------+------+
| 1 | john | 12 |
| 2 | adam | 5 |
| 3 | alfred | 3 |
+----+--------+------+
data
+----+----+----+-----+---------------------+
| id | v1 | v2 | v3 | date |
+----+----+----+-----+---------------------+
| 1 | 4 | 15 | 18 | 2020-10-16 11:15:53 |
| 1 | 2 | 12 | 17 | 2020-10-16 11:22:53 |
| 1 | 3 | 13 | 16 | 2020-10-16 11:32:53 |
| 2 | 1 | 16 | 15 | 2020-10-16 13:22:53 |
| 2 | 3 | 13 | 25 | 2020-10-16 13:42:53 |
| 2 | 4 | 12 | 35 | 2020-10-16 14:12:53 |
| 3 | 1 | 21 | 12 | 2020-10-16 14:12:53 |
| 3 | 2 | 28 | 42 | 2020-10-16 15:12:53 |
| 3 | 4 | 30 | 72 | 2020-10-16 16:12:53 |
+----+----+----+-----+---------------------+
I need to get in one table ID, NAME, v1,v2,v3,date for the new date to all object from first table
something like this:
RESULT
+----+--------+----+----+-----+---------------------+
| id | name | v1 | v2 | v3 | date |
+----+--------+----+----+-----+---------------------+
| 1 | john | 3 | 13 | 16 | 2020-10-16 11:32:53 |
| 2 | adam | 4 | 12 | 35 | 2020-10-16 14:12:53 |
| 3 | alfred | 4 | 30 | 72 | 2020-10-16 16:12:53 |
+----+--------+----+----+-----+---------------------+
I need the newest record from SECOND TABLE for all people from first table.
I try do it by this query:
SELECT people.id,
people.name,
data.v1,
data.v2,
data.v3,
max(data.date)
FROM people
JOIN DATA ON people.id = data.id
GROUP BY people.id
I got the newest data but v1, v2, v3 is random from table.
You want entire rows from data, so aggregation is not an option here. In most databases, your query would fail, because the select and group by clause are not consistent... But MySQL, somehow unfortunaltely, gives you enough rope to developers to to hang themselves with. Your query runs (if sql mode ONLY_FULL_GROUP_BY is disabled), but is actually equivalent to:
SELECT people.id, people.name, ANY_VALUE(data.v1), ANY_VALUE(data.v2), ANY_VALUE(data.v3), MAX(data.date)
FROM people
JOIN data on people.id = data.id
GROUP BY people.id
Now it is plain to see that the database gives you any value of data rows that match the join condition - which may, or may not belong to the row that has the latest date.
Instead of grouping, you actually need to filter. One option uses a subquery:
select p.id, p.name, d.v1, d.v2, d.v3, d.date
from people p
inner join data d on d.id = p.id
where d.date = (select max(d1.date) from data d1 where d1.id = d.id)
The upside of this approach is that it works in all versions of MySQL, including pre-8.0, where window functions are not available.
One simple method uses window functions:
SELECT p.id, p.name, d.v1, d.v2, d.v3, d.date)
FROM people p JOIN
(SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY d.id ORDER BY d.date DESC) as seqnum
FROM data d
) d
ON p.id = d.id AND d.seqnum = 1;
Note: It seems strange that the join column in data would be id. I would expect it to be called something like people_id.

How to find duplicate rows with SQL- GROUP BY

I've a table
+----+------------+
| id | day |
+----+------------+
| 1 | 2006-10-08 |
| 2 | 2006-10-08 |
| 3 | 2006-10-09 |
| 4 | 2006-10-09 |
| 5 | 2006-10-09 |
| 5 | 2006-10-09 |
| 6 | 2006-10-10 |
| 7 | 2006-10-10 |
| 8 | 2006-10-10 |
| 9 | 2006-10-10 |
+----+------------
I want to group by the frequency and its count, for eg:-
Since there's a date 2006-10-08 that appears twice, hence frequency 2 and there is only one date that appears twice , hence total dates 1.
Another eg:-
2006-10-10 and 2006-10-09 both appears 4 times, hence frequency 4 and total dates with frequency 4 are 2.
Following is the expected output.
+----------+--------------------------------+
| Freuency | Total Dates with frequency N |
+----------+--------------------------------+
| 1 | 0 |
| 2 | 1 |
| 3 | 0 |
| 4 | 2 |
+----------+--------------------------------+ and so on till the maximum frequency.
What I've tried is the following:-
select day, count(*) from test GROUP BY day;
It returns the frequency of each date, ie
+------------+----------+
| day | count(*) |
+------------+----------+
| 2006-10-08 | 2 |
| 2006-10-09 | 4 |
| 2006-10-09 | 4 |
+------------+----------+
Please help with the above problem.
Just use your query as a subquery:
select freq, count(*)
from (select day, count(*) as freq
from test
group by day
) d
group by freq;
If you want to get the 0 values, then you have to work harder. A numbers table is handy (if you have one) or you can do:
select n.freq, count(d.day)
from (select 1 as freq union all select 2 union all select 3 union all select 4
) n left join
(select day, count(*) as freq
from test
group by day
) d
on n.freq = d.freq
group by n.freq;

max in one column and min in another column

For example, if Column A and Column B have values:
+---+---+
| A | B |
+---+---+
| 2 | 1 |
| 5 | 1 |
| 6 | 1 |
| 1 | 2 |
| 5 | 2 |
| 0 | 2 |
| 2 | 3 |
| 7 | 3 |
| 4 | 3 |
| 5 | 4 |
+---+---+
From each group of B, I want to get the highest number from A. However, I don't want to include results where the number in B is higher, yet has a smaller A value than the previous one. I know this doesn't make sense in words, but this is what I want the final result to look like:
+---+---+
| A | B |
+---+---+
| 6 | 1 |
| 7 | 3 |
+---+---+
So far I have something like "select max(a), b from table1 group by b" but this doesn't omit the ones where B is higher but the max A is smaller. I know that I could just peruse the results of that query in PHP and remove the ones where the A value is smaller than the previous A value, but I want to put it all in the mysql query if possible.
This technique joins the table against the aggregated version of itself, but the join is offset by one, so that every row is joined to the knowledge of the previous-B's MAX(A) value. It then matches rows where the current A is greater than any of those, and if it doesn't find any, it doesn't include the row. We then aggregate the final selection to get the results you are after.
SELECT
MAX(source_row.A) as A,
source_row.B
FROM ab as source_row
LEFT JOIN (SELECT MAX(A) as A, B FROM ab GROUP BY B) AS one_back
ON one_back.B = source_row.B-1
WHERE (one_back.A IS NULL)
OR one_back.A < source_row.A
GROUP BY B
I have tested this :-)
edit: extra insight
I wanted to share a little insight into how I come up with these kind of solutions; 'cause I think it's important for folks to start to "think in sets"... that's the best advice I ever read regarding JOINS, that you need to envision the intermediate "sets" that your query was working with. To illustrate this, here is a representation of the intermediate "set" that is the critical part of this query; it is the table as it exists "joined" to the aggregated version of itself off-by-one.
+------+------+------------+------------+
| A | B | one_back.B | one_back.A |
+------+------+------------+------------+
| 2 | 1 | NULL | NULL |
| 5 | 1 | NULL | NULL |
| 6 | 1 | NULL | NULL |
| 1 | 2 | 1 | 6 |
| 5 | 2 | 1 | 6 |
| 0 | 2 | 1 | 6 |
| 2 | 3 | 2 | 5 |
| 7 | 3 | 2 | 5 |
| 4 | 3 | 2 | 5 |
| 5 | 4 | 3 | 7 |
+------+------+------------+------------+
And then the set as it actually is created in-memory (the full join'd version is never fully in memory, since MySQL can eliminate rows as soon as it knows they are not going to "make the cut":
+------+------+------------+------------+
| A | B | one_back.B | one_back.A |
+------+------+------------+------------+
| 2 | 1 | NULL | NULL |
| 5 | 1 | NULL | NULL |
| 6 | 1 | NULL | NULL |
| 7 | 3 | 2 | 5 |
+------+------+------------+------------+
And then, of course, it aggregates the results from there into the final form, selecting only the A and B from the original rows.
A simpler solution would be to use a variable to store the value of a from the previous row and make the comparison on each iteration. This also accounts for the case where you might have gaps in the b column, where numbers aren't exactly in perfect sequential order:
SELECT #val:=a.a AS a, a.b
FROM
(
SELECT MAX(a) AS a, b
FROM tbl
GROUP BY b
) a
WHERE a.a > IFNULL(#val,-1)
Select Z.a, Z.b from
(select a, b, rank() over (order by b) as ranker from (select max(a) a, b from table1 group by b) Y) Z left join
(select a, b, rank() over (order by b) as ranker from (select max(a) a, b from table1 group by b) Y1) Z1
on Z.ranker = Z1.ranker + 1
where Z.a > isnull(Z1.a, -100000)

mysql select sum of rows by comparing two relations

I have data from tests with two lists of parts, called in and out. I need to select SUM of test values for each part after the last test where the part went in but didn't come out.
IN LIST OUT LIST TEST
+--------+-----------+ +--------+------------+ +------+-------+
| testid | in_partid | | testid | out_partid | | test | value |
+--------+-----------+ +--------+------------+ +------+-------+
| 1 | 10 | | 1 | 10 | | 1 | 1 |
| 1 | 20 | | 1 | 20 | | 2 | 10 |
| 2 | 10 | | 2 | 10 | | 3 | 100 |
| 2 | 20 | | | | | | |
| 3 | 10 | | 3 | 10 | | | |
| 3 | 20 | | 3 | 20 | | | |
+--------+-----------+ +--------+------------+ +------+-------+
SUM is pretty straightforward, but can I limit it to those rows where testid is greater than testid for the last inspection where part went in but not out?
In this example, part 10 should SUM all three test values, because it's included in all lists, but part 20 should only return value for test 3, as in test 2 it was not included in both in and out lists.
partid sum(value)
10 111
20 100
Can I do with with mysql, or do I need to include php in the mix?
I think your sample output is incorrect from your logic. I think partid 20 should return 101 as it is present in both lists for both tests 1 and 3. Assuming I'm right in that, this query should return the desired results
SELECT in_partid,SUM(value)
FROM (
SELECT DISTINCT in_partid,inl.testid
FROM in_list inl
INNER JOIN out_list outl ON in_partid=out_partid AND inl.testid=outl.testid
) as tests_passed
INNER JOIN tests ON tests_passed.testid=test
GROUP BY in_partid
EDIT: based on OP's comment my assumption above was wrong and was actually a requirement. Accordingly here is a query that I think fulfils the requirements:
SELECT tests_passed.in_partid,SUM(value)
FROM (
SELECT DISTINCT inl.in_partid,IFNULL(last_failed_test,0) as last_failed_test
FROM in_list inl LEFT JOIN (
SELECT in_partid,MAX(inl.testid) as last_failed_test
FROM in_list inl
LEFT JOIN out_list outl ON in_partid=out_partid AND inl.testid=outl.testid
WHERE outl.testid IS NULL
GROUP BY in_partid
) AS last_passed
ON inl.in_partid=last_passed.in_partid
) as tests_passed
INNER JOIN tests ON tests_passed.last_failed_test<test
GROUP BY tests_passed.in_partid
This returns the sample results given above for the sample data supplied.

Subqueries in MySQL creating duplicate results

I am having a bit of trouble with my query.
As you can see i am running two queries. This all looks very well and mysql takes it like a man. But the result that i get is 5 times the same stuff.
SELECT s.category_id, p.product_id
FROM (
SELECT ros_categories.category_id
FROM ros_categories, ros_variantIndex
WHERE ros_categories.name = ros_variantIndex.variantText
AND ros_categories.group = 'Sizes'
LIMIT 0 , 5
) s, (
SELECT ros_product.product_id
FROM ros_product, ros_variantIndex
WHERE ros_product.vart = ros_variantIndex.vart
LIMIT 0 , 5
) p
Output:
+-------------+------------+
| category_id | product_id |
+-------------+------------+
| 110 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
| 110 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
| 110 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
| 110 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
| 110 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
+-------------+------------+
25 rows in set (0.01 sec)
What is going on here? Is this my problem or is mysql being strange?
EDIT:
Thanks for explaining me what the problem was. I fixed it using several joins. So thanks for pointing out my error and naming the problem :-) And sorry bout the silly question
What is going on here? Is this my problem or is mysql being strange?
What you've done is create a Cartesian product also known as a Cross Join. typically you just join s and p to get what you want but the JOIN criteria isn't clear.
Perhaps you want this (guessing at columns on your tables)
SELECT s.category_id, p.product_id
FROM (
SELECT ros_categories.category_id
FROM ros_categories, ros_variantIndex
WHERE ros_categories.name = ros_variantIndex.variantText
AND ros_categories.group = 'Sizes'
LIMIT 0 , 5
)s
INNER JOIN (
SELECT ros_product.product_id, ros_product.category_id
FROM ros_product, ros_variantIndex
WHERE ros_product.vart = ros_variantIndex.vart
LIMIT 0 , 5
)p
on s.category_id = p.category_id
It is just a cross product of two temporary tables representing the respective result sets of the subqueries.