In query with joins and multi-table/field ORDER BY, how to set LIMIT offset to start from a particular row identified by a unique id field? - mysql

Suppose I have four tables: tbl1 ... tbl4. Each has a unique numerical id field. tbl1, tbl2 and tbl3 each has a foreign key field for the next table in the sequence. E.g. tbl1 has a tbl2_id foreign key field, and so on. Each table also has a field order (and other fields not relevant to the question).
It is straightforward to join all four tables to return all rows of tbl1 together with corresponding fields from the other three fields. It is also easy to order this result set by a specific ORDER BY combination of the order fields. It is also easy to return just the row that corresponds to some particular id in tbl1, e.g. WHERE tbl1.id = 7777.
QUESTION: what query most efficiently returns (e.g.) 100 rows, starting from the row corresponding to id=7777, in the order determined by the specific combination of order fields?
Using ROW_NUMBER or (an emulation of it in MySQL version < 8) to get the position of the id=7777 row, and then using that in a new version of the same query to set the offset in the LIMIT clause would be one approach. (With a read lock in between.) But can it be done in a single query?
# FIRST QUERY: get row number of result row where tbl1.id = 7777
SELECT x.row_number
FROM
(SELECT #row_number:=#row_number+1 AS row_number, tbl1.id AS id
FROM (SELECT #row_number:=0) AS t, tbl1
INNER JOIN tbl2 ON tbl2.id = tbl1.tbl2_id
INNER JOIN tbl3 ON tbl3.id = tbl2.tbl3_id
INNER JOIN tbl4 ON tbl4.id = tbl3.tbl4_id
WHERE <some conditions>
ORDER BY tbl4.order, tbl3.order, tbl2.order, tbl1.order
) AS x
WHERE id=7777;
Store the row number from the above query and use it to bind :offset in the following query.
# SECOND QUERY : Get 100 rows starting from the one with id=7777
SELECT x.field1, x.field2, <etc.>
FROM
(SELECT #row_number:=#row_number+1 AS row_number, field1, field2
FROM (SELECT #row_number:=0) AS t, tbl1
INNER JOIN tbl2 ON tbl2.id = tbl1.tbl2_id
INNER JOIN tbl3 ON tbl3.id = tbl2.tbl3_id
INNER JOIN tbl4 ON tbl4.id = tbl3.tbl4_id
WHERE <same conditions as before>
ORDER BY tbl4.order, tbl3.order, tbl2.order, tbl1.order
) AS x
LIMIT :offset, 100;

Clarify question
In the general case, you won't ask for WHERE id1 > 7777. Instead, you have a tuple of (11,22,33,44) and you want to "continue where you left off".
Two discussions, with
That is messy, but not impossible. See Iterating through a compound key . Ig gives an example of doing it with 2 columns; 4 columns coming from 4 tables is an extension of such.
A variation
Here is another discussion of such: https://dba.stackexchange.com/questions/164428/should-i-store-data-pre-ordered-rather-than-ordering-on-the-fly/164755#164755
In actually implementing such, I have found that letting the "100" (LIMIT) be flexible can be easier to think through. The idea is: reach forward 100 rows (with LIMIT 100,1). Let's say you get (111,222,333,444). If you are currently at (111, ...), then deal with id2/3/4. If it is, say, (113, ...), then do WHERE id1 < 113 and leave off any specification of id2/3/4. This means fetching less than 100 rows, but it lands you just shy of starting id1=113.
That is, it involves constructing a WHERE clause with between 1 and 4 conditions.
In all cases, your query says ORDER BY id1, id2, id3, id4. And the only use for LIMIT is in the probe to figure out how far ahead the 100th row is (with LIMIT 100,1).
I think I can dig out some old Perl code for that.

Related

Deleting duplicate rows on MySQL, getting a max row error

I am deleting duplicate rows on MySQL and only leaving behind the old row (least id) but I am getting a max row error
DELETE n1
FROM item_audit n1, item_audit n2
WHERE n1.id > n2.id AND n1.description = n2.description
Keep in mind, with that join condition you are joining each row to every row before it (with the same description). This is one of those cases where a subquery will be much more effective than a join.
DELETE a
FROM item_audit a
WHERE (a.id, a.description) NOT IN
(SELECT * FROM
(
SELECT MIN(id), description
FROM item_audit
GROUP BY description
) AS realSubQ
)
Actually, assuming id is unique, it can even be simplier:
DELETE a
FROM item_audit a
WHERE a.id NOT IN
(SELECT * FROM
( SELECT MIN(id)
FROM item_audit
GROUP BY description
) AS realSubQ
)
As you discovered, MySQL needs to be "tricked" into being able to use the delete target in a subquery with the extra select * wrapper.
Alternatively, a join on the subquery could be used to reduce the size of the intermediate result set created behind the scenes.
DELETE a
FROM item_audit a
LEFT JOIN (SELECT MIN(id) AS firstId FROM item_audit GROUP BY description) AS aFirst
ON a.id = aFirst.firstId
WHERE aFirst.firstId IS NULL
;
If that fails, you can insert the first id's into a temp table, and should be able to do subquery version with that.
CREATE TEMPORARY TABLE `old_ids`
SELECT MIN(ID) AS id
FROM item_audit
GROUP BY description;
DELETE a
FROM item_audit a
LEFT JOIN old_ids ON a.id = old_ids.id
WHERE old_ids.id IS NULL
;
In any of these cases, a LIMIT clause can be placed very last to accomplish an incremental delete. The last, temp table, version has the benefit that the subquery will not need re-evaluated after every incremental delete (and the temporary table can be indexed to speed things up as well).

Performance in mysql joins with subquery and limit

In a join operation between two subqueries (or a table and a subquery), is it preferred to specify the LIMIT clause in an inner query rather than on the outer query (since the order would determine the amount of rows the DBMS would have to iterate to check the where clause)? like:
((
SELECT id
FROM Table1
WHERE score>=100
LIMIT 10)
AS A NATURAL JOIN Table2
))
would be better than
((
SELECT id
FROM Table1
WHERE score>=100)
AS A NATURAL JOIN Table2
))
LIMIT 10
My thinking is that in the last query, the DBMS first have to iterate (full table or an index) ALL rows in Table1 where score>=100 that can be mapped to Table2 on their common columns (which could be any number of rows), and only after that it will truncate to only 10 rows, whereas in the first query, it will only scan until it has found 10 rows from Table1 that satisfy the where clause and can be mapped to Table2, then stop....
The 2 partial statements are not equivalent. When using LIMIT order matters. If you're placing the limit on Table1 you might never see the rows you would have otherwise seen with limit placed on the whole dataset. Given that disclaimer, it seems like using the limit and then joining would be more efficient, but rule of thumb is you should always measure.
Also consider that instead of joining the SELECT as table, for which MySQL will have to build an internal temporary table, you could join the table itself, i.e.:
SELECT t0.col0, t1.col1
FROM
Table0 t0
JOIN Table1 t1 ON (t0.col0 = t1.col0 AND t1.score >= 100)
which might be even more efficient if you have good indexes and end up using them. But again, you should measure.

Join tables to query for a random row

I have two tables, one contains data created by an individual user and that data is assigned a unique primary key. My second table contains ratings of the aforementioned data and contains the user that created the rating, the actual rating, and the ID of the data being rated.
Table one, [data][user][data_id]
Table two, [rating][user][data_id]
I want a query that gives me a random row from table one based on the pretense that table two contains no ratings of random row being returned.
I think this will require a join, but I keep getting stuck. Is this possible with a single query?
I think this should do it:
SELECT * FROM tableOne
WHERE (SELECT COUNT(*) FROM tableTwo WHERE tableTwo.data_id = tableOne.data_id) = 0
ORDER BY RAND()
LIMIT 1
This should do it. It may be a little faster than doing the COUNT method, depending on your tables:
SELECT *
FROM tableOne t1
LEFT JOIN tableTwo t2
ON t2.user= t1.user
AND t2.data_id = t1.data_id
WHERE t2.user IS NULL
ORDER BY RAND()
LIMIT 1
This assumes that your user column is not nullable.

How To Get The Number Interval Between Rows via MySQL

We have a table which has two columns -- ID and Value. The ID is the index of table row, and the Value consists of Fixed String and Key (a number) in hexadecimal storing as string in the database. Take 00001810010 as an example, the fixed string is 0000181 and the seconds part is the key -- 0010.
Table
ID Value
0 00001810000
1 00001810010
2 00001810500
3 00001810900
4 0000181090a
What I want to get from the above table is the Number Interval between rows, for above table the result is
[1, 9], [11, 4FF], [501, 8FF], [901, 909]
I can read all the records into memory and handle them via C++, but is it possible to implement it through MySQL statements only? How?
I would be tempted to match up a row with the previous row with something like this:-
SELECT sub1.id AS this_row_id,
sub1.value AS this_row_value,
z.id AS prev_row_id,
z.value AS prev_row_value
FROM
(
SELECT a.id, a.value, MAX(b.id) AS bid
FROM some_table a
INNER JOIN some_table b
ON a.id > b.id
GROUP BY a.id, a.value
) sub1
INNER JOIN some_table z
ON z.id = sub1.bid
You might want to use LEFT OUTER JOINs rather than INNER JOINs depending on what you want for the first record (where there is no previous record to match on).

Is there any way to mix(change order) rows in mysql table?

I had 3 tables which are not identical to each other. According to one of my requirement I had to copy all these tables records to another table.
That part is okay.What my problem is that the records I inserted is in a order now.
Like
first 100 records from table1
second 100 records from table2
third 100 records from table3
what I wanted to do is change/mix the record positions.Like if i selected first 100 records there should be records from all three table.
selecting data from ORDER BY Rand() is not I want.I just need to select data and display those data.
Is there any way that i can solve this out?Thanks
A great post handling several cases, from simple, to gaps, to non-uniform with gaps.
http://jan.kneschke.de/projects/mysql/order-by-rand/
For most general case, here's how you do it:
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1
This supposes that the distribution of ids is equal, and that there can be gaps in the id list. See the article for more advanced examples
If you don't want to query later on with rand() you could create the table by inserting from a union select ordered by rand() in the first place:
INSERT INTO merged (a, b)
SELECT a, b FROM (
SELECT a, b, rand() AS r FROM t1
UNION ALL
SELECT a, b, rand() AS r FROM t2
) ORDER BY r
However, also consider this post I just came across: INSERT INTO SELECT strange order using UNION, perhaps someone can comment.