Efficient way to merge-sort data from multiple tables - mysql

I have a series of tables that contain data of similar format. I.e. a UNION would work.
Conceptually you can think of it as 1 table partitioned into multiple tables.
I want to get the data from all of these tables sorted.
Now the problem I have is that the data are too much to be displayed all at once to the user, so I need to display them in portions i.e. pages.
Now my problem is that I need to display the data sorted (as already said).
So if I do something like:
SELECT FROM TABLE_1
UNION
SELECT FROM TABLE_2
UNION
....
SELECT FROM TABLE_N
ORDER BY COL
LIMIT OFFSET, RECORDS;
I would constantly be doing a UNION and ORDER BY to get e.g. the just the corresponding 50 records of the pages on each request.
So how would I most efficiently handle this?

My first attempt would be UNION'ing just a small number of records from each table:
( SELECT FROM table_1 ORDER BY col LIMIT #offset, #records )
UNION
...
( SELECT FROM table_N ORDER BY col LIMIT #offset, #records )
ORDER BY col LIMIT #offset, #records
If the above proves insufficient, I would build a manual index table (based on David Starkey's clever suggestion).
CREATE TABLE index_table (
table_id INT,
item_id INT,
col DATETIME,
INDEX (col, table_id, id)
);
Then populate index_table with a method of your liking (cron job, triggers on tables table_n, ...). Your SELECT statement would then look like this:
SELECT *
FROM ( SELECT * FROM index_table ORDER BY col LIMIT #offset, #records ) AS idx
LEFT JOIN table_1 ON (idx.table_id = 1 AND idx.item_id = table_1.id)
...
LEFT JOIN table_n ON (idx.table_id = n AND idx.item_id = table_n.id)
However, I am not sure of how such a query would perform with so many LEFT JOIN's. It really depends on how many tables table_n there are.
Finally, I would consider merging all tables into one single table.

Related

A Complex query into Union mysql multiple times - Optimizer

I want to use a same subquery multiple times into UNION. This subquery is time consumed and I think that using it a lot of times may will be increased the total time of execution.
For example
(SELECT * FROM (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) as T ORDER BY column1 DESC LIMIT 10)
UNION
(SELECT * FROM (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) as T ORDER BY column2 DESC LIMIT 10)
UNION
(SELECT * FROM (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) as T ORDER BY column3 DESC LIMIT 10)
Does the (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) executed 3 times ?
If mysql is smart enough the internal subquery will be executed only one so I don't need any optimization, but if not I have to use something else to optimize it (like using a temporary table, but I want to avoid it)
Do I have to optimize this query by other syntax ? Any suggestion ?
In practice I want to filter some data from huge records and get some of them in 3 group-sections, each section in different order
Plan A:
A TEMPORARY TABLE cannot be referenced more than once. So, build a permanent table and DROP it when finished. (If you might have multiple connections doing the same thing, it will be a hassle to make sure you are not using the same table name.)
Plan B:
With MySQL 8.0, you can do
WITH T AS ( SELECT ... )
SELECT ... FROM T ORDER BY col1
UNION ...
Plan C:
If it is possible to do this:
SELECT id FROM A
ORDER BY col1 LIMIT 10
You could use that as a 'derived' table inside
(SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS)
Something like
SELECT A.*, B.*
FROM ( SELECT id FROM A
ORDER BY col1 LIMIT 10 ) AS x1
JOIN A USING(id)
JOIN B ... AND SOME COMPLEX WHERE CONDITIONS
Similarly for the other two SELECTs, then UNION them together.
Better yet, UNION together the 3 sets of ids, then JOIN to A and B once.
This may have the advantage of dealing with fewer rows.

Finding item and dependent set of items from the same mysql table in one query

I have a table where I store items and the time where they are relevant. For this question the following columns are relevant:
CREATE TABLE my_items
(
id INTEGER,
category INTEGER,
t DOUBLE
);
I want to select all items from a specific category (e.g. 1) and the sets of items that have a time within +- 5 (seconds) from these items.
I will probably do this with two types of queries in a script:
SELECT id,t from my_items where category=1;
then loop over the result set, using each result row's time as t_q1, and do a separate query:
SELECT id from my_items where t >= t_q1-5 AND t <= t_q1+5;
How can I do this in one query?
You can use a join. Take your subquery that selects all category 1 items, and join it with the original table on the condition that the time is within +/- five. It's possible that duplicate rows are returned, so you can group by id to avoid that:
SELECT t.*
FROM myTable t
JOIN (SELECT id, timeCol FROM myTable WHERE category = 1) t1
ON t.timeCol BETWEEN (t1.timeCol - 5) AND (t1.timeCol + 5)
OR t.id = t1.id
GROUP BY t.id;
I added the OR t.id = t1.id to make sure that the rows of category 1 are still included.
You can use a single query with all you criteria if there is only one table
SELECT id,t from my_items where category=1 AND t >= t_q1-5 AND t <= t_q1+5;
If there is two tables, use a right join on the timestamps table for performance.
select id
from my_items i,
(select min(t) min_t, max(t) max_t from my_items where category=1) i2
where i.category = 1 or
i.t between i2.min_t-5 and i2.max_t+5

Quick SELECT WHERE IN like with flag

I have a list of ids and need to check whether user with id is in DB or not in one SELECT. Like SELECT WHERE IN (). But SELECT WHERE IN () doesn't suit my needs, I need in one SELECT distinguish those ids that are in table, and those that are not, not using any loops like multiple SELECTS. Any ideas are welcome!
I'm not sure if this is what you need, but I guess you have table 1 which contains a lot of IDs, and you would like to see which ones occur in table 2 and which ones don't?
select T1.ID, count(*) 'Times of occurrences in T2'
from table 1 T1
left outer join table 2 T2
ON T1.ID = T2.ID
group by T1.ID
You should provide more details. Would it be a single query so a list could be hardcoded into query or you want to find general solution for any list of ids provided? How long is your list?
For single query and not very long list you can use union. On example:
SELECT some_value, EXISTS( SELECT 1 FROM tableName WHERE user_id = some_value )
UNION ALL
SELECT other_value, EXISTS( SELECT 1 FROM tableName WHERE user_id = other_value )
UNION ALL
SELECT other_value2, EXISTS( SELECT 1 FROM tableName WHERE user_id = other_value2 )
UNION ALL
.....
If your list of ids can vary and/or consists of thousands of records it is impossible. In list you have columnar layout and you want to change it to row-level results. In MsSQL there are PIVOT, UNPIVOT clauses which can do that. In MySQL such transformation without explicit unions are impossible.

Get last but one row for each ID

I am using query like
select * from audittable where a_id IN (1,2,3,4,5,6,7,8);
For each ID its returning 5-6 records. I wanted to get the last but one record for each ID.
Can i do this in one sql statement.
Try this query
SELECT
*
FROM
(SELECT
#rn:=if(#prv=a_id, #rn+1, 1) as rId,
#prv:=a_id as a_id,
---Remaining columns
FROM
audittable
JOIN
(SELECT #rn:=0, #prv:=0) t
WHERE
a_id IN (1,2,3,4,5,6,7,8)
ORDER BY
a_id, <column> desc)tmp --Replace column with the column with which you will determine it is the last record
WHERE
rId=1;
If your database is having DateCreated or any column in which you are saving the DateTime as well like when your data is inserted for a particular row then you may use query like
select at1.* from audittable at1 where
datecreated in( select max(datecreated) from audittable at2
where
at1.id = at2.id
order by datecreated desc
);
You may also use LIMIT function as well.
Hope you understand and works for you.
In SQLite, you have the columns a_id and b. For each a_id you get a set of b's. Let you want
to get the latest/highest (maximum in terms of row_id, date or another naturally increasing index) one of b's
SELECT MAX(b), *
FROM audittable
GROUP BY a_id
Here MAX help to get the maximum b from each group.
Bad news that MySQL doesn't associate MAX b with other *-columns of the table. But it still can be used in case of simple table with a_id and b columns!

Is this an inefficient query?

Assuming table1 and table2 both have a large number of rows (ie several hundred thousand), is the following an inefficient query?
Edit: Order by field added.
SELECT * FROM (
SELECT title, updated FROM table1
UNION
SELECT title, updated FROM table2
) AS query
ORDER BY updated DESC
LIMIT 25
If you absolutely need distinct results, another possibility is to use union all and a group by clause instead:
SELECT title FROM (
SELECT title FROM table1 group by title
UNION ALL
SELECT title FROM table2 group by title
) AS query
group by title
LIMIT 25;
Testing this without the limit clause on an indexed ID column from two tables with ~920K rows each in a test database (at $work) resulted in a bit over a second with the query above and about 17 seconds via a union.
this should be even faster - but then I see no ORDER BY so what 25 records do you actually want?
SELECT * FROM (
SELECT title FROM table1 LIMIT 25
UNION
SELECT title FROM table2 LIMIT 25
) AS query
LIMIT 25
UNION must make an extra pass to fetch the distinct records, so you should use UNION ALL.
Yes, use order by and limits in the inner queries.
SELECT * FROM (
(SELECT title FROM table1 ORDER BY title ASC LIMIT C)
UNION
(SELECT title FROM table2 ORDER BY title ASC LIMIT C)
) AS query
LIMIT 25
This will only go through C rows instead of N (hundreds of thousands). The ORDER BY is necessary and should be on an indexed column.
C is a heuristic constant that should be tuned according to the domain. If you only expect a few duplicates, C=50-100 is probably ok.
You can also find out this for yourself by using EXPLAIN.