Search by primary key and then sort - mysql

I'm using MySQL and I have a database that I'm trying to use as the canonical version of data that I will be storing in different indexes. That means that I have to be able to retrieve a set of data quickly from it by primary key, but I also need to sort it on the way out. I can't figure out how to let MySQL efficiently do that.
My table looks like the following:
CREATE TABLE demo.widgets (
id INT AUTO_INCREMENT,
-- lots more information I need
awesomeness INT,
PRIMARY KEY (id),
INDEX IDX_AWESOMENESS (awesomeness),
INDEX IDX_ID_AWESOMENESS (id, awesomeness)
);
And I want to do something along the lines of:
SELECT *
FROM demo.widgets
WHERE id IN (
1,
2,
3,
5,
8,
13 -- you get the idea
)
ORDER BY awesomeness
LIMIT 50;
But unfortunately I can't seem to get good performance out of this. It always has to resort to a filesort. Is there a way to get better performance from this setup, or do I need to consider a different database?

This is explained in the documentation ORDER BY Optimization. In particular, if the index used to select rows in the WHERE clause is different from the one used in ORDER BY, it won't be able to use an index for ORDER BY.

In order to get an optimized query to fetch and retrieve like that, you need to have a key that orders by sort and then primary like so:
create table if not exists test.fetch_sort (
id int primary key,
val int,
key val_id (val, id)
);
insert into test.fetch_sort
values (1, 10), (2, 5), (3, 30);
explain
select *
from test.fetch_sort
where id in (1, 2, 3)
order by val;
This will give a query that only uses the index for searching/sorting.

Related

Limit and offset (pagination) for rows with related data (with joins)

I have tables Alpha and Beta. Beta belongs to Alpha.
create table Alpha
(
id int auto_increment primary key
);
create table Beta
(
id int auto_increment primary key,
alphaId int null,
orderValue int,
constraint Alpha_ibfk_1 foreign key (alphaId) references Alpha (id)
);
Here are a few test records:
insert into Alpha (id) values (1);
insert into Alpha (id) values (2);
insert into Beta (id, alphaId, orderValue) values (1, 1, 23);
insert into Beta (id, alphaId, orderValue) values (2, 1, 43);
insert into Beta (id, alphaId, orderValue) values (3, 2, 73);
I want to create a pagination for them, that would make sense in terms of my application logic. So when I set limit 2, for example, I expect to get a list of two Alpha records and their related records, but in fact when I set limit 2:
select *
from Alpha
inner join Beta on Alpha.id = Beta.alphaId
order by Beta.orderValue
limit 2;
I am resulted with only one Alpha record and its related data:
While I want to figure out a way for my LIMIT construct to only count unique occurrences of Alpha records and return me something like this:
Is it possible to do it in MySQL in one query? Maybe different RDBMS? Or going with multiple queries is the only option?
=== EDIT
The reason for such requirements is that I want to create an API with paging that returns records of Alpha, and their related Beta records. The problem is that the way limit works does not make sense from the user's standpoint: "Hey, I said I want 2 records of Alpha with its related data, not 1. What is that?"
There are a couple of issues with your example:
Your foreign key seems to be wrongly established.
Limiting overwhelmingly requires an explicit order of the rows. Otherwise the result will be unstable and non-reproducible.
Anyway, having said that, you can place a limit on rows for table Alpha and then perform the join against table Beta.
For example:
select *
from (
select *
from Alpha
order by id
limit 2 -- this limit only affects table Alpha
) x
join Beta b on b.alphaId = x.id

Is it possible to specify sorting order to MySQL index?

I have a items table with category_id field.
There is a specific rule to order the items by category_id.
I usually sort the data like this:
SELECT * FROM items ORDER BY FIELD(category_id, 2, 5, 1, 4, 3)
-- In this example, the "rule" is sorting in order of (2, 5, 1, 4, 3)
In this case, simply creating an index on category_id field does not work to speed up sorting items, because the index sorts the category_id just ascending like (1, 2, 3, 4, 5).
Is it possible to specify the sorting rule when I CREATE INDEX on category_id field?
(And then simply SELECT * FROM items ORDER BY category_id works)
Or do I have to create another field like sorted_category_id which is sorted according to the order rule?
Adding the column to the items table, with an index on it, would indeed be a solution focused on speed. By making it a generated column, you ensure consistency, and by making it a virtual column, you can move the extra data into an index (if you create it). So proceed like this:
ALTER TABLE items ADD (
category_ord int GENERATED ALWAYS AS (FIELD(category_id, 2, 5, 1, 4, 3)) VIRTUAL
);
CREATE INDEX idx_items_category_ord ON items(category_ord);
SELECT * FROM items ORDER BY category_ord;
Alternative
Alternatively, the normalised way is to add a column to the category table. This will have a slight performance impact if you have many categories, but does not pose that consistency problem, and saves space. To implement that idea, proceed as follows:
If you don't have that category table, then create it:
CREATE TABLE category(
id int NOT NULL PRIMARY KEY,
ord int NOT NULL,
name varchar(100)
);
Populate the ord field (or whatever you want to call it) as desired:
INSERT INTO category(id, ord, name) VALUES
(1, 30, 'cat1'),
(2, 10, 'cat2'),
(3, 50, 'cat3'),
(4, 40, 'cat4'),
(5, 20, 'cat5');
And add an index on the ord column:
CREATE INDEX category_ord ON category(ord);
Now the query would be:
SELECT *
FROM items
INNER JOIN category
ON items.category_id = category.id
ORDER BY category.ord;
The database engine can now decide to use the index on the ord column (or not), depending on its own analysis. If you want to force the use of it, you can use FORCE INDEX:
SELECT *
FROM items
INNER JOIN category FORCE INDEX category(category_ord)
ON items.category_id = category.id
ORDER BY category.ord;
Note that the engine can use your index on the items.category_id as well, for value by value lookup.
Like Akina says, I can use Generated Columns.
https://dev.mysql.com/doc/refman/8.0/en/create-index.html

MySQL: Why Select .. IN with subquery could not use index

I started to learn MySQL and facing some issues regarding indexing for subquery or join. I have two tables created as following
create table User(id integer, poster integer, PRIMARY KEY (id,poster));
insert into User(id, poster) values(1, 123);
insert into User(id, poster) values(1, 345);
insert into User(id, poster) values(2, 123);
create table Feed(id integer, poster integer, c integer, time integer, PRIMARY KEY(id), INDEX(poster),INDEX(time,c));
insert into Feed(id, poster, c,time) values(1, 123, 0, 2);
insert into Feed(id, poster, c,time) values(2, 123,1,1);
insert into Feed(id, poster, c,time) values(3, 345,2,3);
I initially tried some simple queries like
1. Select poster from User where id =1;
2. Select c from Feed where poster = 1;
3. Select c from Feed where poster in (1,2,3)
The third query explain looks like
SIMPLE Feed NULL ALL poster NULL NULL NULL 3 100.00 Using where; Using filesort
I am not sure why it requires file sort. However after add a composite index INDEX(time,poster,c)to Feed table.Same query will use index
Here is new create table query
create table Feed(id integer, poster integer, c integer, time integer, PRIMARY KEY(id),INDEX(time,poster, c));
Here is explain output with new composite index
1 SIMPLE Feed NULL index NULL time 15 NULL 3 50.00 Using where; Using index
My guess is since order by has higher priority and it is the leftmost index, so we used it first. Then by add poster into composite index, we will be able to still use this composite index to do filter, and finally return c.
Then I tried some subquery
explain SELECT Feed.c from Feed where Feed.poster IN(select poster from User where id =1) order by Feed.time;
Nothing fancy here, I just replace hardcoded (1,2,3) with subquery. I expect to see same explain result, but instead I get
1 SIMPLE User NULL ref PRIMARY,poster PRIMARY 4 const 1 100.00 Using index; Using temporary; Using filesort
1 SIMPLE Feed NULL index NULL time 15 NULL 3 33.33 Using where; Using index; Using join buffer (Block Nested Loop)
I am curious why USER table has Using temporary; Using filesort. I also tried left join it also has same explain output
explain SELECT Feed.c
FROM `Feed`
LEFT JOIN `User` on User.poster = Feed.poster where User.id = 1 order by Feed.time;
Based on my reading, we should avoid using filesort and temporaray file.
How can I optimize my indexing and queries?
Thanks
It's not that it can't, it's that there is no benefit.
An index is a bit like another table that can be joined on to first, to help with the join on to the real table.
In your case, it's quicker to scan the table. The alternative would be to use the index to isolate which row(s) in the underlying table are required and then go to the underlying table to get those rows.
That would be different if your table was a million rows long. Then it would be worth the effort of using the index, to reduce the effort in scanning the table.
So, write a testbed that creates a LOT more random data, then you'll be able to see it.
Alternatively, use a covering index. One that holds all the columns you need to search AND all the columns you'll include in SELECTs and JOINs.
In the example below I change (for table Feed) INDEX(poster) to INDEX(poster, c). Now, if the query planner reads from the index, it immediately knows the value of c too, without "joining" on to the underlying table.
create table User(id integer, poster integer, PRIMARY KEY (id,poster), INDEX(poster));
insert into User(id, poster) values(1, 123);
insert into User(id, poster) values(1, 345);
insert into User(id, poster) values(2, 123);
create table Feed(id integer, poster integer, c integer, time integer, PRIMARY KEY(id), INDEX(poster, c),INDEX(time,c));
insert into Feed(id, poster, c,time) values(1, 123, 0, 2);
insert into Feed(id, poster, c,time) values(2, 123,1,1);
insert into Feed(id, poster, c,time) values(3, 345,2,3);
Now, compare two queries...
Select c from Feed where poster in (1,2,3)
SELECT c, time FROM feed WHERE poster IN (1,2,3)
The first can be answered by just the index.
The second needs either to scan the whole table or seek on the index AND join on to the table. Because the table is so small, the optimiser will decide just to scan the whole table, as that will be cheaper.

MySql Indexes Sort and Where

I would like to know how MySql handle the indexes priority. I have the following table.
CREATE TABLE table (
colum1 VARCHAR(50),
colum2 VARCHAR(50),
colum3 ENUM('a', 'b', 'c'),
PRIMARY KEY(colum1, colum2, colum3)
);
CREATE INDEX colum1_idx ON table (colum1);
CREATE INDEX coloum2_idx ON table (colum2);
const query = `SELECT * FROM table
WHERE colum1 = ?
ORDER BY colum2
LIMIT ?,?`;
Basically my PK is composed by all fields (I need to use INSERT IGNORE) and I am query using colum1 as WHERE clause and ORDER by colum2.
My question is should I create 2 different indexes or create 1 index with (colum1 and colum2)?
Thanks to #JuanCarlosOpo
I find the answer here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#algorithm_step_2c_order_by_
It's more performant using a compound index using both columns.
CREATE INDEX colum_idx ON table (colum1,colum2);
Thanks a lot!

Order by field with SQLite

I'm actually working on a Symfony project at work and we are using Lucene for our search engine.
I was trying to use SQLite in-memory database for unit tests (we are using MySQL) but I stumbled upon something.
The search engine part of the project use Lucene indexing. Basically, you query it and you get an ordered list of ids, which you can use to query your database with a Where In() clause.
The problem is that there is an ORDER BY Field(id, ...) clause in the query, which order the result in the same order as the results returned by Lucene.
Is there any alternative to ORDER BY Field using SQLite ? Or is there another way to order the results the same way Lucene does ?
Thanks :)
Edit:
Simplified query might looks like this :
SELECT i.* FROM item i
WHERE i.id IN(1, 2, 3, 4, 5)
ORDER BY FIELD(i.id, 5, 1, 3, 2, 4)
This is quite nasty and clunky, but it should work. Create a temporary table, and insert the ordered list of IDs, as returned by Lucene. Join the table containing the items to the table containing the list of ordered IDs:
CREATE TABLE item (
id INTEGER PRIMARY KEY ASC,
thing TEXT);
INSERT INTO item (thing) VALUES ("thing 1");
INSERT INTO item (thing) VALUES ("thing 2");
INSERT INTO item (thing) VALUES ("thing 3");
CREATE TEMP TABLE ordered (
id INTEGER PRIMARY KEY ASC,
item_id INTEGER);
INSERT INTO ordered (item_id) VALUES (2);
INSERT INTO ordered (item_id) VALUES (3);
INSERT INTO ordered (item_id) VALUES (1);
SELECT item.thing
FROM item
JOIN ordered
ON ordered.item_id = item.id
ORDER BY ordered.id;
Output:
thing 2
thing 3
thing 1
Yes, it's the sort of SQL that will make people shudder, but I don't know of a SQLite equivalent for ORDER BY FIELD.