I started to learn MySQL and facing some issues regarding indexing for subquery or join. I have two tables created as following
create table User(id integer, poster integer, PRIMARY KEY (id,poster));
insert into User(id, poster) values(1, 123);
insert into User(id, poster) values(1, 345);
insert into User(id, poster) values(2, 123);
create table Feed(id integer, poster integer, c integer, time integer, PRIMARY KEY(id), INDEX(poster),INDEX(time,c));
insert into Feed(id, poster, c,time) values(1, 123, 0, 2);
insert into Feed(id, poster, c,time) values(2, 123,1,1);
insert into Feed(id, poster, c,time) values(3, 345,2,3);
I initially tried some simple queries like
1. Select poster from User where id =1;
2. Select c from Feed where poster = 1;
3. Select c from Feed where poster in (1,2,3)
The third query explain looks like
SIMPLE Feed NULL ALL poster NULL NULL NULL 3 100.00 Using where; Using filesort
I am not sure why it requires file sort. However after add a composite index INDEX(time,poster,c)to Feed table.Same query will use index
Here is new create table query
create table Feed(id integer, poster integer, c integer, time integer, PRIMARY KEY(id),INDEX(time,poster, c));
Here is explain output with new composite index
1 SIMPLE Feed NULL index NULL time 15 NULL 3 50.00 Using where; Using index
My guess is since order by has higher priority and it is the leftmost index, so we used it first. Then by add poster into composite index, we will be able to still use this composite index to do filter, and finally return c.
Then I tried some subquery
explain SELECT Feed.c from Feed where Feed.poster IN(select poster from User where id =1) order by Feed.time;
Nothing fancy here, I just replace hardcoded (1,2,3) with subquery. I expect to see same explain result, but instead I get
1 SIMPLE User NULL ref PRIMARY,poster PRIMARY 4 const 1 100.00 Using index; Using temporary; Using filesort
1 SIMPLE Feed NULL index NULL time 15 NULL 3 33.33 Using where; Using index; Using join buffer (Block Nested Loop)
I am curious why USER table has Using temporary; Using filesort. I also tried left join it also has same explain output
explain SELECT Feed.c
FROM `Feed`
LEFT JOIN `User` on User.poster = Feed.poster where User.id = 1 order by Feed.time;
Based on my reading, we should avoid using filesort and temporaray file.
How can I optimize my indexing and queries?
Thanks
It's not that it can't, it's that there is no benefit.
An index is a bit like another table that can be joined on to first, to help with the join on to the real table.
In your case, it's quicker to scan the table. The alternative would be to use the index to isolate which row(s) in the underlying table are required and then go to the underlying table to get those rows.
That would be different if your table was a million rows long. Then it would be worth the effort of using the index, to reduce the effort in scanning the table.
So, write a testbed that creates a LOT more random data, then you'll be able to see it.
Alternatively, use a covering index. One that holds all the columns you need to search AND all the columns you'll include in SELECTs and JOINs.
In the example below I change (for table Feed) INDEX(poster) to INDEX(poster, c). Now, if the query planner reads from the index, it immediately knows the value of c too, without "joining" on to the underlying table.
create table User(id integer, poster integer, PRIMARY KEY (id,poster), INDEX(poster));
insert into User(id, poster) values(1, 123);
insert into User(id, poster) values(1, 345);
insert into User(id, poster) values(2, 123);
create table Feed(id integer, poster integer, c integer, time integer, PRIMARY KEY(id), INDEX(poster, c),INDEX(time,c));
insert into Feed(id, poster, c,time) values(1, 123, 0, 2);
insert into Feed(id, poster, c,time) values(2, 123,1,1);
insert into Feed(id, poster, c,time) values(3, 345,2,3);
Now, compare two queries...
Select c from Feed where poster in (1,2,3)
SELECT c, time FROM feed WHERE poster IN (1,2,3)
The first can be answered by just the index.
The second needs either to scan the whole table or seek on the index AND join on to the table. Because the table is so small, the optimiser will decide just to scan the whole table, as that will be cheaper.
Related
I have tables Alpha and Beta. Beta belongs to Alpha.
create table Alpha
(
id int auto_increment primary key
);
create table Beta
(
id int auto_increment primary key,
alphaId int null,
orderValue int,
constraint Alpha_ibfk_1 foreign key (alphaId) references Alpha (id)
);
Here are a few test records:
insert into Alpha (id) values (1);
insert into Alpha (id) values (2);
insert into Beta (id, alphaId, orderValue) values (1, 1, 23);
insert into Beta (id, alphaId, orderValue) values (2, 1, 43);
insert into Beta (id, alphaId, orderValue) values (3, 2, 73);
I want to create a pagination for them, that would make sense in terms of my application logic. So when I set limit 2, for example, I expect to get a list of two Alpha records and their related records, but in fact when I set limit 2:
select *
from Alpha
inner join Beta on Alpha.id = Beta.alphaId
order by Beta.orderValue
limit 2;
I am resulted with only one Alpha record and its related data:
While I want to figure out a way for my LIMIT construct to only count unique occurrences of Alpha records and return me something like this:
Is it possible to do it in MySQL in one query? Maybe different RDBMS? Or going with multiple queries is the only option?
=== EDIT
The reason for such requirements is that I want to create an API with paging that returns records of Alpha, and their related Beta records. The problem is that the way limit works does not make sense from the user's standpoint: "Hey, I said I want 2 records of Alpha with its related data, not 1. What is that?"
There are a couple of issues with your example:
Your foreign key seems to be wrongly established.
Limiting overwhelmingly requires an explicit order of the rows. Otherwise the result will be unstable and non-reproducible.
Anyway, having said that, you can place a limit on rows for table Alpha and then perform the join against table Beta.
For example:
select *
from (
select *
from Alpha
order by id
limit 2 -- this limit only affects table Alpha
) x
join Beta b on b.alphaId = x.id
I would like to know how MySql handle the indexes priority. I have the following table.
CREATE TABLE table (
colum1 VARCHAR(50),
colum2 VARCHAR(50),
colum3 ENUM('a', 'b', 'c'),
PRIMARY KEY(colum1, colum2, colum3)
);
CREATE INDEX colum1_idx ON table (colum1);
CREATE INDEX coloum2_idx ON table (colum2);
const query = `SELECT * FROM table
WHERE colum1 = ?
ORDER BY colum2
LIMIT ?,?`;
Basically my PK is composed by all fields (I need to use INSERT IGNORE) and I am query using colum1 as WHERE clause and ORDER by colum2.
My question is should I create 2 different indexes or create 1 index with (colum1 and colum2)?
Thanks to #JuanCarlosOpo
I find the answer here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#algorithm_step_2c_order_by_
It's more performant using a compound index using both columns.
CREATE INDEX colum_idx ON table (colum1,colum2);
Thanks a lot!
I have a table like this:
uuid | username | first_seen | last_seen | score
Before, the table used the primary key of a "player_id" column that ascended. I removed this player_id as I no longer needed it. I want to make the 'uuid' the primary key, but there's a lot of duplicates. I want to remove all these duplicates from the table, but keep the first one (based off the row number, the first row stays).
How can I do this? I've searched up everywhere, but they all show how to do it if you have a row ID column...
I highly advocate having auto-incremented integer primary keys. So, I would encourage you to go back. These are useful for several reasons, such as:
They tell you the insert order of rows.
They are more efficient for primary keys.
Because primary keys are clustered in MySQL, they always go at the end.
But, you don't have to follow that advice. My recommendation would be to insert the data into a new table and reload into your desired table:
create temporary table tt as
select t.*
from tt
group by tt.uuid;
truncate table t;
alter table t add constraint pk_uuid primary key (uuid);
insert into t
select * from tt;
Note: I am using a (mis)feature of MySQL that allows you to group by one column while pulling columns not in the group by. I don't like this extension, but you do not specify how to choose the particular row you want. This will give values for the other columns from matching rows. There are other ways to get one row per uuid.
I'm using MySQL and I have a database that I'm trying to use as the canonical version of data that I will be storing in different indexes. That means that I have to be able to retrieve a set of data quickly from it by primary key, but I also need to sort it on the way out. I can't figure out how to let MySQL efficiently do that.
My table looks like the following:
CREATE TABLE demo.widgets (
id INT AUTO_INCREMENT,
-- lots more information I need
awesomeness INT,
PRIMARY KEY (id),
INDEX IDX_AWESOMENESS (awesomeness),
INDEX IDX_ID_AWESOMENESS (id, awesomeness)
);
And I want to do something along the lines of:
SELECT *
FROM demo.widgets
WHERE id IN (
1,
2,
3,
5,
8,
13 -- you get the idea
)
ORDER BY awesomeness
LIMIT 50;
But unfortunately I can't seem to get good performance out of this. It always has to resort to a filesort. Is there a way to get better performance from this setup, or do I need to consider a different database?
This is explained in the documentation ORDER BY Optimization. In particular, if the index used to select rows in the WHERE clause is different from the one used in ORDER BY, it won't be able to use an index for ORDER BY.
In order to get an optimized query to fetch and retrieve like that, you need to have a key that orders by sort and then primary like so:
create table if not exists test.fetch_sort (
id int primary key,
val int,
key val_id (val, id)
);
insert into test.fetch_sort
values (1, 10), (2, 5), (3, 30);
explain
select *
from test.fetch_sort
where id in (1, 2, 3)
order by val;
This will give a query that only uses the index for searching/sorting.
I am developing an application for my college's website and I would like to pull all the events in ascending date order from the database. There is a total of four tables:
Table Events1
event_id, mediumint(8), Unsigned
date, date,
Index -> Primary Key (event_id)
Index -> (date)
Table events_users
event_id, smallint(5), Unsigned
user_id, mediumint(8), Unsigned
Index -> PRIMARY (event_id, user_id)
Table user_bm
link, varchar(26)
user_id, mediumint(8)
Index -> PRIMARY (link, user_id)
Table user_eoc
link, varchar(8)
user_id, mediumint(8)
Index -> Primary (link, user_id)
Query:
EXPLAIN SELECT * FROM events1 E INNER JOIN event_users EU ON E.event_id = EU.event_id
RIGHT JOIN user_eoc EOC ON EU.user_id = EOC.user_id
INNER JOIN user_bm BM ON EOC.user_id = BM.user_id
WHERE E.date >= '2013-01-01' AND E.date <= '2013-01-31'
AND EOC.link = "E690"
AND BM.link like "1.1%"
ORDER BY E.date
EXPLANATION:
The query above does two things.
1) Searches and filters out all students through the user_bm and user_eoc tables. The "link" columns are denormalized columns to quickly filter students by major/year/campus etc.
2) After applying the filter, MYSQL grabs the user_ids of all matching students and finds all events they are attending and outputs them in ascending order.
QUERY OPTIMIZER EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE EOC ref PRIMARY PRIMARY 26 const 47 Using where; Using index; Using temporary; Using f...
1 SIMPLE BM ref PRIMARY,user_id-link user_id-link 3 test.EOC.user_id 1 Using where; Using index
1 SIMPLE EU ref PRIMARY,user_id user_id 3 test.EOC.user_id 1 Using index
1 SIMPLE E eq_ref PRIMARY,date-event_id PRIMARY 3 test.EU.event_id 1 Using where
QUESTION:
The query works fine but can be optimized. Specifically - using filesort and using temporary is costly and I would like to avoid this. I am not sure if this is possible because I would like to 'Order By' events by date that have a 1:n relationship with the matching users. The Order BY applies to a joined table.
Any help or guidance would be greatly appreciated. Thank you and Happy Holidays!
Ordering can be done in two ways. By index or by temporary table. You are ordering by date in table Events1 but it's using the PRIMARY KEY which doesn't contain date so in this case the result needs to be ordered in a temporary table.
It is not necessarily expensive though. If the result is small enough to fit in memory it will not be a temporary table on disk, just in memory and that is not expensive.
Neither is filesort. "Using filesort" doesn't mean it will use any file, it just means it's not sorting by index.
So, if your query executes fast you should be happy. If the result set is small it will be sorted in memory and no files will be created.