optimize SELECT query, knowing that we are dealing with a limited range - mysql

I am trying to include in a MYSQL SELECT query a limitation.
My database is structured in a way, that if a record is found in column one then only 5000 max records with the same name can be found after that one.
Example:
mark
..mark repeated 5000 times
john
anna
..other millions of names
So in this table it would be more efficent to find the first Mark, and continue to search maximum 5000 rows down from that one.
Is it possible to do something like this?

Just make a btree index on the name column:
CREATE INDEX name ON your_table(name) USING BTREE
and mysql will silently do exactly what you want each time it looks for a name.

Try with:
SELECT name
FROM table
ORDER BY (name = 'mark') DESC
LIMIT 5000
Basicly you sort mark 1st then the rest follow up and gets limited.

Its actually quite difficult to understand your desired output .... but i think this might be heading in the right direction ?
(SELECT name
FROM table
WHERE name = 'mark'
LIMIT 5000)
UNION
(SELECT name
FROM table
WHERE name != 'mark'
ORDER BY name)
This will first get upto 5000 records with the first name as mark then get the remainder - you can add a limit to the second query if required ... using UNION
For performance you should ensure that the columns used by ORDER BY and WHERE are indexed accordingly

If you make sure that the column is properly indexed, MySQL will take care off optimisation for you.
Edit:
Thinking about it, I figured that this answer is only useful if I specify how to do that. user nobody beat me to the punch: CREATE INDEX name ON your_table(name) USING BTREE
This is exactly what database indexes are designed to do; this is what they are for. MySQL will use the index itself to optimise the search.

Related

If your table has more selects than inserts, are indexes always beneficial?

I have a mysql innodb table where I'm performing a lot of selects using different columns. I thought that adding an index on each of those fields could help performance, but after reading a bit on indexes I'm not sure if adding an index on a column you select on always helps.
I have far more selects than inserts/updates happening in my case.
My table 'students' looks like:
id | student_name | nickname | team | time_joined_school | honor_roll
and I have the following queries:
# The team column is varchar(32), and only has about 20 different values.
# The honor_roll field is a smallint and is only either 0 or 1.
1. select from students where team = '?' and honor_roll = ?;
# The student_name field is varchar(32).
2. select from students where student_name = '?';
# The nickname field is varchar(64).
3. select from students where nickname like '%?%';
all the results are ordered by time_joined_school, which is a bigint(20).
So I was just going to add an index on each of the columns, does that make sense in this scenario?
Thanks
Indexes help the database more efficiently find the data you're looking for. Which is to say you don't need an index simply because you're selecting a given column, but instead you (generally) need an index for columns you're selecting based on - i.e. using a WHERE clause (even if you don't end up including the searched column in your result).
Broadly, this means you should have indexes on columns that segregate your data in logical ways, and not on extraneous, simply informative columns. Before looking at your specific queries, all of these columns seem like reasonable candidates for indexing, since you could reasonably construct queries around these columns. Examples of columns that would make less sense would be things phone_number, address, or student_notes - you could index such columns, but generally you don't need or want to.
Specifically based on your queries, you'll want student_name, team, and honor_roll to be indexed, since you're defining WHERE conditions based on the values of these columns. You'll also benefit from indexing time_joined_school if, as you suggest, you're ORDER BYing your queries based on that column. Your LIKE query is not actually easy for most RDBs to handle, and indexing nickname won't help. Check out How to speed up SELECT .. LIKE queries in MySQL on multiple columns? for more.
Note also that the ratio of SELECT to INSERT is not terribly relevant for deciding whether to use an index or not. Even if you only populate the table once, and it's read-only from that point on, SELECTs will run faster if you index the correct columns.
Yes indexes help on accerate your querys.
In your case you should have index on:
1) Team and honor_roll from query 1 (only 1 index with 2 fields)
2) student_name
3) time_joined_school from order
For the query 3 you can't use indexes because of the like statement. Hope this helps.

Query optimization order by

I have two tables
LangArticles | columns: id (INT) ,de (VARCHAR),en (VARCHAR),count_links(INT)
WikiLinks | columns: article_id,link_id,nr_in_article (all integer)
The name of an article is in the columns de (German) and en (English).
The id in the LangArticles table is the same as the ids article_id and link_id.
I want now to get all article names which links to another article. So I want all articles which links to 'abc'. 'abc' has the id = '1'
So my normal query (without an order by) looks like:
select distinct(LA.de),W.nr_in_article,LA.count_links from
LangArticles as LA inner join WikiLinks as W on W.article_id = LA.id
where W.link_id in ("1")
This maybe took 0.001 seconds and give me 100000 results. Actually I want the best 5 hits.
Best means in this case the most relevant ones. I want to sort it like this:
The articles which links to 'abc' at the beginning of an article (nr_in_article) and which has a lot of links itself (count_links) should have a high ranking.
I am using an
order by (1-(W.nr_in_article/LA.count_links)) desc
for this.
The problem is that I am not sure how to optimize this order by.
The Explain in mysql says that he has to use a temporary file and filesort and can't use the index on the order by keys. For testing I tried an "easy" order by W.nr_in_article so an normal order with one key.
For your information my indices are:
in LangArticles: id (primary),de (unique),en (unique), count_links(index)
in WikiLinks: article_id(index),link_id(index),nr_in_article(index)
But I tried this two multiindices link_id,nr_in_article & article_id,nr_in_article as well.
And the query with order by tooks approximately 5.5 seconds. :(
I think I know why MySql has to use a temporary file and filesort here because all 100,000 entries has to be found with one index (link_id) and afterwards it has to be sorted and in a temporary file it can't use an index.
But is there any way to make this faster?
Actually I only want the best 5 hits so there is no need to sort everything. I am not sure if sth. like the bad sort (bubble sort) would be faster for this than Quicksort which sorts the hole temporary table.
Since you only need the top 5 I think you could split it into two queries that should lead less results.
First like Sam pointed out,
order by (W.nr_in_article/LA.count_links) asc
should be equivalent to your
order by (1-(W.nr_in_article/LA.count_links)) desc
unless I'm overlooking some corner case here.
Furthermore, anything where
W.nr_in_article > LA.count_links
will be in the TOP 5 unless that result is empty, so I would try the query
select distinct(LA.de),W.nr_in_article,LA.count_links
from LangArticles as LA
inner join WikiLinks_2 as W on W.article_id = LA.id
and W.nr_in_article > LA.count_links
where W.link_id in ("1")
order by W.nr_in_article/La.count_links
limit 5
Only if this returns less than 5 results you have to further execute the query again with a changed where condition.
This however will not bring the runtime down by orders of magnitude, but should help a little. If you need more performance I don't see any other way than a materialized view, which I don't think is available in mysql, but can be simulated using triggers.

mysql regex() alternation match order?

With the following query:
SELECT * FROM people WHERE name REGEXP(bob|robert)
Am I right in assuming that mysql will scan each row in a fairly random order looking for either 'bob' or 'robert' (rather than bob first, then another scan for robert)?
If so, is there any way to get mysql to attempt to match the entire table against 'bob' first and then 'robert' without performing two seperate queries?
SELECT * FROM people WHERE name REGEXP(bob|robert) order by name desc
It is only one query, and do the job.
SGBD can scan the data as they are please to do, it is not specify in SQL, and it is not random.
Unspecified can be random but isn't random.
There is no logical way to match the entire table against bob first (why would you want to?)
You can order the results, though, but it can be slow if the table has high cardinality and/or name is not a key.
SELECT * FROM people WHERE name = 'bob' OR name = 'robert'
ORDER BY name = 'bob' DESC

MySql Explain ignoring the unique index in a particular query

I started looking into Index(es) in depth for the first time and started analyzing our db beginning from the users table for the first time. I searched SO to find a similar question but was not able to frame my search well, I guess.
I was going through a particular concept and this first observation left me wondering - The difference in these Explain(s) [Difference : First query is using 'a%' while the second query is using 'ab%']
[Total number of rows in users table = 9193]:
1) explain select * from users where email_address like 'a%';
(Actually matching columns = 1240)
2) explain select * from users where email_address like 'ab%';
(Actually matching columns = 109)
The index looks like this :
My question:
Why is the index totally ignored in the first query? Does mySql think that it is a better idea not to use the index in the case 1? If yes, why?
If the probability, based statistics mysql collects on distribution of the values, is above a certain ratio of the total rows (typically 1/11 of the total), mysql deems it more efficient to simply scan the whole table reading the disks pages in sequentially, rather than use the index jumping around the disk pages in random order.
You could try your luck with this query, which may use the index:
where email_address between 'a' and 'az'
Although doing the full scan may actually be faster.
This is not a direct answer to your question but I still want to point it out (in case you already don't know):
Try:
explain select email_address from users where email_address like 'a%';
explain select email_address from users where email_address like 'ab%';
MySQL would now use indexes in both the queries above since the columns of interest are directly available from the index.
Probably in the case where you do a "select *", index access is more costly since the optmizer has to go through the index records, find the row ids and then go back to the table to retrieve other column values.
But in the query above where you only do a "select email_address", the optmizer knows all the information desired is available right from the index and hence it would use the index irrespective of the 30% rule.
Experts, please correct me if I am wrong.

Mysql table or query optimisation

I am running the following query and however I change it, it still takes almost 5 seconds to run which is completely unacceptable...
The query:
SELECT cat1, cat2, cat3, PRid, title, genre, artist, author, actors, imageURL,
lowprice, highprice, prodcatID, description
from products
where title like '%' AND imageURL <> '' AND cat1 = 'Clothing and accessories'
order by userrating desc
limit 500
I've tried taking out the "like %", taking out the "imageURl <> ''" but still the same. I've tried returning only 1 colum, still the same.
I have indexes on almost every column in the table, certainly all the columns mentioned in the query.
This is basically for a category listing. If I do a fulltext search for something in the title column which has a fulltext index, it takes less than a second.
Should I add another fulltext index to column cat1 and change the query focus to "match against" on that column?
Am I expecting too much?
The table has just short of 3 million rows.
You said you had an index on every column. Do you have an index such as?
alter table products add index (cat1, userrating)
If you don't, give it a try. Run that query and let me know if it run faster.
Also, I assume you're actually setting some kind of filter instead of the % on the title, field, right?
You should rather have the cat1 as a integer, then a string in these 3 million rows. You must also index correctly. If indexing all columns only improved, then it'd be a default thing the system would do.
Apart from that, title LIKE '%' doesn't do anything. I guess you use it to search so it becomes `title LIKE 'search%'
Do you use any sort of framework to fetch this? Getting 500 rows with a lot of columns can exhaust the system if your framework saves this to a large array. It may probably not be the case, but:
Try running a ordinary $query = mysql_query() and while($row = mysql_fetch_object($query)).
I suggest to add an index with the columns queried: title, imageURL and cat1.
Second improvement: use the SQL server cache, it will deadly improve the speed.
Last improvement: if you query is always like that, only the values change, then use prepared statements.
Well, I am quite sure that a % as the first char in a LIKE clause, gives you a full table scan for that column (in your case you won't have that full table scan executed because you already have restricting clauses in the AND clause).
Beside that try to add an index on cat1 column. Also, try to add other criterias to your query, in order to reduce the size of the dataset - your working data set (the number of rows that matches your query, without the LIMIT clause) might be too big also.