Why mysql fulltext search not run as expected? - mysql

I have a table with 700, 000 rows. One column called 'data' is text type. I add fulltext index on this column to improve my query speed.
Here are two query, the second not return as expected.
You can see that the first query return one result with the keywords I specified.
It took 2 seconds
I thought the second query should run faster since I give more filter condition. But it tooks about one minute.
Giving more conditions should narrow down the data set to search, why it slower?
MYSQL version is 8.0.16 Engine is InnoDB. Sorry about the Mosaic

Giving more conditions should narrow down the data set to search, why it slower?
FULLTEXT will search for each required string, getting a list of row numbers (or something equivalent).
For multiple required strings, it will get multiple lists and "AND" them together.
Furthermore, when there are two strings ("26228" and "31500733" in quotes together, their adjacency needs to be verified. This may be the slow part.
Consider this instead
MATCH AGAINST(+uf8... +26228 +31500733 IN BOOLEAN MODE)
That won't test the adjacency, but that might not matter to the end results. (Note also that I skipped the too-short "i".)

Related

how mysql `select where in` query work ? why larger amount parameter faster?

I am doing query with SELECT WHERE IN, and found something unpredicted,
SELECT a,b,c FROM tbl1 where a IN (a1,a2,a3...,a5000) --> this query took 1.7ms
SELECT a,b,c FROM tbl1 where a IN (a1,a2,a3...,a20) --> this query took 6.4ms
what is the algorithm of IN ? why larger amount of parameter faster than smaller one ?
The following is a guess...
For every SQL query, the optimizer analyzes the query to choose which index to use.
For multi-valued range queries (like IN(...)), the optimizer performs an "index dive" for each value in the short list, trying to estimate whether it's a good idea to use the index. If you are searching for values that are too common, it's more efficient to just do a table-scan, so there's no need to use the index.
MySQL 5.6 introduced a special optimization, to skip the index dives if you use a long list. Instead, it just guesses that the index on your a column may be worth using, based on stored index statistics.
You can control how long of a list causes the optimizer to skip index dives with the eq_range_index_dive_limit option. The default is 10. Your example shows a list of length 20, so I'm not sure why it's more expensive.
Read the manual about this feature here: https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#equality-range-optimization

Drawback of using multiple conditions in where clause

As we all know that the where clause of a sql query executes before select clause. And we put some conditions in where clause to filter out the result according to our requirement.
While writing some queries I encountered a question in my mind that, is/are there any drawback(s) of using multiple conditions in the where clause and in what order they are applied to filter the result from the selected table.
For example: we have a table
Building(name,height,owner,builder_name,age)
and we have a query:
select * from Building
where height between X and Y and age between A and B
Now, how this query will execute. And what about the order of the conditions i.e.
X<=height<=Y and A<=age<=B
Will it be something like, first the whole record will be searched for the height within the given range and then same thing will be done for age well. ???
the Database server has multiple options to solve that query. It will choose the option the server "thinks" is faster.
The options I see are:
Scan the whole table and filter out rows that don't satisfy the where clause
Seek a height range on an index on height column, then filter out rows using age between A and B predicate.
Seek an age range on an index on age column, then filter out rows using height between X and Y predicate
Seek both indexes, then perform an index intersection
The database server not always use an index that might be applicable, it considers some things before using it, such as:
The index selectivity.
The index coverage.
High selectivity indexes are more likely to be used.
Covering indexes are likely used.
The query planner will try to find the optimal way to search the table and test the WHERE clause. It will start by trying to use an index if possible, which will narrow down the rows that it needs to search. If there are multiple potential indexes, it will try to use the one that it estimates will narrow it down best.
Then it will scan all these rows, and test each of them against all of the remaining conditions. It should never need to make multiple passes over the entire table.
If you want to see how a particular query will be executed, use the EXPLAIN command.

Whether or not SQL query (SELECT) continues or stops reading data from table when find the value

Greeting,
My question; Whether or no sql query (SELECT) continues or stops reading data (records) from table when find the value that I was looking for?
referance: "In order to return data for this query, mysql must start at the beginning of the disk data file, read in enough of the record to know where the category field data starts (because long_text is variable length), read this value, see if it satisfies the where condition (and so decide whether to add to the return record set), then figure out where the next record set is, then repeat."
link for referance: http://www.verynoisy.com/sql-indexing-dummies/#how_the_database_finds_records_normally
In general you don't know and you don't care, but you have to adapt when queries take too long to execute. When you do something like
select a,b,c from mytable where a=3 and b=5
then the database engine has a couple of options to optimize. When all these options fail, then it will do a "full table scan" - which means, it will have to examine the entire table to see which rows are eligible. When you have indices on e.g. column a then the database engine can optimize the search because it can pre-select rows where a has value 3. So, in general, make sure that you have indices for the columns that are most searched. (Perversely, some database engines get confused when you have too many indices and will fall back to a full table scan because they've lost their way...)
As to whether or not the scanning stops: In general, the database engine has to examine all data in the table (hopefully aided by indices) and won't stop after having found just one hit. If you want just the first hit, use a limit 1 clause to make sure that your result set has only one outcome. But then again, if you have a sort by clause, the database engine cannot stop after the first hit, there might be next ones that should get priority given the sorting.
Summarizing, how the db engine does its scan depends on how smart it is, what indices are available etc.. If your select queries take too long then consider re-organizing your indices, writing your select statements differently, or rebuilding the table.
The RDBMS reading data from disk is something you cannot know, you should not care and you must not rely on.
The issue is too broad to get a precise answer. The engine reads data from storage in blocks, a block can contain records that are not needed by the query at hand. If all the columns needed by the query is available in an index, the RDBMS won't even read the data file, it will only use the index. The data it needs could already be cached in memory (because it was read during the execution of a previous query). The underlying OS and the storage media also keep their own caches.
On a busy system, all these factors could lead to very different storage access patterns while running the same query several times on a couple of minutes apart.
Yes it scans the entire file. Unless you put something like
select * from user where id=100 limit 1
This of course will still search entire rows if id 100 is the last record.
If id is a primary key it will automatically be indexed and searching would be optimized
I'm sorry... I thought the table.
I will change question and I will explain it in the following image;
I understand that in CASE 1 all columns must be read with each iteration.
My question is: If it's the same in the CASE 2 or columns that are not selected in the query are excluded from reading in each iteration.
Also, are the both queries are the some in performance perspective?
Clarify:
CASE: 1 In first CASE select print all data
CASE: 2 In second CASE select print columns first_name and last_name
Whether in CASE 2 mysql server (SQL query) reads only columns first_name, last_name or read the entire table to get that data(rows)=(first_name, last_name)?
An interest of me how the server reads table row in CASE 1 and CASE 2?

Partitioning of a large MySQL table that uses LIKE for search

I have a table with 80 millions of records. The structure of the table:
id - autoincrement,
code - alphanumeric code from 5 to 100 characters,
other fields.
The most used query is
SELECT * FROM table
WHERE code LIKE '%{user-defined-value}%'
The number of queries is growing as well as the recodrs count. Very soon I will have performance problems.
Is there any way to split the table in the parts? Or maybe some other ways to optimize the table?
The leading % in the search is the killer here. It negates the use of any index.
The only thing I can think of is to partition the table based on length of code.
For example, if the code that is entered is 10 characters long, then first search the table with 10 character codes, without the leading percent sign, then search the table with 11 character codes, with the leading percent sign, and then the table with 12 character codes, with the leading percent sign, and so on.
This saves you from searching through all of codes that are less than 10 characters long that will never match. Also, you are able to utilize an index for one of the searches (the first one).
This also will help keep the table sizes somewhat smaller.
You can use a UNION to perform all of the queries at once, though you'll probably want to create the query dynamically.
You should also take a look to see if FULLTEXT indexing might be a better solution.
Some thoughts:
You can split the table into multiple smaller tables based on a certain condition. For example, on ID perhaps or may be code or may be any other fields. It basically means that you keep a certain type of records in a table and split different types into different tables
Try MySQL Partitioning
If possible. purge older entries or you may at least think of moving them to another archive table
Instead of LIKE, consider using REGEXP for regular expression search
Rather than running SELECT *, try selecting only selective columns SELECT id, code, ...
I'm not sure if this query is somewhat related to a search within your application where a user inputted value is compared with the code column and results echoed to the user. But if yes, you can try to add options to the search query, like asking user if he wants an exact match or should start with match etc. This way you do not necessarily need to run a LIKE match everytime
This should have been the first point, but I assume you have the right indexes on the table
Try using more of the query cache. The best way to use it is to avoid frequent updates to the table because on each update the query cache is cleaned. So lesser the updates, more likely it is that MySQL caches the queries, which will then mean quicker results
Hope the above helps!

Is it OK to index all the fields in this mysql query?

I have this mysql query and I am not sure what are the implications of indexing all the fields in the query . I mean is it OK to index all the fields in the CASE statement, Join Statement and Where Statement? Are there any performance implications of indexing fields?
SELECT roots.id as root_id, root_words.*,
CASE
WHEN root_words.title LIKE '%text%' THEN 1
WHEN root_words.unsigned_title LIKE '%normalised_text%' THEN 2
WHEN unsigned_source LIKE '%normalised_text%' THEN 3
WHEN roots.root LIKE '%text%' THEN 4
END as priorities
FROM roots INNER JOIN root_words ON roots.id=root_words.root_id
WHERE (root_words.unsigned_title LIKE '%normalised_text%') OR (root_words.title LIKE '%text%')
OR (unsigned_source LIKE '%normalised_text."%') OR (roots.root LIKE '%text%') ORDER by priorities
Also, How can I further improve the speed of the query above?
Thanks!
You index columns in tables, not queries.
None of the search criteria you've specified will be able to make use of indexes (since the search terms begin with a wild card).
You should make sure that the id column is indexed, to speed the JOIN. (Presumably, it's already indexed as a PRIMARY KEY in one table and a FOREIGN KEY in the other).
To speed up this query you will need to use full text search. Adding indexes will not speed up this particular query and will cost you time on INSERTs, UPDATEs, and DELETEs.
Caveat: Indexes speed up retrieval time but cause inserts and updates to run slower.
To answer the implications of indexing every field, there is a performance hit when using indexes whenever the data that is indexed is modified, either through inserts, updates, or deletes. This is because SQL needs to maintain the index. It's a balance between how often the data is read versus how often it is modified.
In this specific query, the only index that could possibly help would be in your JOIN clause, on the fields roots.id and root_words.root_id.
None of the checks in your WHERE clause could be indexed, because of the leading '%'. This causes SQL to scan every row in these tables for a matching value.
If you are able to remove the leading '%', you would then benefit from indexes on these fields... if not, you should look into implementing full-text search; but be warned, this isn't trivial.
Indexing won't help when used in conjunction with LIKE '%something%'.
It's like looking for words in a dictionary that have ae in them somewhere. The dictionary (or Index in this case) is organised based on the first letter of the word, then the second letter, etc. It has no mechanism to put all the words with ae in them close together. You still end up reading the whole dictionary from beginning to end.
Indexing the fields used in the CASE clause will likely not help you. Indexing helps by making it easy to find records in a table. The CASE clause is about processing the records you have found, not finding them in the first place.
Optimisers can also struggle with optimising multiple unrelated OR conditions such as yours. The optimiser is trying to narrow down the amount of effort to complete your query, but that's hard to do when unrelated conditions could make a record acceptable.
All in all your query would benefit from indexes on roots(root_id) and/or roots(id), but not much else.
If you were to index additional fields though, the two main costs are:
- Increased write time (insert, update or delete) due to additional indexes to write to
- Increased space taken up on the disk