What does it mean to have access type "All" in the query optimizer? I found this using the explain=format json statement in MySQL.
"ALL" means it's a full table scan, no index can be used at all, this is not good, you'd better add some indexes.
Check out the Mannul
Here "ALL" means your query's return output after a full table scan is done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked const, and usually very bad in all other cases. Normally, you can avoid ALL by adding indexes that enable row retrieval from the table based on constant values or column values from earlier tables.
One problem here is that MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. Table1.column1 is declared as CHAR(10) and Table2.column2 is CHAR(15), so there is a length mismatch. if both table's both columns have same datatype and length that give more optimize output.
Mostly developer make many indexes on one table, so you can specify which index use for your query by use keyword "USER INDEX"
e.g:
SELECT select_fields
FROM tablename USE INDEX(indexname)
WHERE condition;
Related
I have the following problem when running a mysql query:
Query is very slow and when i use explain the query key is null but possible_keys are avaiable and the order is correct, i also tried adding independent indexes per each row but still key was NULL.
You can see table, index and mysql explain here: https://snag.gy/vcChl6.jpg
The optimizer likely has just decided that there is no reason to use the index.
Since you are using SELECT * that means that means that if it used the index, then it would have to use the primary key from the index to then go back and look up all the necessary data from the clustered index. That is referred to as a double lookup, and is generally bad for performance. As there are so few records in this table, the optimizer likely decided that it can easily do a full table scan instead and get your result faster.
In short, this is expected behavior.
If you want to SELECT just some columns, add them to the t1 index and then just SELECT only the columns you need, with that given WHERE clause. It should use the index then. As your table grows in size, it may start using the index as well, once it estimates that the double lookup is cheaper than the full table scan.
A guess: Most rows are of that 'project' and that 'lang'.
The Optimizer does not understand that fact, so it takes the index that is obviously the best:
(id_project, id_lang)
This one would be equally good: (id_lang, id_project).
No fair... The EXPLAIN mentions indexes named id_project and id_lang (not useful), but the list of indexes shows a composite index t1(id_project, id_lang) (useful).
Then, as Willem suggests, it has to bounce between the index and the table. Normally (that is, when it has adequate statistics), the Optimizer will say "Oh, more than ~20% of the table is being referenced; let's ignore any index."
Things you can do:
Get rid of that index.
Change * to a list of just the columns you need. In particular, if you avoid the 3 TEXT columns, two optimizations kick in. Alternatively, any that will never be longer than 255 characters can be changed to VARCHAR(255).
Use some other filtering, ordering, limiting, etc. If this is a web application, do you really want to get ~534 rows?
In my database, I have a table that contains a companyId, pointing to a company, and some text. I would like to do a FULLTEXT search, but as I always make requests against a specific companyId I'd like to use a composite key that combines my companyId and the fulltext index.
Is there anyway to do that ? As I guess this is not possible, what is the optimal way to create indexes so that the following query is fastest ?
The request will always be
SELECT * FROM textTable
WHERE companyId = ? (Possibly more conditions) AND
MATCH(value) AGAINST("example")
Should I create my indexes on integer columns normally and add one fulltext index ? or should I include the value column in the index ? Maybe both ?
Depending on how large the text field is, MySQL will store the data to disk, so including it in a compound index won't do you much good. I this scenario, the index should simply be the companyId, which will have a reference back to the PK, which will then parse each of the text fields.
The general rule of thumb is to filter early, so filtering first by companyId (int - 4 bytes) is preferable. Now, if the text field is small enough, you can add it as the second value in the index to prevent the second round-trip, but understand that this will impact INSERT performances significantly.
The best option for this type of scenario may be to use a NoSQL database to handle the text lookup, if the text fields are large; they're pretty exceptional with that type of job.
Why you're selecting everything? this is not going to help your query performance.
You must select only the columns that you need. This would reduce the amount of time taking in scanning the table.
For FULLTEXT search, your method would sort the results by relevance and then uses the index lookup based on your WHERE clause. So, you need to revert that into a straight forward lookup by adding the index lookup within your SELECT clause.
The query should be something like this :
SELECT
MATCH(value) AGAINST("example")
FROM
textTable
you can add more filtering based on WHERE clause, or better you may involve IF statement or CASE or any other function if you would.
something like this :
SELECT
IF(MATCH(value) AGAINST("example"), 'Your Returned value if TRUE', 'Your returned value if FALSE')
FROM textTable
This will do a full table scan, and this is faster.
For indexing part,
You can use index on companyId and include value with it. You can do the same with any other columns that you use in your SELECT, WHERE, and ORDER BY.
Sometimes it is okay to create more indexes, each one specific for particular task.
I am trying to understand indexes in MySQL. I know that an index created in a table can speed up executing queries and it can slow down the inserting and updating of rows.
When creating an index, I used this query on a table called authors that contains (AuthorNum, AuthorFName, AuthorLName, ...)
Create index Index_1 on Authors ([What to put here]);
I know I have to put a column name, but which one?
Do I have to put the column name that will be compared in the Where statement when a user query the Table or what?
The Anatomy of an Index
An index is a distinct data structure within a database and is data redundancy. Its primary purpose is to provide an ordered representation of the indexed data through a logical ordering which is independent of the physical ordering. We do this using a doubly linked list and a tree structure known as the balanced search tree (B-tree). B-trees are nice because they keep data sorted and allow searches, access, insertions, and deletions in logarithmic time. Because of the doubly linked list, we are able to go backwards or forwards as needed on the index for various queries easily. Inserts become simple since we only have to rearrange pointers to the different pieces of data. Databases use these doubly linked list to connect leaf nodes (usually in a B+ tree or B-tree), each of which are stored in a page, and to establish logical ordering between the leaf nodes. Operations like UPDATE or INSERT become slower because they are actually two writing operations in the filesystem (one for the table data and one for the index data).
Defining an Optimal Index With WHERE
To define an optimal index you must not only understand how indexes work, but you must also understand how the application queries the data. E.g., you must know the column combinations that appear in the WHERE clause.
A common restriction with queries on LAST_NAME and FIRST_NAME columns deals with case sensitivity. For example, instead of doing an exact search like Hotinger we would prefer to match all results such as HoTingEr and so on. This is very easy to do in a WHERE clause: we just say WHERE UPPER(LAST_NAME) = UPPER('Hotinger')
However, if we define an index of LAST_NAME and query, it will actually run a full table scan because the query is not on LAST_NAME but on UPPER(LAST_NAME). From the database's perspective, this is completely different. So, in this case you should define the index on UPPER(LAST_NAME) instead.
Indexes do not necessarily have to be for one column. For example, if the primary key is a composite key (consisting of multiple columns) it will create a concatenated index also known as a combined index. Note that the ordering of the concatenated index has a significant impact on its usability and scalability so it must be chosen carefully. Basically, the ordering should match the way it is ordered in the WHERE clause.
Defining an Optimal Index With LIKE
The position of the wildcard characters makes a huge difference. LIKE clauses only use the characters before the wildcard during tree traversal; the rest do not narrow the scanned index range. The more selective the prefix of the LIKE clause the more narrow the scanned index becomes. This makes the index lookup faster. As a tip, avoid LIKE clauses which lead with wildcards like "%OTINGER%" For full-text searches, MySQL offers MATCH and AGAINST keywords. Starting with MySQL 5.6, you can have full-text indexes. Look at Full-Text Search Functions from MySQL for more in-depth discussion on indexing these results.
Yes, generally you need an index on the column or columns that you compare in the WHERE clause of your queries to speed up queries.
If you search by AuthorFName, then you create an index on that column. If they search by AuthorLName, then you create an index on that column.
In this case though, maybe what you should be looking at is a FULLTEXT index. That would allow users to enter fuzzy queries, which would return a number of results ordered by relevance.
From the MySQL Manual:
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster than reading
sequentially. If you need to access most of the rows, it is faster to
read sequentially, because this minimizes disk seeks.
An index usually means a B-Tree. Understand the structure of the B-Tree and you'll understand what index can and cannot do.
In your particular case:
WHERE AuthorLName = 'something' and WHERE AuthorLName LIKE 'something%' can be sped-up by an index on {AuthorLName}.
WHERE AuthorLName = 'something AND AuthorFName = 'something else' can be sped-up by a composite index on {AuthorLName, AuthorFName} or {AuthorFName, AuthorLName}.
WHERE AuthorLName = 'something OR AuthorFName = 'something else' (which doesn't make much sense, but is here as an example) can be sped-up by having two indexes: on {AuthorLName} and on {AuthorFName}.
WHERE AuthorLName LIKE '%something' cannot be sped-up by a B-Tree index (cunsider full-text indexing).
Etc...
See Use The Index, Luke! for a much more thorough treatment of the subject than possible in a simple SO post.
Limited length index:
When using text columns or very large varchar columns you won't be able to create an index over the entire length of the text/varchar, there are some limits (around 1024 ASCII characters in length).
In such a case you specify the length in the index declaration.
CREATE INDEX `my_limited_length_index` ON `my_table`(`long_text_content`(512));
-- please notice the use of the numeric length of the index after the column name
Processed value index (apparently available in PostgreSQL not MySQL):
Indexes are not exclusively built from one column, some may be built from multiple columns and other may be built from just some of the info a column has. For example if you have a full datetime column but you know you're only going to filter records by date you can build an index based on the datetime column but only containing date info.
-- `my_table` has a `created` column of type timestamp
CREATE INDEX `my_date_created` ON `my_table`(DATE(`created`));
-- please notice the use of the DATE function which extracts only
-- the date from the `created` timestamp
index shall span the columns you are going to use in WHERE statement.
To better understand, here is an example:
SELECT * FROM Authors WHERE AuthorNum > 10 AND AuthorLName LIKE 'A%';
SELECT * FROM Authors WHERE AuthorLName LIKE 'Be%';
If you are often using the shown above queries, you are highly adviced to have two indexes:
Create index AuthNum_AuthLName_Index on Authors (AuthorNum, AuthorLName);
Create index AuthLName_Index on Authors (AuthorLName);
The key thing to remember: index shall have the same combiation of columns used in WHERE statements
If a column is of type int, does it still need to be indexed to make select query run faster?
SELECT *
FROM MyTable
WHERE intCol = 100;
Probably yes, unless
The table has very few rows (<1000)
The int column has poor selectivity
a large proportion of the table is being returned (say > 1%)
In which case, a table scan might make more sense anyway, and the optimiser may choose to do one. In most cases having an index anyway is not very harmful, but you should definitely try it and see (in your lab, on production-grade hardware with a production-like data set)
Yes, an index on any column can make the query perform faster regardless of data type. The data itself is what matters -- no point in using an index if there are only two values currently in the system.
Also be aware that:
the presence of an index doesn't ensure it will be used -- table statistics need to be current, but it really depends on the query.
MySQL also only allows one index per SELECT, and has a limited amount of space for indexes (limit dependent on engine).
Yes it does. It does not matter which data type a column has. If you don't specify an index, mysql has to scan the whole table anytime you are searching a value with that column.
Yes, the whole point on indexes is that you create them by primary key or by adding the index manualy. The type of the column does not say anything about the speed of query.
Yes, it has to be indexed. It doesn't make sense to index tinyint columns if they are used as boolean, because this index won't be selective enough.
I have a MySQL table where an indexed INT column is going to be 0 for 90% of the rows. If I change those rows to use NULL instead of 0, will they be left out of the index, making the index about 90% smaller?
http://dev.mysql.com/doc/refman/5.0/en/is-null-optimization.html
MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL.
It looks like it does index the NULLs too.
Be careful when you run this because MySQL will LOCK the table for WRITES during the index creation. Building the index can take a while on large tables even if the column is empty (all nulls).
Reference.
Allowing a column to be null will add a byte to the storage requirements of the column. This will lead to an increased index size which is probably not good. That said if a lot of your queries are changed to use "IS NULL" or "NOT NULL" they might be overall faster than doing value comparisons.
My gut would tell me not null, but there's one answer: test!
No, it will continue to include them, but don't make too many assumptions about what the consequences are in either case. A lot depends on the range of other values (google for "cardinality").
MSSQL has a new index type called a "filtered index" for this type of situation (i.e. includes records in the index based on a filter). dBASE-type systems used to have a similar capability, and it was pretty handy.
Each index has a cardinality means how many distinct values are indexed. AFAIK it's not a reasonable idea to say indexes repeat the same value for many rows but the index will only addresses a repeated value to the clustered index of many rows (rows having null value for this field) and keeping the reference ID of the clustered index means : each row with a NULL value indexed field wastes a size as large as the PK (for this reason experts recommend to have a reasonable PK size if you have composite PK).