MySQL: index a field that contains only distinct values? - mysql

Is it useful for SELECT performance to set an index on a field that contains only distinct values?
eg:
order_id
--------
98317490
10928343
82931376
93438473
...

Is it useful for SELECT performance to set an index on a field that contains only distinct values?
That depends. An index is useful if you often search on this column:
WHERE column=value
WHERE column BETWEEN a AND b
The usefulness of an index is determined by its selectivity. For example, if your column contains a boolean, which is:
false in 99.9% of rows
true in 0.1% of rows
Then you can easily guess that using an index to find "true" values will be a huge boost relative to reading the entire table to search for them.
On the other hand, searching for "false" using an index will be slower than not using an index, since you're gonna read the whole table anyway, you might as well not bother to also process the index.
If values are all distinct, then selectivity is maximum, and index will be very useful. That is, assuming you actually search on that column!
An index that is never used only slows down updates.

Of course it is useful, as with all indexes - it is useful if you have select statements where you have this field on the WHERE clause.
Whether this field has distinct values or not doesn't really matter.
Note that if your field is marked as UNIQUE or PRIMARY KEY in the database, the database will technically already have an index for this field, so adding another index for it will not change anything.

Related

MySQL how to index a query that searches for a substring in column while filtering integer columns

I have a table with a billion+ rows. I have have the below query which I frequently execute:
SELECT SUM(price) FROM mytable WHERE domain IN ('com') AND url LIKE '%/shop%' AND date BETWEEN '2001-01-01' AND '2007-01-01';
Where domain is varchar(10) and url is varchar(255) and price is float. I understand that any query with %..% will not use any index. So logically, I created an index on price domain and date:
create index price_date on mytable(price, domain, date)
The problem here persists, this index is also not used because query contains: url LIKE '%.com/shop%'
On the other hand a FULLTEXT index still will not work since I have other non text filters in the query.
How can I optimise the above query? I have too many rows not to use an index.
UPDATE
Is this an sql limit? could such a query provide better performance on a noSQL database?
You have two range conditions, one uses IN() and the other uses BETWEEN. The best you can hope is that the condition on the first column of the index uses the index to examine rows, and the condition on the second column of the index uses index condition pushdown to make the storage engine do some pre-filtering.
Then it's up to you to choose which column should be the first column in the index, based on how well each condition would narrow down the search. If your condition on date is more likely to reduce the set of examined rows, then put that first in the index definition.
The order of terms in the WHERE clause does not have to match the order of columns in the index.
MySQL does not support optimizing with both a fulltext index and a B-tree index on the same table reference in the same query.
You can't use a fulltext index anyway for the pattern you are searching for. Fulltext indexes don't allow searches for punctuation characters, only words.
I vote for this order:
INDEX(domain, -- first because of "="
date, -- then range
url, price) -- "covering"
but, since the constants look like most of the billion rows would be hit, I don't expect good performance.
If this is a common query and/or "shop" is one of only a few possible filters, we can discuss whether a summary table would be useful.

Index on mysql partitioned tables

I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.

What is the better between search text and number in mysql?

What is the better between search text and number in mysql ?
EG:
$sql = "SELECT * FROM table WHERE country = 'Australia' ";
and
$sql = "SELECT * FROM table WHERE country = '10899' ";
for load faster from database.
For Faster Searching you need to perform search operation via primary key which searches data faster because primary key is unique key. Example you provided are both performing text search...
Both of your queries are searching by text, as you use single quotes to surround the value. To search by number, you don't have to surround the value by quotes.
To answer your question about performance, you should search by primary key or indexed columns for better speed if you have huge amount of data. In small dataset, you won't notice a difference, as simple SELECT usually finished in split seconds.
Here you asked whats better Number or Text ? But there are so many factors thats need to take into consideration to improve where clause performance.
Biggest fear regarding where clause is full table scan and to avoid so you have to use index.
"If the optimizer gets confused or cannot find an appropriate index that matches the WHERE clause, the optimizer will read every row in the table."
Whenever you create any clusted index (Primary Key) on table, Clustered indexes sort and store the data rows in the table or view based on their key values.
Now here you can think about Text or Number
So you can go with number as sorting time required is less compared to text.
Things to improve where clause:
1) Use of Primary Key column in where clause (automatically uniqueness and NOT NULL comes)
2) If not possible with primary key column atleast have index on the column used in where cluse (Prefer Number as sorting is faster w.r.t large amount of data)

Are there any advantage sorting on indexed fields

I want to perform a sort operation on some field. Is it advantageous making an index on that field. For example:
SELECT * FROM `users` WHERE `age`=33 ORDER BY `name`
In this query, I know that having an index on age is helpful, but would it be better if I maintain an index for name. Would there be a performance gain by indexing it. And the ORDER BY operation is frequently needed for other queries as well.
An index on name alone would not be likely to help significantly with this query, but an index on (age, name) would.
While it's not entirely accurate, it's often instructive to think of an index as a list of rows sorted by the keys in the index (e.g, sorted by age first, then by name). In the case of your sample query, all the rows with age=33 would naturally come out of the composite index sorted by name, saving you from doing a separate sort. Having a separate index for name wouldn't help the same way.

Indexing SQL database

i got questions about indexing SQL database:
Is it better to index boolean column or rather not because there are only 2 options? i know if the table is small then indexing will not change anything, but im asking about table with 1mln records.
If i got two dates ValidFrom and ValidTo is it better to create 1 index with 2 columns or 2 seperate indexes? In 90% of queries i use where validfrom < date && validto > date, but there are also few selects only with validfrom or only with validto
whats the diffrence between clustered and non-clistered index? i cant find any article, so a link would be great
You both tagged MySQL and SQL-server. This answer is MySQL inspired.
It depends on many things, but more important than the size is the variation. If about 50% of the values are TRUE, that means the rest of the values (also about 50%) are FALSE and an index will not help much. If only 2% of the values are TRUE and your queries often only need TRUE records, this index will be useful!
If your queries often use both, put both in the index. If one is used more than the other, put that one FIRST in the index, so the composite index can be used for the one field as well.
A clustered index means that the data actually is inside the index. A non-clustered index just points to the data, which is actually stored elsewhere. The PRIMARY KEY in InnoDB is a clustered index.
If you want to use Indexes in MySQL, EXPLAIN is your friend!
This is all for SQL Server, which is what I know about...
1 - Depends on cardinality, but as a rule an index on a single boolean field (BIT in SQL Server) won't be used since it's not very selective.
2 - Make 2 indexes, one with both, and the other with just the second field from the first index. Then you are covered in both cases.
3 - Clustered indexes contain the data for ALL fields at the leaf level (the entire table basically) ordered by your clustered index field. Non-clustered indexes contain only the key fields and any INCLUDEd fields at the leaf level, with a pointer to the clustered index row if you need any other data from other fields for that row.
If you use the "Filtered Index", the number of records up to 2 million with no problems.
Create 1 Non clustered index instead of 2 Filtered Index
Different in user experience, these two aspects are not related to each other nothing. The search index (PK: Primary Key) is different than searching for a range of values ​​(Non clustered Index often used in tracing the value range), in fact finding by PK represented less than 1% queries