Multi-column database indexes and query speed - mysql

I'm deploying a Rails application that aggregates coupon data from various third-party providers into a searchable database. Searches are conducted across four fields for each coupon: headline, coupon code, description, and expiration date.
Because some of these third-party providers do a rather bad job of keeping their data sorted, and because I don't want duplicate coupons to creep into my database, I've implemented a unique compound index across those four columns. That prevents the same coupon from being inserted into my database more than once.
Given that I'm searching against these columns (via simple WHERE column LIKE %whatever% matching for the time being), I want these columns to each individually benefit from the speed gains to be had by indexing them.
So here's my question: will the compound index across all columns provide the same searching speed gains as if I had applied an individual index to each column? Or will it only guarantee uniqueness among the rows?
Complicating the matter somewhat is that I'm developing in Rails, so my question pertains both to SQLite3 and MySQL (and whatever we might port over to in the future), rather than one specific RDBMS.
My guess is that the indexes will speed up searching across individual columns, but I really don't have enough "under the hood" database expertise to feel confident in that judgement.
Thanks for lending your expertise.

will the compound index across all
columns provide the same searching
speed gains as if I had applied an
individual index to each column?
Nope. The order of the columns in the index is very important. Lets suppose you have an index like this: create unique index index_name on table_name (headline, coupon_code, description,expiration_date)
In this case these queries will use the index
select * from table_name where headline = 1
select * from table_name where headline = 1 and cupon_code = 2
and these queries wont use the unique index:
select * from table_name where coupon_code = 1
select * from table_name where description = 1 and cupon_code = 2
So the rule is something like this. When you have multiple fields indexed together, then you have to specify the first k field to be able to use the index.
So if you want to be able to search for any one of these fields then you should create on index on each of them separately (besides the combined unique index)
Also, be careful with the LIKE operator.
this will use index SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
and this will not SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
index usage http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
multiple column index http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

Related

MySQL index key on table with more columns

In my script, I have a lot of SQL WHERE clauses, e.g.:
SELECT * FROM cars WHERE active=1 AND model='A3';
SELECT * FROM cars WHERE active=1 AND year=2017;
SELECT * FROM cars WHERE active=1 AND brand='BMW';
I am using different SQL clauses on same table because I need different data.
I would like to set index key on table cars, but I am not sure how to do it. Should I set separate keys for each column (active, model, year, brand) or should I set keys for groups (active,model and active,year and active,brand)?
WHERE a=1 AND y='m'
is best handled by INDEX(a,y) in either order. The optimal set of indexes is several pairs like that. However, I do not recommend having more than a few indexes. Try to limit it to queries that users actually make.
INDEX(a,b,c,d):
WHERE a=1 AND b=22 -- Index is useful
WHERE a=1 AND d=44 -- Index is less useful
Only the "left column(s)" of an index are used. Hence the second case, uses a, but stops because b is not in the WHERE.
You might be tempted to also have (active, year, model). That combination works well for active AND year, active AND year AND model, but not active AND model (but no year).
More on creating indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Since model implies a make, there is little use to put both of those in the same composite index.
year is not very selective, and users might want a range of years. These make it difficult to get an effective index on year.
How many rows will you have? If it is millions, we need to work harder to avoid performance problems. I'm leaning toward this, but only because the lookup would be more compact.
We use single indexing when we want to query for just one column, same asin your case and multiple group indexing when we have multiple condition in the same where clause.
Go for single indexing.
For more detailed explanation, refer this article: https://www.sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/

Optimization of WHERE using a single value on multiple columns

I have a MySQL table which I'm trying to search a pair of columns for a single value. It's quite a large table, so I want the search time as fast as possible.
I have simplified the tables below for ease of understanding
SELECT * FROM clients WHERE name=? OR sirname=?
VS
SELECT * FROM clients WHERE ? IN (name, sirname)
with indexes on name and sirname
EXPLAIN on the former uses the indexes, but not on the latter
Is this accurate, or is there some optimisation going on under the hood which EXPLAIN doesn't catch?
Strongly related to Checking multiple columns for one value, but cannot discuss there due to age of thread.
Because MySQL generally uses one index per table reference in a query, you will probably have to do it this way:
SELECT * FROM clients WHERE name=?
UNION
SELECT * FROM clients WHERE sirname=?
This will count as two table references for purposes of index selection. The appropriate index will be used in each case.

What will be the behavior of the index in these two scenarios in relation databases like mysql?

Let's say I have a table students with the following fields
id,student_id,test_type,score
Consider these two queries
select * from students where student_id = x and score > y
select * from students where student_id = x order by score
Let's say I have indexes on both student_id and score but not a composite index, what will be the indexes that will be used by the database? Will the query be able to use both of the indexes or whether at max one index can be used?
Let's say with the student_id index I am able to restrict the results in the query, will I be able to use the score index to sort or filtering?
or if databases chooses the score index to sort or filter first, will I be able to student_id index for student_id =x filtering?
MySQL's optimizer would like the composite INDEX(student_id, score) for both queries.
Without the composite index... The optimizer almost never uses two indexes. The optimizer would pick between INDEX(student_id) and INDEX(score).
But there is another wrinkle -- If this table is InnoDB, and if it has PRIMARY KEY(student_id), then INDEX(score) implicitly has student_id tacked on then end. HenceINDEX(score)` would be perfect for the first query.
Given two indexes, the optimizer looks at cardinality and various other things to pick between them.
More on creating the best index.
Well it definitely depends on your data set and database. Imagine in the students table if I have 100 different id's but the same student_id. The student_id index would be considered bad and the *Teradata Query Optimizer would be smart enough to choose a better one like score or id. (If using Teradata DB, but most have built in smart features like this). A Composite Index certainly wouldn't be selected because WHY? I think in this tables case wouldn't help fetch at all. The best way to select a good index is to ask okay which column can provide me a solid unique value that is inexpensive (Integer) and can eliminate a good partition or chunk of data if selected. But yes student_id would be the best index in this case. Plus the query that ends with "and score > y" would be quicker. Where clause is always seen first so dataset will be much smaller.

If your table has more selects than inserts, are indexes always beneficial?

I have a mysql innodb table where I'm performing a lot of selects using different columns. I thought that adding an index on each of those fields could help performance, but after reading a bit on indexes I'm not sure if adding an index on a column you select on always helps.
I have far more selects than inserts/updates happening in my case.
My table 'students' looks like:
id | student_name | nickname | team | time_joined_school | honor_roll
and I have the following queries:
# The team column is varchar(32), and only has about 20 different values.
# The honor_roll field is a smallint and is only either 0 or 1.
1. select from students where team = '?' and honor_roll = ?;
# The student_name field is varchar(32).
2. select from students where student_name = '?';
# The nickname field is varchar(64).
3. select from students where nickname like '%?%';
all the results are ordered by time_joined_school, which is a bigint(20).
So I was just going to add an index on each of the columns, does that make sense in this scenario?
Thanks
Indexes help the database more efficiently find the data you're looking for. Which is to say you don't need an index simply because you're selecting a given column, but instead you (generally) need an index for columns you're selecting based on - i.e. using a WHERE clause (even if you don't end up including the searched column in your result).
Broadly, this means you should have indexes on columns that segregate your data in logical ways, and not on extraneous, simply informative columns. Before looking at your specific queries, all of these columns seem like reasonable candidates for indexing, since you could reasonably construct queries around these columns. Examples of columns that would make less sense would be things phone_number, address, or student_notes - you could index such columns, but generally you don't need or want to.
Specifically based on your queries, you'll want student_name, team, and honor_roll to be indexed, since you're defining WHERE conditions based on the values of these columns. You'll also benefit from indexing time_joined_school if, as you suggest, you're ORDER BYing your queries based on that column. Your LIKE query is not actually easy for most RDBs to handle, and indexing nickname won't help. Check out How to speed up SELECT .. LIKE queries in MySQL on multiple columns? for more.
Note also that the ratio of SELECT to INSERT is not terribly relevant for deciding whether to use an index or not. Even if you only populate the table once, and it's read-only from that point on, SELECTs will run faster if you index the correct columns.
Yes indexes help on accerate your querys.
In your case you should have index on:
1) Team and honor_roll from query 1 (only 1 index with 2 fields)
2) student_name
3) time_joined_school from order
For the query 3 you can't use indexes because of the like statement. Hope this helps.

MySQL: For ORDER BY calls (different fields), do I create an index for each field?

A quick question. I have a simple table with this structure:
USERS_TABLE = {id}, name, number_field_1, number_field_2, number_field_3, number_field_4, number_field_5
Sorting is a major feature, one field at a time at the moment. Table can have up to 50 million records. Most selects are done using the "id" field, which is the primary key and it's indexed.
A sample query:
SELECT * FROM USERS_TABLE WHERE id=10 ORDER BY number_field_1 DESC LIMIT 100;
The question:
Do I create separate indexes for each "number_field_*" to optimize ORDER BY statements, or is there a better way to optimize? Thanks
There's no silver bullet here, you will have to test this for your data and your queries.
If all your queries (e.g. WHERE id=10) returns few rows, it might not be worth indexing the order by columns. On the other hand, if they return a lot of rows, it might help greatly.
If you always order by atleast one of the fields, consider creating an index on that column, or a compound index on some of these columns - keep in mind that if you have a compound index on (num_field_1,num_field_2) and you order only by num_field_2 , that index will not be used. You have to include the leftmost fields of the index to make use of it.
On the other hand, you seem to have a lot of different fields you order by, the drawback of creating an index on each an every one of them is your indexes will be much larger, and your inserts/deletes/updates might start to get slower.
Basically - there is no shortcut. You'll have to play around with which indexes works best for your queries and tune accordingly, while carefully analyzing your queries with EXPLAIN