News articles database - should I use index on date_publish field for faster SELECTS - mysql

Need some advice on how to optimize my articles table for read operations. I have articles table where I store articles editors write. There is requirement that editors can enter an article with a date_publish set in future. These articles can not be displayed in cover page at any time until the publish_date has actually come.
So my question here is should I have an index on date_publish field for better performance? I am using MySQL database, with InnoDB engine. I store dates as unixtimestamps in unsigned INT(11) field.
I when I make a read for list articles for cover page I do something like this:
SELECT articles.* FROM articles WHERE date_publish < $time

Adding an index on the column date_publish would optimize the following simple query:
SELECT * FROM articles WHERE date_publish < $time
However, if you change the query, such as add an ORDER clause to order by a column other than date_publish, you may need a compound (multi-column) index to optimize the query.
EDIT
To be able to fully utilize an index, a "covering" index must include all columns in the WHERE, JOIN and ORDER clauses, usually in that order. So, if you have a range in your WHERE clause on date_publish, and ORDER BY article_name, then you may wish to index on both columns (date_publish, article_name). That way MySQL can use the index for both selection and sorting.

Related

Can i use 2 different indexes on a fulltext search?

i'm not very very experimented with the indexes so that's why i'm asking this silly question. i searched like everywhere but i didn't get a clear answer.
I will have a table items with columns: id,name,category,price
Here will be 3 indexes:
id - Primary Index
name - FullText Index
category,price - Composite Index
I estimate my table in future will get like 700.000-1.000.00 rows.
I need to do a fulltext search for name and where category is a specified category and order by price.
So my query will be this:
SELECT * FROM items
WHERE MATCH(name) AGAINST(‘my search’) and category='my category' order by price
My question is:
How many index will be used to perform this search?
It will use 2 indexes?
[fulltext index] & [category,price] index - Will get results for words and then will use the next index to match my category and price order
It will use 1 index
[fulltext index] only - Will get results for words, but after will have to manually match my category and price order
I want my query to be fast, what are you opinions? I know the fulltext search is fast, but what happen if i apply clauses like: category and price order? will be same fast?
MySQL will only ever use one index in any search. The reason being that using two indexes will require two searches. This will make the query much more slower. You can force MySQL to use a specific index in a query but this is not a good idea.
In summary: MySQL will only ever use one index it cant use two indexes.

Instructing MySQL to apply WHERE clause to rows returned by previous WHERE clause

I have the following query:
SELECT dt_stamp
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
The claim_notes table has about half a million rows, so this query runs very slowly since it has to search against the unindexed note column (which I can't do anything about). I know that when the type_id, dt_stamp, and user_id conditions are applied, I'll be searching against about 60 rows instead of half a million. But MySQL doesn't seem to apply these in order. What I'd like to do is to see if there's a way to tell MySQL to only apply the note LIKE :click_to_call condition to the rows that meet the former conditions so that it's not searching all rows with this condition.
What I've come up with is this:
SELECT dt_stamp
FROM (
SELECT *
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
)
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
This works and is extremely fast. I'm just wondering if this is the right way to do this, or if there is a more official way to handle it.
It shouldn't be necessary to do this. The MySQL optimizer can handle it if you have multiple terms in your WHERE clause separated by AND. Basically, it knows how to do "apply all the conditions you can using indexes, then apply unindexed expressions only to the remaining rows."
But choosing the right index is important. A multi-column index is best for a series of AND terms than individual indexes. MySQL can apply index intersection, but that's much less effective than finding the same rows with a single index.
A few logical rules apply to creating multi-column indexes:
Conditions on unique columns are preferred over conditions on non-unique columns.
Equality conditions (=) are preferred over ranges (>=, IN, BETWEEN, !=, etc.).
After the first column in the index used for a range condition, subsequent columns won't use an index.
Most of the time, searching the result of a function on a column (e.g. DATE(dt_stamp)) won't use an index. It'd be better in that case to store a DATE data type and use = instead of >=.
If the condition matches > 20% of the table, MySQL probably will decide to skip the index and do a table-scan anyway.
Here are some webinars by myself and my colleagues at Percona to help explain index design:
Tools and Techniques for Index Design
MySQL Indexing: Best Practices
Advanced MySQL Query Tuning
Really Large Queries: Advanced Optimization Techniques
You can get the slides for these webinars for free, and view the recording for free, but the recording requires registration.
Don't go for the derived table solution as it is not performant. I'm surprised about the fact that having = and >= operators MySQL is going for the LIKE first.
Anyway, I'd say you could try adding some indexes on those fields and see what happens:
ALTER TABLE claim_notes ADD INDEX(type_id, user_id);
ALTER TABLE claim_notes ADD INDEX(dt_stamp);
The latter index won't actually improve the search on the indexes but rather the sorting of the results.
Of course, having an EXPLAIN of the query would help.

Multi-column database indexes and query speed

I'm deploying a Rails application that aggregates coupon data from various third-party providers into a searchable database. Searches are conducted across four fields for each coupon: headline, coupon code, description, and expiration date.
Because some of these third-party providers do a rather bad job of keeping their data sorted, and because I don't want duplicate coupons to creep into my database, I've implemented a unique compound index across those four columns. That prevents the same coupon from being inserted into my database more than once.
Given that I'm searching against these columns (via simple WHERE column LIKE %whatever% matching for the time being), I want these columns to each individually benefit from the speed gains to be had by indexing them.
So here's my question: will the compound index across all columns provide the same searching speed gains as if I had applied an individual index to each column? Or will it only guarantee uniqueness among the rows?
Complicating the matter somewhat is that I'm developing in Rails, so my question pertains both to SQLite3 and MySQL (and whatever we might port over to in the future), rather than one specific RDBMS.
My guess is that the indexes will speed up searching across individual columns, but I really don't have enough "under the hood" database expertise to feel confident in that judgement.
Thanks for lending your expertise.
will the compound index across all
columns provide the same searching
speed gains as if I had applied an
individual index to each column?
Nope. The order of the columns in the index is very important. Lets suppose you have an index like this: create unique index index_name on table_name (headline, coupon_code, description,expiration_date)
In this case these queries will use the index
select * from table_name where headline = 1
select * from table_name where headline = 1 and cupon_code = 2
and these queries wont use the unique index:
select * from table_name where coupon_code = 1
select * from table_name where description = 1 and cupon_code = 2
So the rule is something like this. When you have multiple fields indexed together, then you have to specify the first k field to be able to use the index.
So if you want to be able to search for any one of these fields then you should create on index on each of them separately (besides the combined unique index)
Also, be careful with the LIKE operator.
this will use index SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
and this will not SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
index usage http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
multiple column index http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

Optimizing query instead of using order by

I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.

Mysql WHERE statements priority during execution

SELECT * FROM articles WHERE title LIKE '%query%' AND user_id=123
The user_id column is index. How Mysql will execute this query? I think, that LIKE have lowest priority, right?
Thank you.
EXPLAIN SELECT * FROM articles WHERE title LIKE '%query%' AND user_id=123
That will tell it all :)
MySQL will almost certainly use the user_id index.
For queries in general, the optimizer will work out the possible access paths and use some pre-computed statistics on the table contents to estimate which will be the quickest.
For example, if your articles tables also had a date column article_date that was indexed and you executed a query with predicates on both user_id and article_date. Then MySQL will have to estimate which index will be the quickest to select the required rows.
If there are thousands of articles a day by lots of different users, then the user_id index might be best. But if there are only a few articles a day, but most users post a large number of articles, it may be quicker to use article_date index instead.