I've got a table in a MySQL database that has the following fields:
ID | GENDER | BIRTHYEAR | POSTCODE
Users can search the table using any of the fields in any combination (i.e., SELECT * FROM table WHERE GENDER = 'M' AND POSTCODE IN (1000, 2000); or SELECT * FROM table WHERE BIRTHYEAR = 1973;)
From the MySQL docs, it uses left indexing. So if I create an index on all 4 columns it won't use the index if the ID field isn't used. Do I need to create an index for every possible combination of field (ID; ID/GENDER; ID/BIRTHYEAR; etc.) or will creating one index for all fields be sufficient?
If it makes any difference, there are upwards of 3 million records in this table.
In this situation I typically log search criteria, number of results returned and time taken to perform the search. Just because you're creating the flexibility to search by any field doesn't mean your users make use of this flexibility. I'd normally create indexes on sensible combinations and then once I've determined the usage patterns drop the lowly used indexes or create new unsuspected indexes.
I'm not sure if MySQL supports statistics or histograms for skewed data but the index on gender may or may not work. If MySQL supports statistics then this will indicate the selectivity of an index. In a general population an index on a field with a 50/50 split won't help. If you're sample data is computer programmers and the data is 95% males then a search for females would use the index.
Use EXPLAIN.
(I'd say, use Postgres, too, lol).
It seems recent versions of MySQL can use several indexes in the same query, they call this Index Merge. In this case 1 index per column will be enough.
Gender is a special case, since selectivity is 50% you don't need an index on it, it would be counterproductive.
Creating indexes on single fields is useful but it would be really useful if your data was of varchar type and each record had a different value, since birthyear and postcode are numbers they are already well indexed.
You can index birthyear because it should be different for many of the records (but up to 120 birthyears in total at max I guess).
Gender in my opinion doesn't need an index.
You can find out what field combinations are most likely to give different results and index those, like: birthyear - postcode, id - birthyear, id - postcode.
Related
I'm using a MySQL database and have to perform some select queries on large/huge tables (e.g. 267,736 rows and 30 columns).
Query details:
Only select queries (the data in the table is fixed, never an update, insert or delete)
Select query on all the columns (business requirement)
Mostly limit the number of rows (LIMIT 10 to all rows -> user can choose)
Could be ordered by one or multiple columns (creation of indexes here will not help since the user can order by any column he likes)
Could be filtered by a value the user chooses (where filter on one or more columns)
Currently the queries take up to 2 seconds, which is to long.
Is there a way to speed them up?
Which storage engine should I use: InnoDB/MyISAM/...
Should I have a primary key, even if I will never use him?
...?
You should (must actually) use indexes.
Create indexes on all columns with which WHERE or ORDER BY is going to be used. Also study and use EXPLAIN to see the impact of the indexes and to optimize your queries.
You don't have to create a primary key if there is no column with unique data in your table, but it is very likely that you do have such a column (id, time...). In this case you should use primary key to filter your queries.
Number of columns in the query has close to no impact on SELECT speed.
As long as you make "Only select queries" storage engine does not matter either. MyISAM might be a bit faster, but InnoDB has many features you will need when you decide that your "Only select queries" rule must be broken.
I have a large database containing more than five million records, this database has three fields (ID, name, text), the field ID has a primary key, the field name has a FULLTEXT index.
I want to create a search engine for my site that seeks in the field name, I use FULLTEXT index but has the disadvantage not to accept the keywords of less than four characters, so I decided to delete it and put a INDEX KEY on the field name and use the following request:
EXPLAIN SELECT * FROM table WHERE locate ('search', name) > 0;
the problem is that this application does not use the index KEY field name,
but this request:
EXPLAIN SELECT name FROM table WHERE locate ('search', name) > 0;
uses the INDEX KEY,
I do not know why when I select all fields MYSQL does not use index.
In your opinion how to solve this problem and if possible a better alternative.
You can set the minimum amount of characters for full text indexes in the mysql configuration. I am not at my computer at the moment to find a example however this page might help you: http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html
Update:
Back at my pc. So regarding why mysql would use a index on the SELECT name FROM table WHERE locate ('search', name) > 0; statement is very simple. When you create a index on the name field the index contains the actual name field i.e. the value of the name field, so when you select only the name field mysql can do a search and retrieve all the data required from the index. So in this scenario mysql has to do one operation to retrieve the data which match the searched values in the index and return them.
The SELECT name FROM table WHERE locate ('search', name) > 0; however needs the other data fields as well. Since only the name field's value is stored in the index mysql will have to read the index and then the table to retrieve the other fields. So in this scenario mysql has to match the values in the index then find the values on the table and then return them. This means mysql has to do 2 operations which is double the amount of work compared to the previous scenario.
Since 5 million rows is still very small it is probably faster for mysql to just loop through the table and retrieve the rows. As you add more rows mysql will probably start using the index once the cost of looping through the table is higher than the cost of reading the index and then looking up the values on the table.
Hope that makes sense.
I am trying to find out how to design the indexes for my data when my query is using ranges for 2 fields.
expenses_tbl:
idx date category amount
auto-inc INT TINYINT DECIMAL(7,2)
PK
The column category defines the type of expense. Like, entertainment, clothes, education, etc. The other columns are obvious.
One of my query on this table is to find all those instances where for a given date range, the expense has been more than $50. This query will look like:
SELECT date, category, amount
FROM expenses_tbl
WHERE date > 120101 AND date < 120811
AND amount > 50.00;
How do I design the index/secondary index on this table for this particular query.
Assumption: The table is very large (It's not currently, but that gives me a scope to learn).
MySQL generally doesn't support ranges on multiple parts of a compound index. Either it will use the index for the date, or an index for the amount, but not both. It might do an index merge if you had two indexes, one on each, but I'm not sure.
I'd check the EXPLAIN before and after adding these indexes:
CREATE INDEX date_idx ON expenses_tbl (date);
CREATE INDEX amount_idx ON expenses_tbl (amount);
Compound index ranges - http://dev.mysql.com/doc/refman/5.5/en/range-access-multi-part.html
Index Merge - http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html
A couple more points that have not been mentioned yet:
The order of the columns in the index can make a difference. You may want to try both of these indexes:
(date, amount)
(amount, date)
Which to pick? Generally you want the most selective condition be the first column in the index.
If your date ranges are large but few expenses are over $50 then you want amount first in the index.
If you have narrow date ranges and most of the expenses are over $50 then you should put date first.
If both indexes are present then MySQL will choose the index with the lowest estimated cost.
You can try adding both indexes and then look at the output of EXPLAIN SELECT ... to see which index MySQL chooses for your query.
You may also want to consider a covering index. By including the column category in the index (as the last column) it means that all the data required for your query is available in the index, so MySQL does not need to look at the base table at all to get the results for your query.
The general answer to your question is that you want a composite index, with two keys. The first being date and the second being the amount.
Note that this index will work for queries with restrictions on the date or on the date and on the expense. It will not work for queries with restrictions on the expense only. If you have both types, you might want a second index on expense.
If the table is really, really large, then you might want to partition it by date and build indexes on expense within each partition.
I've a large database which is being used to record all the events occurring. It's like an ticketing system. Now, since I stored data with different status to know the action on the certain ticket in same column I've to use multiple 'or' in an statement to know the current status in the ticket.
For example: 1 is for a ticket opened, 2 for acknowledgement, 3 for event closed. Now the query to select all events with 1,2,3 would be:
SELECT *
FROM tbl_name
WHERE status IN (1, 2, 3)
AND event_id = 1;
I've created indexes for the id field, and another index event_status for event_id and status field.
Now, when I run EXPLAIN on this query it doesn't use event_status index rather it uses other existing index like event_status_dept which consist of event_id, status and department.
If I use only two fields in IN i.e 'IN (1,2)' statement, it uses the event_status index otherwise it uses the other index i.e event_status_dept. I don't know what is wrong with my statement.
I don't think anything is wrong with your query.
The optimizer uses the best index it can find according to the conditions in the query and the statistics it holds.
An index is not effective if more than a certain percent of the records satisfy the condition.
Example:
If the optimizer's statistics say that only 5% of the events in the table are of the types 1,2, this would be an effective index and it will use it.
But if 70% of the events are of types 1,2,3, this index is not effective and the optimizer may use another index or no index at all.
Using multiple indexes per table access is generally very inefficient:
http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/index-merge-performance
A concatenated index, like yours on event_id, status, department is the better solution.
However, MySQL has some kind of Index-Merge:
http://dev.mysql.com/doc/refman/5.5/en/index-merge-optimization.html
Right now, I'm debating whether or not to use COUNT(id) or "count" columns. I heard that InnoDB COUNT is very slow without a WHERE clause because it needs to lock the table and do a full index scan. Is that the same behavior when using a WHERE clause?
For example, if I have a table with 1 million records. Doing a COUNT without a WHERE clause will require looking up 1 million records using an index. Will the query become significantly faster if adding a WHERE clause decreases the number of rows that match the criteria from 1 million to 500,000?
Consider the "Badges" page on SO, would adding a column in the badges table called count and incrementing it whenever a user earned that particular badge be faster than doing a SELECT COUNT(id) FROM user_badges WHERE user_id = 111?
Using MyIASM is not an option because I need the features of InnoDB to maintain data integrity.
SELECT COUNT(*) FROM tablename seems to do a full table scan.
SELECT COUNT(*) FROM tablename USE INDEX (colname) seems to be quite fast if
the index available is NOT NULL, UNIQUE, and fixed-length. A non-UNIQUE index doesn't help much, if at all. Variable length indices (VARCHAR) seem to be slower, but that may just be because the index is physically larger. Integer UNIQUE NOT NULL indices can be counted quickly. Which makes sense.
MySQL really should perform this optimization automatically.
Performance of COUNT() is fine as long as you have an index that's used.
If you have a million records and the column in question is NON NULL then a COUNT() will be a million quite easily. If NULL values are allowed, those aren't indexed so the number of records is easily obtained by looking at the index size.
If you're not specifying a WHERE clause, then the worst case is the primary key index will be used.
If you specify a WHERE clause, just make sure the column(s) are indexed.
I wouldn't say avoid, but it depends on what you are trying to do:
If you only need to provide an estimate, you could do SELECT MAX(id) FROM table. This is much cheaper, since it just needs to read the max value in the index.
If we consider the badges example you gave, InnoDB only needs to count up the number of badges that user has (assuming an index on user_id). I'd say in most case that's not going to be more than 10-20, and it's not much harm at all.
It really depends on the situation. I probably would keep the count of the number of badges someone has on the main user table as a column (count_badges_awarded) simply because every time an avatar is shown, so is that number. It saves me having to do 2 queries.