I have a customer table having customer_id, customer_name, email,password,status etc fields.
Currently only customer_id is indexed (as it is the primary key)
I have few queries that select customers as follows
select * from customer
where status=1 and email<>''
and email is not null
and password<>''
and password is not null
This runs slow as I have 1.3 million records in it
So I was thinking of adding index on email field.
I want to know which indexing will make it better the simple index will work or I have to use FULLTEXT index
A FULLTEXT index is helpful for searching for words within a column.
If you really just search for empty (and non-null) emails and passwords, then a simple index will suffice.
For this very query, a more relevant index would be:
ALTER TABLE customer ADD INDEX (status, email, password);
[edit]
As correctly pointed out by Dukeling et al., such an index is probably useless if most of your customers do have an e-mail or a password set.
Assuming the above (most of your customers do have an e-mail and a password set), then your query returns many records, and any index will be of little help (as advised by nos and raina77ow).
The only thing one can be sure of, is that a FULLTEXT index is useless in this case.
Related
How does INDEX work with MYSQL?
Suppose I got 2 tables like this
//customerTable
id auto_increment,
username char(30),
password char(40),
phone int(10)
//profileTable
id auto_increment,
username char(30),
description text
And I created an INDEX on username on both tables, like this
create index username on `customerTable` ( username, password )
create index username on `profileTable` ( username )
Then I run these queries:
select * from `customerTable` where username='abc' limit 1
select * from `customerTable` where username='abc' and password='xyzzzzz' limit 1
select customerTable.*, profileTable.* from
customerTable, profileTable where
customerTable.username='abc'
and customerTable.password='xyzzzzzzz'
and customerTable.username = profileTable.username
limit 1
Which indexes will these 3 queries use? Because name of both indexes is same...
Index names must be unique within the same table. That is, you can't have two indexes in the same table and name both indexes username.
You can reuse an index name on a different table, like you have shown. Index names don't have to be unique over multiple tables. In this way, they are like column names. You can use the same column name in more than one table.
Some people like to define a naming convention for their index names, but it doesn't really affect anything as far as the database is concerned.
I'm especially puzzled when I see developers who think they have to use "idx_" as a prefix for every index name. It's not necessary, it's just four extra characters you have to type.
The SQL query optimizer knows which index belongs to each table, even if they have the same name. It will not get confused.
You might like my presentation How to Design Indexes, Really, or the video of me presenting it: https://www.youtube.com/watch?v=ELR7-RdU9XU
P.S.: I have a couple of comments that are not directly related to your question, but I have to caution you:
Please don't store passwords in plain text. If a hacker gains access to your database, you'll be sorry. Read You're Probably Storing Passwords Incorrectly.
You're using old-fashioned syntax for your joins. Read Why isn't SQL ANSI-92 standard better adopted over ANSI-89?
Given the following SQL table :
Employee(ssn, name, dept, manager,
salary)
You discover that the following query is significantly slower than
expected. There is an index on salary, and you have verified that
the query plan is using it.
SELECT *
FROM Employee
WHERE salary = 48000
Please give a possible reason why this query is slower than expected, and provide a tuning solution that
addresses that reason.
I have two ideas for why this query is slower than expected. One is that we are trying to SELECT * instead of SELECT Employee.salary which would slow down the query as we must search across all columns instead of one. Another idea is that the index on salary is non-clustered, and we want to use a clustered index, as the company could be very large and it would make sense to organize the table by the salary field.
Would either of those two solutions speed up this query? I.e. either change SELECT * to SELECT Employee.salary or explicitly set the index on salary to be clustered?
What indexes do you have now?
Is it really "slow"? What evidence do you have?
Comments on "SELECT * instead of SELECT Employee.salary" --
* is bad form because tomorrow you might add a column, thereby breaking any code that is expecting a certain number of columns in a certain order.
Dealing with * versus salary does not happen until after the row(s) is located.
Locating the row(s) is the costly part.
On the other hand, if you have INDEX(salary) and only look at salary then the index is "covering". That means that the "data" (the other columns) does not need to be fetched. Hence, faster. But this is probably beyond what your teacher has told you about yet.
Comments on "the index on salary is non-clustered, and we want to use a clustered index" --
In MySQL (not necessarily in other RDBMSs), InnoDB has exactly one PRIMARY KEY and it is always UNIQUE and "clustered".
That is, "clustered" implies "unique", which seems inappropriate for "salary".
In InnoDB a "secondary key" implicitly includes the column(s) of the PK (ssn?), with which it can reach over into the data.
"verified that the query plan" -- Have you learned about EXPLAIN SELECT ...?
More Tips on creating the optimal index for a given SELECT.
I will try to be as simple as I can be ,
You can not simply make salary a clustered index unless you make it a unique or primary which is kind of both stupid and senseless because two person can have same salary.
There can be only one clustered index per table according to MYSQL documentation. Database by default elects primary key for being clustered index .
If you do not define a PRIMARY KEY for your table, MySQL locates the
first UNIQUE index where all the key columns are NOT NULL and InnoDB
uses it as the clustered index.
To speed up your query I have a few suggestions , go for secondary indexes,
If you want to search a salary by direct value then hash based indexes are a better option, if MYSQL supports that already.
If you want to search a value using greater than , less than or some range ,then B-tree indexes are better choice.
The first option is faster than the second one , but is limited to only equality operator.
Hope it helps.
THE INFO
Currently I have two tables I am working with- a POST table that holds data for a individual posts, and a FAVORITES table that holds data for users that opt to save favorite posts in their profile.
The tables look like this:
On the POSTS table there is only a primary key on id, no indexes that I have set. On Favorites I have a combined index that I was testing of (postid, deviceid).
The POSTS table contains approx. 10,000 entries.
The FAVORITES table contains approx. 4,680,500 entries.
The query I use to grab the favorites from a particular deviceid is:
SELECT post FROM POSTS
WHERE id IN
(SELECT postid FROM favourites WHERE deviceid="12d4a4a4a4a4a4a");
THE PROBLEM:
With the amount of data being returned, and several devices having multiple favorites, the query can take upwards of 7-10 seconds to both COUNT favorites for a particular device and/or SELECT using the above query and subquery. When this happens during peak times, you can obviously imagine the issues that can cause.
Caching the query results is an option, but since the data is pretty specific in that the same user is not calling the query multiple times, but rather unique instances, I think there is a better solution. On another note, caching would need to be short lived, which would nullify its benefit.
I know the method of indexing, and I am familiar with foreign keys, but I'm not sure practically if and how they could be implemented between the query and the subquery to enhance performance.
Any advice/guidance is much appreciated.
Cheers,
Jared
SELECT post FROM POSTS
INNER JOIN favourites ON POSTS.id=favourites.postid
WHERE favourites.deviceid="12d4a4a4a4a4a4a");
split the index in favourites in 2 indices one on deviceid and one on postid
Why use a subquery? Have you tried a join?
SELECT post FROM posts INNER JOIN favourites ON posts.id=favourites.postid WHERE deviceid="12d4a4a4a4a4a4a"
You won't be using (only) your indices to retrieve the query results since the post field is not in any index. So you might actually end up saving time by making one query to get all the matching IDs from posts, then a second to get the post values.
Using EXPLAIN SELECT... will also help you optimize this query. Have you tried that?
On MySQL, composite indexes can only be used in the order the keys are defined. So for index (postid, deviceid), you can only use the index if you have a postid and need the deviceid. In your query here you're doing the opposite--you have a constant deviceid and want corresponding postid. So your query is not using any indexes.
More information on mysql composite indexes.
You should either add a deviceid index or reverse the index so that it's (deviceid, postid).
By the way, your favorites table looks a lot like a junction table. Consider whether you need the id column at all.
A couple of things you could do to improve performance:
Separate the device_id out to a device table with a surrogate primary key (an int) and a non-clustered index on the device_id varchar. The favorites table should only include the device table surrogate key. This should make the favorites table smaller and should make your favorites table index smaller. The smaller the index and smaller the table, the faster it will be to search.
Your favorites table index is wrong. It should not be (post_id,device_id). It should be (device_id,post_id) as your query needs to search by device_id first. As your favorites table row is so small, I question the value of including the post_id in the index. It just isn't worth the extra space for a possible marginal improvement in query speed.
EDIT: You need the post_id in the index to keep the entries unique (just make sure device_id is first).
I read tons of articles about btree theorem about database..thing there is always bewildering.
assuming I have a table as described as follow:
table userinfo:
(user_id as primary key, username as string, password as string)
as described in some articles, the user_id is created as index for the table userinfo, I will get efficient preformance, if i select records by index of user_id..
but if i select by username, it's said it campares the lines one by one.....
I try this in MYSQL , it is not as slow as expected....
why?
how does mysql do with this selcetion??
thanx
If your WHERE clause compares by username (which is not indexed), it will probably do a full table scan. But, this may still be fast if the number of rows in the table is small. Computers are very fast these days and DBs are smart about organizing data for efficient table scans.
What does index keyword mean and what function it serves? I understand that it is meant to speed up querying, but I am not very sure how this can be done.
When how to choose the column to be indexed?
A sample of index keyword usage is shown below in create table query:
CREATE TABLE `blog_comment`
(
`id` INTEGER NOT NULL AUTO_INCREMENT,
`blog_post_id` INTEGER,
`author` VARCHAR(255),
`email` VARCHAR(255),
`body` TEXT,
`created_at` DATETIME,
PRIMARY KEY (`id`),
INDEX `blog_comment_FI_1` (`blog_post_id`),
CONSTRAINT `blog_comment_FK_1`
FOREIGN KEY (`blog_post_id`)
REFERENCES `blog_post` (`id`)
)Type=MyISAM
;
I'd recommend reading How MySQL Uses Indexes from the MySQL Reference Manual. It states that indexes are used...
To find the rows matching a WHERE clause quickly.
To eliminate rows from consideration.
To retrieve rows from other tables when performing joins.
To find the MIN() or MAX() value for a specific indexed column.
To sort or group a table (under certain conditions).
To optimize queries using only indexes without consulting the data rows.
Indexes in a database work like an index in a book. You can find what you're looking for in an book quicker, because the index is listed alphabetically. Instead of an alphabetical list, MySQL uses B-trees to organize its indexes, which is quicker for its purposes (but would take a lot longer for a human).
Using more indexes means using up more space (as well as the overhead of maintaining the index), so it's only really worth using indexes on columns that fulfil the above usage criteria.
In your example, the id and blog_post_id columns both uses indexes (PRIMARY KEY is an index too) so that the application can find them quicker. In the case of id, it is likely that this allows users to modify or delete a comment quickly, and in the case of blog_post_id, so the application can quickly find all comments for a given post.
You'll notice that there is no index for the email column. This means that searching for all blog posts by a particular e-mail address would probably take quite a long time. If searching for all comments by a particular e-mail address is something you'd want to add, it might make sense to add an index to that too.
This keyword means that you are creating an index on column blog_post_id along with the table.
Queries like that:
SELECT *
FROM blog_comment
WHERE blog_post_id = #id
will use this index to search on this field and run faster.
Also, there is a foreign key on this column.
When you decide to delete a blog post, the database will need check against this table to see there are no orphan comments. The index will also speed up this check, so queries like
DELETE
FROM blog_post
WHERE ...
will also run faster.