I read tons of articles about btree theorem about database..thing there is always bewildering.
assuming I have a table as described as follow:
table userinfo:
(user_id as primary key, username as string, password as string)
as described in some articles, the user_id is created as index for the table userinfo, I will get efficient preformance, if i select records by index of user_id..
but if i select by username, it's said it campares the lines one by one.....
I try this in MYSQL , it is not as slow as expected....
why?
how does mysql do with this selcetion??
thanx
If your WHERE clause compares by username (which is not indexed), it will probably do a full table scan. But, this may still be fast if the number of rows in the table is small. Computers are very fast these days and DBs are smart about organizing data for efficient table scans.
Related
My table has user id(primary key) and it's relation to other users, so table contains multiple rows for primary key. Right now I have 500k around users in system, so together these generated millions of rows in that table.
Searching and other operation became so slow due to that, I checked about mysql partitioning, so want to understand that how I can use partitioning for this scenario so query will execute on that particular user id only (as in every query I used to pass user id in condition).
Before you look into partitions, have you considered adding indexes? Check your query planner to see if it does a full table scan.
I have a customer table having customer_id, customer_name, email,password,status etc fields.
Currently only customer_id is indexed (as it is the primary key)
I have few queries that select customers as follows
select * from customer
where status=1 and email<>''
and email is not null
and password<>''
and password is not null
This runs slow as I have 1.3 million records in it
So I was thinking of adding index on email field.
I want to know which indexing will make it better the simple index will work or I have to use FULLTEXT index
A FULLTEXT index is helpful for searching for words within a column.
If you really just search for empty (and non-null) emails and passwords, then a simple index will suffice.
For this very query, a more relevant index would be:
ALTER TABLE customer ADD INDEX (status, email, password);
[edit]
As correctly pointed out by Dukeling et al., such an index is probably useless if most of your customers do have an e-mail or a password set.
Assuming the above (most of your customers do have an e-mail and a password set), then your query returns many records, and any index will be of little help (as advised by nos and raina77ow).
The only thing one can be sure of, is that a FULLTEXT index is useless in this case.
I'm still trying to get my head around the best way to use INDEXES in MySQL. How do you know when to merge them together and when to have them separate?
Below are the indexes from the Wordpress posts table. See how post_name, post_parent and post_author are seperate entries? And then they have type_status_date which is a mixture of 4 fields?
http://img215.imageshack.us/img215/5976/screenshot20120426at431.png
I don't understand the logic behind this? Can anyone enlighten me?
Going to be a bit of a long answer but here we go. Please note I am not going to deal with the differences in database engines here(MyISAM and InnoDB have distinct way of implementing what I am trying to describe)
First thing you have to understand about a index is that it is a separate data structure stored on disk. Normally this is a b-tree data structure containing the column(s) that you have indexed and also contain a pointer to the row in the table(this pointer is normally the primary key).
The only index that is stored with the data is the primary key index. Thus a primary key index IS the table.
Lets assume you have following table definition.
CREATE TABLE `Student` (
`StudentNumber` INT NOT NULL ,
`Name` VARCHAR(32) NULL ,
`Surname` VARCHAR(32) NULL ,
`StudentEmail` VARCHAR(32) NULL ,
PRIMARY KEY (`StudentNumber`) );
Since we have a primary key on StudentID there will be a index containing the primary key and the other columns in the index. If you had to look at the data in the index you would probably see something like this.
1 , John ,Doe ,Jdoe#gmail.com
As you can see this is the table data once again showing you that the primary key index IS the table.
The StudentNumber column is indexed which allows your to effectively search on it the rest of the data is stored with the key. Thus if ran the following query:
SELECT * FROM Student WHERE StudentNumber=1
MySQL would use the primary index to quickly find the row and the read the data stored with the indexed column. Since there is a index MySQL can use the index to do a effective binary seek operation on the b-tree.
Also when it comes to retrieving the data after doing the search MySQL can read the data from the index thus we are using 1 operation in the index to retrieve the data. Now if I ran the following query:
SELECT * FROM Student WHERE Name ='Joe'
MySQL would check if there is a index that it could use to speed the query up. However in my case there is no index on name so MySQL would do a sequential read from the table one row at a time from the first row to the last.
At each row it would evaluate the row against the where clause and return matching row. So basically it reads the primary key index from top to bottom. Remember the primary key index is the table.
If I ran the following statement:
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_name` (`Name` ASC) ;
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_surname` (`Surname` ASC) ;
MySQL would create new indexes on the Student table. This will be stored away from the table on disk and the data inside would look something like this:
Data in ix_Name
John, 1 <--PRIMARY KEY VALUE
Data in ix_Surname
Doe, 1 <--PRIMARY KEY VALUE
Notice the data in the ix_Name index is the name and the primary key value. Great so if I ran the previous select statement MySQL would then read the ix_name index and get the primary key value for matching items and then use the primary key index to get the rest of the data.
So the number of operations to get the data from the index is 2. The matching rows are found in the index and then a lookup happens on the primary key to get the row data out.
You now have the following query:
SELECT * FROM Student WHERE Name='John' AND surname ='Doe'
Here MySQL cant use both indexes as it would be a waste of operations. If MySQL had to use both indexes in this query the following would happen(this should not happen).
1 Find in the ix_Name the rows with the value John
2 Read the primary key that matches to get the row data
3 Store the matching results
4 Find in the ix Surname the rows with the value Doe
5 Read the primary key that matches to get row data.
6 Store the matching results
7 Take the Name results and Surname results and merge them
8 Return query results.
This is really a waste of IO as MySQL would then read the table twice. Basically using one index would be better than trying to use two(I will explain in a momnet why). MySQL will choose 1 index to use in a this simple query.
So how does MySQL decide on which index to use?
MySQL keeps statistics around indexes internally. These statistics tell MySQL basically how unique a index is. So for the sake of argument lets say the surname index (ix_surname)was more unique than the name index(ix_name) MySQL would use the surname index (ix_surname).
Thus query retrieval would be like this:
1 Use the ix_surname and find rows that match the value Doe
2 Read the primary key and apply the filter for the value John on the actual column data in the row.
3 Return the matched row.
As you can see the number of operations in this search is much less. I have over simplified a lot of the technical detail. Indexing is a interesting thing to master but you have to look at it from the perspective of how do I get the data with the minimal amount of IO.
Hope it is as clear as mud now!
MySQL cannot normally use more than one index at a time. That means, for instance, that when you have a query that filters or sorts on two fields you put them both into the same index.
WordPress likely has a common query that filters and/or sorts on post_type, post_status and post_date. Making an educated guess as to what they stand for, this would likely be the core query for WordPress's Post listing pages. So the three fields are put into the same index.
THE INFO
Currently I have two tables I am working with- a POST table that holds data for a individual posts, and a FAVORITES table that holds data for users that opt to save favorite posts in their profile.
The tables look like this:
On the POSTS table there is only a primary key on id, no indexes that I have set. On Favorites I have a combined index that I was testing of (postid, deviceid).
The POSTS table contains approx. 10,000 entries.
The FAVORITES table contains approx. 4,680,500 entries.
The query I use to grab the favorites from a particular deviceid is:
SELECT post FROM POSTS
WHERE id IN
(SELECT postid FROM favourites WHERE deviceid="12d4a4a4a4a4a4a");
THE PROBLEM:
With the amount of data being returned, and several devices having multiple favorites, the query can take upwards of 7-10 seconds to both COUNT favorites for a particular device and/or SELECT using the above query and subquery. When this happens during peak times, you can obviously imagine the issues that can cause.
Caching the query results is an option, but since the data is pretty specific in that the same user is not calling the query multiple times, but rather unique instances, I think there is a better solution. On another note, caching would need to be short lived, which would nullify its benefit.
I know the method of indexing, and I am familiar with foreign keys, but I'm not sure practically if and how they could be implemented between the query and the subquery to enhance performance.
Any advice/guidance is much appreciated.
Cheers,
Jared
SELECT post FROM POSTS
INNER JOIN favourites ON POSTS.id=favourites.postid
WHERE favourites.deviceid="12d4a4a4a4a4a4a");
split the index in favourites in 2 indices one on deviceid and one on postid
Why use a subquery? Have you tried a join?
SELECT post FROM posts INNER JOIN favourites ON posts.id=favourites.postid WHERE deviceid="12d4a4a4a4a4a4a"
You won't be using (only) your indices to retrieve the query results since the post field is not in any index. So you might actually end up saving time by making one query to get all the matching IDs from posts, then a second to get the post values.
Using EXPLAIN SELECT... will also help you optimize this query. Have you tried that?
On MySQL, composite indexes can only be used in the order the keys are defined. So for index (postid, deviceid), you can only use the index if you have a postid and need the deviceid. In your query here you're doing the opposite--you have a constant deviceid and want corresponding postid. So your query is not using any indexes.
More information on mysql composite indexes.
You should either add a deviceid index or reverse the index so that it's (deviceid, postid).
By the way, your favorites table looks a lot like a junction table. Consider whether you need the id column at all.
A couple of things you could do to improve performance:
Separate the device_id out to a device table with a surrogate primary key (an int) and a non-clustered index on the device_id varchar. The favorites table should only include the device table surrogate key. This should make the favorites table smaller and should make your favorites table index smaller. The smaller the index and smaller the table, the faster it will be to search.
Your favorites table index is wrong. It should not be (post_id,device_id). It should be (device_id,post_id) as your query needs to search by device_id first. As your favorites table row is so small, I question the value of including the post_id in the index. It just isn't worth the extra space for a possible marginal improvement in query speed.
EDIT: You need the post_id in the index to keep the entries unique (just make sure device_id is first).
I have two tables (MYISAM)
create table A (email varchar(50));
create table B( email varchar(50) key 'email' (email));
Table A has 130K records
Table B has 20K records
why does this sql statement take very long time (more than two minutes, then i aborted query by Ctrl+C)
Statement is:
select count(*) from user A, tmp B where A.email=B.email;
Thanks
I'd guess that the Query optimizer has nothing to go on. Why don't you try defining indexes on the email columns.
In general, joining on strings is more expensive than joining on shorter data types like int.
You could speed up this query by making sure both email columns are indexed.
If table A has an int ID field then table B should store that ID instead of storing the email string again. That would decrease the DB size and along with indexes would provide a much faster query speed than a string would ever give you.