How does INDEX work with MYSQL?
Suppose I got 2 tables like this
//customerTable
id auto_increment,
username char(30),
password char(40),
phone int(10)
//profileTable
id auto_increment,
username char(30),
description text
And I created an INDEX on username on both tables, like this
create index username on `customerTable` ( username, password )
create index username on `profileTable` ( username )
Then I run these queries:
select * from `customerTable` where username='abc' limit 1
select * from `customerTable` where username='abc' and password='xyzzzzz' limit 1
select customerTable.*, profileTable.* from
customerTable, profileTable where
customerTable.username='abc'
and customerTable.password='xyzzzzzzz'
and customerTable.username = profileTable.username
limit 1
Which indexes will these 3 queries use? Because name of both indexes is same...
Index names must be unique within the same table. That is, you can't have two indexes in the same table and name both indexes username.
You can reuse an index name on a different table, like you have shown. Index names don't have to be unique over multiple tables. In this way, they are like column names. You can use the same column name in more than one table.
Some people like to define a naming convention for their index names, but it doesn't really affect anything as far as the database is concerned.
I'm especially puzzled when I see developers who think they have to use "idx_" as a prefix for every index name. It's not necessary, it's just four extra characters you have to type.
The SQL query optimizer knows which index belongs to each table, even if they have the same name. It will not get confused.
You might like my presentation How to Design Indexes, Really, or the video of me presenting it: https://www.youtube.com/watch?v=ELR7-RdU9XU
P.S.: I have a couple of comments that are not directly related to your question, but I have to caution you:
Please don't store passwords in plain text. If a hacker gains access to your database, you'll be sorry. Read You're Probably Storing Passwords Incorrectly.
You're using old-fashioned syntax for your joins. Read Why isn't SQL ANSI-92 standard better adopted over ANSI-89?
Related
My Table Schema is
CREATE TABLE ITEMS(Time , Name, Token) PRIMARY_KEY(Time, NAME).
Where Time is the timestamp the item is created. When i do the following query
SELECT Name, Token from ITEMS where name = shoes
it takes a while to load the data as my table has more than million rows.
Should i need to add INDEX for faster retrieval of data? I already have an INDEX for this table as there is a PRIMARY KEY.
You need a separate index for name. The primary key index can handle name, but only in conjunction with time.
If you defined it instead as:
PRIMARY_KEY(Name, Time)
Then your query could take advantage of the index.
MySQL has pretty good documentation on composite indexes here.
When you create index using PRIMARY_KEY(Time, NAME), these values will be concatenated. There is no way for MySQL to use this index to search by NAME.
BTW, you may get lot of useful hints from query optimiser if you use EXPLAN keyword in front of your query like this:
EXPLAIN SELECT Name, Token from ITEMS where name = shoes
Keep your eye on output marked "where". This tells how many records MySQL needs to fetch and examine manually after all indexes are exhausted. No need to wait or test in blind.
I have a customer table having customer_id, customer_name, email,password,status etc fields.
Currently only customer_id is indexed (as it is the primary key)
I have few queries that select customers as follows
select * from customer
where status=1 and email<>''
and email is not null
and password<>''
and password is not null
This runs slow as I have 1.3 million records in it
So I was thinking of adding index on email field.
I want to know which indexing will make it better the simple index will work or I have to use FULLTEXT index
A FULLTEXT index is helpful for searching for words within a column.
If you really just search for empty (and non-null) emails and passwords, then a simple index will suffice.
For this very query, a more relevant index would be:
ALTER TABLE customer ADD INDEX (status, email, password);
[edit]
As correctly pointed out by Dukeling et al., such an index is probably useless if most of your customers do have an e-mail or a password set.
Assuming the above (most of your customers do have an e-mail and a password set), then your query returns many records, and any index will be of little help (as advised by nos and raina77ow).
The only thing one can be sure of, is that a FULLTEXT index is useless in this case.
What index(es) to I need to set to get results as fast as possible for DISTINCT queries on a certain column?
Example table columns:
id INTEGER
name VARCHAR(32)
groupname VARCHAR(16)
Every so often I need to get a list of all groups,
SELECT DISTINCT groupname FROM data ORDER BY groupname
The table can have > 200k entries, but only about a dozen groups. I would like to not use a separate table for the group names, because the data get imported often from a CSV file.
In this case, an index on groupname should get you the best possible results.
If that's not good enough, a couple more options to consider - first, you could cache the results of that query so that you only run it when you absolutely have to. Second, you could create a separate table to store the groupname values and populate it via an insert trigger (this would avoid having to change your CSV import process)
Indexing on groupname will solve your issue. If you are very much concern about the performance of your query while inserting/updating, then instead of indexing whole column try "column prefix Indexing".
Just adding indexes on varchar might slow down your insertion/updation as it need to update the index lookup for every write. For more information read BTree indexing algorithm
I am creating a simple comparison script and I have some questions for the database structure. Firstly the database will be huge, I am expecting more than 1 million entries in products.
Secondly, there will be a search form that the search term will look into (%$term%) the field name and display the product's related info and shop's info.
Below you can see my database structure named products.
id int(10) NOT NULL
name varchar(50) NOT NULL
link varchar(50) NOT NULL
description varchar(50) NOT NULL
image varchar(50) NOT NULL
price varchar(50) NOT NULL
My questions are:
Do you suggest me to index a field? Users will not be able to insert or update products, the only query will be SELECT to display the results and I will update the products from XML feeds often for possible products changes.
I have to store the shop info like name, shipping, link, image... This gives me two option. a) To create a new table named shops and join those two tables with a new field in products shopID that will look for the id in shops and display the info or b) Should I add these info (name, shipping, ...) in extra fields in products in every single product ? (I think the answer is obvious but I need your suggestion).
Are there any other things I should have in mind, or change?
I am not an advanced programmer and what I learn is through internet, so maybe the questions are too obvious for you, but for me is the ticket for learning.
Thank you for your answers.
Indexes are required to fetch records very fast. So yes, they're recommended. But what kind of an index would you like to use? MyISAM engine offers "regular" string index that you can use with a LIKE clause (e.g. LIKE 'hello%') but it restricts you from using a wildcard at the beginning of the search phrase. In addition, MyISAM has a FULLTEXT index that allows you to search words in the whole string, not just the beginning of the string. So you could create a FULLTEXT index on the columns description and name - but 2 FULLTEXT indexes seem redundant in this case. Maybe you could join those columns and separate the values with a token or a character? If so, you'll need to create only 1 FULLTEXT index on the joined column, which can save a lot fragmentation and disk space. One of the cons for using MyISAM engine is that when writing to it (UPDATE/DELETE queries) - it locks the entire table. So, if the table is written to many times a minute, it will probably make other queries hang. That's why you should see if InnoDB engine suits your needs - which enables concurrent read/write operations on the table.
That's probably a good idea, since having index on the column price seems essential, and FULLTEXT indexes doesn't work together with other indexes.
I'd say: Use InnoDB and Sphinx, and have a primary index on id & a regular index on price.
The most important thing for you to understand is that when writing a code for specific software, you must be well familiar with that software and it's caveats. You should read High performance MySQL - extremely recommended.
Edit:
If you want to add an indexes in the products table, you can do that with
ALTER TABLE /* etc */ when the table is empty or contains small amount of data. If the table has a lot of data, then it's recommended to create another table that's similar to products, altering that new table and populating it with data from the old products table, e.g.:
CREATE TABLE `products_new` LIKE `products`;
ALTER TABLE `products_new` ADD FULLTEXT (`name`);
LOCK TABLES `products` READ, `products_new` WRITE;
INSERT INTO `products_new` SELECT * FROM `products`;
LOCK TABLES `products` WRITE, `products_new` WRITE;
ALTER TABLE `products` RENAME TO `products_bad`;
ALTER TABLE `products_new` RENAME TO `products`;
/* The following doesn't work:
RENAME TABLE `products` TO `products_bad`, `products_new` TO `products`;
See: http://bugs.mysql.com/bug.php?id=22246
*/
DROP TABLE `products_bad`;
Nikolai,
The ID should be a primary key. That automatically puts an index on ID, and will speed up any queries that need to get specific products.
The shop table should be a second table, but you should have a 3rd table that joins product with shops. At it's most basic, it would have two fields, shop_id, product_id. This let's you have a single product in multiple shops. These two fields should be foreign keys to the product table and shop table.
If you are ever thinking about having a different price for a product per shop, then the product_store join table should also contain the price, although the base price could be stored in the products table.
Price should be a decimal, so that you can do calculations on the price field.
1) You should generally index fields that are commonly used. However since your search on name uses a wildcard at the start an index will have no effect on this query.
2) Creating a shops table and linking to this would be better.
Price for sure because something tells me you will search over this field and do orderings.
"Premature optimization is a root of all evil" (c) Donald Knuth. So, I suggest to normalize your tables, so YES - create table for shops. Once your applicated grown big, and you faced to highloads, you will be able to denormalize your database to avoid JOINS (one way to optimize your voracious application)
Get back to stackoverflow with your problem ;-)
Generally you should index fields that will be intensively used. But using wildcard for your search won't help much.
Better use another table with foreign key.
Also shouldn't your "id" field in your products table be define as PRIMARY KEY ?
Here are my suggestions:
To be able to search for %term% you need full-text search, an index will not do you any good when the search-term starts with a wildcard.
Yes you should put an index on the id-column (and probably make it auto increment) since that seems to be the unique column in the table. Other than that there's no point in us suggesting any other indexes since we don't which queries you are going to run.
Yes, create another table for shops, otherwise you will have data that is not normalized, for shop-name and so on (there might be rare cases that "require" de-normalization, such as optimization, but you have not reached there yet). Not normalized data will cause problems, in your specific case, such as what will you do when a shop needs to change it name? Well, you will have to update all matching rows in the product table.
There are many things you should keep in mind, but it's out of scope for this answer. I suggest that you get to work and learn as you go, because learning by doing is a great way become a better developer. Then when you hit a specific problem, search for/post it here on stackoverflow.
Assume that I have one big table with three columns: "user_name", "user_property", "value_of_property". Lat's also assume that I have a lot of user (let say 100 000) and a lot of properties (let say 10 000). Then the table is going to be huge (1 billion rows).
When I extract information from the table I always need information about a particular user. So, I use, for example where user_name='Albert Gates'. So, every time the mysql server needs to analyze 1 billion lines to find those of them which contain "Albert Gates" as user_name.
Would it not be wise to split the big table into many small ones corresponding to fixed users?
No, I don't think that is a good idea. A better approach is to add an index on the user_name column - and perhaps another index on (user_name, user_property) for looking up a single property. Then the database does not need to scan all the rows - it just need to find the appropriate entry in the index which is stored in a B-Tree, making it easy to find a record in a very small amount of time.
If your application is still slow even after correctly indexing it can sometimes be a good idea to partition your largest tables.
One other thing you could consider is normalizing your database so that the user_name is stored in a separate table and use an integer foriegn key in its place. This can reduce storage requirements and can increase performance. The same may apply to user_property.
you should normalise your design as follows:
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;
drop table if exists properties;
create table properties
(
property_id smallint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;
drop table if exists user_property_values;
create table user_property_values
(
user_id int unsigned not null,
property_id smallint unsigned not null,
value varchar(255) not null,
primary key (user_id, property_id),
key (property_id)
)
engine=innodb;
insert into users (username) values ('f00'),('bar'),('alpha'),('beta');
insert into properties (name) values ('age'),('gender');
insert into user_property_values values
(1,1,'30'),(1,2,'Male'),
(2,1,'24'),(2,2,'Female'),
(3,1,'18'),
(4,1,'26'),(4,2,'Male');
From a performance perspective the innodb clustered index works wonders in this similar example (COLD run):
select count(*) from product
count(*)
========
1,000,000 (1M)
select count(*) from category
count(*)
========
250,000 (500K)
select count(*) from product_category
count(*)
========
125,431,192 (125M)
select
c.*,
p.*
from
product_category pc
inner join category c on pc.cat_id = c.cat_id
inner join product p on pc.prod_id = p.prod_id
where
pc.cat_id = 1001;
0:00:00.030: Query OK (0.03 secs)
Properly indexing your database will be the number 1 way of improving performance. I once had a query take a half an hour (on a large dataset, but none the less). Then we come to find out that the tables had no index. Once indexed the query took less than 10 seconds.
Why do you need to have this table structure. My fundemental problem is that you are going to have to cast the data in value of property every time you want to use it. That is bad in my opinion - also storing numbers as text is crazy given that its all binary anyway. For instance how are you going to have required fields? Or fields that need to have constraints based on other fields? Eg start and end date?
Why not simply have the properties as fields rather than some many to many relationship?
have 1 flat table. When your business rules begin to show that properties should be grouped then you can consider moving them out into other tables and have several 1:0-1 relationships with the users table. But this is not normalization and it will degrade performance slightly due to the extra join (however the self documenting nature of the table names will greatly aid any developers)
One way i regularly see databqase performance get totally castrated is by having a generic
Id, property Type, Property Name, Property Value table.
This is really lazy but exceptionally flexible but totally kills performance. In fact on a new job where performance is bad i actually ask if they have a table with this structure - it invariably becomes the center point of the database and is slow. The whole point of relational database design is that the relations are determined ahead of time. This is simply a technique that aims to speed up development at a huge cost to application speed. It also puts a huge reliance on business logic in the application layer to behave - which is not defensive at all. Eventually you find that you wan to use properties in a key relationsip which leads to all kinds of casting on the join which further degrades performance.
If data has a 1:1 relationship with an entity then it should be a field on the same table. If your table gets to more than 30 fields wide then consider movign them into another table but dont call it normalisation because it isnt. It is a technique to help developers group fields together at the cost of performance in an attempt to aid understanding.
I don't know if mysql has an equivalent but sqlserver 2008 has sparse columns - null values take no space.
SParse column datatypes
I'm not saying a EAV approach is always wrong, but i think using a relational database for this approach is probably not the best choice.