What index(es) to I need to set to get results as fast as possible for DISTINCT queries on a certain column?
Example table columns:
id INTEGER
name VARCHAR(32)
groupname VARCHAR(16)
Every so often I need to get a list of all groups,
SELECT DISTINCT groupname FROM data ORDER BY groupname
The table can have > 200k entries, but only about a dozen groups. I would like to not use a separate table for the group names, because the data get imported often from a CSV file.
In this case, an index on groupname should get you the best possible results.
If that's not good enough, a couple more options to consider - first, you could cache the results of that query so that you only run it when you absolutely have to. Second, you could create a separate table to store the groupname values and populate it via an insert trigger (this would avoid having to change your CSV import process)
Indexing on groupname will solve your issue. If you are very much concern about the performance of your query while inserting/updating, then instead of indexing whole column try "column prefix Indexing".
Just adding indexes on varchar might slow down your insertion/updation as it need to update the index lookup for every write. For more information read BTree indexing algorithm
Related
I have a database table with 10000 rows in it and I'd like to select a few thousand items using something like the following:
SELECT id FROM models WHERE category_id = 2
EDIT: The id column is the primary index of the table. Also, the table has another index on category_id.
My question is what would be the impact on performance? Would the query run slow? Should I consider splitting my models table into separate tables (one table for each category)?
This is what database engines are designed for. Just make sure you have an index on the column that you're using in the WHERE clause.
You can try this to get the 100 records
SELECT id FROM models WHERE category_id = 2 LIMT 100
Also you can create index on that column to get the fast retrival of the result
ALTER TABLE `table` ADD INDEX `category_id ` (`category_id `)
EDIT:-
If you have index created on your columns then you dont have to worry about the performance, database engines are smart enough to take care of the performance.
My question is what would be the impact on performance? Would the
query run slow? Should I consider splitting my models table into
separate tables
No you dont have to split your tables as that would not help you in gaining performance
Im fairly new to SQL however I would first index the column
I agree with R.T.'s solution. In addition I can recommend you the link below :
https://indexanalysis.codeplex.com/
download the sql code. It's a stored procedure that helps me a lot when I want to analyze the impact of the indexes or what status they have in my database.
Please check.
My Table Schema is
CREATE TABLE ITEMS(Time , Name, Token) PRIMARY_KEY(Time, NAME).
Where Time is the timestamp the item is created. When i do the following query
SELECT Name, Token from ITEMS where name = shoes
it takes a while to load the data as my table has more than million rows.
Should i need to add INDEX for faster retrieval of data? I already have an INDEX for this table as there is a PRIMARY KEY.
You need a separate index for name. The primary key index can handle name, but only in conjunction with time.
If you defined it instead as:
PRIMARY_KEY(Name, Time)
Then your query could take advantage of the index.
MySQL has pretty good documentation on composite indexes here.
When you create index using PRIMARY_KEY(Time, NAME), these values will be concatenated. There is no way for MySQL to use this index to search by NAME.
BTW, you may get lot of useful hints from query optimiser if you use EXPLAN keyword in front of your query like this:
EXPLAIN SELECT Name, Token from ITEMS where name = shoes
Keep your eye on output marked "where". This tells how many records MySQL needs to fetch and examine manually after all indexes are exhausted. No need to wait or test in blind.
If I have a query with ordering by a string column, like this...
SELECT * FROM foo ORDER BY name
...should I create an index for foo.name? (foo.name may be VARCHAR(255) or VARCHAR(400)
Obviously, you should create an index on name.
If you run queries with order_by by a column or where conditions by a columns, then those columns should be indexed.
It will increase the speed in which you get the result from any database.
But indexing should be used with caution. Too much of indexing may slow up your database.
You should index those columns which are searched frequently or ordered frequently.
Doesn't seem to affect, in MySQL with my test table. My test table is small, though. Explain revealed that Mysql has to do file sort in both the cases; with and without index.
Itz better to check your query with explain to confirm the same.
What is the best solution for inserting boolean values on database if you want more query performance and minimum losing of memory on select statement.
For example:
I have a table with 36 fields that 30 of them has boolean values (zero or one) and i need to search records using the boolean fields that just have true values.
SELECT * FROM `myTable`
WHERE
`field_5th` = 1
AND `field_12th` = 1
AND `field_20` = 1
AND `field_8` = 1
Is there any solution?
If you want to store boolean values or flags there are basically three options:
Individual columns
This is reflected in your example above. The advantage is that you will be able to put indexes on the flags you intend to use most often for lookups. The disadvantage is that this will take up more space (since the minimum column size that can be allocated is 1 byte.)
However, if you're column names are really going to be field_20, field_21, etc. Then this is absolutely NOT the way to go. Numbered columns are a sign you should use either of the other two methods.
Bitmasks
As was suggested above you can store multiple values in a single integer column. A BIGINT column would give you up to 64 possible flags.
Values would be something like:
UPDATE table SET flags=b'100';
UPDATE table SET flags=b'10000';
Then the field would look something like: 10100
That would represent having two flag values set. To query for any particular flag value set, you would do
SELECT flags FROM table WHERE flags & b'100';
The advantage of this is that your flags are very compact space-wise. The disadvantage is that you can't place indexes on the field which would help improve the performance of searching for specific flags.
One-to-many relationship
This is where you create another table, and each row there would have the id of the row it's linked to, and the flag:
CREATE TABLE main (
main_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
);
CREATE TABLE flag (
main_id INT UNSIGNED NOT NULL,
name VARCHAR(16)
);
Then you would insert multiple rows into the flag table.
The advantage is that you can use indexes for lookups, and you can have any number of flags per row without changing your schema. This works best for sparse values, where most rows do not have a value set. If every row needs all flags defined, then this isn't very efficient.
For performance comparisson you can read a blog post I wrote on the topic:
Set Performance Compare
Also when you ask which is "Best" that's a very subjective question. Best at what? It all really depends on what your data looks like and what your requirements are and how you want to query it.
Keep in mind that if you want to do a query like:
SELECT * FROM table WHERE some_flag=true
Indexes will only help you if few rows have that value set. If most of the rows in the table have some_flag=true, then mysql will ignore indexes and do a full table scan instead.
How many rows of data are you querying over? You can store the boolean values in an integer value and use bit operations to test for them them. It's not indexable, but storage is very well packed. Using TINYINT fields with indexes would pick one index to use and scan from there.
Assume I have this table:
create table table_a (
id int,
name varchar(25),
address varchar(25),
primary key (id)
) engine = innodb;
When I run this query:
select * from table_a where id >= 'x' and name = 'test';
How will MySQL process it? Will it pull all the id's first (assume 1000 rows) then apply the where clause name = 'test'?
Or while it looks for the ids, it is already applying the where clause at the same time?
As id is the PK (and no index on name) it will load all rows that satisfy the id based criterion into memory after which it will filter the resultset by the name criterion. Adding a composite index containing both fields would mean that it would only load the records that satisfy both criteria. Adding a separate single column index on the name field may not result in an index merge operation, in which case the index would have no effect.
Do you have indexes on either column? That may affect the execution plan. The other thing is one might cast the 'x'::int to ensure a numeric comparison instead of a string comparison.
For the best result, you should have a single index which includes both of the columns id and name.
In your case, I can't answer the affect of the primary index to that query. That depends on DBMS's and versions. If you really don't want to put more index (because more index means slow write and updates) just populate your table with like 10.000.000 random results, try it and see the effect.
you can compare the execution times by executing the query first when the id comes first in the where clause and then interchange and bring the name first. to see an example of mysql performance with indexes check this out http://www.mysqlperformanceblog.com/2006/06/02/indexes-in-mysql/
You can get information on how the query is processed by running EXPLAIN on the query.
If the idea is to optimize that query then you might want to add an index like:
alter table table_a add unique index name_id_idx (name, id);