MySQL questions are some of my favorites on StackOverflow.
Unfortunately, things like this:
SELECT foo, bar, baz, quux, frozzle, lambchops FROM something JOIN somethingelse ON 1=1 JOIN (SELECT * FROM areyouserious) v ON 0=5 WHERE lambchops = 'good';
make my eyes bleed.
Also, attempts at describing your schema often go like this:
I have a table CrazyTable with a column that is a date and it has a primary key of Foo_Key but I want to join on SOMETABLE using a substring of column_bar (which is in CrazyTable) which pertains to the phase of the moon (which I store in moon_phases as a thrice-serialized PHP array).
Here is an example of a question I asked, that had I not followed the steps below, I would never have gotten a satisfactory answer from anyone: I have no shame..
I will answer below with what helps me the most with getting the best answer to your question. What helps you?
Use SHOW CREATE TABLE
This tells me more about your tables than your words ever could:
mysql> show create table magic\G
*************************** 1. row ***************************
Table: magic
Create Table: CREATE TABLE `magic` (
`id` int(11) DEFAULT NULL,
`what` varchar(255) DEFAULT NULL,
`the` datetime DEFAULT NULL,
`heck` text,
`soup_is_good` double DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
CAVEAT: If you have 70 columns in your table, omit the unnecessary ones. What's necessary?
Fields JOINed on
Fields SELECTed
Fields WHEREed on
Use EXPLAIN
This allows me to see how best to optimize your currently working, yet presumably slow query:
mysql> explain select * from magic\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: magic
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
Extra:
1 row in set (0.00 sec)
Use \G
Having to scroll right is generally an inconvenience.
Usual:
mysql> select * from magic;
+------------+-------------------------------+---------------------+-------------------+--------------+
| id | what | the | heck | soup_is_good |
+------------+-------------------------------+---------------------+-------------------+--------------+
| 1000000000 | A really long text string yay | 2009-07-29 22:28:17 | OOOH A TEXT FIELD | 100.5 |
+------------+-------------------------------+---------------------+-------------------+--------------+
1 row in set (0.00 sec)
Better:
mysql> select * from magic\G
*************************** 1. row ***************************
id: 1000000000
what: A really long text string yay
the: 2009-07-29 22:28:17
heck: OOOH A TEXT FIELD
soup_is_good: 100.5
1 row in set (0.00 sec)
CAVEAT: \G obviously turns one row of data into several. This becomes equally cumbersome for several rows of data. Do what looks best.
Use an external pastebin for obnoxiously large chunks of data:
Pastie
gist.github
Let us know your expectations
Slow? - We don't know what slow is to you. Seconds, minutes, hours? It helps to know.
Faster - We don't know this either. What's your expectation of fast?
Frequency - Is this a query that you plan to run just once? Daily? Hundreds or thousands of times a day? This helps us know when it's Good Enough.
Procedure Analyse
select * from yourtable procedure analyse()\G
The above will let others know the max and min values stored in the table. That helps.
Knowing which indexes you have on the tables concerned is vital, imo. You state you are using a substring of column_bar in the where clause - you may need to denormalize and store this substring in another column and then index it. There again cardinality of the column can make it worthless using an index on that column, if (for example) there are only 2 distinct values present. For a useful video tutorial on Performance Tuning Best Practices watch this youtube video by Jay Pipes.
Related
How does SQL engines differ when we use equal sign and IN operator have same value? Does execution time changes?
1st one using equality check operator
WHERE column_value = 'All'
2nd one using IN operator and single value
WHERE column_value IN ('All')
Does SQL engine changes IN to = if only one value is there?
Is there any difference for same in MySQL and PostgreSQL?
There is no difference between those two statements, and the optimiser will transform the IN to the = when IN has just one element in it.
Though when you have a question like this, just run both statements, run their execution plan and see the differences. Here - you won't find any.
After a big search online, I found a document on SQL to support this (I assume it applies to all DBMS):
If there is only one value inside the parenthesis, this commend [sic] is equivalent to,
WHERE "column_name" = 'value1
Here is the execution plan of both queries in Oracle (most DBMS will process this the same):
EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number = '123456789'
Plan hash value: 2312174735
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
| 2 | INDEX UNIQUE SCAN | SYS_C0029838 |
-----------------------------------------------------
And for IN() :
EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number in('123456789');
Plan hash value: 2312174735
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
| 2 | INDEX UNIQUE SCAN | SYS_C0029838 |
-----------------------------------------------------
As you can see, both are identical. This is on an indexed column. Same goes for an unindexed column (just full table scan).
There are no big differences really, but if your column_value is indexed, IN operator may not read it as an index.
Encountered this problem once, so be careful.
There is no difference when you are using it with a single value. If you will check the table scan, index scan, or index seek for the above two queries you will find that there is no difference between the two queries.
Is there any difference for same in Mysql and PostgresSQL?
No it would not have any difference on the two engines(Infact it would be same for most of the databases including SQL Server, Oracle etc). Both engines will convert IN to =
Teach a man to fish, etc. Here's how to see for yourself what variations on your queries will do:
mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id = "AMH"\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sentence
type: ref
possible_keys: sentence_lang_id
key: sentence_lang_id
key_len: 153
ref: const
rows: 442
Extra: Using where
And let's try it the other way:
mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id in ("AMH")\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sentence
type: ref
possible_keys: sentence_lang_id
key: sentence_lang_id
key_len: 153
ref: const
rows: 442
Extra: Using where
You can read here about how to interpret the results of a mysql EXPLAIN request. For now, note that we got identical output for both queries: exactly the same "execution plan" is generated. The type row tells us that the query uses a non-unique index (a foreign key, in this case), and the ref row tells us that the query is executed by comparing a constant value against this index.
For single IN Clause,there is no difference..below is demo using an EMPS table i have..
select * from emps where empid in (1)
select * from emps where empid=1
Predicate for First Query in execution plan:
[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[#1],0)
Predicate for second query in execution plan:
[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[#1],0)
If you have multiple values in IN Clause,its better to convert them to joins
Just to add a different perspective, one of the main points of rdbms systems is that they will rewrite your query for you, and pick the best execution plan for that query and all equivalent ones. This means that as long as two queries are logically identical, the should always generate the same execution plan on a given rdbms.
That being said, many queries are equivalent (same result set) but only because of constraints the database itself is unaware of, so be careful about those cases (E.g for a flag field with numbers 1-6, the db doesn't know <3 is the same as in (1,2)). But at the end of the day, if you're just thinking about legibility of and and or statements it won't make a difference for performance which way you write them.
The MySQL 5.7 documentation states:
The filtered column indicates an estimated percentage of table rows that will be filtered by the table condition. That is, rows shows the estimated number of rows examined and rows × filtered / 100 shows the number of rows that will be joined with previous tables.
To attempt to understand this better, I tried it out on a query using the MySQL Sakila Sample Database. The table in question has the following structure:
mysql> SHOW CREATE TABLE film \G
*************************** 1. row ***************************
Table: film
Create Table: CREATE TABLE `film` (
`film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`description` text,
`release_year` year(4) DEFAULT NULL,
`language_id` tinyint(3) unsigned NOT NULL,
`original_language_id` tinyint(3) unsigned DEFAULT NULL,
`rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3',
`rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99',
`length` smallint(5) unsigned DEFAULT NULL,
`replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99',
`rating` enum('G','PG','PG-13','R','NC-17') DEFAULT 'G',
`special_features` set('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') DEFAULT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`film_id`),
KEY `idx_title` (`title`),
KEY `idx_fk_language_id` (`language_id`),
KEY `idx_fk_original_language_id` (`original_language_id`),
CONSTRAINT `fk_film_language` FOREIGN KEY (`language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE,
CONSTRAINT `fk_film_language_original` FOREIGN KEY (`original_language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8
And this is the EXPLAIN plan for the query:
mysql> EXPLAIN SELECT * FROM film WHERE release_year=2006 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1000
filtered: 10.00
Extra: Using where
This table's sample dataset has 1,000 total rows, and all of them have release_year set to 2006. Using the formula in the MySQL documentation:
rows x filtered / 100 = "number of rows that will be joined with previous tables
So,
1,000 x 10 / 100 = 100 = "100 rows will be joined with previous tables"
Huh? What "previous table"? There is no JOIN going on here.
What about the first portion of the quote from the documentation? "Estimated percentage of table rows that will be filtered by the table condition." Well, the table condition is release_year = 2006, and all records have that value, so shouldn't filtered be either 0.00 or 100.00 (depending on what they mean by "filtered")?
Maybe it's behaving strangely because there's no index on release_year? So I created one:
mysql> CREATE INDEX test ON film(release_year);
The filtered column now shows 100.00. So, shouldn't it have shown 0.00 before I added the index? Hm. What if I make half the table have release_year be 2006, and the other half not?
mysql> UPDATE film SET release_year=2017 ORDER BY RAND() LIMIT 500;
Query OK, 500 rows affected (0.03 sec)
Rows matched: 500 Changed: 500 Warnings: 0
Now the EXPLAIN looks like this:
mysql> EXPLAIN SELECT * FROM film WHERE release_year=2006 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
partitions: NULL
type: ref
possible_keys: test
key: test
key_len: 2
ref: const
rows: 500
filtered: 100.00
Extra: Using index condition
And, since I decided to confuse myself even further:
mysql> EXPLAIN SELECT * FROM film WHERE release_year!=2006 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
partitions: NULL
type: ALL
possible_keys: test
key: NULL
key_len: NULL
ref: NULL
rows: 1000
filtered: 50.10
Extra: Using where
So, an estimate of 501 rows will be filtered by the table condition and "joined with previous tables"?
I simply do not understand.
I realize it's an "estimate", but on what is this estimate based? If an index being present moves the estimate to 100.00, shouldn't its absence be 0.00, not 10.00? And what's with that 50.10 result in the last query?
Is filtered at all useful in determining if a query can be optimized further, or how to optimize it further, or is it generally just "noise" that can be ignored?
…number of rows that will be joined with previous tables…
In the absence of any joins, I believe this can be taken to mean number of rows
UPDATE - the documentation, now at least, says "following tables" but the point still stands, thanks #WilsonHauck
To take each of your examples in turn
1000 rows, all from 2006, no index…
EXPLAIN SELECT * FROM film WHERE release_year = 2006
key: NULL
rows: 1000
filtered: 10.00
Extra: Using where
Here the engine expects to visit 1000 rows, and expects to return around 10% of these
As the query is not using an index, it makes sense to predict that every row will be checked, but unfortunately the filtered estimate is inaccurate. I don't know how the engine makes this prediction, but as it doesn't know all the rows are from 2006 (until it checks them).. it's not the craziest thing in the world
Perhaps in the absence of further information, the engine expects any simple = condition to reduce the result set to 10% of the available rows
1000 rows, half from 2006, with index…
EXPLAIN SELECT * FROM film WHERE release_year = 2006
key: test
rows: 500
filtered: 100.00
Extra: Using index condition
Here the engine expects to visit 500 rows and expects to return all of them
Now the query is using the new index, the engine can make more accurate predictions. It can very quickly see that 500 rows match the condition, and will have to visit only and exactly these to satisfy the query
EXPLAIN SELECT * FROM film WHERE release_year != 2006
key: NULL
rows: 1000
filtered: 50.10
Extra: Using where
Here the engine expects to visit 1000 rows and return 50.10% of them
The engine has opted not to use the index, maybe the != operation is not quite as simple as = in this case, and therefore it makes sense to predict that every row will be visited
The engine has, however, made a fairly accurate prediction on how many of these visited rows will be returned. I don't know where the .10% comes from, but perhaps the engine has used the index or the results of previous queries to recognise that around 50% of the rows will match the condition
It's a bit of a dark art, but the filtered value does give you some fairly useful information, and some insight into why the engine has made certain decisions
If the number of rows is high and the filtered rows estimate is low (and accurate), it may be a good indication that a carefully applied index could speed up the query
how can I make use of it?
High numbers (ideally filtered: 100.00) indicate, that the query is using a "good" index, or an index would be useless.
Consider a table with a deleted_at TIMESTAMP NULL column (soft deletion) without an index on it, and like 99% of rows contain NULL (are not deleted). Now with a query like
SELECT * FROM my_table WHERE deleted_at IS NULL
you might see
filtered: 99.00
In this case an index on deleted_at would be useless, due to the overhead of a second lookup (finding the filtered rows in the clustered index). In worst case the index might even hurt the performance, if the optimizer decides to use it.
But if you query for "deleted" rows with
SELECT * FROM my_table WHERE deleted_at IS NOT NULL
you should get something like
filtered: 1.00
The low number indicates, that the query could benefit from an index. If you now create the index on (deleted_at), EXPLAIN will show you
filtered: 100.00
I would say: Anything >= 10% is not worth creating an index. That at least for single-column conditions.
A different story, is when you have a condition on multiple columns like
WHERE a=1 AND b=2
Assuming 1M rows in the table and a cardinality of 10 for both columns (each column contains 10 distinct values) randomly distributed, with an index on (a) the engine would analize 100K rows (10% due to the index on a) and return 10K rows (10% of 10% due to condition on b). EXPLAIN should show you rows: 100000, filtered: 10.00. In this case extending the single column index on (a) to a composite index on (a, b) should improve the query time by factor 10. And EXPLAIN sould show you rows: 10000, filtered: 100.00.
However - That all is more a theory. The reason: I often see filtered: 100.00 when it should be rather 1.00, at least for low cardinality columns and at least on MariaDB. That might be different for MySQL (I can't test that right now), but your example shows a similar behavior (10.00 instead of 100.00).
Actually I don't remember when the filtered value has ever helped me. First things I look at are: The order of the tables (if it's a JOIN), the used key, the used key length and the number of examined rows.
From existing 5.7 documentation today at url
https://dev.mysql.com/doc/refman/5.7/en/explain-output.html
filtered (JSON name: filtered)
The filtered column indicates an estimated percentage of table rows that will be filtered by the table condition. The maximum value is 100, which means no filtering of rows occurred. Values decreasing from 100 indicate increasing amounts of filtering. rows shows the estimated number of rows examined and rows × filtered shows the number of rows that will be joined with the following table. For example, if rows is 1000 and filtered is 50.00 (50%), the number of rows to be joined with the following table is 1000 × 50% = 500.
So you have to write one of these to understand perfectly but the estimate is based not on the contents but meta data about the contents and statistics.
Let me give you a specific made up example I'm not saying any sql platform does what I describe here this is just an example:
You have a table with 1000 rows and max value for year column is 2010 and min value for year column is 2000 -- without any other information you can "guess" that where year = 2007 will take 10% of all items assuming an average distribution.
In this case it would return 1000 and 10.
To answer your final question filtered might be useful if (as shown above) you only have one "default" value that is throwing everything off -- you might decide to use say null instead of a default to get your queries to perform better. Or you might see that statistics needs to be run on your tables more often because the ranges change a lot. This depends a lot on a given platform and your data model.
I find the "filtered" column to be useless.
EXPLAIN (today) uses crude statistics to derive many of the numbers it shows. "Filtered" is an example of how bad they can be.
To get even deeper into numbers, run EXPLAIN FORMAT=JSON SELECT ... This, in newer versions of MySQL, will provide the "cost" for each possible execution plan. Hence, it gives you clues of what options it thought about and the "cost basis" for the plan that was picked. Unfortunately, it uses a constant for fetching a row -- giving no weighting to whether the row came from disk or was already cached.
A more precise metric of what work was done can be derived after the fact via the STATUS "Handler%" values. I discuss that, plus simple optimization techniques in http://mysql.rjweb.org/doc.php/index_cookbook_mysql .
Histograms exist in 8.0 and 10.0; they will provide more precision. They probably help make "filtered" be somewhat useful.
How does SQL engines differ when we use equal sign and IN operator have same value? Does execution time changes?
1st one using equality check operator
WHERE column_value = 'All'
2nd one using IN operator and single value
WHERE column_value IN ('All')
Does SQL engine changes IN to = if only one value is there?
Is there any difference for same in MySQL and PostgreSQL?
There is no difference between those two statements, and the optimiser will transform the IN to the = when IN has just one element in it.
Though when you have a question like this, just run both statements, run their execution plan and see the differences. Here - you won't find any.
After a big search online, I found a document on SQL to support this (I assume it applies to all DBMS):
If there is only one value inside the parenthesis, this commend [sic] is equivalent to,
WHERE "column_name" = 'value1
Here is the execution plan of both queries in Oracle (most DBMS will process this the same):
EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number = '123456789'
Plan hash value: 2312174735
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
| 2 | INDEX UNIQUE SCAN | SYS_C0029838 |
-----------------------------------------------------
And for IN() :
EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number in('123456789');
Plan hash value: 2312174735
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
| 2 | INDEX UNIQUE SCAN | SYS_C0029838 |
-----------------------------------------------------
As you can see, both are identical. This is on an indexed column. Same goes for an unindexed column (just full table scan).
There are no big differences really, but if your column_value is indexed, IN operator may not read it as an index.
Encountered this problem once, so be careful.
There is no difference when you are using it with a single value. If you will check the table scan, index scan, or index seek for the above two queries you will find that there is no difference between the two queries.
Is there any difference for same in Mysql and PostgresSQL?
No it would not have any difference on the two engines(Infact it would be same for most of the databases including SQL Server, Oracle etc). Both engines will convert IN to =
Teach a man to fish, etc. Here's how to see for yourself what variations on your queries will do:
mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id = "AMH"\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sentence
type: ref
possible_keys: sentence_lang_id
key: sentence_lang_id
key_len: 153
ref: const
rows: 442
Extra: Using where
And let's try it the other way:
mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id in ("AMH")\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sentence
type: ref
possible_keys: sentence_lang_id
key: sentence_lang_id
key_len: 153
ref: const
rows: 442
Extra: Using where
You can read here about how to interpret the results of a mysql EXPLAIN request. For now, note that we got identical output for both queries: exactly the same "execution plan" is generated. The type row tells us that the query uses a non-unique index (a foreign key, in this case), and the ref row tells us that the query is executed by comparing a constant value against this index.
For single IN Clause,there is no difference..below is demo using an EMPS table i have..
select * from emps where empid in (1)
select * from emps where empid=1
Predicate for First Query in execution plan:
[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[#1],0)
Predicate for second query in execution plan:
[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[#1],0)
If you have multiple values in IN Clause,its better to convert them to joins
Just to add a different perspective, one of the main points of rdbms systems is that they will rewrite your query for you, and pick the best execution plan for that query and all equivalent ones. This means that as long as two queries are logically identical, the should always generate the same execution plan on a given rdbms.
That being said, many queries are equivalent (same result set) but only because of constraints the database itself is unaware of, so be careful about those cases (E.g for a flag field with numbers 1-6, the db doesn't know <3 is the same as in (1,2)). But at the end of the day, if you're just thinking about legibility of and and or statements it won't make a difference for performance which way you write them.
Is there a problem, issue, or performance hit when using duplicate WHERE clauses?
Example SQL code:
SELECT * FROM `table`
WHERE `field` = 1
AND `field` = 1
AND `field2` = 22
AND `field2` = 22
Does the optimizer eliminate the duplicates?
The WHERE clause works like an if condition in any programming language. This clause is used to compare given value with the field value available in MySQL table.
If given value from outside is equal to the available field value in MySQL table, then it returns that row.
You won't face any problems or issues by having duplicate conditions but in a bigger scale this might slightly decrease performance.
EDIT:
You can issue an EXPLAIN statement, which tells MySQL to display some information about how it would execute a SELECT query without actually executing it.
This way you can see exactly what is going to be executed.
To use EXPLAIN, just put the word EXPLAIN in front of the SELECT statement:
mysql> EXPLAIN SELECT * FROM table WHERE 0\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: NULL
type: NULL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra: Impossible WHERE
Normally, EXPLAIN returns more information than that, including non-NULL information about the indexes that will be used to scan tables, the types of joins that will be used, and estimates of the number of rows that will need to be examined from each table.
I have a query of such like
$query = "SELECT * FROM tbl_comments WHERE id=222 ORDER BY comment_time";
Do I need to add an index on the comment_time field?
Also, if I want to get the data between two dates then how should I build the index?
Yes, index will help you, when using ORDER BY. Because INDEX is a sorted data structure, so the request will be executed faster.
Look at this example: table test2 with 3 rows. I used LIMIT after order by to show the difference in execution.
DROP TABLE IF EXISTS `test2`;
CREATE TABLE `test2` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` varchar(10) CHARACTER SET utf8 COLLATE utf8_swedish_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `ix_value` (`value`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;
-- ----------------------------
-- Records of test2
-- ----------------------------
INSERT INTO `test2` VALUES ('1', '10');
INSERT INTO `test2` VALUES ('2', '11');
INSERT INTO `test2` VALUES ('2', '9');
-- ----------------------------
-- Without INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value LIMIT 1\G
*************************** 1. row *************************
id: 1
select_type: SIMPLE
table: test2
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3
Extra: Using filesort
1 row in set (0.00 sec)
MySQL checked 3 rows to output the result.
After CREATE INDEX, we get this:
mysql> CREATE INDEX ix_value ON test2 (value) USING BTREE;
Query OK, 0 rows affected (0.14 sec)
-- ----------------------------
-- With INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value LIMIT 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test2
type: index
possible_keys: NULL
key: ix_value
key_len: 32
ref: NULL
rows: 1
Extra: Using index
1 row in set (0.00 sec)
Now MySQL used only 1 row.
Answering the received comments, I tried the same query without LIMIT:
-- ----------------------------
-- Without INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value\G
*************************** 1. row ******************
id: 1
select_type: SIMPLE
table: test2
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3
Extra: Using filesort
-- ----------------------------
-- With INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value\G
*************************** 1. row *****************
id: 1
select_type: SIMPLE
table: test2
type: index
possible_keys: NULL
key: ix_value
key_len: 32
ref: NULL
rows: 3
Extra: Using index
As we see, it uses index, for the 2-nd ORDER BY.
To build an index on your field, use this:
CREATE INDEX ix_comment_time ON tbl_comments (comment_time) USING BTREE;
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
An index on the comment_time field might not help at all for a query like this:
SELECT *
FROM tbl_comments
WHERE id=222
ORDER BY comment_time;
The query needs to scan the the table to find the matching id values. It can do this by scanning the index, looking up the rows, and doing the test. If there is one row that matches and it has the highext comment_time, then this requires scanning the index and reading the table.
Without the index, it would scan the table, find the row, and very quickly sort the 1 row. The sequential scan of the table would typically be faster than an index scan followed by a page lookup (and would definitely be faster on a table larger than available memory).
On the other hand, an index on id, comment_time would be very helpful.
Technically you don't need indices on every field, as it will work too, however for performance reasons you might need one or more.
EDIT
This problem is known from the beginning of software design. Typically if you increase amount of memory used by the program, you will reduce its speed (assuming the program is well-written). Assigning an index to a field increases data used by the db, but makes searching faster. If you do not want to search anything by this field (you actually do in the question), it would not be necessary.
In modern era the indices are not so big comparing to disk data size and adding one or more should not be a bad idea.
Normally it is very difficult to surely tell "do I need index or not". Some help is provided by EXPLAIN statement (refer to the manual).
Regarding your first question, you don't have to create index on comment_time. If the number of records is very large you'll need indices to speed your retrieval. But for your operation you don't need indices.
For your second question using a WHERE Clause like this will help you.
WHERE(comment_time BETWEEN 'startDate' AND 'endDate');
You don't have to put the index on comment_time if your where id is distinct.
To increase the speed of retrieval of data you would need index. This will work with out index also. For your second question you can use WHERE and BETWEEN clause.
Refer: http://www.w3schools.com/sql/sql_between.asp
The EXPLAIN statement is very useful in situations like that. For your query, you would use it as follows:
EXPLAIN SELECT * FROM tbl_comments WHERE id=222 ORDER BY comment_time
This will output which indexes are being used to execute the query and allows you to perform experiments with different indexes to find the best configuration. In order to speed up sorting, you will want a BTREE index since it stores data in a sorted manner. To speed up finding items with a certain id, a HASH index is the better option since it provides quick lookups for equality predicates. Note that MySQL might not be able to use a combination of both indexes to execute your query and will instead use just one of them.
Further information: http://dev.mysql.com/doc/refman/5.7/en/using-explain.html
For range predicates, like dates in a range of dates, a BTREE index will perform better than a HASH index.
Further information: http://dev.mysql.com/doc/refman/5.7/en/create-index.html