MySQL: Slow SELECT because of Index / FKEY?

MySQL: Slow SELECT because of Index / FKEY? - mysql

Dear StackOverflow Members
It's my first post, so please be nice :-)
I have a strange SQL behavior which i can't explain and don't find any resources which explains it.
I have built a web honeypot which record all access and attacks and display it on a statistic page.
However since the data increased, the generation of the statistic page is getting slower and slower.
I narrowed it down to a some select statements which takes a quite a long time.
The "issue" seems to be an index on a specific column.
*For sure the real issue is my lack of knowledge :-)
Database: mysql
DB schema
Event Table (removed unrelated columes):
Event table size: 30MB
Event table records: 335k
CREATE TABLE `event` (
`EventID` int(11) NOT NULL,
`EventTime` datetime NOT NULL DEFAULT current_timestamp(),
`WEBURL` varchar(50) COLLATE utf8_bin DEFAULT NULL,
`IP` varchar(15) COLLATE utf8_bin NOT NULL,
`AttackID` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
ALTER TABLE `event`
ADD PRIMARY KEY (`EventID`),
ADD KEY `AttackID` (`AttackID`);
ALTER TABLE `event`
ADD CONSTRAINT `event_ibfk_1` FOREIGN KEY (`AttackID`) REFERENCES `attack` (`AttackID`);
Attack Table
attack table size: 32KB
attack Table records: 11
CREATE TABLE attack (
`AttackID` int(4) NOT NULL,
`AttackName` varchar(30) COLLATE utf8_bin NOT NULL,
`AttackDescription` varchar(70) COLLATE utf8_bin NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
ALTER TABLE `attack`
ADD PRIMARY KEY (`AttackID`),
SLOW Query:
SELECT Count(EventID), IP
-> FROM event
-> WHERE AttackID >0
-> GROUP BY IP
-> ORDER BY Count(EventID) DESC
-> LIMIT 5;
RESULT: 5 rows in set (1.220 sec)
(This seems quite long for me, for a simple query)
QuerySlow
Now the Strange thing:
If I remove the foreign key relationship the performance of the query is the same.
But if I remove the the index on event.AttackID same select statement is much faster:
(ALTER TABLE `event` DROP INDEX `AttackID`;)
The result of the SQL SELECT query:
5 rows in set (0.242 sec)
QueryFast
From my understanding indexes on columns which are used in "WHERE" should improve the performance.
Why does removing the index have such an impact on the query?
What can I do to keep the relations between the table and have a faster
SELECT execution?
Cheers

Why does removing the index improve performance?
The query optimizer has multiple ways to resolve a query. For instance, two methods for filtering data are:
Look up the rows that match the where clause in the index and then fetch related data from the data pages.
Scan the index.
This doesn't get into the use of indexes for joins or aggregations or alternative algorithms.
Which is better? Under some circumstances, the first method is horribly slower than the second. This occurs when the data for the table does not fit into memory. Under such circumstances, the index can read a record from page 124 and then from 1068 and then from 124 again and -- well, all sorts of random intertwined reading of pages. Reading data pages in order is usually faster. And when the data doesn't fit into memory, thrashing occurs, which means that a page in memory is aged (overwritten) -- and then needed again.
I'm not saying that is occurring in your case. I am simply saying that what optimizers do is not always obvious. The optimizer has to make judgements based on the nature of the data -- and those judgements are not right 100% of the time. They are usually correct. But there are borderline cases. Sometimes, the issue is out-of-date statistics. Sometimes the issue is that what looks best to the optimizer is not best in practice.
Let me emphasize that optimizers usually do a very good job, and a better job than a person would do. Even if they occasionally come up with suboptimal plans, they are still quite useful.

Get rid of your redundant UNIQUE KEYs. A primary key is a unique key.
Use COUNT(*) rather than COUNT(IP) in your query. They mean the same thing because you declared IP to be NOT NULL.
Your query can be much faster if you stop saying WHERE AttackId>0. Because that column is a FK to the PK of your other table, those values should be nonzero anyway. But to get that speedup you'll need an index on event(IP) something like this.
CREATE INDEX IpDex ON event (IP)
But you're still summarizing a large table, and that will always take time.
It looks like you want to display some kind of leaderboard. You could add a top_ips table, and use an EVENT to populate it, using your query, every few minutes. Then you could display it to your users without incurring the cost of the query every time. This of course would display slightly stale data; only you know whether that's acceptable in your app.
Pro Tip. Read https://use-the-index-luke.com by Marcus Winand.

Essentially every part of your query, except for the FKey, conspires to make the query slow.
Your query is equivalent to
SELECT Count(*), IP
FROM event
WHERE AttackID >0
GROUP BY IP
ORDER BY Count(*) DESC
LIMIT 5;
Please use COUNT(*) unless you need to avoid NULL.
If AttackID is rarely >0, the optimal index is probably
ADD INDEX(AttackID, -- for filtering
IP) -- for covering
Else, the optimal index is probably
ADD INDEX(IP, -- to avoid sorting
AttackID) -- for covering
You could simply add both indexes and let the Optimizer decide. Meanwhile, get rid of these, if they exist:
DROP INDEX(AttackID)
DROP INDEX(IP)
because any uses of them are handled by the new indexes.
Furthermore, leaving the 1-column indexes around can confuse the Optimizer into using them instead of the covering index. (This seems to be a design flaw in at least some versions of MySQL/MariaDB.)
"Covering" means that the query can be performed entirely in the index's BTree. EXPLAIN will indicate it with "Using index". A "covering" index speeds up a query by 2x -- but there is a very wide variation on this prediction. ("Using index condition" is something different.)
More on index creation: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Related

Improving MySQL Query Speeds - 150,000+ Rows Returned Slows Query

Hi I currently have a query which is taking 11(sec) to run. I have a report which is displayed on a website which runs 4 different queries which are similar and all take 11(sec) each to run. I don't really want the customer having to wait a minute for all of these queries to run and display the data.
I am using 4 different AJAX requests to call an APIs to get the data I need and these all start at once but the queries are running one after another. If there was a way to get these queries to all run at once (parallel) so the total load time is only 11(sec) that would also fix my issue, I don't believe that is possible though.
Here is the query I am running:
SELECT device_uuid,
day_epoch,
is_repeat
FROM tracking_daily_stats_zone_unique_device_uuids_per_hour
WHERE day_epoch >= 1552435200
AND day_epoch < 1553040000
AND venue_id = 46
AND zone_id IN (102,105,108,110,111,113,116,117,118,121,287)
I can't think of anyway to speed this query up at all, below are pictures of the table indexes and the explain statement on this query.
I think the above query is using relevant indexes in the where conditions.
If there is anything you can think of to speed this query up please let me know, I have been working on it for 3 days and can't seem to figure out the problem. It would be great to get the query times down to 5(sec) maximum. If I am wrong about the AJAX issue please let me know as this would also fix my issue.
" EDIT "
I have came across something quite strange which might be causing the issue. When I change the day_epoch range to something smaller (5th - 9th) which returns 130,000 rows the query time is 0.7(sec) but then I add one more day onto that range (5th - 10th) and it returns over 150,000 rows the query time is 13(sec). I have ran loads of different ranges and have came to the conclusion if the amount of rows returned is over 150,000 that has a huge effect on the query times.
Table Definition -
CREATE TABLE `tracking_daily_stats_zone_unique_device_uuids_per_hour` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`day_epoch` int(10) NOT NULL,
`day_of_week` tinyint(1) NOT NULL COMMENT 'day of week, monday = 1',
`hour` int(2) NOT NULL,
`venue_id` int(5) NOT NULL,
`zone_id` int(5) NOT NULL,
`device_uuid` binary(16) NOT NULL COMMENT 'binary representation of the device_uuid, unique for a single day',
`device_vendor_id` int(5) unsigned NOT NULL DEFAULT '0' COMMENT 'id of the device vendor',
`first_seen` int(10) unsigned NOT NULL DEFAULT '0',
`last_seen` int(10) unsigned NOT NULL DEFAULT '0',
`is_repeat` tinyint(1) NOT NULL COMMENT 'is the device a repeat for this day?',
`prev_last_seen` int(10) NOT NULL DEFAULT '0' COMMENT 'previous last seen ts',
PRIMARY KEY (`id`,`venue_id`) USING BTREE,
KEY `venue_id` (`venue_id`),
KEY `zone_id` (`zone_id`),
KEY `day_of_week` (`day_of_week`),
KEY `day_epoch` (`day_epoch`),
KEY `hour` (`hour`),
KEY `device_uuid` (`device_uuid`),
KEY `is_repeat` (`is_repeat`),
KEY `device_vendor_id` (`device_vendor_id`)
) ENGINE=InnoDB AUTO_INCREMENT=450967720 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY HASH (venue_id)
PARTITIONS 100 */

The straight forward solution is to add this query specific index to the table:
ALTER TABLE tracking_daily_stats_zone_unique_device_uuids_per_hour
ADD INDEX complex_idx (`venue_id`, `day_epoch`, `zone_id`)
WARNING This query change can take a while on DB.
And then force it when you call:
SELECT device_uuid,
day_epoch,
is_repeat
FROM tracking_daily_stats_zone_unique_device_uuids_per_hour
USE INDEX (complex_idx)
WHERE day_epoch >= 1552435200
AND day_epoch < 1553040000
AND venue_id = 46
AND zone_id IN (102,105,108,110,111,113,116,117,118,121,287)
It is definitely not universal but should work for this particular query.
UPDATE When you have partitioned table you can get profit by forcing particular PARTITION. In our case since that is venue_id just force it:
SELECT device_uuid,
day_epoch,
is_repeat
FROM tracking_daily_stats_zone_unique_device_uuids_per_hour
PARTITION (`p46`)
WHERE day_epoch >= 1552435200
AND day_epoch < 1553040000
AND zone_id IN (102,105,108,110,111,113,116,117,118,121,287)
Where p46 is concatenated string of p and venue_id = 46
And another trick if you go this way. You can remove AND venue_id = 46 from WHERE clause. Because there is no other data in that partition.

What happens if you change the order of conditions? Put venue_id = ? first. The order matters.
Now it first checks all rows for:
- day_epoch >= 1552435200
- then, the remaining set for day_epoch < 1553040000
- then, the remaining set for venue_id = 46
- then, the remaining set for zone_id IN (102,105,108,110,111,113,116,117,118,121,287)
When working with heavy queries, you should always try to make the first "selector" the most effective. You can do that by using a proper index for 1 (or combination) index and to make sure that first selector narrows down the most (at least for integers, in case of strings you need another tactic).
Sometimes, a query simply is slow. When you have a lot of data (and/or not enough resources) you just cant really do anything about that. Thats where you need another solution: Make a summary table. I doubt you show 150.000 rows x4 to your visitor. You can sum it, e.g., hourly or every few minutes and select from that way smaller table.
Offtopic: Putting an index on everything only slows you down when inserting/updating/deleting. Index the least amount of columns, just the once you actually filter on (e.g. use in a WHERE or GROUP BY).

450M rows is rather large. So, I will discuss a variety of issues that can help.
Shrink data A big table leads to more I/O, which is the main performance killer. ('Small' tables tend to stay cached, and not have an I/O burden.)
Any kind of INT, even INT(2) takes 4 bytes. An "hour" can easily fit in a 1-byte TINYINT. That saves over a 1GB in the data, plus a similar amount in INDEX(hour).
If hour and day_of_week can be derived, don't bother having them as separate columns. This will save more space.
Some reason to use a 4-byte day_epoch instead of a 3-byte DATE? Or perhaps you do need a 5-byte DATETIME or TIMESTAMP.
Optimal INDEX (take #1)
If it is always a single venue_id, then either this is a good first cut at the optimal index:
INDEX(venue_id, zone_id, day_epoch)
First is the constant, then the IN, then a range. The Optimizer does well with this in many cases. (It is unclear whether the number of items in an IN clause can lead to inefficiencies.)
Better Primary Key (better index)
With AUTO_INCREMENT, there is probably no good reason to include columns after the auto_inc column in the PK. That is, PRIMARY KEY(id, venue_id) is no better than PRIMARY KEY(id).
InnoDB orders the data's BTree according to the PRIMARY KEY. So, if you are fetching several rows and can arrange for them to be adjacent to each other based on the PK, you get extra performance. (cf "Clustered".) So:
PRIMARY KEY(venue_id, zone_id, day_epoch, -- this order, as discussed above;
id) -- to make sure that the entire PK is unique.
INDEX(id) -- to keep AUTO_INCREMENT happy
And, I agree with DROPping any indexes that are not in use, including the one I recommended above. It is rarely useful to index flags (is_repeat).
UUID
Indexing a UUID can be deadly for performance once the table is really big. This is because of the randomness of UUIDs/GUIDs, leading to ever-increasing I/O burden to insert new entries in the index.
Multi-dimensional
Assuming day_epoch is sometimes multiple days, you seem to have 2 or 3 "dimensions":
A date range
A list of zones
A venue.
INDEXes are 1-dimensional. Therein lies the problem. However, PARTITIONing can sometimes help. I discuss this briefly as "case 2" in http://mysql.rjweb.org/doc.php/partitionmaint .
There is no good way to get 3 dimensions, so let's focus on 2.
You should partition on something that is a "range", such as day_epoch or zone_id.
After that, you should decide what to put in the PRIMARY KEY so that you can further take advantage of "clustering".
Plan A: This assumes you are searching for only one venue_id at a time:
PARTITION BY RANGE(day_epoch) -- see note below
PRIMARY KEY(venue_id, zone_id, id)
Plan B: This assumes you sometimes srefineearch for venue_id IN (.., .., ...), hence it does not make a good first column for the PK:
Well, I don't have good advice here; so let's go with Plan A.
The RANGE expression must be numeric. Your day_epoch works fine as is. Changing to a DATE, would necessitate BY RANGE(TO_DAYS(...)), which works fine.
You should limit the number of partitions to 50. (The 81 mentioned above is not bad.) The problem is that "lots" of partitions introduces different inefficiencies; "too few" partitions leads to "why bother".
Note that almost always the optimal PK is different for a partitioned table than the equivalent non-partitioned table.
Note that I disagree with partitioning on venue_id since it is so easy to put that column at the start of the PK instead.
Analysis
Assuming you search for a single venue_id and use my suggested partitioning & PK, here's how the SELECT performs:
Filter on the date range. This is likely to limit the activity to a single partition.
Drill into the data's BTree for that one partition to find the one venue_id.
Hopscotch through the data from there, landing on the desired zone_ids.
For each, further filter based the date.

MySQL taking forever 'sending data'. Simple query, lots of data

I'm trying to run what I believe to be a simple query on a fairly large dataset, and it's taking a very long time to execute -- it stalls in the "Sending data" state for 3-4 hours or more.
The table looks like this:
CREATE TABLE `transaction` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`uuid` varchar(36) NOT NULL,
`userId` varchar(64) NOT NULL,
`protocol` int(11) NOT NULL,
... A few other fields: ints and small varchars
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `uuid` (`uuid`),
KEY `userId` (`userId`),
KEY `protocol` (`protocol`),
KEY `created` (`created`)
) ENGINE=InnoDB AUTO_INCREMENT=61 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 COMMENT='Transaction audit table'
And the query is here:
select protocol, count(distinct userId) as count from transaction
where created > '2012-01-15 23:59:59' and created <= '2012-02-14 23:59:59'
group by protocol;
The table has approximately 222 million rows, and the where clause in the query filters down to about 20 million rows. The distinct option will bring it down to about 700,000 distinct rows, and then after grouping, (and when the query finally finishes), 4 to 5 rows are actually returned.
I realize that it's a lot of data, but it seems that 4-5 hours is an awfully long time for this query.
Thanks.
Edit: For reference, this is running on AWS on a db.m2.4xlarge RDS database instance.

Why don't you profile a query and see what exactly is happening?
SET PROFILING = 1;
SET profiling_history_size = 0;
SET profiling_history_size = 15;
/* Your query should be here */
SHOW PROFILES;
SELECT state, ROUND(SUM(duration),5) AS `duration (summed) in sec` FROM information_schema.profiling WHERE query_id = 3 GROUP BY state ORDER BY `duration (summed) in sec` DESC;
SET PROFILING = 0;
EXPLAIN /* Your query again should appear here */;
I think this will help you in seeing where exactly query takes time and based on result you can perform optimization operations.

This is a really heavy query. To understand why it takes so long you should understand the details.
You have a range condition on the indexed field, that is MySQL finds the smallest created value in the index and for each value it gets the corresponding primary key from the index, retrieves the row from disk, and fetches the required fields (protocol, userId) missing in the current index record, puts them in a "temporary table", making the groupings on those 700000 rows. The index can actually be used and is used here only for speeding up the range condition.
The only way to speed it up, is to have an index that contains all the necessary data, so that MySQL would not need to make on disk lookups for the rows. That is called a covering index. But you should understand that the index will reside in memory and will contain ~ sizeOf(created+protocol+userId+PK)*rowCount bytes, that may become a burden as itself for the queries that update the table and for other indexes. It is easier to create a separate aggregates table and periodically update the table using your query.

Both distinct and group by will need to sort and store temporary data on the server. With that much data that might take a while.
Indexing different combinations of userId, created and protocol will help, but I can't say how much or what index will help the most.

Starting from a certain version of MariaDB (maybe since 10.5), I noticed that after importing a dump with
mysql dbname < dump.sql
the optimizer thinks things different from how they are, making the wrong decisions about indexes.
In general even listing tables innodb with phpmyadmin becomes very very slow.
I noticed that running
ANALYZE TABLE myTable;
fixes.
So after each import I run, that it's equal to run ANALYZE on each table
mysqlcheck -aA

MYSQL indexing issue

I am having some difficulties finding an answer to this question...
For simplicity lets create use this situation.
I create a table like this..
CREATE TABLE `test` (
`MerchID` int(10) DEFAULT NULL,
KEY `MerchID` (`MerchID`)
) ENGINE=InnoDB AUTO_INCREMENT=32769 DEFAULT CHARSET=utf8;
I will insert some data into the column of this table...
INSERT INTO test
SELECT 1
UNION
SELECT 2
UNION
SELECT null
Now I examine the query using MYSQL's explain feature...
EXPLAIN
SELECT * FROM test
WHERE merchid IS NOT NULL
Resting in ID=1
,select_type=SIMPLE
,table=test
,type=index
,possible_keys=MerchID
,key=MerchID
,key_len=5
,ref=NULL
,rows=3
,Extra= Using where
;Using index
In production in my real procedure something like this takes a long time with this index. If I re declare the table with the index line reading "KEY MerchID (MerchID) USING BTREE' I get much better results. The explain feature seems to return the same results too. I have read some basics about the BTREE, HASH and RTREE storage types for indexes/keys. When no storage type is specified I was unded the assumption that BTREE would be assumed. However I am kinda stumped why when modifying my index to use this storage type my procedure seems to fly. Any ideas?
I am using MYSQL 5.1 and coding in MYSQL Workbench. The part of procedure that appears to be help up is like the one I illustrated above where the column of a joined table is tested for NULL.

I think you are on the wrong path. For InnoDB storage the only available index method is the BTREE so if you are safe to omit the BTREE keyword from you table create script.Supported index types here along with other useful information.
The performance issue is coming from a different place.

Whenever testing performance, be sure to always use the SQL_NO_CACHE directive, otherwise, with query caching, the second time you run a query, your results may be returned a lot faster simply due to caching.
With a covering index (all of the selected and filtered columns are in the index), the query is rather efficient. Using index in the EXPLAIN result shows that it's being used as a covering index.
However, if the index were not a covering index, MySQL would have to perform a seek for each row returned by the index in order to grab the actual table data. While this would still be fast for a small result set, with a result set of 1 million rows, that would be 1 million seeks. If the number of NULL rows were a high percentage, MySQL would abandon the index altogether to avoid the seeks.
Ensure that your real "production" index is a covering index as well.

How to replace clustered index scan with a non-clustered index seek or clustered index seek?

Below is my create table script:-
CREATE TABLE [dbo].[PatientCharts](
[PatientChartId] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[FacilityId] [uniqueidentifier] NOT NULL,
[VisitNumber] [varchar](200) NOT NULL,
[MRNNumber] [varchar](100) NULL,
[TimeIn] [time](7) NULL,
[TimeOut] [time](7) NULL,
[DateOfService] [date] NULL,
[DateOut] [date] NULL),
I have one clustered index on PatientChartId and two non-clustered index on VisitNumber and MRNNumber. This table has millions of records.
The following query is doing a clustered index scan:-
SELECT *
FROM dbo.PatientCharts
INNER JOIN ( SELECT FacilityID
FROM Facilities
WHERE RemoteClientDB IN (
SELECT SiteID
FROM RemoteClient WITH ( NOLOCK )
WHERE Code = 'IN-ESXI-EDISC14'
)
) AS Filter ON dbo.PatientCharts.FacilityId = Filter.FacilityID
This clustered index scan is taking a lot of time in production because of data volume.
The execution plan is :-
I have even tried adding a Non-clusted index on FacilityID and including PatientChartID but still the same execution plan.
I am doing DBCC FREEPROCCACHE everytime to instruct sql server to use a new plan every time.
Is there anything else which I should do to prevent clusteredindex scan ?

The clustered scan will occur since there is no index to support your query. Even if you index FacilityID and PatientChartID you are still potentially asking for sufficient amounts of data to scan due to going past the tipping point (Google Kimberly Tripp Tipping Point)
There is no easy way to say the next part, but for a system with millions of records but such a trivial query causing you a problem, you are going to have to get a lot more aware about indexing in general and how the SQL plan engine behaves. I would recommend Kalen Delany's SQL Internals and if you search on here for book recommendations, there are questions with a number of good solid recommendations.

Have you tried implementing this as a straight query with inner joins instead of using subqueries for each step?
I would be happy to take a look at the resulting execution plan if you change the query to the following form:
select * from patientschart...
inner join facilities...
inner join remoteclientdb....
where...
I think the optimizer will choose the correct indexes once you get rid of the subqueries. Try it and share the execution plan.
Also, on another note, do you need all fields in the resultset? You might benefit by switching to specific columns instead of * in the select list.
I hope this helps.

As Andrew mentioned, your clustered index isn't helping you or hurting you here- if you didn't have the clustered index, you'd see a table scan instead (which I assure you would be no more fun than the clustered index scan).
Assuming that this is the most important query on this table, I'd say that you should change the table design so that the clustered index is on FacilityID instead. That would be dramatically faster.

I think you should avoid doing a SELECT * and specify the coulmns which you require . Then you can plan your indexes on the execution plan you get

MYSQL: Index keyword in create table and when to use it

What does index keyword mean and what function it serves? I understand that it is meant to speed up querying, but I am not very sure how this can be done.
When how to choose the column to be indexed?
A sample of index keyword usage is shown below in create table query:
CREATE TABLE `blog_comment`
(
`id` INTEGER NOT NULL AUTO_INCREMENT,
`blog_post_id` INTEGER,
`author` VARCHAR(255),
`email` VARCHAR(255),
`body` TEXT,
`created_at` DATETIME,
PRIMARY KEY (`id`),
INDEX `blog_comment_FI_1` (`blog_post_id`),
CONSTRAINT `blog_comment_FK_1`
FOREIGN KEY (`blog_post_id`)
REFERENCES `blog_post` (`id`)
)Type=MyISAM
;

I'd recommend reading How MySQL Uses Indexes from the MySQL Reference Manual. It states that indexes are used...
To find the rows matching a WHERE clause quickly.
To eliminate rows from consideration.
To retrieve rows from other tables when performing joins.
To find the MIN() or MAX() value for a specific indexed column.
To sort or group a table (under certain conditions).
To optimize queries using only indexes without consulting the data rows.
Indexes in a database work like an index in a book. You can find what you're looking for in an book quicker, because the index is listed alphabetically. Instead of an alphabetical list, MySQL uses B-trees to organize its indexes, which is quicker for its purposes (but would take a lot longer for a human).
Using more indexes means using up more space (as well as the overhead of maintaining the index), so it's only really worth using indexes on columns that fulfil the above usage criteria.
In your example, the id and blog_post_id columns both uses indexes (PRIMARY KEY is an index too) so that the application can find them quicker. In the case of id, it is likely that this allows users to modify or delete a comment quickly, and in the case of blog_post_id, so the application can quickly find all comments for a given post.
You'll notice that there is no index for the email column. This means that searching for all blog posts by a particular e-mail address would probably take quite a long time. If searching for all comments by a particular e-mail address is something you'd want to add, it might make sense to add an index to that too.

This keyword means that you are creating an index on column blog_post_id along with the table.
Queries like that:
SELECT *
FROM blog_comment
WHERE blog_post_id = #id
will use this index to search on this field and run faster.
Also, there is a foreign key on this column.
When you decide to delete a blog post, the database will need check against this table to see there are no orphan comments. The index will also speed up this check, so queries like
DELETE
FROM blog_post
WHERE ...
will also run faster.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008