I have three tables, all of them can have possibly millions of rows. I have an actions table and a reactions table, that holds reactions related to actions. Then there is a emotes table linked to reactions. What I would like to do with this particular query, is finding the most clicked emote for a certain action. The difficulty for me is that the query includes three tables instead of only two.
Table actions (postings):
PKY id
...
Table reactions (comments, emotes etc.):
PKY id
INT action_id (related to actions table)
...
Table emotes:
PKY id
INT react_id (related to reactions table)
INT emote_id (related to a hardcoded list of available emotes)
...
The SQL query I came up with basically seems to work, but it takes 12 seconds if the tables contain millions rows. The SQL query looks like this:
select emote_id, count(*) as cnt from emotes
where react_id in (
select id from reactions where action_id=2942715
)
group by emote_id order by cnt desc limit 1
MySQL explain says the following:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY emotes index NULL react_id_2 21 NULL 4358594 Using where; Using index; Using temporary; Using f...
2 DEPENDENT SUBQUERY reactions unique_subquery PRIMARY,action_id PRIMARY 8 func 1 Using where
...I am grateful for any tips for improving on the query. Note that I will NOT call this query every time a list of actions is being built, but only when emotes are being added. Therefore it's no problem if the query takes maybe 0.5 seconds to finish. But 12 is too long!
what about this
SELECT
emote_id,
count(*) as cnt
FROM emotes a
INNER JOIN reactions r
ON r.id = a.react_id
WHERE action_id = 2942715
GROUP BY emote_id
ORDER BY cnt DESC
LIMIT 1
Related
I am using Laravel query builder to get desired results from database. The following query if working perfectly but taking too much time to get results. Can you please help me with this?
select
`amz_ads_sp_campaigns`.*,
SUM(attributedUnitsOrdered7d) as order7d,
SUM(attributedUnitsOrdered30d) as order30d,
SUM(attributedSales7d) as sale7d,
SUM(attributedSales30d) as sale30d,
SUM(impressions) as impressions,
SUM(clicks) as clicks,
SUM(cost) as cost,
SUM(attributedConversions7d) as attributedConversions7d,
SUM(attributedConversions30d) as attributedConversions30d
from
`amz_ads_sp_product_targetings`
inner join `amz_ads_sp_report_product_targetings` on `amz_ads_sp_product_targetings`.`campaignId` = `amz_ads_sp_report_product_targetings`.`campaignId`
inner join `amz_ads_sp_campaigns` on `amz_ads_sp_report_product_targetings`.`campaignId` = `amz_ads_sp_campaigns`.`campaignId`
where
(
`amz_ads_sp_product_targetings`.`user_id` = ?
and `amz_ads_sp_product_targetings`.`profileId` = ?
)
group by
`amz_ads_sp_product_targetings`.`campaignId`
Result of Explain SQL
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE amz_ads_sp_report_product_targetings ALL campaignId NULL NULL NULL 50061 Using temporary; Using filesort
1 SIMPLE amz_ads_sp_campaigns ref campaignId campaignId 8 pr-amz-ppc.amz_ads_sp_report_product_targetings.ca... 1
1 SIMPLE amz_ads_sp_product_targetings ref campaignId campaignId 8 pr-amz-ppc.amz_ads_sp_report_product_targetings.ca... 33 Using where
Your query could benefit from several indices to cover the WHERE clause as well as the join conditions:
CREATE INDEX idx1 ON amz_ads_sp_product_targetings (
user_id, profileId, campaignId);
CREATE INDEX idx2 ON amz_ads_sp_report_product_targetings (
campaignId);
CREATE INDEX idx3 ON amz_ads_sp_campaigns (campaignId);
The first index idx1 covers the entire WHERE clause, which might let MySQL throw away many records on the initial scan of the amz_ads_sp_product_targetings table. It also includes the campaignId column, which is needed for the first join. The second and third indices cover the join columns of each respective table. This might let MySQL do a more rapid lookup during the join process.
Note that selecting amz_ads_sp_campaigns.* is not valid unless the campaignId of that table be the primary key. Also, there isn't much else we can do speed up the query, as SUM, by its nature, requires touching every record in order to come up the result sum.
I am looking at making a request using 2 tables faster.
I have the following 2 tables :
Table "logs"
id varchar(36) PK
date timestamp(2)
more varchar fields, and one text field
That table has what the PHP Laravel Framework calls a "polymorphic many to many" relationship with several other objects, so there is a second table "logs_pivot" :
id unsigned int PK
log_id varchar(36) FOREIGN KEY (logs.id)
model_id varchar(40)
model_type varchar(50)
There is one or several entries in logs_pivot per entry in logs. They have 20+ and 10+ millions of rows, respectively.
We do queries like so :
select * from logs
join logs_pivot on logs.id = logs_pivot.log_id
where model_id = 'some_id' and model_type = 'My\Class'
order by date desc
limit 50;
Obviously we have a compound index on both the model_id and model_type fields, but the requests are still slow : several (dozens of) seconds every times.
We also have an index on the date field, but an EXPLAIN show that this is the model_id_model_type index that is used.
Explain statement:
+----+-------------+-------------+------------+--------+--------------------------------------------------------------------------------+-----------------------------------------------+---------+-------------------------------------------+------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+--------+--------------------------------------------------------------------------------+-----------------------------------------------+---------+-------------------------------------------+------+----------+---------------------------------+
| 1 | SIMPLE | logs_pivot | NULL | ref | logs_pivot_model_id_model_type_index,logs_pivot_log_id_index | logs_pivot_model_id_model_type_index | 364 | const,const | 1 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | logs | NULL | eq_ref | PRIMARY | PRIMARY | 146 | the_db_name.logs_pivot.log_id | 1 | 100.00 | NULL |
+----+-------------+-------------+------------+--------+--------------------------------------------------------------------------------+-----------------------------------------------+---------+-------------------------------------------+------+----------+---------------------------------+
In other tables, I was able to make a similar request much faster by including the date field in the index. But in that case they are in a separate table.
When we want to access these data, they are typically a few hours/days old.
Our InnoDB pools are much too small to hold all that data (+ all the other tables) in memory, so the data is most probably always queried on disk.
What would be all the ways we could make that request faster ?
Ideally only with another index, or by changing how it is done.
Thanks a lot !
Edit 17h05 :
Thank you all for your answers so far, I will try something like O Jones suggest, and also to somehow include the date field in the pivot table, so that I can include in the index index.
Edit 14/10 10h.
Solution :
So I ended up changing how the request was really done, by sorting on the id field of the pivot table, which indeed allow to put it in an index.
Also the request to count the total number of rows is changed to only be done on the pivot table, when it is not filtered by date.
Thank you all !
Just a suggestion. Using a compound index is obviously a good thing. Another might be to pre-qualify an ID by date, and extend your index based on your logs_pivot table indexing on (model_id, model_type, log_id ).
If your querying data, and the entire history is 20+ million records, how far back does the data go where you are only dealing with getting a limit of 50 records per given category of model id/type. Say 3-months? vs say your log of 5 years? (not listed in post, just a for-instance). So if you can query the minimum log ID where the date is greater than say 3 months back, that one ID can limit what else is going on from your logs_pivot table.
Something like
select
lp.*,
l.date
from
logs_pivot lp
JOIN Logs l
on lp.log_id = l.id
where
model_id = 'some_id'
and model_type = 'My\Class'
and log_id >= ( select min( id )
from logs
where date >= datesub( curdate(), interval 3 month ))
order by
l.date desc
limit
50;
So, the where clause for the log_id is done once and returns just an ID from as far back as 3 months and not the entire history of the logs_pivot. Then you query with the optimized two-part key of model id/type, but also jumping to the end of its index with the ID included in the index key to skip over all the historical.
Another thing you MAY want to include are some pre-aggregate tables of how many records such as per month/year per given model type/id. Use that as a pre-query to present to users, then you can use that as a drill-down to further get more detail. A pre-aggregate table can be done on all the historical stuff once since it would be static and not change. The only one you would have to constantly update would be whatever the current single month period is, such as on a nightly basis. Or even possibly better, via a trigger that either inserts a record every time an add is done, or updates a count for the given model/type based on year/month aggregations. Again, just a suggestion as no other context on how / why the data will be presented to the end-user.
I see two problems:
UUIDs are costly when tables are huge relative to RAM size.
The LIMIT cannot be handled optimally because the WHERE clauses come from one table, but the ORDER BY column comes from another table. That is, it will do all of the JOIN, then sort and finally peel off a few rows.
SELECT columns FROM big table ORDER BY something LIMIT small number is a notorious query performance antipattern. Why? the server sorts a whole mess of long rows then discards almost all of them. It doesn't help that one of your columns is a LOB -- a TEXT column.
Here's an approach that can reduce that overhead: Figure out which rows you want by finding the set of primary keys you want, then fetch the content of only those rows.
What rows do you want? This subquery finds them.
SELECT id
FROM logs
JOIN logs_pivot
ON logs.id = logs_pivot.log_id
WHERE logs_pivot.model_id = 'some_id'
AND logs_pivot.model_type = 'My\Class'
ORDER BY logs.date DESC
LIMIT 50
This does all the heavy lifting of working out the rows you want. So, this is the query you need to optimize.
It can be accelerated by this index on logs
CREATE INDEX logs_date_desc ON logs (date DESC);
and this three-column compound index on logs_pivot
CREATE INDEX logs_pivot_lookup ON logs_pivot (model_id, model_type, log_id);
This index is likely to be better, since the Optimizer will see the filtering on logs_pivot but not logs. Hence, it will look in logs_pivot first.
Or maybe
CREATE INDEX logs_pivot_lookup ON logs_pivot (log_id, model_id, model_type);
Try one then the other to see which yields faster results. (I'm not sure how the JOIN will use the compound index.) (Or simply add both, and use EXPLAIN to see which one it uses.)
Then, when you're happy -- or satisfied anyway -- with the subquery's performance, use it to grab the rows you need, like this
SELECT *
FROM logs
WHERE id IN (
SELECT id
FROM logs
JOIN logs_pivot
ON logs.id = logs_pivot.log_id
WHERE logs_pivot.model_id = 'some_id'
AND model_type = 'My\Class'
ORDER BY logs.date DESC
LIMIT 50
)
ORDER BY date DESC
This works because it sorts less data. The covering three-column index on logs_pivot will also help.
Notice that both the sub query and main query have ORDER BY clauses, to make sure the returned detail result set is in the order you need.
Edit Darnit, been on MariaDB 10+ and MySQL 8+ so long I forgot about the old limitation. Try this instead.
SELECT *
FROM logs
JOIN (
SELECT id
FROM logs
JOIN logs_pivot
ON logs.id = logs_pivot.log_id
WHERE logs_pivot.model_id = 'some_id'
AND model_type = 'My\Class'
ORDER BY logs.date DESC
LIMIT 50
) id_set ON logs.id = id_set.id
ORDER BY date DESC
Finally, if you know you only care about rows newer than some certain time you can add something like this to your subquery.
AND logs.date >= NOW() - INTERVAL 5 DAY
This will help a lot if you have tonnage of historical data in your table.
I have this following SQL query, which, when I originally coded it, was exceptionally fast, it now takes over 1 second to complete:
SELECT counted/scount as ratio, [etc]
FROM
playlists
LEFT JOIN (
select AID, PLID FROM (SELECT AID, PLID FROM p_s ORDER BY `order` asc, PLSID desc)as g GROUP BY PLID
) as t USING(PLID)
INNER JOIN (
SELECT PLID, count(PLID) as scount from p_s LEFT JOIN audio USING(AID) WHERE removed='0' and verified='1' GROUP BY PLID
) as g USING(PLID)
LEFT JOIN (
select AID, count(AID) as counted FROM a_p_all WHERE ".time()." - playtime < 2678400 GROUP BY AID
) as r USING(AID)
LEFT JOIN audio USING (AID)
LEFT JOIN members USING (UID)
WHERE scount > 4 ORDER BY ratio desc
LIMIT 0, 20
I have identified the problem, the a_p_all table has over 500k rows. This is slowing down the query. I have come up with a solution:
Create a smaller temporary table, that only stores the data necessary, and deletes anything older than is needed.
However, is there a better method to use? Optimally I wouldn't need a temporary table; what do sites such as YouTube/Facebook do for large tables to keep query times fast?
edit
This is the EXPLAIN table for the query in the answer from #spencer7593
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 20
1 PRIMARY u eq_ref PRIMARY PRIMARY 8 q.AID 1 Using index
1 PRIMARY m eq_ref PRIMARY PRIMARY 8 q.UID 1 Using index
3 DERIVED <derived6> ALL NULL NULL NULL NULL 20
6 DERIVED t ALL NULL NULL NULL NULL 21
5 DEPENDENT SUBQUERY s ALL NULL NULL NULL NULL 49 Using where; Using filesort
4 DEPENDENT SUBQUERY c ALL NULL NULL NULL NULL 49 Using where
4 DEPENDENT SUBQUERY o eq_ref PRIMARY PRIMARY 8 database.c.AID 1 Using where
2 DEPENDENT SUBQUERY a ALL NULL NULL NULL NULL 510594 Using where
Two "big rock" issues stand out to me.
Firstly, this predicate
WHERE ".time()." - playtime < 2678400
(I'm assuming that this isn't the actual SQL being submitted to the database, but that what's being sent to the database is something like this...
WHERE 1409192073 - playtime < 2678400
such that we want only rows where playtime is within the past 31 days (i.e. within 31*24*60*60 seconds of the integer value returned by time().
This predicate can't make use of a range scan operation on a suitable index on playtime. MySQL evaluates the expression on the left side for every row in the table (every row that isn't excluded by some other predicate), and the result of that expression is compared to the literal on the right.
To improve performance, rewrite the predicate that so that the comparison is made on the bare column. Compare the value stored in the playtime column to an expression that needs to be evaluated one time, for example:
WHERE playtime > 1409192073 - 2678400
With a suitable index available, MySQL can perform a "range" scan operation, and efficiently eliminate a boatload of rows that don't need to be evaluated.
The second "big rock" is the inline views, or "derived tables" in MySQL parlance. MySQL is much different than other databases in how inline views are processed. MySQL actually runs that innermost query, and stores the result set as a temporary MyISAM table, and then the outer query runs against the MyISAM table. (The name that MySQL uses, "derived table", makes sense when we understand how MySQL processes the inline view.) Also, MySQL does not "push" predicates down, from an outer query down into the view queries. And on the derived table, there are no indexes created. (I believe MySQL 5.7 is changing that, and does sometimes create indexes, to improve performance.) But large "derived tables" can have a significant performance impact.
Also, the LIMIT clause gets applied last in the statement processing; that's after all the rows in the resultset are prepared and sorted. Even if you are returning only 20 rows, MySQL still prepares the entire resultset; it just doesn't transfer them to the client.
Lots of the column references are not qualified with the table name or alias, so we don't know, for example, which table (p_s or audio) contains the removed and verified columns.
(We know it can't be both, if MySQL isn't throwing a "ambiguous column" error. But MySQL has access to the table definitions, where we don't. MySQL also knows something about the cardinality of the columns, in particular, which columns (or combination of columns) are UNIQUE, and which columns can contain NULL values, etc.
Best practice is to qualify ALL column references with the table name or (preferably) a table alias. (This makes it much easier on the human reading the SQL, and it also avoids a query from breaking when a new column is added to a table.)
Also, the query as a LIMIT clause, but there's no ORDER BY clause (or implied ORDER BY), which makes the resultset indeterminate. We don't have any guaranteed which will be the "first" rows returned.
EDIT
To return only 20 rows from playlists (out of thousands or more), I might try using correlated subqueries in the SELECT list; using a LIMIT clause in an inline view to winnow down the number of rows that I'd need to run the subqueries for. Correlated subqueries can eat your lunch (and your lunchbox too) in terms of performance with large sets, due to the number of times those need to be run.
From what I can gather, you are attempting to return 20 rows from playlists, picking up the related row from member (by the foreign key in playlists), finding the "first" song in the playlist; getting a count of times that "song" has been played in the past 31 days (from any playlist); getting the number of times a song appears on that playlist (as long as it's been verified and hasn't been removed... the outerness of that LEFT JOIN is negated by the predicates on the removed and verified columns, if either of those columns is from the audio table...).
I'd take a shot with something like this, to compare performance:
SELECT q.*
, ( SELECT COUNT(1)
FROM a_p_all a
WHERE a.playtime < 1409192073 - 2678400
AND a.AID = q.AID
) AS counted
FROM ( SELECT p.PLID
, p.UID
, p.[etc]
, ( SELECT COUNT(1)
FROM p_s c
JOIN audio o
ON o.AID = c.AID
AND o.removed='0'
AND o.verified='1'
WHERE c.PLID = p.PLID
) AS scount
, ( SELECT s.AID
FROM p_s s
WHERE s.PLID = p.PLID
ORDER BY s.order ASC, s.PLSID DESC
LIMIT 1
) AS AID
FROM ( SELECT t.PLID
, t.[etc]
FROM playlists t
ORDER BY NULL
LIMIT 20
) p
) q
LEFT JOIN audio u ON u.AID = q.AID
LEFT JOIN members m ON m.UID = q.UID
LIMIT 0, 20
UPDATE
Dude, the EXPLAIN output is showing that you don't have suitable indexes available. To get any decent chance at performance with the correlated subqueries, you're going to want to add some indexes, e.g.
... ON a_p_all (AID, playtime)
... ON p_s (PLID, order, PLSID, AID)
I'm have a question. The following query is taking upwards of 2 - 3 seconds to exicute and I'm not sure why. I have 2 tables involved one with a list of items and the another with a list of attribute's for each item. The items table is indexed with unique primary key and the attributes table has a foreign key constraint.
The relationship between the items table is ONE TO MANY to the attributes.
I am not sure how else to speed up query and would appreciate any advice.
The database is MYSQL inodb
EXPLAIN SELECT * FROM eshop_items AS ite WHERE (SELECT attValue FROM eshop_items_attributes WHERE attItemId=ite.ItemId ANd attType=5 AND attValue='20')='20' ORDER BY itemAdded DESC LIMIT 0, 18;
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 PRIMARY ite ALL NULL NULL NULL NULL 57179 Using where; Using filesort
2 DEPENDENT SUBQUERY eshop_items_attributes ref attItemId attItemId 9 gabriel_new.ite.itemId 5 Using where
Index: eshop_items_attributes
Name Fieldnames Index Type Index method
attItemId attItemId Normal BTREE
attType attType Normal BTREE
attValue attValue Normal BTREE
Index: eshop_items
Name Fieldnames Index Type Index method
itemCode itemCode Unique BTREE
itemCodeOrig itemCodeOrig Unique BTREE
itemConfig itemConfig Normal BTREE
itemStatus itemStatus Normal BTREE
Can't use a join because the item_attributes table is a key -> value pair table. So for every record in the items_attributes table there can be many item id's
here is a sample
item_id attribute_index attribute_value
12345 10 true
12345 2 somevalue
12345 6 some other value
32456 10 true
32456 11 another value
32456 2 somevalue
So a join wouldn't work because I can't join multiple rows from the items_attributes table to one row in the items table.
I can't write a query where attribute_index is = to 2 AN attribute_index = 10. I would always get back no results.
:(
Change the query from correlated to IN and see what happens.
SELECT *
FROM eshop_items AS ite
WHERE ItemId IN (
SELECT attItemId
FROM eshop_items_attributes
WHERE attType=5
AND attValue='20')
ORDER BY itemAdded DESC
LIMIT 0, 18
You'll see further gains by changing your btree to bitmap on eshop_items_attributes. But be warned: bitmap has consequences on INSERT/UPDATE.
The "DEPENDENT SUBQUERY" is what's killing performance in this query. It has to run the subquery once for every distinct ItemId in the outer query. It should be much better as a join:
SELECT ite.* FROM eshop_items AS ite
INNER JOIN eshop_items_attributes AS a ON ite.ItemId = a.attItemId
WHERE a.attType = 5 AND a.attValue = 20
ORDER BY ite.itemAdded DESC LIMIT 0, 18;
I find it much easier to think about such a query as a join:
SELECT ite.*
FROM eshop_items ite join
eshop_items_attributes ia
on ia.attItemId = ite.ItemId and
ia.attType = 5 and
ia.attValue='20'
ORDER BY ite.itemAdded DESC
LIMIT 0, 18;
This works if there is at most one matching attribute for each item. Otherwise, you need select distinct (which could hurt performance, except you are already doing a sort).
To facilitate this join, create the index eshop_items_attributes(attType, attValue, attItemId). The index should satisfy the join without having to read the table, the rest is dealing with the result set.
The same index would probably help with the correlated subquery.
I'm having my weird trouble with a friends feed query - here is the background:
I have 3 tables
checkin - around 13m records
users - around 250k records
friends - around 1.5m records
In the checkin table - it lists activity that are performed by users. (here are numerous indexes, however there is an index on user_id, created_at, and (user_id,created_at).
The users table is just the basic user information There is an index on user_id.
The friends table has a user_id, target_id and is_approved. There is an index on the (user_id, is_approved) fields.
In my query, I am trying to pull down just a basic friends feed of any users - so I have been doing this:
SELECT checkin_id, created_at
FROM checkin
WHERE (user_id IN (SELECT friend_id from friends where user_id = 1 and is_approved = 1) OR user_id = 1)
ORDER by created_at DESC
LIMIT 0, 15
The goal of the query is just to pull the checkin_id and created_at for all the users' friend plus their activity. It's a pretty simple query, but when a user's friends have tons of recent activity, this query is very quick, here is the EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY checkin index user_id,user_id_2 created_at 8 NULL 15 Using where
2 DEPENDENT SUBQUERY friends eq_ref user_id,friend_id,is_approved,friend_looku... PRIMARY 8 const,func 1 Using where
As an explanation, user_id is a simple index on user_id - while user_id_2 is an index on user_id and created_at. On the friends table, friends_lookup is the index of user_id and is_approved.
This is a very simple query and get's completed in: Showing rows 0 - 14 (15 total, Query took 0.0073 sec).
However when a user's friends activity is not very recent and there isn't a lot of the data, the same query takes around 5-7 seconds and it has the same EXPLAIN as the previous query - but takes longer.
It doesn't seem to have an affect on more friends, it seems to speed up with more recent activity.
Is there any tips that anyone have to optimize these queries to makes sure they run the same speed irregardless of activity?
Server Setup
This is a dedicated MySQL server running 16GB of RAM. It is running Ubuntu 10.10 and the version of MySQL is 5.1.49
UPDATE
So most people have suggested remove the IN piece and move them into a INNER JOIN:
SELECT c.checkin_id, c.created_at
FROM checkin c
INNER JOIN friends f ON c.user_id = f.friend_id
WHERE f.user_id =1
AND f.is_approved =1
ORDER BY c.created_at DESC
LIMIT 0 , 15
This query is 10x worse - as reported in the EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE f ref PRIMARY,user_id,friend_id,is_approved,friend_looku... friend_lookup 5 const,const 938 Using temporary; Using filesort
1 SIMPLE c ref user_id,user_id_2 user_id 4 untappd_prod.f.friend_id 71 Using where
The goal of this query to get all the friends activity, and yours in the same query (instead of having to create two queries and merge the results together and sort by created_at). I also can't remove the index on user_id as it's important piece of another query.
The interesting part is when I run this query on a user account that doesn't have a lot activity, I get this explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE f index_merge PRIMARY,user_id,friend_id,is_approved,friend_looku... user_id,friend_lookup 4,5 NULL 11 Using intersect(user_id,friend_lookup); Using wher...
1 SIMPLE c ref user_id,user_id_2 user_id 4 untappd_prod.f.friend_id 71 Using where
Any advice?
so.. you have a few things going on here..
in the explain plan .. usually the optimizer will choose whats in "key" and not whats in possible_keys. So thats why you experience when it needs to scan more records when the data is not recent.
on checkin table only ( user_id, created_at ) and created_at is necessary.. you dont need another index for user_id.. the optimizer will use (user_id, created_at ) since user_id is the first order.
try this..
use join between friends and checkin and remove the in clause, such that friends becomes the driving table and you should see that first on the execution path of your explain plan.
with 1 done, you should make sure that checkin is using (user_id, created_dt ) index in the execution path.
write another query for the OR condition where user_id from checkin table is 1. I think your data set should be mutually exclusive for these two sets, it should then be ok .. or else you would not need to have the OR condition after the IN clause in the first place.
remove the user_id index thats by it self as you have user_id, created_at index.
-- your goal is that it uses the index under key not just possible keys.
this should take care of older non recent checkins as well as recent ones.
My first suggestion is to remove the dependent subquery and turn it into a join. I've found that MySQL is not good at processing these types of queries. Try this:
SELECT c.checkin_id, c.created_at
FROM checkin c
INNER JOIN friends f
ON c.user_id = f.friend_id
WHERE f.user_id = 1
AND f.is_approved = 1
ORDER by c.created_at DESC
LIMIT 0, 15
My second suggestion, since you have a dedicated server, is to use the InnoDB storage engine for all your tables. Make sure that you tweak default InnoDB settings, especially for innodb_buffer_pool_size: http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/