I have a database that holds the details of various different events and all of the odds that bookmakers are offering on those events. I have the following query which I am using to get the best odds for each different type of bet for each event:
SELECT
eo1.id,
eo1.event_id,
eo1.market_id,
IF(markets.display_name IS NULL, markets.name, markets.display_name) AS market_name,
IF(market_values.display_name IS NULL, market_values.name, market_values.display_name) AS market_value_name,
eo2.bookmaker_id,
eo2.value
FROM event_odds AS eo1
JOIN markets ON eo1.market_id = markets.id AND markets.enabled = 1
JOIN market_values on eo1.market_value_id = market_values.id
JOIN bookmakers on eo1.bookmaker_id = bookmakers.id AND bookmakers.enabled = 1
JOIN event_odds AS eo2
ON
eo1.event_id = eo2.event_id
AND eo1.market_id = eo2.market_id
AND eo1.market_value_id = eo2.market_value_id
AND eo2.value = (
SELECT MAX(value)
FROM event_odds
WHERE event_odds.event_id = eo1.event_id
AND event_odds.market_id = eo1.market_id
AND event_odds.market_value_id = eo1.market_value_id
)
WHERE eo1.`event_id` = 6708
AND markets.name != '-'
GROUP BY eo1.market_id, eo1.market_value_id
ORDER BY markets.sort_order, market_name, market_values.id
This returns exactly what I want however since the database has grown in size it's started to be very slow. I currently have just over 500,000 records in the event odds table and the query takes almost 2 minutes to run. The hardware is decent spec, all of the columns are indexed correctly and the table engine being used is MyISAM for all tables. How can I optimise this query so it runs quicker?
For this query, you want to be sure you have an index on event_odds(event_id, market_id, market_value_id, value).
In addition, you want indexes on:
markets(id, enabled, name)
bookmakers(id, enabled)
Note that composite indexes are quite different from multiple indexes with one column each.
Create a MySQL view for this SQL. Try to fetch data from that MySQL view then. This would help in increasing the speed and can reduce complexity. Try pagination for listing using limit. This will also reduce the load on server. Try to indexes for typical columns
Related
I have a database with several tables, the ones involved in this query that I want to optimize are only 4.
albums, songs, genres, genre_song
A song can have many genres, and a genre many songs. An album can have many songs. An album is related to genres through songs.
The objective is to be able to recommend albums related to the genre of the album.
So that led me to have this query.
SELECT *
FROM `albums`
WHERE EXISTS
(SELECT *
FROM `songs`
WHERE `albums`.`id` = `songs`.`album_id`
AND EXISTS
(SELECT *
FROM `genres`
INNER JOIN `genre_song` ON `genres`.`id` = `genre_song`.`genre_id`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genres`.`id` IN (6)))
AND `id` <> 37635
AND `published` = 1
ORDER BY `release_date` DESC
LIMIT 6
This query takes me between 1.4s and 1.6s. I would like to reduce it as much as possible. The ideal goal would be less than 10ms 😁
I am already using index in several tables, I have managed to reduce times in other queries from up to 4 seconds to only 15-20ms. I am willing to use anything to reduce the performance to a minimum.
I am using Laravel, so this would be the query with Eloquent.
$relatedAlbums = Album::whereHas('songs.genres', function ($query) use ($album) {
$query->whereIn('genres.id', $album->genres->pluck('id'));
})->where('id', '<>', $album->id)
->orderByDesc('release_date')
->take(6)
->get();
Note: Previously, the genres were loaded.
If you want to recreate the tables and some fake data in your database, here is the structure
It is hard to do guesses without seing the real data... but anyways:
I think the problem is that even if you LIMIT the required rows to 6, you have to read ALL the albums table, because:
You are filtering them by a non-indexed column
You are sorting them by an non-indexed column
You don't know which albums will make the cut (will have a song for required genre). So you calculate all of them, then order by release_date, and keep top 6
If you accessed the albums in a sorted published status and published date, once you get first 6 albums that make the cut, mysql can stop processing the query. Of course, you may have 'bad luck' and perhaps the albums that have genre-6 songs are the oldest-published ones, and thus you will have to read and process many albums anyways. Anyways, this optimization should not hurt, so it is worth trying, and one should expect the data to be somewaht eventy distributed.
Also, as stated on other answers, you don't actually need to access the geres table (abeit this is not probably the worst problem of the query). You may just access genre_song and you may create a new index for the two columns you need.
create index genre_song_id_id on genre_song(genre_id, song_id);
Note that previous index only makes sense if you change the query (As suggested on the end of the answer)
For the albums table, you may create any of those two indexes:
create index release_date_desc_v1 on albums (published, release_date desc);
create index release_date_desc_v2 on albums (release_date desc, published);
Choose the whatever index is better for your data:
If the percentage of published albums is "low" you probably want to use _v1
Else, _v2 index will be better
Please, test them both, but don't let both indexes coexist at the same time. If testing _v1, make sure you dropped _v2 and vice versa.
Also, change your query to not use genre table:
SELECT *
FROM `albums`
WHERE EXISTS
(SELECT *
FROM `songs`
WHERE `albums`.`id` = `songs`.`album_id`
AND EXISTS
(SELECT *
FROM `genre_song`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genre_song`.`genre_id` IN (6)))
AND `id` <> 37635
AND `published` = 1
ORDER BY `release_date` DESC
LIMIT 6;
One thing I noticed is that you don't have to join the genres table, In the following subquery
AND EXISTS
(SELECT *
FROM `genres`
INNER JOIN `genre_song` ON `genres`.`id` = `genre_song`.`genre_id`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genres`.`id` IN (6))
We can simplify this and following could be the whole query.
SELECT *
FROM `albums`
WHERE EXISTS
(SELECT *
FROM `songs`
WHERE `albums`.`id` = `songs`.`album_id`
AND EXISTS
(SELECT *
FROM `genre_song`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genre_song`.`genre_id` IN (6)))
AND `id` <> 37635
AND `published` = 1
ORDER BY `release_date` DESC
LIMIT 6
Sure you have to optimize your query for quick response time but here is another tip which can rocket your response time.
I had face the similar problem of slow response time and i have managed to reduce it substantially by simply using cache.
You can use redis driver for cache in Laravel, it will save you from querying the database again and again so your response time will automatically be improved,since redis stores the query and its results in key value pair so next time you are making the api call will return the results from cache without querying the database. Using the redis driver for cache will give you one brilliant advantage which i love.
You can use cache tags
Cache tags allow you to tag related items in the cache and then flush all cached values that have been assigned a given tag.So for example you have an api which retrieves posts of user having $id=1 then you can dynamically put data into cache tags so that next time querying the same record will speed up the response time and if you want to update the data in database you can simply update it to cache tags as well.You can do some thing like the following
public $cacheTag = 'user';
// checking if the record exists in cache already then retrieve it from cache
//other wise retrieve it from database and store it in cache as well for next time
//to boost response time.
$item = Cache::tags([$cacheTag])->get($cacheTag.$id);
if($item == NULL) {
if(!$row) {
$row = $this->model->find($id);
}
if($row != NULL || $row != false) {
$item = (object) $row->toArray();
Cache::tags([$cacheTag])->forever($this->cacheTag.$id, $item);
}
}
While updating data in database you can delete the data from cache and update it
if($refresh)
{
Cache::tags([$cacheTag])->forget($cacheTag.$id);
}
You can read more about cache from Laravel's documentation
FWIW, I find the following easier to understand, so I would want to see the EXPLAIN for this:
SELECT DISTINCT a.*
FROM albums a
JOIN songs s
ON s.album_id = a.id
JOIN genre_song gs
ON gs.song_id = s.id
JOIN genres g
ON g.id = gs.genre_id
WHERE g.id IN (6)
AND a.id <> 37635
AND a.published = 1
ORDER
BY a.release_date DESC
LIMIT 6
In this instance, (and assuming the tables are InnoDB), an index on (published,relase_date) might help.
I am trying to update multiple tables that use the same column called "Team". I created a update statement but very slow and takes way to long. Can I get some tips to optimize and run faster?
update QB, RB, WR, passing, rushing, receiving
set qb.team='GB',
rb.team='GB',
wr.team='GB',
passing.team='GB',
rushing.team='GB',
receiving.team='GB'
where qb.team=('GNB') or
(rb.team='GNB') or
(wr.team='GNB') or
(passing.team='GNB') or
(rushing.team='GNB') or
(receiving.team='GNB');
You're doing a huge cross join on all six of your tables. This means that the criteria in your WHERE clause are scanning through a very large number of joined rows. Specifically you're scanning the product of the number of rows in all six tables.
Instead, you should write your query like this.
update QB
join RB ON QB.something = RB.something
join WR ON QB.something = WR.something ... etc
SET QB.team = 'GB', RB.team='GB' ... etc
WHERE something
I have a query on a fact table "foo_success" in a star schema, which has about 6 million rows. This table holds (integer) references to dimension tables and nothing else. We use MyISAM as storage engine.
The query:
SELECT
hierarchy.level0name,
hierarchy.level1name,
hierarchy.level0,
hierarchy.level1,
date.date,
address.city,
user.emailAddress,
foo_object.name,
foo_object.type,
user_group.groupId,
COUNT(user.id) AS count_user_id,
SUM(foo_object_statistic.passes) AS sum_foo_object_statistic_passes,
SUM(foo_object_statistic.starts) AS sum_foo_object_statistic_starts,
SUM(foo_object_statistic.calls) AS sum_foo_object_statistic_calls
FROM
foo_success,
user,
user_group,
address,
hierarchy,
foo_object,
foo_object_statistic,
date
WHERE (foo_success.userDimensionId = user.id)
AND (foo_success.userGroupDimensionId = user_group.id)
AND (foo_success.addressDimensionId = address.id)
AND (foo_success.hierarchyDimensionId = hierarchy.id)
AND (foo_success.fooObjectDimensionId = foo_object.id)
AND (foo_success.fooObjectStatisticDimensionId = foo_object_statistic.id)
AND (foo_success.dateDimensionId=date.id)
AND hierarchy.level0 = 'XYZ'
AND hierarchy.level1 IS NOT NULL
AND hierarchy.level2 IS NOT NULL
AND hierarchy.level3 IS NOT NULL
AND hierarchy.level4 IS NOT NULL
AND hierarchy.level5 IS NOT NULL
AND hierarchy.level6 IS NULL
AND hierarchy.level7 IS NULL
GROUP BY hierarchy.level0, foo_object.fooObjectId
LIMIT 0, 25;
What I've tried so far:
This is the simple join version, which equals the INNER JOIN alternative in speed.
There are indices on all fields which are joined or which are part of a condition.
I did use EXPLAIN on this query and found that the query cost (# of processed rows) is 128596 for the table user and 77 for the table foo_success.
I tried to remove the dependency on the user table, which leads to a # of processed rows of over 6 million in the fact table foo_success.
It takes about 1,5 minutes to finish this query, which is far off my expectations for a data warehouse star schema optimized on read speed. Is there any way I can optimize this monster?
The inefficiency of the query mostly comes from transfering a lot of data you do not actually use: the fields hierarchy.level1name, hierarchy.level0name, hierarchy.level1, date.date, address.city, user.emailAddress, foo_object.name, foo_object.type, user_group.groupId are not included in GROUP BY clause, which means that the information is retrieved for each row, loaded in memory and then just discarded.
What I would recommend is to concentrate retrieving of all sufficient ids and aggregation results in a subquery and then join to the rest of the tables, so that each join would produce not more than a single row (you can even move the LIMIT clause in the subquery to minimize the required subsequent JOIN operations). After that, you may discover, that you do not have some useful indexes.
I have this MySQL query that is very slow, I presume because of all the JOINs (it seems complicated, but it's a matter of lot of tables):
SELECT DISTINCT doctors.doc_id,
doctors.doc_user,
doctors.doc_first,
doctors.doc_last,
doctors.doc_email,
doctors.doc_notes,
titles.tit_name,
specializations.spe_name,
activities.act_name,
users.use_first,
users.use_last,
(SELECT COUNT(*) FROM locations WHERE locations.loc_doctor = doctors.doc_id) AS loc_count,
(SELECT COUNT(*) FROM reception WHERE reception.rec_doctor = doctors.doc_id) AS rec_count,
(SELECT COUNT(*) FROM visits INNER JOIN reports ON visits.vis_report = reports.rep_id WHERE visits.vis_doctor = doctors.doc_id AND reports.rep_user LIKE '%s') AS vis_count
FROM
doctors
INNER JOIN titles ON titles.tit_id = doctors.doc_title
INNER JOIN specializations ON specializations.spe_id = doctors.doc_specialization
INNER JOIN activities ON activities.act_id = doctors.doc_activity
LEFT JOIN locations ON locations.loc_doctor = doctors.doc_id
INNER JOIN users ON doctors.doc_user = users.use_id
WHERE
((doctors.doc_last LIKE %s) OR (doctors.doc_first LIKE %s) OR (doctors.doc_email LIKE %s))
AND doctors.doc_user LIKE %s
AND locations.loc_province LIKE %s
AND doctors.doc_specialization LIKE %s
AND doctors.doc_activity LIKE %s
ORDER BY %s
All the %s are parameters in a sprintf() PHP function
The most important thing to notice is... that I have NO indexes on MySQL! I presume that I can speed up the process adding some indexes... but what and where? There are so many joins and search parameters that I am in confusion about what would be efficient :-)
Please can you help?
Thanks in advance!
You can start with adding indexes on those columns you are using in the where condition.
Further, you should index those fields which are used in join, i.e. primary keys and foreign keys column.
I would suggest that gradually experimenting with these indexes would yield a real performance boost.
Further, I have observed that you are fetching too much of data in a single query. If it is really not required, break it up in different reports and pages (If possible) as even if you do proper indexing, the solution will not be quite scalable and may not handle large amount of data.
Note: You might have to create full text index on fields which you query by '%' qualifier.(i.e. use LIKE operator)
Like operators are pretty slow. Here is a discussion of applying indexes and FULL TEXT
mysql like performance boost
so I have a 560mb db with the largest table 500mb(over 10 million rows)
my query hase to join 5 tables and takes about 10 seconds to finish....
SELECT DISTINCT trips.tripid AS tripid,
stops.stopdescrption AS "perron",
Date_format(segments.segmentstart, "%H:%i") AS "time",
Date_format(trips.tripend, "%H:%i") AS "arrival",
Upper(routes.routepublicidentifier) AS "lijn",
plcend.placedescrption AS "destination"
FROM calendar
JOIN trips
ON calendar.vsid = trips.vsid
JOIN routes
ON routes.routeid = trips.routeid
JOIN places plcstart
ON plcstart.placeid = trips.placeidstart
JOIN places plcend
ON plcend.placeid = trips.placeidend
JOIN segments
ON segments.tripid = trips.tripid
JOIN stops
ON segments.stopid = stops.stopid
WHERE stops.stopid IN ( 43914, 23899, 23925, 23908,
23913, 19899, 23871, 43902,
23876, 25563, 18956, 19912,
23889, 23861, 23879, 23884,
23856, 19920, 19898, 23916,
23894, 20985, 23930, 20932,
20986, 22434, 20021, 19893,
19903, 19707, 19935 )
AND calendar.vscdate = Str_to_date('25-10-2011', "%e-%c-%Y")
AND segments.segmentstart >= Str_to_date('15:56', "%H:%i")
AND routes.routeservicetype = 0
AND segments.segmentstart > "00:00:00"
ORDER BY segments.segmentstart
what are things I can do to speed this up? any tips are welcome, i'm pretty new to sql...
but I can't change the structure of the db because it's not mine...
Use EXPLAIN to find the bottlenecks: http://dev.mysql.com/doc/refman/5.0/en/explain.html
Then perhaps, add indexes.
If you don't need to select ALL rows, use LIMIT to limit returned result count.
Just looking at the query, I would say that you should make sure that you have indexes on trips.vsid, calendar.vscdate, segments.segmentstart and routes.routeservicetype. I assume that there is already indexes on all the primary keys in the tables.
Using explain as Briedis suggested would show you how well the indexes work.
You might want to add covering indexes for some tables, like for example an index on trips.vsid where tripid and routeid are included. That way the database can use only the index for the data that is needed from the table, and not read from the actual table.
Edit:
The execution plan tells you that it successfully uses indexes for everything except the segments table, where it does a table scan and filters by the where condition. You should try to make a covering index for segments.segmentstart by including tripid and stopid.
Try adding a clusters index to the routes table on both routeservicetype and routeid.
Depending on the frequency of the data within the routeservicetype field, you may get an improvement by shrinking the amount of data being compared in the join to the trips table.
Looking at the explain plan, you may also want to force the sequence of the table usage by using STRAIGHT_JOIN instead of JOIN (or INNER JOIN), as I've had real improvements with this technique.
Essentially, put the table with the smallest row-count of extracted data at the beginning of the query, and the largest row count table at the end (in this case possibly the segments table?), with the exception of simple lookups (eg. for descriptions).
You may also consider altering the WHERE clause to filter the segments table on stopid instead of the stops table, and creating a clustered index on the segments table on (stopid, tripid and segmentstart) - this index will be effectively able to satisfy two joins and two where clauses from a single index...
To build the index...
ALTER TABLE segments ADD INDEX idx_qry_helper ( stopid, tripid, segmentstart );
And the altered WHERE clause...
WHERE segments.stopid IN ( 43914, 23899, 23925, 23908,
23913, 19899, 23871, 43902,
23876, 25563, 18956, 19912,
23889, 23861, 23879, 23884,
23856, 19920, 19898, 23916,
23894, 20985, 23930, 20932,
20986, 22434, 20021, 19893,
19903, 19707, 19935 )
:
:
At the end of the day, a 10 second response for what appears to be a complex query on a fairly large dataset, isn't all that bad!