I have a database with several tables, the ones involved in this query that I want to optimize are only 4.
albums, songs, genres, genre_song
A song can have many genres, and a genre many songs. An album can have many songs. An album is related to genres through songs.
The objective is to be able to recommend albums related to the genre of the album.
So that led me to have this query.
SELECT *
FROM `albums`
WHERE EXISTS
(SELECT *
FROM `songs`
WHERE `albums`.`id` = `songs`.`album_id`
AND EXISTS
(SELECT *
FROM `genres`
INNER JOIN `genre_song` ON `genres`.`id` = `genre_song`.`genre_id`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genres`.`id` IN (6)))
AND `id` <> 37635
AND `published` = 1
ORDER BY `release_date` DESC
LIMIT 6
This query takes me between 1.4s and 1.6s. I would like to reduce it as much as possible. The ideal goal would be less than 10ms 😁
I am already using index in several tables, I have managed to reduce times in other queries from up to 4 seconds to only 15-20ms. I am willing to use anything to reduce the performance to a minimum.
I am using Laravel, so this would be the query with Eloquent.
$relatedAlbums = Album::whereHas('songs.genres', function ($query) use ($album) {
$query->whereIn('genres.id', $album->genres->pluck('id'));
})->where('id', '<>', $album->id)
->orderByDesc('release_date')
->take(6)
->get();
Note: Previously, the genres were loaded.
If you want to recreate the tables and some fake data in your database, here is the structure
It is hard to do guesses without seing the real data... but anyways:
I think the problem is that even if you LIMIT the required rows to 6, you have to read ALL the albums table, because:
You are filtering them by a non-indexed column
You are sorting them by an non-indexed column
You don't know which albums will make the cut (will have a song for required genre). So you calculate all of them, then order by release_date, and keep top 6
If you accessed the albums in a sorted published status and published date, once you get first 6 albums that make the cut, mysql can stop processing the query. Of course, you may have 'bad luck' and perhaps the albums that have genre-6 songs are the oldest-published ones, and thus you will have to read and process many albums anyways. Anyways, this optimization should not hurt, so it is worth trying, and one should expect the data to be somewaht eventy distributed.
Also, as stated on other answers, you don't actually need to access the geres table (abeit this is not probably the worst problem of the query). You may just access genre_song and you may create a new index for the two columns you need.
create index genre_song_id_id on genre_song(genre_id, song_id);
Note that previous index only makes sense if you change the query (As suggested on the end of the answer)
For the albums table, you may create any of those two indexes:
create index release_date_desc_v1 on albums (published, release_date desc);
create index release_date_desc_v2 on albums (release_date desc, published);
Choose the whatever index is better for your data:
If the percentage of published albums is "low" you probably want to use _v1
Else, _v2 index will be better
Please, test them both, but don't let both indexes coexist at the same time. If testing _v1, make sure you dropped _v2 and vice versa.
Also, change your query to not use genre table:
SELECT *
FROM `albums`
WHERE EXISTS
(SELECT *
FROM `songs`
WHERE `albums`.`id` = `songs`.`album_id`
AND EXISTS
(SELECT *
FROM `genre_song`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genre_song`.`genre_id` IN (6)))
AND `id` <> 37635
AND `published` = 1
ORDER BY `release_date` DESC
LIMIT 6;
One thing I noticed is that you don't have to join the genres table, In the following subquery
AND EXISTS
(SELECT *
FROM `genres`
INNER JOIN `genre_song` ON `genres`.`id` = `genre_song`.`genre_id`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genres`.`id` IN (6))
We can simplify this and following could be the whole query.
SELECT *
FROM `albums`
WHERE EXISTS
(SELECT *
FROM `songs`
WHERE `albums`.`id` = `songs`.`album_id`
AND EXISTS
(SELECT *
FROM `genre_song`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genre_song`.`genre_id` IN (6)))
AND `id` <> 37635
AND `published` = 1
ORDER BY `release_date` DESC
LIMIT 6
Sure you have to optimize your query for quick response time but here is another tip which can rocket your response time.
I had face the similar problem of slow response time and i have managed to reduce it substantially by simply using cache.
You can use redis driver for cache in Laravel, it will save you from querying the database again and again so your response time will automatically be improved,since redis stores the query and its results in key value pair so next time you are making the api call will return the results from cache without querying the database. Using the redis driver for cache will give you one brilliant advantage which i love.
You can use cache tags
Cache tags allow you to tag related items in the cache and then flush all cached values that have been assigned a given tag.So for example you have an api which retrieves posts of user having $id=1 then you can dynamically put data into cache tags so that next time querying the same record will speed up the response time and if you want to update the data in database you can simply update it to cache tags as well.You can do some thing like the following
public $cacheTag = 'user';
// checking if the record exists in cache already then retrieve it from cache
//other wise retrieve it from database and store it in cache as well for next time
//to boost response time.
$item = Cache::tags([$cacheTag])->get($cacheTag.$id);
if($item == NULL) {
if(!$row) {
$row = $this->model->find($id);
}
if($row != NULL || $row != false) {
$item = (object) $row->toArray();
Cache::tags([$cacheTag])->forever($this->cacheTag.$id, $item);
}
}
While updating data in database you can delete the data from cache and update it
if($refresh)
{
Cache::tags([$cacheTag])->forget($cacheTag.$id);
}
You can read more about cache from Laravel's documentation
FWIW, I find the following easier to understand, so I would want to see the EXPLAIN for this:
SELECT DISTINCT a.*
FROM albums a
JOIN songs s
ON s.album_id = a.id
JOIN genre_song gs
ON gs.song_id = s.id
JOIN genres g
ON g.id = gs.genre_id
WHERE g.id IN (6)
AND a.id <> 37635
AND a.published = 1
ORDER
BY a.release_date DESC
LIMIT 6
In this instance, (and assuming the tables are InnoDB), an index on (published,relase_date) might help.
Related
I have a database that holds the details of various different events and all of the odds that bookmakers are offering on those events. I have the following query which I am using to get the best odds for each different type of bet for each event:
SELECT
eo1.id,
eo1.event_id,
eo1.market_id,
IF(markets.display_name IS NULL, markets.name, markets.display_name) AS market_name,
IF(market_values.display_name IS NULL, market_values.name, market_values.display_name) AS market_value_name,
eo2.bookmaker_id,
eo2.value
FROM event_odds AS eo1
JOIN markets ON eo1.market_id = markets.id AND markets.enabled = 1
JOIN market_values on eo1.market_value_id = market_values.id
JOIN bookmakers on eo1.bookmaker_id = bookmakers.id AND bookmakers.enabled = 1
JOIN event_odds AS eo2
ON
eo1.event_id = eo2.event_id
AND eo1.market_id = eo2.market_id
AND eo1.market_value_id = eo2.market_value_id
AND eo2.value = (
SELECT MAX(value)
FROM event_odds
WHERE event_odds.event_id = eo1.event_id
AND event_odds.market_id = eo1.market_id
AND event_odds.market_value_id = eo1.market_value_id
)
WHERE eo1.`event_id` = 6708
AND markets.name != '-'
GROUP BY eo1.market_id, eo1.market_value_id
ORDER BY markets.sort_order, market_name, market_values.id
This returns exactly what I want however since the database has grown in size it's started to be very slow. I currently have just over 500,000 records in the event odds table and the query takes almost 2 minutes to run. The hardware is decent spec, all of the columns are indexed correctly and the table engine being used is MyISAM for all tables. How can I optimise this query so it runs quicker?
For this query, you want to be sure you have an index on event_odds(event_id, market_id, market_value_id, value).
In addition, you want indexes on:
markets(id, enabled, name)
bookmakers(id, enabled)
Note that composite indexes are quite different from multiple indexes with one column each.
Create a MySQL view for this SQL. Try to fetch data from that MySQL view then. This would help in increasing the speed and can reduce complexity. Try pagination for listing using limit. This will also reduce the load on server. Try to indexes for typical columns
I have a table of 'posts' that contains post_type and primary key of a corresponding table, whereby for each post_type there is a corresponding table.
I would like to retrieve the data from the corresponding tables as if to populate the entries in a social feed style wall. e.g. The final data aught to be able to be packaged up as JSON entities.
Here are some screenshots to illustrate my database tables:
`
Posts
`
Games
`
Achievements
`
Videos
I'm new to MySQL queries and so I have a couple of considerations, I am wondering if this is possible using the CASE statement of MySQL. I have added a snippet of PSEUDO CODE that hopefully illustrates a little what I have in mind.
SELECT * FROM posts
CASE
WHEN posts.post_type = 'game' THEN
INNER JOIN games ON (games.game_id = posts.origin_id)
WHEN posts.post_type = 'achievement' THEN
INNER JOIN achievements ON (achievements.achievement_id = posts.origin_id)
WHEN posts.post_type = 'event' THEN
INNER JOIN events ON (event.event_id = posts.origin_id)
END;
Alternatively if this is NOT possible, efficient or practical, then I would really appreciate an more efficient alternative approach. I created an SQLFiddle with some sample database tables etc (NOT 100% accurate, just for testing).
An option I have been told about is using LEFT JOINS and I am experimenting with them here:
http://sqlfiddle.com/#!9/d7fdde/2
However I have not been able to effectively get the entity data using PHP without data loss / corruption. It's clear that there is missing entity data or even additional data such as missing created_at and updated_at entries and addition 1000 likes entry in the 2nd entity.
https://gist.github.com/PluginIO/ec9c411f75859570a087c53ca4671f3e
I tried to remove the NULL values from the LEFT JOINS with the following PHP routines:
https://gist.github.com/PluginIO/2444d2fb1098ebb3248f4fb84751d831
Last but not least, a commenter below offered an alternative approach that uses INNER JOINS and UNIONS however I have been unable to get that working:
http://sqlfiddle.com/#!9/d7fdde/12
Hopefully it is quite clear that the result that I am hoping for is a sequential set of post entities with its own relevant table data in a JSON formatted list.
Regards
Use sub query instead of the join. Don't use subquery if subquery table return more than one row for specific id.
Check this SQL Fiddle: Click Here
SELECT * ,CASE
WHEN posts.post_type = 'game' THEN
( SELECT games.name from games where games.game_id = posts.origin_id)
WHEN posts.post_type = 'achievement' THEN
( SELECT achievements.name FROM achievements WHERE achievements.achievement_id = posts.origin_id)
END as value_name FROM posts;
Hope this query helpful to you.
Try this updated your fiddle : http://sqlfiddle.com/#!9/d6e413/34
SELECT p.*,g.*,a.* from posts p
LEFT JOIN games g
ON
p.post_type = 'game' AND
g.game_id = p.origin_id
LEFT JOIN achievements a
ON
p.post_type='achievement' AND
a.achievement_id = p.origin_id
EDIT
Read data from mysql php :
$post = array();
if ($result = $mysqli->query($query)) {
/* fetch object array */
while ($obj = $result->fetch_object()) {
array_push($post,$obj);
}
}
vardump($post);
There will be few elements which returns null in every line
To remove NULL or empty field from array :
php :
foreach($array as &$value) {
$value = array_filter($value, function($v) { return ! empty( $value ); });
}
// $array = array_map('array_filter', $array);
refer : http://codepad.org/FdfY5aqj
That you think you can retrieve the data with a SELECT statement implies that the tables games, achievements and events contain a common (sub)set of attributes (although the tables have different prefixes on the '_id' field, the data domain is the same in each case). In the absence of any further information, these should not have been designed as separate tables but rather should have used a single table with an attribute for 'type' and seperate tables detailling any non-common attributes. Indeed, if all that is contained in the 'posts' table is an id and an identifier for the table containing the data, it becomes redundant at this point - you should have 1 table, not 4.
Amongst other issues, you would not have the problem you describe above.
If you have already gone well down the road of implementing a system around this design and fixing the problem is too expensive, then you can apply a workaround by:
SELECT *
FROM posts p
INNER JOIN (
SELECT a.achievent_id AS common_id, .....
FROM achievements a
UNION
SELECT g.game_id, .....
FROM games g
UNION
SELECT e.event_id, ....
FROM events e
) AS ilv
ON p.id=ilv.id
Where ... represents the common attributes.
But in addition to not supplying the details of the table structures, nor did you tell us how the query will be filtered - I doubt you want to retrieve the entire data set each and every time you run the query. How you implement the filtering will have a big impact on the performance.
If the number of rows is most reduced by the filtering on the posts table (i.e. using some attribute you've not told us about) then selecting the union of the join (rather than joining to the union) would be more efficient:
SELECT *
FROM posts p
INNER JOIN
achievements a
ON p.id=a.achievement_id
WHERE p.user=?
UNION
SELECT *
FROM posts p
INNER JOIN
events e
ON p.id=e.evet_id
WHERE p.user=?
UNION
SELECT *
FROM posts p
INNER JOIN
games g
ON p.id=g.game_id
WHERE p.user=?
I am trying to write an SQL query which is pretty complex. The requirements are as follows:
I need to return these fields from the query:
track.artist
track.title
track.seconds
track.track_id
track.relative_file
album.image_file
album.album
album.album_id
track.track_number
I can select a random track with the following query:
select
track.artist, track.title, track.seconds, track.track_id,
track.relative_file, album.image_file, album.album,
album.album_id, track.track_number
FROM
track, album
WHERE
album.album_id = track.album_id
ORDER BY RAND() limit 10;
Here is where I am having trouble though. I also have a table called "trackfilters1" thru "trackfilters10" Each row has an auto incrementing ID field. Therefore, row 10 is data for album_id 10. These fields are populated with 1's and 0's. For example, album #10 has 10 tracks, then trackfilters1.flags will contain "1111111111" if all tracks are to be included in the search. If track 10 was to be excluded, then it would contain "1111111110"
My problem is including this clause.
The latest query I have come up with is the following:
select
track.artist, track.title, track.seconds,
track.track_id, track.relative_file, album.image_file,
album.album, album.album_id, track.track_number
FROM
track, album, trackfilters1, trackfilters2
WHERE
album.album_id = track.album_id
AND
( (album.album_id = trackfilters1.id)
OR
(album.album_id=trackfilters2.id) )
AND
( (mid(trackfilters1.flags, track.track_number,1) = 1)
OR
( mid(trackfilters2.flags, track.track_number,1) = 1))
ORDER BY RAND() limit 2;
however this is causing SQL to hang. I'm presuming that I'm doing something wrong. Does anybody know what it is? I would be open to suggestions if there is an easier way to achieve my end result, I am not set on repairing my broken query if there is a better way to accomplish this.
Additionally, in my trials, I have noticed when I had a working query and added say, trackfilters2 to the FROM clause without using it anywhere in the query, it would hang as well. This makes me wonder. Is this correct behavior? I would think adding to the FROM list without making use of the data would just make the server procure more data, I wouldn't have expected it to hang.
There's not enough information here to determine what's causing the performance issue.
But here's a few suggestions and comments.
Ditch the old-school comma syntax for the join operations, and use the JOIN keyword instead. And relocate the join predicates to an ON clause.
And for heaven's sake, format the SQL so that it's decipherable by someone trying to read it.
There's some questions here... will there always be a matching row in both trackfilters1 and trackfilters2 for rows you want to return? Or could a row be missing from trackfilters2, and you still want to return the row if there's a matching row in trackfilters1? (The answer to that question determines whether you'd want to use an outer join vs an inner join to those tables.)
For best performance with large sets, having appropriate indexes defined is going to be critical.
Use EXPLAIN to see the execution plan.
I suggest you try writing your query like this:
SELECT track.artist
, track.title
, track.seconds
, track.track_id
, track.relative_file
, album.image_file
, album.album
, album.album_id
, track.track_number
FROM track
JOIN album
ON album.album_id = track.album_id
LEFT
JOIN trackfilters1
ON trackfilters1.id = album.album_id
LEFT
JOIN trackfilters2
ON trackfilters2.id = album.album_id
WHERE MID(trackfilters1.flags, track.track_number, 1) = '1'
OR MID(trackfilters2.flags, track.track_number, 1) = '1'
ORDER BY RAND()
LIMIT 2
And if you want help with performance, provide the output from EXPLAIN, and what indexes are defined.
SELECT COUNT(*)
FROM song AS s
JOIN user AS u
ON(u.user_id = s.user_id)
WHERE s.is_active = 1 AND s.public = 1
The s.active and s.public are index as well as u.user_id and s.user_id.
song table row count 310k
user table row count 22k
Is there a way to optimize this? We're getting 1 second query times on this.
Ensure that you have a compound "covering" index on song: (user_id, is_active, public). Here, we've named the index covering_index:
SELECT COUNT(s.user_id)
FROM song s FORCE INDEX (covering_index)
JOIN user u
ON u.user_id = s.user_id
WHERE s.is_active = 1 AND s.public = 1
Here, we're ensuring that the JOIN is done with the covering index instead of the primary key, so that the covering index can be used for the WHERE clause as well.
I also changed COUNT(*) to COUNT(s.user_id). Though MySQL should be smart enough to pick the column from the index, I explicitly named the column just in case.
Ensure that you have enough memory configured on the server so that all of your indexes can stay in memory.
If you're still having issues, please post the results of EXPLAIN.
Perhaps write it as a stored procedure or view... You could also try selecting all the IDs first then running the count on the result... if you do it all as one query it may be faster. Generally optimisation is done by using nested selects or making the server do the work so in this context that is all I can think of.
SELECT Count(*) FROM
(SELECT song.user_id FROM
(SELECT * FROM song WHERE song.is_active = 1 AND song.public = 1) as t
JOIN user AS u
ON(t.user_id = u.user_id))
Also be sure you are using the correct kind of join.
I've got a users table and a votes table. The votes table stores votes toward other users. And for better or worse, a single row in the votes table, stores the votes in both directions between the two users.
Now, the problem is when I wanna list for example all people someone has voted on.
I'm no MySQL expert, but from what I've figured out, thanks to the OR condition in the join statement, it needs to look through the whole users table (currently +44,000 rows), and it creates a temporary table to do so.
Currently, the bellow query takes about two minutes, yes, two minutes to complete. If I remove the OR condition, and everything after it in the join statement, it runs in less than half a second, as it only needs to look through about 17 of the 44,000 user rows (explain ftw!).
The bellow example, the user ID is 9834, and I'm trying to fetch his/her own no votes, and join the info from user who was voted on to the result.
Is there a better, and faster way to do this query? Or should I restructure the tables? I seriously hope it can be fixed by modifying the query, cause there's already a lot of users (+44,000), and votes (+130,000) in the tables, which I'd have to migrate.
thanks :)
SELECT *, votes.id as vote_id
FROM `votes`
LEFT JOIN users ON (
(
votes.user_id_1 = 9834
AND
users.uid = votes.user_id_2
)
OR
(
votes.user_id_2 = 9834
AND
users.uid = votes.user_id_1
)
)
WHERE (
(
votes.user_id_1 = 9834
AND
votes.vote_1 = 0
)
OR
(
votes.user_id_2 = 9834
AND
votes.vote_2 = 0
)
)
ORDER BY votes.updated_at DESC
LIMIT 0, 10
Instead of the OR, you could do a UNION of 2 queries. I have known instances where this is an order of magnitude faster in at least one other DBMS, and I'm guessing MySQL's query optimizer may share the same "feature".
SELECT whatever
FROM votes v
INNER JOIN
users u
ON v.user_id_1 = u.uid
WHERE v.user_id_2 = 9834
AND v.votes_2 = 0
UNION
SELECT whatever
FROM votes v
INNER JOIN
users u
ON v.user_id_2 = u.uid
WHERE v.user_id_1 = 9834
AND v.votes_1 = 0
ORDER BY updated_at DESC
You've answered your own question: yes, you should redesign the table, as it's not working for you. It's too slow, and requires overly complicated queries. Fortunately, migrating the data is just a matter of doing essentially the query you're asking about here, but for all user instead of just one. (That is, a sum or count over the unions the first answering suggested.)