How to execute this PHP script updating a field faster? - mysql

<?
include 'database.php';
$user1 = mysql_query("SELECT id,username FROM `users` ORDER BY `id` ASC");
while($user = mysql_fetch_object($user1)) {
mysql_query("UPDATE `pokemon` SET `user`=$user->id WHERE `owner`='$user->username'");
}
?>
Basically, I have over 300,000 users in my game, and there are millions of "Pokemon".
I was thinking of using an integer for the foreign key rather than a string, and so I have created this small little script to do that for me.
Unfortunately, it has only seemed to update 1000 users in the past hour, so therefore it would take me 300 hours for the whole thing to be completed. Is there a way I can make this script more efficient?
user is the new unique identifier, while owner is the old unique identifier (foreign key).
Is there an alternative solution to my method that would require less than 3-4 hours of time? I'm sure there must be some nice little SQL query I can just execute via phpMyAdmin rather than using this code.
Thanks for all the help, it really is appreciated and I will surely return the favour whenever possible.
Edit: Thanks for teaching me this new technique, I'll try it out and update my thread.

I think an UPDATE query with an INNER JOIN would be faster than a loop
UPDATE `pokemon` a
INNER JOIN `users` b
ON a.`owner` = b.`username`
SET a.`user`= b.`id`

You can JOIN the tables and then do the update in a single sql query:
UPDATE pokemon
LEFT JOIN users
ON pokemon.owner = users.username
SET pokemon.user = users.id

Related

MySQL/Eloquent Query Optimization

I have a database with several tables, the ones involved in this query that I want to optimize are only 4.
albums, songs, genres, genre_song
A song can have many genres, and a genre many songs. An album can have many songs. An album is related to genres through songs.
The objective is to be able to recommend albums related to the genre of the album.
So that led me to have this query.
SELECT *
FROM `albums`
WHERE EXISTS
(SELECT *
FROM `songs`
WHERE `albums`.`id` = `songs`.`album_id`
AND EXISTS
(SELECT *
FROM `genres`
INNER JOIN `genre_song` ON `genres`.`id` = `genre_song`.`genre_id`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genres`.`id` IN (6)))
AND `id` <> 37635
AND `published` = 1
ORDER BY `release_date` DESC
LIMIT 6
This query takes me between 1.4s and 1.6s. I would like to reduce it as much as possible. The ideal goal would be less than 10ms 😁
I am already using index in several tables, I have managed to reduce times in other queries from up to 4 seconds to only 15-20ms. I am willing to use anything to reduce the performance to a minimum.
I am using Laravel, so this would be the query with Eloquent.
$relatedAlbums = Album::whereHas('songs.genres', function ($query) use ($album) {
$query->whereIn('genres.id', $album->genres->pluck('id'));
})->where('id', '<>', $album->id)
->orderByDesc('release_date')
->take(6)
->get();
Note: Previously, the genres were loaded.
If you want to recreate the tables and some fake data in your database, here is the structure
It is hard to do guesses without seing the real data... but anyways:
I think the problem is that even if you LIMIT the required rows to 6, you have to read ALL the albums table, because:
You are filtering them by a non-indexed column
You are sorting them by an non-indexed column
You don't know which albums will make the cut (will have a song for required genre). So you calculate all of them, then order by release_date, and keep top 6
If you accessed the albums in a sorted published status and published date, once you get first 6 albums that make the cut, mysql can stop processing the query. Of course, you may have 'bad luck' and perhaps the albums that have genre-6 songs are the oldest-published ones, and thus you will have to read and process many albums anyways. Anyways, this optimization should not hurt, so it is worth trying, and one should expect the data to be somewaht eventy distributed.
Also, as stated on other answers, you don't actually need to access the geres table (abeit this is not probably the worst problem of the query). You may just access genre_song and you may create a new index for the two columns you need.
create index genre_song_id_id on genre_song(genre_id, song_id);
Note that previous index only makes sense if you change the query (As suggested on the end of the answer)
For the albums table, you may create any of those two indexes:
create index release_date_desc_v1 on albums (published, release_date desc);
create index release_date_desc_v2 on albums (release_date desc, published);
Choose the whatever index is better for your data:
If the percentage of published albums is "low" you probably want to use _v1
Else, _v2 index will be better
Please, test them both, but don't let both indexes coexist at the same time. If testing _v1, make sure you dropped _v2 and vice versa.
Also, change your query to not use genre table:
SELECT *
FROM `albums`
WHERE EXISTS
(SELECT *
FROM `songs`
WHERE `albums`.`id` = `songs`.`album_id`
AND EXISTS
(SELECT *
FROM `genre_song`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genre_song`.`genre_id` IN (6)))
AND `id` <> 37635
AND `published` = 1
ORDER BY `release_date` DESC
LIMIT 6;
One thing I noticed is that you don't have to join the genres table, In the following subquery
AND EXISTS
(SELECT *
FROM `genres`
INNER JOIN `genre_song` ON `genres`.`id` = `genre_song`.`genre_id`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genres`.`id` IN (6))
We can simplify this and following could be the whole query.
SELECT *
FROM `albums`
WHERE EXISTS
(SELECT *
FROM `songs`
WHERE `albums`.`id` = `songs`.`album_id`
AND EXISTS
(SELECT *
FROM `genre_song`
WHERE `songs`.`id` = `genre_song`.`song_id`
AND `genre_song`.`genre_id` IN (6)))
AND `id` <> 37635
AND `published` = 1
ORDER BY `release_date` DESC
LIMIT 6
Sure you have to optimize your query for quick response time but here is another tip which can rocket your response time.
I had face the similar problem of slow response time and i have managed to reduce it substantially by simply using cache.
You can use redis driver for cache in Laravel, it will save you from querying the database again and again so your response time will automatically be improved,since redis stores the query and its results in key value pair so next time you are making the api call will return the results from cache without querying the database. Using the redis driver for cache will give you one brilliant advantage which i love.
You can use cache tags
Cache tags allow you to tag related items in the cache and then flush all cached values that have been assigned a given tag.So for example you have an api which retrieves posts of user having $id=1 then you can dynamically put data into cache tags so that next time querying the same record will speed up the response time and if you want to update the data in database you can simply update it to cache tags as well.You can do some thing like the following
public $cacheTag = 'user';
// checking if the record exists in cache already then retrieve it from cache
//other wise retrieve it from database and store it in cache as well for next time
//to boost response time.
$item = Cache::tags([$cacheTag])->get($cacheTag.$id);
if($item == NULL) {
if(!$row) {
$row = $this->model->find($id);
}
if($row != NULL || $row != false) {
$item = (object) $row->toArray();
Cache::tags([$cacheTag])->forever($this->cacheTag.$id, $item);
}
}
While updating data in database you can delete the data from cache and update it
if($refresh)
{
Cache::tags([$cacheTag])->forget($cacheTag.$id);
}
You can read more about cache from Laravel's documentation
FWIW, I find the following easier to understand, so I would want to see the EXPLAIN for this:
SELECT DISTINCT a.*
FROM albums a
JOIN songs s
ON s.album_id = a.id
JOIN genre_song gs
ON gs.song_id = s.id
JOIN genres g
ON g.id = gs.genre_id
WHERE g.id IN (6)
AND a.id <> 37635
AND a.published = 1
ORDER
BY a.release_date DESC
LIMIT 6
In this instance, (and assuming the tables are InnoDB), an index on (published,relase_date) might help.

SQL query with nested query on another table in it

I've got two tables, we'll call them email_bounces and master_email_list.
master_email_list is ~3.5m records.
email_bounces is ~100,000 records.
I'm trying to do a query where I update bounce=1 in master_email_list if the email address is found in email_bounces.
Here's what I have.
update 'master_email_list' set bounce=1 where email in (select email FROM 'email_bounces')
Except that doesn't seem to work, it queries, then hangs indefinitely (I left it running overnight, after about 4 hours running prior).
Help is appreciated.
Use
update master_email_list l
inner join email_bounces b on b.email = l.email
set bounce = 1
You could also try to deactivate keys during the update to speed things up:
ALTER TABLE master_email_list DISABLE KEYS;
And afterwards
ALTER TABLE master_email_list ENABLE KEYS;
By using table alias names you have to set it.
In this i am using inner join
update master_email_list mel
inner join email_bounces eb
on mel.email = eb.email
set mel.bounce = 1
If that simple query takes hours, I can only see two possible reasons;
You're missing an index with master_email_list.email as a first column.
CREATE INDEX ix_email ON master_email_list(email);
...should speed things up significantly.
You have an active transaction on the table that holds a lock. Check that you have no uncommitted transactions pending, and if you can't find them, check this answer how to look for them.

MySQL query help moving data between tables

I've imported by phpbb3 forum in bbpress using the built-in importer. All of the anonymous users from bbpress who didn't have accounts, but were allowed to post are disconnected from there posts and everything is showing up as anonymous in bbpress. I grabbed all the post_usernames from phpbb_posts and created users with this query:
INSERT INTO wp_users (user_login)
SELECT DISTINCT post_username
FROM phpbb_posts
Now I'm trying to do a query between the 3 different tables. Something along these lines:
SELECT ID FROM wp_users
INSERT INTO wp_posts(post_author)
WHERE wp_posts(post_date) = phpbb_posts(post_time)
AND phpbb_posts(post_username) = wp_users(user_login)
Obviously this isn't right... probably syntax errors, but I also need to add some way of telling MySQL that the user_login has to be attached to the ID from the first line. Hopefully this makes sense. Thanks in advance for any help!
Updated queries:
SELECT ID FROM wp_users
SELECT post_time FROM phpbb_posts = post_date
SELECT post_username FROM phpbb_posts = user_login
hopefully this syntax makes more sense. These did work and they select the right information. The problem is I don't know how to write the WHERE statement properly and like you said baskint, I think I need to make the last statement a sub-query somehow. Thanks again!
I am still not sure what are the PK's (Primary Key) and FK's (Foreign Key) relationships of each table. However, assuming that wp_users is the primary table and phpbb_posts.post_username is the FK of wp_users.user_login...:
SELECT `wp_users`.`ID`
FROM `wp_users` INNER JOIN
(SELECT `phpbb_posts`.`post_username` FROM `phpbb_posts`, `wp_posts` WHERE `phpbb_posts`.`post_time` = `wp_posts`.`post_date` ) AS `posts`
ON `wp_users`.`user_login` = `posts`.`post_username`;
EDIT (Dec-05-2012):
After chatting and going through specific, #sbroways had to change data-types on some fields and a few other modifications. In turn, the final query turned out to be:
SELECT wp_users.*, ws_posts.*
FROM wp_users INNER JOIN ws_posts
ON wp_users.user_login = ws_posts.user_login
you're right. your syntax is confusing and not correct. trying to understand what you are trying to accomplish. in second query, why are you selecting and inserting at the same time? perhaps i am missing something, but can you state what you are trying to pull out from which tables and how you would like to see the results in plain English?
Also you can think in terms of sub-queries (SELECT * FROM b WHERE id IS IN (SELECT Id from a). You can cascade this a few times and perhaps get to your answer.

MySQL: Why does this select query not find row when joined table has multiple results?

Title might be confusing, didn't quite know how to put it. Here's what i need to do. I have two tables, cronjobs and cronjob_seeds. I need to see if a cronjob exists before adding it to the database.
Consider these tables:
cronjobs:
-id- -callback_model- -callback_method-
1 movie_suggestion_model fetch_similar_movies
cronjob_seeds:
-cronjob_id- -seed-
1 seed1
1 seed2
Before adding a new cronjob, i need to see if the exact same cronjob exists in the database. I wrote the following query, but it doesn't work if the cronjob has multiple seeds. It works good if it only has one seed, but every time a cronjob has multiple seeds it returns nothing.
SELECT `id`
FROM (`cronjobs`)
INNER JOIN `cronjob_seeds` ON `cronjob_seeds`.`cronjob_id` = `cronjobs`.`id`
WHERE `cronjobs`.`callback_model` = 'movie_suggestion_model'
AND `cronjobs`.`callback_method` = 'fetch_similar_movies'
AND `cronjob_seeds`.`seed` = '1'
AND `cronjob_seeds`.`seed` = 10
Am i missing something? Should i be using another type of join?
And, off topic, but a seed is a parameter for the callback method, i just named it a little weird.
You should use IN clause
Change your query to:
SELECT `id`
FROM (`cronjobs`)
INNER JOIN `cronjob_seeds` ON `cronjob_seeds`.`cronjob_id` = `cronjobs`.`id`
WHERE `cronjobs`.`callback_model` = 'movie_suggestion_model'
AND `cronjobs`.`callback_method` = 'fetch_similar_movies'
AND `cronjob_seeds`.`seed` IN ('seed1', 'seed2')
Only to check if a cron entry exists, you don't need a join, unless you need to check if it exists and has specific seeds. To check only if a cron entry exists you need to do something like this:
SELECT `id`
FROM `cronjobs`
WHERE `cronjobs`.`callback_model` = 'movie_suggestion_model'
AND `cronjobs`.`callback_method` = 'fetch_similar_movies'
How come your query return result even when you had a single row for _cronjob_seeds_ . Cause the last two AND criteria conflict with each other. You should revise it as :
AND `cronjob_seeds`.`seed` BETWEEN '1' AND '10'
Also are you questioning the existance of a record in cronjob_seed or in cronjob? Your query parameters does not seem to be clear. Are you trying to check whether a specific cronjob with specific seeds exist? Tip: try to write the query in human language than sql

Update two tables with a single query in SQL Server

I need to update some information which are stored in two tables. To update one of the table, a JOIN with the other table must be done. I was wondering, if I can update both with a single query. Some posts suggested using a trigger. I was hoping there's another way, since I will have to do it in C#. Also I saw other posts saying it's possible using something like this:
update pd, pr
set pd.Name = 'Test',
pr.Date = '2012-07-31',
from prDetail pd
left join pr on pd.ID = pr.ID
where pd.Code = '45007'
and pr = '2019'
and pr.Item = '1'
This is not working for me (it's showing this error: Incorrect syntax near ','). Can this be achieved in some way?
No.
You can only update one table at a time.
Separate your update into two update queries, and if necessary, wrap them in a transaction.
Yes, As #podiluska said you can only update one table at a time. And you can also use transaction for executing both query with a single transaction.
Else,
If your table/relation has dependency (foreign key and primary key ) relation , then you can use cascade options.
Else,
You can use Trigger.
No, it's not possible to update two at a time but you may consider wrapping it up in a stored procedure and calling that instead. You should try to avoid triggers for something like this if possible.