How to find temp space used by a query? Tried with Explain and Explain extended but that doesn't show any information.
Well, one thing you could do is monitor the temp directory while you are running a query.
Howerver, there are specific types of queries/situations that will cause temp files to be created (not all).
The answer at https://dba.stackexchange.com/a/30635 tells you which kinds.
Related
I am generating some MySQL queries using php. In some cases, my code generates duplicate query code for some of the queries, for security precautions. For example, let's say I have a table UploadedImages, which contains images uploaded by a user. It is connected to the User table via a reference. When a plain user wants to query that table, if he doesn't have admin rights, I forcefully put in a WHERE condition to the query, which only retrieves images which belong to him.
Because of this forcefull inclusion, sometimes, the query which I generate results in duplicate where conditions:
SELECT * FROM UploadedImages WHERE
accounts_AccountId = '143' AND
DateUploaded > '2017-10-11 21:42:32' AND
accounts_AccountId = '143'
Should I bother, with cleaning up this query before running it, or will MariaDB clean it up for me? (ie, will this query run any slower, or is it possible that it will result in erroneus results if I don't clean it up beforehand, by removing the identical duplicate conditions?)
If your question is "Should I bother cleaning it up?", Yes you should clean up the code that produces this because the fact that it can include the same clause multiple times suggests the database layer is not abstracted to a particularly modern level. The database layer should be able to be re-written to use a different database provider without having to change the code that depends upon it. It looks like it is not the case.
If your question is "Does adding the same restriction twice slow the query?" then the answer is no, not significantly.
You can answer the question for yourself: Run EXPLAIN SELECT ... on both queries. If the output is the same, then the dup was cleaned up.
I have a set of quite complex SELECT queries which use a lot of disk space (I see this from df -h while running). Is there a way to estimate the temporary disk space required for a query before starting it?
You can use the EXPLAIN keyword to describe how your joins will effect the number of rows that will be joined together. This will also assist you in properly using keys if they are not already present. Explain will tell you when it thinks it will need to use temp tables (disk space). Based on the size of the rows being joined, you can then roughly estimate your disk space need.
See the docs on explain here:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
Basically though, just prepend "Explain" to your select query to get info output. I believe you can also do this programatically if needed and use the results in your actual code, say for instance you needed to calculate(estimate) a large query run time and display it to the user before proceeding.
Update:
I wrote a working script that finishes this job in a reasonable length of time, and seems to be quite reliable. It's coded entirely in PHP and is built around the array_diff() idea suggested by saccharine (so, thanks saccharine!).
You can access the source code here: http://pastebin.com/ddeiiEET
I have a MySQL database that is an index of mp3 files in a certain directory, together with their attributes (ie. title/artist/album).
New files are often being added to the music directory. At the moment it contains about 25,000 MP3 files, but I need to create a cron job that goes through it each day or so, adding any files that it doesn't find in the database.
The problem is that I don't know what is the best / least taxing way of doing this. I'm assuming a MySQL query would have to be run for each file on each cron run (to check if it's already indexed), so the script would unavoidably take a little while to run (which is okay; it's an automated process). However, because of this, my usual language of choice (PHP) would probably not suffice, as it is not designed to run long-running scripts like this (or is it...?).
It would obviously be nice, but I'm not fussed about deleting index entries for deleted files (if files actually get deleted, it's always manual cleaning up, and I don't mind just going into the database by hand to fix the index).
By the way, it would be recursive; the files are mostly situated in an Artist/Album/Title.mp3 structure, however they aren't religiously ordered like this and the script would certainly have to be able to fetch ID3 tags for new files. In fact, ideally, I would like the script to fetch ID3 tags for each file on every run, and either add a new row to the database or update the existing one if it had changed.
Anyway, I'm starting from the ground up with this, so the most basic advice first I guess (such as which programming language to use - I'm willing to learn a new one if necessary). Thanks a lot!
First a dumb question, would it not be possible to simply order the files by date added and only run the iterations through the files added in the last day? I'm not very familiar working with files, but it seems like it should be possible.
If all you want to do is improve the speed of your current code, I would recommend that you check that your data is properly indexed. It makes queries a lot faster if you search through a table's index. If you're searching through columns that aren't the key, you might want to change your setup. You should also avoid using "SELECT *" and instead use "SELECT COUNT" as mysql will then be returning ints instead of objects.
You can also do everything in a few mysql queries but will increase the complexity of your php code. Call the array with information about all the files $files. Select the data from the db where the files in the db match the a file in $files. Something like this.
"SELECT id FROM MUSIC WHERE id IN ($files)"
Read the returned array and label it $db_files. Then find all files in $files array that don't appear in $db_files array using array_diff(). Label the missing files $missing_files. Then insert the files in $missing_files into the db.
What kind of Engine are you using? If you're using MyISAM, the whole table will be locked while updating your table. But still, 25k rows are not that much, so basically in (max) a few minutes it should be updated. If it is InnoDB just update it since it's row-level locked and you should be still able to use your table while updating it.
By the way, if you're not using any fulltext search on that table, I believe that you should convert it to InnoDB as you can use foreign indexes, and that would help you a lot while joining tables. Also, it scales better AFAIK.
I'm looking at having someone do some optimization on a database. If I gave them a similar version of the db with different data, could they create a script file to run all the optimizations on my database (ie create indexes, etc) without them ever seeing or touching the actual database? I'm looking at MySQL but would be open to other db's if necessary. Thanks for any suggestions.
EDIT:
What if it were an identical copy with transformed data? Along with a couple sample queries that approximated what the db was used for (ie OLAP vs OLTP)? Would a script be able to contain everything or would they need hands on access to the actual db?
EDIT 2:
Could I create a copy of the db, transform the data to make it unrecognizable, create a backup file of the db, give it to vendor and them give me a script file to run on my db?
Why are you concerned that they should not access the database? You will get better optimization if they have the actual data as they can consider table sizes, which queries run the slowest, whether to denormalise if necessary, putting small tables completely in memory, ...?
If it is a issue of confidentiality you can always make the data anomous by replacement of names.
If it's just adding indices, then yes. However, there are a number of things to consider when "optimizing". Which are the slowest queries in your database? How large are certain tables? How can certain things be changed/migrated to make those certain queries run faster? It could be harder to see this with sparse sample data. You might also include a query log so that this person could see how you're using the tables/what you're trying to get out of them, and how long those operations take.
I am using Joomla 1.5 and VirtueMart 1.1.3.
There is an issue where tmp files that are 1.6 GB are created every time a certain query is executed. is this normal? I think virtuemart is using a huge join statement to pull the whole products table and several other tables. I found the file that builds the query but i don't know where to begin to optimize this. even if i did virtuemart seems to use this one file to build all sql statements so i could end up breaking something.
You could look at the MySQL slow query log (and/or enable it) to see the particular query taking time and space. With that in hand, you can use MySQL's EXPLAIN functionality to see why the query is slow.
If you're lucky, the VirtueMart developers simply haven't added valid indexes to their tables, which causes MySQL to have to do things the slow way (ie. filesort, etc). If you're unlucky, changing the schema won't help and you'll have to take this up with the VirtueMart developers, or fix it yourself.
In any case, if you find a solution, you probably should let the VirtueMart team know.
Best of luck!