My mysql instance has 1700+ tables named "index_*". At 15MB each, this adds up to 25+ gigs.
How can I clean these up? Is it as easy as dropping these tables? or is there some configuration in tikiwiki that cleans up the database with regards to index tables?
Wow, 1700+ of them, never seen that before! Which version are you running? You probably need to upgrade as that sounds like a bug.
However, having said that the good news is that you can safely (but carefully) delete (drop) them and then rebuild the search index from the search admin panel or the command line using console.php and tiki will make a new one (or two).
I guess that 25 GB is too much to back up, but i'd suggest talking a backup of all the other tables if you can just in case.
The index_* tables are the unified search mysql engine's storage and it's usual to have a couple of them, maybe half a dozen or more, so something sounds like it's going badly wrong. Maybe you have a cron job running a regular rebuild? (but that would have to be every hour or something, usually once a day is plenty)
Good luck!
Related
I'm studying up on the future of the database I maintain. Right now we have one database server running MySQL using InnoDB and MyISAM tables. I'm watching the metrics closely and I can see that this will not be sustainable forever. Where does one go next? I have reviewed solutions like Cassandra, but I want to stick to an SQL approach so I'm not sure about that. I have also reviewed NDB cluster and federated database solutions, but I've noticed no one has anything good to say about those. Basically, I looking for advice on intermediate solutions. We do not yet need a vast multi-node array operating on tens of DB servers, but one server is about to reach its limit. I don't want to just throw another server on the pile without making sure that the DB architecture at hand benefits well from the extra power. What do you guys suggest for when it is time to move beyond a single server and how to manage this transition. Thank you to anyone who can help.
Edit to better explain: At present, we have about a hundred tables. We run many join operations to gather the data the end user needs to see, such that most of our queries join at least two tables to complete any operation. The data set is not too big yet, only a few hundred Megs, but the data is accessed in such a way that each table has a few writes everyday, the heaviest of which has about a thousand writes a day. We probably have about a few hundred thousand reads a day too, so read do outnumber writes about 9 to 1.
First Solutions:
Indices go a LONG way
Use profiling software to find your slow queries and optimize them
Depending on your hosting company you can usually update the RAM/CPU of the server
Second Solutions:
Split your reads and your writes into two databases. (I don't know if you're using PHP or not but PHP has a plugin that will automatically split them for you without having to change any of your code http://php.net/manual/en/mysqlnd-ms.rwsplit.php)
Use software like memcache to store database information that is frequently queried but not frequently updated
This should really be a community wiki page, but I have to ask this question and see what I might be missing. I'm a moderator on a site and they are going through a new site transition.
They started data migration yesterday around lunch. It's still going on and they say it's going to take 30 more hours. It's a rather large site (700 million records going from SQL Server to MySQL) but I couldn't fathom why it was taking so long.
I just found out that they're indexing on the fly. Are there benefits to this? Would it not be quicker and probably safer to copy and then index? If anyone has links, I'll most likely choose that as the answer. Thanks.
The typical procedure I know is to copy all the tables with disabled constraints and no indexes and recreate indexes from scratch afterwards and then enable the constraints. Rebuilding an index from scratch is much cheaper than creating it online during migration.
Googling a minute brought up this for you from the horse's mouth :) :
http://www.mysql.com/why-mysql/white-papers/mysql_microsoftsql2mysql_paper.pdf
see e.g. page 5:
Also you'll want to take the
permissions and index statements from
the end of each of these files [the
generated MySQL DDL], and put them in
new files. If these statements are
left when migrating, migrating the
data will be significantly slower.
I didn't find a benchmark, but you could produce a very representative one yourself: Just migrate, say 1 million of your own records, using both strategies. The results should speak for themselves.
Here is a related question.
I've been doing a lot of research, reading on replication, etc but just not sure as to what mysql solution would work.
This is what I'm looking at:
when my mysql fails for some reason or there are certain queries that are taking really long to execute and locking some tables, I want the other insert/update/select queries to still function at normal speed without having to wait for locks to be released or for the main database to be back up. I'm thinking there should be a second mysql server for this to happen, but is what I mentioned possible even if there is and would it involve a lot of change in my existing programming logic?
when my database is being backed up, I would still like my site to function normally, all inserts/selects/updates should function as normal.
when I need to alter a large table, I wouldn't like it to affect my application, there should be a backup server to work from.
So what do I need to do to get all this done and also would it require changing plenty of existing coding to suit the new set up? [My site has a lot of reads and writes]
There's no easy way. You're asking for a highly-available MySQL-based setup, and that requires a lot of work at the server and client ends.
Some issues, for example:
when I need to alter a large table, I wouldn't like it to affect my application, there should be a backup server to work from.
If you're altering the table, you can't trivially create a copy to work from during the update. What about the changes that are made to your copy while the first update is taking place?
Have a search for "High Availability MySQL". It's mostly a solved problem, but the solution depends heavily on your exact requirements. You cannot just ask for "I want my SQL server to run at full speed always forever no matter what I throw at it".
Not a MySQL specific answer, but a general one. Have a read only copy of your DB for site to render, which will be synced to the master DB regularly. This way, you can keep your site working even if the master DB is under load/lock due to insert/delete. For efficiency, keep this copy as denormalized as you can.
What's the fastest way to export/import a mysql database using innodb tables?
I have a production database which I periodically need to download to my development machine to debug customer issues. The way we currently do this is to download our regular database backups, which are generated using "mysql -B dbname" and then gzipped. We then import them using "gunzip -c backup.gz | mysql -u root".
From what I can tell from reading "mysqldump --help", mysqldump runs wtih --opt by default, which looks like it turns on a bunch of the things that I can think of that would make imports faster, such as turning off indexes and importing tables as one massive import statement.
Are there better ways to do this, or further optimizations we should be doing?
Note: I mostly want to optimize the time it takes to load the database onto my development machine (a relatively recent macbook pro, with lots of ram). Backup time and network transfer time currently aren't big issues.
Update:
To answer some questions posed in the answers:
The production database schema changes up to a couple times a week. We're running rails, so it's relatively easy to run the migrate scripts on stale production data.
We need to put production data into a development environment potentially on a daily or hourly basis. This entirely depends on what a developer is working on. We often have specific customer issues that are the result of some data spread across a number of tables in the db, which needs to be debugged in a development environment.
I honestly don't know how long mysqldump takes. Less than 2 hours, since we currently run it every 2 hours. However, that's not what we're trying to optimize, we want to optimize the import onto the developer workstation.
We don't need the full production database, but it's not totally trivial to separate what we do and don't need (there are a lot of tables with foreign key relationships). This is probably where we'll have to go eventually, but we'd like to avoid it for a bit longer if we can.
It depends on how you define "fastest".
As Joel says, developer time is expensive. Mysqldump works and handles a lot of cases you'd otherwise have to handle yourself or spend time evaluating other products to see if they handle them.
The pertinent questions are:
How often does your production database schema change?
Note: I'm referring to adding, removing or renaming tables, columns, views and the like ie things that will break actual code.
How often do you need to put production data into a development environment?
In my experience, not very often at all. I've generally found that once a month is more than sufficient.
How long does mysqldump take?
If it's less than 8 hours it can be done overnight as a cron job. Problem solved.
Do you need all the data?
Another way to optimize this is to simply get a relevant subset of data. Of course this requires a custom script to be written to get a subset of entities and all relevant related entities but will yield the quickest end result. The script will also need to be maintained through schema changes so this is a time-consuming approach that should be used as an absolute last resort. Production samples should be large enough to include a sufficiently broad sample of data and identify any potential performance problems.
Conclusion
Basically, just use mysqldump until you absolutely can't. Spending time on another solution is time not spent developing.
Consider using replication. That would allow you to update your copy in real time, and MySQL replication allows for catching up even if you have to shut down the slave. You could also use a parallell MySQL instance on your normal server that replicates the data to a MyISAM table, which supports online backup. MySQL allows for this as long as the tables have the same definition.
Another option that might be worth looking into is XtraBackup from renowned MySQL performance specialists Percona. It's an online backup solution for InnoDB. Haven't looked at it myself, though, so I won't vouch for it's stability or that it's even a workable solution for your problem.
I'm rewriting a PHP+MySQL site that averages 40-50 hits a day using Django.
Is SQLite a suitable database to use here? Are there any advantages/disadvantages between them?
I'm just using the db to store a blog and the users who can edit it. I am using fulltext search for the blog search, but no complex joins anywhere.
40-50 hits per day is very small and SQLLite can be used without any problem.
MySql might be better once you will get more hit because it handles in a better way multiple connexion (lock isn't the same with MySql and SqlLite).
The major problem with sqlite is concurrency. If you expect 40-50 hits a day, that's probably a non-issue. However, if that load increases you should be ready to migrate to a database daemon such as MySQL - better abstract your database specific code to make such a switch as painless as possible.
The performance section of the SQLite wiki might be of use to you.
Since you're already using an adequate database, I don't see a reason to migrate to a smaller one.
While sqlite might be perfectly adequate, too - changing to a less-capable platform from a more-capable one doesn't seem the best choice :)
SQLite will work just fine for you. It sounds as though you're largely using the database as read-only (with occasional writes to update the content). SQLite excels at this kind of access pattern. The only place where SQLite chokes is when you have a lot of writes to a database, because once a process attempts to write the file is locked until the write is complete. Also, if you do lots of writes (like updating rows in a loop) you should look into putting all those writes into a transaction - while the file is locked once the transaction hits a write query, the updates themselves take much less time because they're written to the file at once and not individually.
SQLite would be fine for this level of traffic. It actually performs quite well, the only thing that it is lacking is caching of data and queries because it needs to be spun up every time your page is accessed. That said, it is still very quick and it shouldn't be too hard to migrate to MySQL later if need be.