Often times I use bash scripts to add massive amounts of data to my localhost site databses, once I see that the new data is working properly in my local website I export the database from phpmyadmin and edit the sql file , granted with vim it is realtively easy to change all inserts to insert ignore and so on to prepare it to be accepted in phpmyadmin in cpanel to finaly add the data to my website. this becomes cumbersome when the database gets bigger and bigger
I am new to this and I don't know how to do this operation in a professional/optimal way. is my entire process wrong? how do you do it ?
thank you for your answers
Ah, I think I understand better. I can't answer for any kind of specific enterprise environment, but I'm sure there are many different systems cobbled together with all sorts of creative baler twine and you could get a wide variety of answers to this question.
I'm actually working on a project right now where we're trying to keep data updated between two systems. The incoming data gets imported to a MySQL database and every now and then, new data is exported to a .sql file. Each row has an auto incrementing primary key "id", so we very simply keep track of the last exported ID and start the export from there (using mysqldump and the --where argument). This is quite simple and doesn't feel like an "enterprise-ready" solution, but it's fine for our needs. That avoids the problem of duplicated inserts.
Another solution would be to export the entire database from your development system, then through some series of actions import it to the production server while deleting the old database entirely. This could depend greatly on the size of your data and how much downtime you're willing to perform. An efficient and robust implementation of this would import to a staging database (and verify there were no import problems) before moving the tables to the "correct" database.
If you are simply referring to schema changes or very small amounts of data, then probably version control is your best bet. This is what I do for some of my database schemas; basically you start out with the base schema, then any change gets written as a script that can be run incrementally. So for instance, in an inventory system I might have originally started with a customer table, with fields for ID and name. Later I added a marketing department, and they want me to get email addresses. 2-email.sql would be this line: ALTER TABLE `customer` ADD `email` VARCHAR(255) NOT NULL AFTER `name`;. Still later, if I decide to handle shipping, I'll need to add mailing addresses, so 3-address.sql adds that to the database. Then on the other end, I just run those through a script (bonus points are awarded for using MySQL logic such as "IF NOT EXISTS" so the script can run as many times as needed without error).
Finally, you might benefit from setting up a replication system. Your staging database would automatically send all changes to the production database. Depending on your development process, this can be quite useful or might just get in the way.
Related
I am looking into migrating my MySQL DB to Azure Database for MySQL https://azure.microsoft.com/en-us/services/mysql/. It currently resides on a server hosted by another company. The DB is about 100 GB. (It worries me that Azure uses the term "relatively large" for 1GB.)
Is there a way to migrate the DB without any or little (a few hours, max) downtime? I obviously can't do a dump and load as the downtime could be days. Their documentation seems to be for syncing with a MySQL server that is already on a MS server.
Is there a way to export the data out of MS Azure if I later want to use something else, again without significant downtime?
Another approach: Use Azure Data Factory to copy the data from your MySQL source to your Azure DB. Set up a sync procedure that updates your Azure Database with new rows. Sync, take MYSQL db offline, sync once more and switch to the Azure DB.
See Microsoft online help
Don't underestimate the complexity of this migration.
With 100GB, it's a good guess that most rows in your tables don't get UPDATEd or DELETEd.
For my suggestion here to work, you will need a way to
SELECT * FROM table WHERE (the rows are new or updated since a certain date)
Some INSERT-only tables will have autoincrementing ID values. In this case you can figure out the ID cutoff value between old and new. Other tables may be UPDATEd. Unless those table have timestamps saying when they were updated, you'll have a challenge figuring it out. You need to understand your data to do that. It's OK if your WHERE (new or updated) operation takes some extra rows that are older. It's NOT OK if it misses INSERTed or UPDATEd rows.
Once you know how to do this for each large table, you can start migrating.
Mass Migration Keeping your old system online and active, you can use mysqldump to migrate your data to the new server. You can take as long as you require to do it. Read this for some suggestions. getting Lost connection to mysql when using mysqldump even with max_allowed_packet parameter
Then, you'll have a stale copy of the data on the new server. Make sure the indexes are correctly built. You may want to use OPTIMIZE TABLE on the newly loaded tables.
Update Migration You can then use your WHERE (the rows are new or updated) queries to migrate the rows that have changed since you migrated the whole table. Again, you can take as long as you want to do this, keeping your old system online. It should take much less time than your first migration, because it will handle far fewer rows.
Final Migration, offline Finally, you can take your system offline and migrate the remaining rows, the ones that changed since your last migration. And migrate your small tables in their entirety, again. Then start your new system.
Yeah but, you say, how will I know I did it right?
For best results, you should script your migration steps, and use the scripts. That way your final migration step will go quickly.
You could rehearse this process on a local server on your premises. While 100GiB is big for a database, it's not an outrageous amount of disk space on a desktop or server-room machine.
Save the very large extracted files from your mass migration step so you can re-use them when you flub your first attempts to load them. That way you'll save the repeated extraction load on your old system.
You should stand up a staging copy of your migrated database (at your new cloud provider) and test it with a staging copy of your application. You may be able to do this with a small subset of your rows. But do test your final migration step with this copy to make sure it works.
Be prepared for a fast rollback to the old system if the new one goes wrong .
AND, maybe this is an opportunity to purge out some old data before you migrate. This kind of migration is difficult enough that you could make a business case for extracting and then deleting old rows from your old server, before you start migrating.
I have an application that runs on a MySQL database, the application is somewhat resource intensive on the DB.
My client wants to connect Qlikview to this DB for reporting. I was wondering if someone could point me to a white paper or URL regarding the best way to do this without causing locks etc on my DB.
I have searched the Google to no avail.
Qlikview is in-memory tool with preloaded data so your client have to get data only during periodical reloads not all the time.
The best way is that your client will set reload once per night and make it incremental. If your tables have only new records load every night only records bigger than last primary key loaded.
If your tables have modified records you need to add in mysql last_modified_time field and maybe also set index on that field.
last_modified_time TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
If your fields are get deleted the best is set it as deleted=1 in mysql otherwise your client will need to reload everything from that tables to get to know which rows were deleted.
Additionally your client to save resources should load only data in really simple style per table without JOINS:
SELECT [fields] FROM TABLE WHERE `id` > $(vLastId);
Qlikview is really good and fast for data modelling/joins so all data model your client can create in QLikview.
Reporting can indeed cause problems on a busy transactional database.
One approach you might want to examine is to have a replica (slave) of your database. MySQL supports this very well and your replica data can be as up to date as you require. You could then attach any reporting system to your replica to run heavy reports that won't affect your main database. This also gives you a backup (2nd copy) and the backup can further be used to create offline backups of your data also without affecting your main database.
There's lots of information on the setup of MySQL replicas so that's not too hard.
I hope that helps.
We've a productive MySQL-Database which private user data inside (passwordhashes, ips, emails etc).
When a developer run a buildjob in jenkins on his developer-vm, we want to include a copy of the live-database so that he get an environment which is very similar to our production one. But we've to clean up the production database before it is copied to the dev-server because of 2 reasons:
Developers shouldn't get a copy of all our user data like hashed passwords or emails
The database is big, so we want to delete some of the contents that the dev has a few real data sets for testing, but not > 100k, that will have no benefit and will increase the time which the dump take
I thought about this and tried a few things, but I found no method which is fast and will do the job.
My first idea was to make a dump of all the data by mysqldump, import it on the dev-machine, and send some MySQL-Querys for setting placeholders instead of private data
UPDATE user_data SET email = "dev#example.com" [...]
On the one hand this is slow because it have to copy the huge database AND do the querys. And I don't like it that all of our user-data is on the dev-machine, even for a short time period. I would like it better when the data gets cleaned first and then exported to the dev-machine. This would be possible by copying the database in a temp one on the production system, then clean the data, export it and delete the copied database on the production system. But this also created a lot of overhead.
What is a good and fast method for doing this?
I thought about something like mysqldump with replacing the data, so that no overhead is created. But i can't find any tool which can do this.
Do you have enough room for two databases on the production server? If so, make a developer db on the same server (or any server, really) which is a nightly dump of production, minus all the sensitive information and bulk.
Developers get access to only this "developer" database, from the production server, which you know has been pruned of anything sensitive. As a bonus, they could connect directly to it, and possibly never need to download it.
I'm researching something that I'd like to call replication, but there is probably some other technical word for it - since as far as I know "replication" is a complete replication of structure and its data to slaves. I only want the structure replication. My terminology is probably wrong which is why I can't seem to find answers on my own.
Is it possible to set up a mysql environment that replicates a master structure to multiple local databases when a change, addition or drop has been made? I'm looking for a solution where each user gets its own database instance with their own unique data but with the same structure of tables. When an update is being made to the master structure, the same procedure should be replicated by each user database.
E.g. a column is being added to master.table1 that is replicated by user1.table1 and user2.table1.
My first idea was to write a update procedure in PHP but it feels like this would be a quite fundamental function built-in to the database, since my conclusion would be that index lookup would be much faster with less data (~ total data divided by users) and probably more secure (no unfortunate leaks, if any).
I solved this problem with simple set of SQL scripts for every change in database, named year-month-day-description.sql, which i run in lexicographical order (that's why it begins with date).
Of course you do not want to run them all every time. So to know which scripts I need to execute, each script has simple insert at it's end, which inserts filename of the script into table in database. So the updater PHP script simply make list of scripts, remove these in table and run the rest.
Good on this solution is, that you can include data transformations too. And also, it can be fully automatic and as long as scripts are ok, nothing bad will happen.
You will probably need to look into incorporating the use of database "migrations", something popularized by the Ruby on Rails framework. This Google search for PHP database migrations might be a could starting point for you.
The concept is that as you develop your application and make schema changes, you can create SQL migration scripts to roll-forward or roll-back the schema changes. This makes it really easy to then easily "migrate" your database schema to work with a particular code version (for example if you have branched code being worked on in multiple environments that need each need a different version of the database).
That isn't going to autmoatically make updates like you suggest, but is certainly a step in the right direction. There a also tools like Toad for MySQL and Navicat which have some level of support of schema synchronization. But again these would be manual comparisons/syncs.
I'm working on a group project where we all have a mysql database working on a local machine. The table mainly has filenames and stats used for image processing. We all will run some processing, which updates the database locally with results.
I want to know what the best way is to update everyone else's database, once someone has changed theirs.
My idea is to perform a mysqldump after each processing run, and let that file be tracked by git (which we use religiously). I've written a bunch of python utils for the database, and it would be simple enough to read this dump into the database when we detect that the db is behind. I don't really want to do this though, less it clog up our git repo with unnecessary 10-50Mb files with every commit.
Does anyone know a better way to do this?
*I'll also note that we are Aerospace students. I have some DB experience, but it only comes out of need. We're busy and I'm not looking to become an IT networking guru. Just want to keep it hands off for them since they are DB noobs and get the glazed over look of fear whenever I tell them to do anything with the database. I made it hands off for them thus far.
You might want to consider following the Rails-style database migration concept, whereby as you are developing you provide roll-forward and roll-back SQL statements that work as patches, allowing you to roll your database to any particular revision state that is required.
Of course, this is typically meant for dealing with schema changes only (i.e. you don't worry about revisioning data that might be dynamically populated into tables.). For configuration tables or similar tables that are basically static in content, you can certainly add migrations as well.
A Google search for "rails migrations for python" turned up a number of results, including the following tool:
http://pypi.python.org/pypi/simple-db-migrate
I would suggest to create a DEV MySQL server on any shared hosting. (No DB experience is required).
Allow remote access to this server. (again, no experience is required, everything could be done through Control Panel)
And you and your group of developers will have access to the database at any time from any place and from any device. (As long as you have internet connection)