I am looking to move some MySQL databases to the cloud in Amazon Redshift. Currently I am creating a Python script to convert the tables to CSVs, encrypt them, put them in S3, then COPY the data into Redshift. However, the way it is set up I would have to copy the data one table at a time. I have read that you can split your data into multiple files and upload them in parallel, however I believe this is still only for loading data into one table. Is there a way to use COPY on multiple tables at once? Having to copy data over from each table individually seems very inefficient.
All of your statements are correct.
The COPY command can load from multiple files in parallel (in fact, that is recommended because it can then spread the load job across multiple nodes), but it only loads on table per COPY command.
You could connect to Redshift via multiple sessions and run a COPY command in each session to load multiple tables simultaneously (but be careful of the impact on production users).
If you are wishing to migrate data from an on-premises database to Amazon Redshift, consider using:
AWS Schema Conversion Tool
AWS Database Migration Service
The Database Migration Service can even perform on-going updates of Redshift whenever data is updated in the source database.
How should you go about restoring (and backing) up a MySQL database "safely"? By "safely" I mean: the restore should create/overwrite a desired database, but not risk altering anything outside that database.
I have already read https://dev.mysql.com/doc/refman/5.7/en/backup-types.html.
I have external users. They & I may want to exchange backups for restore. We do not have a commercial MySQL Enterprise Backup, and are not looking for a third-party commercial offering.
In Microsoft SQL Server there are BACKUP and RESTORE commands. BACKUP creates a file containing just the database you want; both its rows and all its schema/structure are included. RESTORE accepts such a file, and creates or overwrites its structure. The user can restore to a same-named database, or specify a different database name. This kind of behaviour is just what I am looking for.
In MySQL I have come across 3 possibilities:
Most people seem to use mysqldump to create a "dump file", and mysql to read that back in. The dump file contains a list of arbitrary MySQL statements, which are simply executed by mysql. This is quite unacceptable: the file could contain any SQL statements. (Limiting access rights of restoring user to try to ensure it cannot do anything "naughty" is not acceptable.) There is also the issue that the user may have created the dump file with the "Include CREATE Schema" option (MySQL Workbench), which hard-codes the original database name for recreation. This "dump" approach is totally unsuitable to me, and I find it surprising that anyone would use it in a production environment.
I have come across MySQL's SELECT ... INTO OUTFILE and LOAD DATA INFILE statements. At least they do not contain SQL code to execute. However, they look like a lot of work, deal with a table at time not the whole database, and don't deal with the structure of the tables, you have to know that yourself for restoring. There is a mysqlimport helper command-line utility, but I don't see anything for the export side, and I don't see it for restoring a complete database.
The last is to use what MySQL refers to as "Physical (Raw)" rather than "Logical" Backups. This works on the database directories and files themselves. It is the equivalent of SQL Server's detach/attach method for backing up/restoring. But, as per https://dev.mysql.com/doc/refman/5.7/en/backup-types.html, it has all sorts of caveats, e.g. "Backups are portable only to other machines that have identical or similar hardware characteristics." (I have no idea, e.g. some users are Windows versus Windows, I have no idea about their architecture) and "Backups can be performed while the MySQL server is not running. If the server is running, it is necessary to perform appropriate locking so that the server does not change database contents during the backup." (let alone restores).
So can anything satisfy (what I regard as) my modest requirements, as outlined above, for MySQL backup/restore? Am I really the only person who finds the above 3 as the only, yet unacceptable, possible solutions?
1 - mysqldump - I use this quite a bit, usually in environments where I am handling all the details myself. I do have one configuration where I use that to send copies of a development database - to be dumped/restored in its entirety - to other developers. It is probably the fastest solution, has some reasonable configuration options (e.g., to include/exclude specific tables) and generates very functional SQL code (e.g., each INSERT batch is small enough to avoid locking/speed issues). For a "replace entire database" or "replace key tables in a specific database" solution, it works very well. I am not too concerned about the "arbitrary SQL commands" problem - if that is an issue then you likely have other issues with users trying to "do their own thing".
2 - SELECT ... INTO OUTFILE and LOAD DATA INFILE - The problem with these is that if you have any really big tables then the LOAD DATA INFILE statement can cause problems because it is trying to load everything all at once. You also have to add code to create (if needed) or empty the tables before LOAD DATA.
3 - Physical (raw) file transfer. This can work but under limited circumstances. I had one situation with a multi-gigabyte database and decided to compress the raw files, move them to the new machine, uncompress and just tell MySQL "everything is already there". It mostly worked well. But I would not recommend it for any unattended/end-user process due to the MANY possible problems.
What do I recommend?
1 - mysqldump - live with its limitations and risks, set up a script to call mysqldump and compress the file (I am pretty sure there are options in mysqldump to do the compression automatically), include the date in the file name so that there is less confusion as the files are sent around, and make a simple script for users to load the file.
2 - Write your own program. I have done this a few times. This is more work initially but allows you to control every aspect of the process and transfer a file that only contains data without any actual SQL code. You can control the specific database, tables, etc. One catch is that if you make any changes to the table structure, indexes, etc. you will need to make sure that information is somehow transmitted to the receiving problem so that it can change the structures as needed - that is not a problem with mysqldump as it normally replaces the tables, creating the new structures, indexes, etc. This can be written in any language that can connect to MySQL - it does not have to be the same language as your application.
If you're not going to use third party tools (like innobackupex for example) then you're limited to use ... mysqldump, which is in the mysql package.
I can't understand why it is not acceptable for you, why you don't like sql commands in those dumps. Best practice,when restoring a single db into the server, which already contains other databases, is to have a separated user, with rights only to write into the restored db. Then even if the user performing restore, would change the sql commands and tried to write to another db, they will not be able to.
When doing raw backup (physical copy of database files) you need to have all the instances down, mysql server not running. Similar hardware means you need to have the same directories as the source server (unless you would change my.cnf before starting the server, and putting all the files to right directories).
When coming into mysql, try to not compare it to sql server - it's totally different approach and philosophy.
But if you would convinced yourself anyhow to use third party tool - I recommend innobackex from Percona, which is free btw.
The export tool that complements mysqlimport is mysqldump --tab. This outputs CSV files like SELECT...INTO OUTFILE. It also outputs the table structure in much smaller .sql files. So there are two files for each table.
Once you recreate your tables from the .sql files, you can use mysqlimport to import all the data files. You can even use the mysqlimport --use-threads option to make it load multiple data files in parallel.
This way you have more control over which schema to load the data into, and it should run a lot faster than loading a large SQL dump.
For a small application, need to use a flat file database with relational capabilities (2 or 3 tables).
Couple questions regarding this schema:
Does such databases exist?
Any performance hits with large datasets? (say, 10k-20k entries)
The reason I want to go with a flat-file database is so the whole thing(root directory) can be copied and pasted, and not have to worry about exporting the database, installing & configuring a database in another system, etc.
thanks.
Try SQLite.
It is easy to use, portable, no configuration and has great performance (10k / 20k is nothing)
I need to know how data from databases is stored on a filesystem. I am sure, that different databases use different ways of storing data, but I want to know what the general rule is (if there is one), and what can be changed in settings of a particular DB.
How is the whole database stored? In one big file or one file per table?
What if a table is enormous? Would it be split into few files?
What is typical size of file in that case?
The answer to this question is both database dependent and implementation dependent. Here are some examples of how data can be stored:
As a single file per database. (This is the default for SQL Server.)
Using a separate file system manager, which could be the operating system. (MySQL has several options, with names like InnoDB.)
Using separate files for each table. (If we consider Access a database.)
As multiple physical files, spread across multiple file systems, but represented as a single "file". (HIVE, for instance, that uses a parallel file system to store the data.)
However, these are the default configurations. Real databases typically let you split the data among multiple physical devices. SQL Server and MySQL call this partitions. Oracle calls this table spaces. These are typically set up by knowledgeable DBAs who understand the performance requirements of the system.
The final questions are easy to answer, though. Most databases give you the option of either growing the databases as space is needed or giving the database a fixed (or fixed maximum) size. I have not encountered a database engine that will split the underlying data into multiple files automatically, although it is possible that newer column oriented databases (such as Vertica) do something similar.
I'd like to populate the MySQL timezone tables with the database provided by MySQL. I am using a cloud DB and can't overwrite DB tables and restart the server.
Can someone help me understand how to load these files manually?
Rational
I loaded the tz tables from the OS, but the OS has a ton of timezone names. I'd like a more concise set of names that I can query for forms. I think the set provided by MySQL might be a better fit. No other apps are running on the database, thus timezone conflicts aren't an issue.
The database provided by mysql comes as a bunch of myISAM container files; I don't think you're going to be able to safely drop them into the mysql data base directory without bouncing your mysqld.
Do you own this mysqld, or are you one of many tenants in a vendor-owned system?
If you own it, you can load a subset of the /usr/share/zoneinfo time zones. A useful subset might be /usr/share/zoneinfo/posix.
If you're using the mysql.time_zone_name.Name to populate a pick list (a good use for it) you could select an appropriate subset of the admittedly enormous list of names,
or create some aliases right in that table.
I ended up loading the tables into a SQL server on my on local machine, then exporting insert statements and manually loading those onto the server for which I don't have direct control of. Not a glamors solution, it it appears to be the only reasonable way to go about it.