Can anybody help me. I want to know if there is a good solution to moving a large amount of data filtered from one table in an oracle db to another in a mysql db.
I know that you can run a query and loop over the results of it and insert it to the other database but the problem is that it may run out of memory and i'm looking for a good solution like running jobs or some asynchronous tasks.
You are looking for an ETL system (maybe without the T in your case)
Time ago I use pygrametl and it managed the data surprisingly fast.
Another option is django bulk create (don't try to insert all data at once, split data into chunks)
Related
I have a very specific problem that requires multiple MYSQL DB instances, and I need to "sync" all data from each DB/table into one DB/table.
Basically, [tableA.db1, tableB.db2, tableC.db3] into [TableAll.db4].
Some of the DB instances are on the same machine, and some are on a separate machine.
About 80,000 rows are added to a table per day, and there are 3 tables(DB).
So, about 240,000 would be "synced" to a single table per day.
I've just been using Event Schedule to copy the data from each DB into the "All-For-One" DB every hour.
However, I've been wondering lately if that's the best solution.
I considered using Trigger, but I've been told it puts heavy burden on DB.
Using statement trigger may be better, but it depends too much on how the statement is formed.
Then I heard about Federated (in Oracle term, "DBLink"),
and I thought I could use it to link each table and create a VIEW table on those tables.
But I don't know much about databases, so I don't really know the implication of each method.
So, my question is..
Considering the "All-For-One" DB only needs to be Read-Only,
which method would be better, performance and resource wise, in order to copy data from multiple databases into one database regularly?
Thanks!
I have a Laravel web app that's using a VueJS front-end and MySQL as the RDBMS. I currently have a table that is 23.8gb and contains 8m+ rows and it's growing every second.
When querying this data, I'm joining it to 4 other tables so the entire dataset is humongous.
I'm currently only pulling and displaying 1000 rows as I don't need anymore than that. VueJS is showing the data in a table and there are 13 filter options for the user to select from to filter the data ranging from date, name, status, etc.
Using Eloquent and having MySQL indexes in place, I've managed to get the query time down to a respectable time but I need this section of the app to be as responsive as possible.
Some of the where clauses that kick off from the filters are taking 13 seconds to execute which I feel is too long.
I've been doing some reading and thinking maybe MongoDB or Redis may be an option but have very little experience with either.
For this particular scenario, what do you think would be the best option to maximise read performance?
If I were to use MongoDB, I wouldn't migrate the current data... I'd basically have a second database that contains all the new data. This app hasn't gone into production yet and in most use cases, only the last 30 days worth of data will be required but the option to query old data is still required hence keeping both MySQL and MongoDB.
Any feedback will be appreciated.
Try to use elasticsearch. It will speed up the read process.
Try converting the query into a stored procedure. You can execute the stored procedure like this..
DB::select('exec stored_procedure("Param1", "param2",..)');
or
DB::select('exec stored_procedure(?,?,..)',array($Param1,$param2));
Try this for without parameters
DB::select('EXEC stored_procedure')
Try using EXPLAIN to optimise the performance.
How to optimise MySQL queries based on EXPLAIN plan
We are running a service where we have to setup a new database for each new site. The database is exactly the same so we can simply dump from a backup file or clone from a sample database (which is created only for clone purpose, no transaction will be run there thus no worry about corrupting data) from the same server. The database it self contains around 100 tables and with some data, taking around 1-2mins to import, which is too slow.
I'm trying to find a way to do it as fast as possible, the first thought came to mind was to copy the files within the sample database data_dir, but it seems like I also need to somehow edit the table lists or mysql wont be able to read my new database's tables eventhough it still shows them there.
You're duplicating the database the wrong way, it will be much faster if you do it properly.
Here is how you duplicate a database:
create database new_database;
create table new_database.table_one select * from source_database.table_one;
create table new_database.table_two select * from source_database.table_two;
create table new_database.table_three select * from source_database.table_three;
...
I just did a performance test, this takes 81 seconds to duplicate 750MB of data across 7 million table rows. Presumably your database is smaller than that?
I don't think you are going to find anything faster. One thing you could do is already have a queue of duplicate databases on standby ready to be picked up and used at any time. So you don't need to create a new database at all, you just rename an existing database from a queue of available ones. And have a cron job running to make sure the queue never runs empty.
Why mysql not able to read or what you changes in table lists?
I think there may be problem of permissions to read by mysql, otherwise it would be fine..
Thanks
I have a mysql query that is taking 8 seconds to execute/fetch (in workbench).
I won't go into the details of why it may be slow (I think GROUPBY isnt helping though).
What I really want to know is, how I can basically cache it to work more quickly because the tables only change like 5-10 times/hr, while users access the site 1000s times/hour.
Is there a way to just have the results regenerated/cached when the db changes so results are not constantly regenerated?
I'm quite new to sql so any basic thought may go a long way.
I am not familiar with such a caching facility in MySQL. There are alternatives.
One mechanism would be to use application level caching. The application would store the previous result and use that if possible. Note this wouldn't really work well for multiple users.
What you might want to do is store the report in a separate table. Then you can run that every five minutes or so. This would be a simple mechanism using a job scheduler to run the job.
A variation on this would be to have a stored procedure that first checks if the data has changed. If the underlying data has changed, then the stored procedure would regenerate the report table. When the stored procedure is done, the report table would be up-to-date.
An alternative would be to use triggers, whenever the underlying data changes. The trigger could run the query, storing the results in a table (as above). Alternatively, the trigger could just update the rows in the report that would have changed (harder, because it involves understanding the business logic behind the report).
All of these require some change to the application. If your application query is stored in a view (something like vw_FetchReport1) then the change is trivial and all on the server side. If the query is embedded in the application, then you need to replace it with something else. I strongly advocate using views (or in other databases user defined functions or stored procedures) for database access. This defines the API for the database application and greatly facilitates changes such as the ones described here.
EDIT: (in response to comment)
More information about scheduling jobs in MySQL is here. I would expect the SQL code to be something like:
truncate table ReportTable;
insert into ReportTable
select * from <ReportQuery>;
(In practice, you would include column lists in the select and insert statements.)
A simple solution that can be used to speed-up the response time for long running queries is to periodically generate summarized tables, based on underlying data refreshing or business needs.
For example, if your business don't care about sub-minute "accuracy", you can run the process once each minute and make your user interface to query this calculated table, instead of summarizing raw data online.
What's the best way to copy a large MySQL table in terms of speed and memory use?
Option 1. Using PHP, select X rows from old table and insert them into the new table. Proceed to next iteration of select/insert until all entries are copied over.
Option 2. Use MySQL INSERT INTO ... SELECT without row limits.
Option 3. Use MySQL INSERT INTO ... SELECT with a limited number of rows copied over per run.
EDIT: I am not going to use mysqldump. The purpose of my question is to find the best way to write a database conversion program. Some tables have changed, some have not. I need to automate the entire copy over / conversion procedure without worrying about manually dumping any tables. So it would be helpful if you could answer which of the above options is best.
There is a program that was written specifically for this task called mysqldump.
mysqldump is a great tool in terms of simplicity and careful handling of all types of data, but it is not as fast as load data infile
If you're copying on the same database, I like this version of Option 2:
a) CREATE TABLE foo_new LIKE foo;
b) INSERT INTO foo_new SELECT * FROM foo;
I've got lots of tables with hundreds of millions of rows (like 1/2B) AND InnoDB AND several keys AND constraints. They take many many hours to read from a MySQL dump, but only an hour or so by load data infile. It is correct that copying the raw files with the DB offline is even faster. It is also correct that non-ASCII characters, binary data, and NULLs need to be handled carefully in CSV (or tab-delimited files), but fortunately, I've pretty much got numbers and text :-). I might take the time to see how long the above steps a) and b) take, but I think they are slower than the load data infile... which is probably because of transactions.
Off the three options listed above.
I would select the second option if you have a Unique constraint on at least one column, therefore not creating duplicate rows if the script has to be run multiple times to achieve its task in the event of server timeouts.
Otherwise your third option would be the way to go, while manually taking into account any server timeouts to determine your insert select limits.
Use a stored procedure
Option two must be fastest, but it's gonna be a mighty long transaction. You should look into making a stored procedure doing the copy. That way you could offload some of the data parsing/handling from the MySQL engine.
MySQL's load data query is faster than almost anything else, however it requires exporting each table to a CSV file.
Pay particular attention to escape characters and representing NULL values/binary data/etc in the CSV to avoid data loss.
If possible, the fastest way will be to take the database offline and simply copy data files on disk.
Of course, this have some requirements:
you can stop the database while copying.
you are using a storage engine that stores each table in individual files, MyISAM does this.
you have privileged access to the database server (root login or similar)
Ah, I see you have edited your post, then I think this DBA-from-hell approach is not an option... but still, it's fast!
The best way i find so far is creating the files as dump files(.txt), by using the outfile to a text then using infile in mysql to get the same data to the database