Talend job truncate records when i killed the job and run again - mysql

I am using the Talend open studio for Data Integration tool for transfer sql server table data to mysql server database.
I have a 40 million records into the table.
I created and run the job but after inserting 20 million approx, connection failed.
When i Tried again to insert the Data then talend job firstly truncate the data from the table then it is inserting data from the beginning,

The question seems to be incomplete, but assuming that you want the table not to truncate before each load, check the "Action on table" property. It should be set as "Default" or "Create Table if does not exist".
Now, if you're question is to handle restart-ability of the job where the job should resume from 20 million rows on the next run, there are multiple ways you could achieve this. In your case since you are dealing with high number of records, having a mechanism like pagination would help in which you load the data in chunks (lets say 10000 at a time) and loop it setting the commit interval as 10000. After each successful entry in the database of 10000 records, make an entry into one log table with the timestamp or incremental key in your data (to mark the checkpoint). Your job should look something like this:
tLoop--{read checkpoint from table}--tMSSqlInput--tMySqlOutput--{load new checkpoint in table}

You can set the a property in context variable, say 'loadType' which will have value either 'initial' or 'incremental'.
And before truncating table you should have 'if' link to check what is the value of this variable, if it is 'initial' it will truncate and it is 'incremental' then you can run your subjob to load data.

Related

Table Renaming in an Explicit Transaction

I am extracting a subset of data from a backend system to load into a SQL table for querying by a number of local systems. I do not expect the dataset to ever be very large - no more than a few thousand records. The extract will run every two minutes on a SQL2008 server. The local systems are in use 24 x 7.
In my prototype, I extract the data into a staging table, then drop the live table and rename the staging table to become the live table in an explicit transaction.
SELECT fieldlist
INTO Temp_MyTable_Staging
FROM FOOBAR;
BEGIN TRANSACTION
IF(OBJECT_ID('dbo.MyTable') Is Not Null)
DROP TABLE MyTable;
EXECUTE sp_rename N'dbo.Temp_MyTable_Staging', N'MyTable';
COMMIT
I have found lots of posts on the theory of transactions and locks, but none that explain what actually happens if a scheduled job tries to query the table in the few milliseconds while the drop/rename executes. Does the scheduled job just wait a few moments, or does it terminate?
Conversely, what happens if the rename starts while a scheduled job is selecting from the live table? Does transaction fail to get a lock and therefore terminate?

How to achieve zero downtime in ETl

I have an ETL process which takes data from transaction db and keeps after processing stores the data to another DB. While storing the data we are truncating the old data and storing new data to have better performance, as update takes a lot of time than truncate insert. So in this process we experience counts as 0 or wrong data for some time (like for 2 3 mins). We are running the ETL in every 8 hours.
So how can we avoid this problem? How can we achieve zero downtime?
One way we did use in the past was to prepare the prod data in a table named temp. Then when finished (and checked, that was the lengthy part in our process), drop prod and rename the temp in prod.
Takes almost no time, and the process was successful even in case some other users were locking the table.

Inserting 1 million records in mysql

I have two tables and in both tables I get 1 million records .And I am using cron job every night for inserting records .In first table I am truncating the table first and then inserting the records and in second table I am updating and inserting record according to primary key. I am using mysql as my database.My problem is I need to do this task each day but I am unable to insert all data .So what can be the possible solution for this problem
Important is to set off all kind of actions and checks MySQL wants to perform when posting data, like autocommit, indexing, etc.
https://dev.mysql.com/doc/refman/5.7/en/optimizing-innodb-bulk-data-loading.html
Because if you do not do this, MySQL does a lot of work after every record added, and it adds up, when the process is proceeding, resulting in a very slow processing and importing in the end, and may not complete in one day.
If you must use MySql : For the first table, disable the indexes, do the inserts, than enable indexes. This will works faster.
Alternatively MongoDb will be faster, and Redis is very fast.

Best way to insert and query a lot of data MySQL

I have to read approximately 5000 rows of 6 columns max from a xls file. I'm using PHP >= 5.3. I'll be using PHPExcel for that task. I haven't try it but I think it can handle (If you have other options, they are welcome).
The issue is that every time I read a row, I need to query the database to verify if that particular row exists, If it does, then overwrite it, If not, then add it.
I think that's going to take a lot of time and PHP will just simply timeout ( I can't modify the timeout variable since it's a shared server).
Could you give me a hand with this?
Appreciate your help
Since you're using MySQL, all you have to do is insert data and not worry about a row being there at all.
Here's why and how:
If you query a database from PHP to verify a row exists, that's bad. Reason it's bad is because you are prone to getting false results. There's a lag between PHP and MySQL, and PHP can't be used to verify data integrity. That's the job of the database.
To ensure there are no duplicate rows, we use UNIQUE constraints on our columns.
MySQL extends SQL standard using INSERT INTO ... ON DUPLICATE KEY UPDATE syntax. That lets you just insert data, and if there's a duplicate row - you can just update it with new data.
Reading 5000 rows is quick. Inserting 5000 is also quick, if you wrap it in a transaction. I would suggest reading 100 rows from the excel file, starting a transaction and just insert data (using ON DUPLICATE KEY UPDATE to handle duplicates). That will let you spend 1 I/O of your hard drive to save 100 records. Doing so, you can finish this whole process in a few seconds, which lets you not to worry about performance or timeouts.
At first run this process via exec, and timeout has no matter
At second, select all rows before read excel file. Select not at one query, read 2000 rows at time for example, and collect it to array.
At third use xlsx format and chunkReader, that allows read not whole file.
It's not 100% garantee, but i did the same.

What is the most efficient way to insert a large number of rows from MS SQL to MySQL?

I need to transfer a large number of rows from a SQL Server database to a MySQL db (MyISAM) on a daily basis. I have the MySQL instance set-up as a linked server.
I have a simple query which returns the rows that need to be transferred. The number of rows will grow to approximately 40,000 rows, each row has only 2 char(11) columns. This should equate to roughly 0.8MB in size. The execution time of the SELECT query is negligible. I'll be truncating the MySQL table prior to inserting the records each day.
I'm using an INSERT...SELECT query into the MySQL linked server to perform the insert. With 40,000 rows it takes approximately 40 seconds. To see how long it would take to move that number of rows from one MySQL table to another I executed a remote query from MSSQL and it took closer to 1 second.
I can't see much of what's going on looking at the execution plan in SSMS but it appears as though an INSERT statement is being executed for every one of the rows rather than a single statement to insert all of the rows.
What can I do to improve the performance? Is there some way I can force the rows to be inserted into MySQL in a single statement if that is what's going on?
LOAD DATA INFILE is much faster in MySQL than INSERT. If you can set up your MS SQL server to output a temporary CSV output file, you can then pull it in to MySQL either with the commandline mysqlimport tool, or with LOAD DATA INFILE in a MySQL SQL statement.
The problem is that the table you are selecting from is on the local server and the table you are inserting to is on the remote server. As such the linked server is going to have to translate each row into a INSERT INTO Table (Field1, Field2) VALUES ('VALUE1','VALUE2') or similar on the MySQL server. What you could do is to keep a checksum on each row in the SQL server. Instead of truncating and reinserting the entire table you can simply delete and reinsert changed and new records. Unless most of your records change every day this should cut the amount of data you have to transfer down enourmously without having to mess about exporting and reimporting text files.
I am not sure whether that makes it faster but a bulk download and upload would be the alternative.
On the mySQL side you could do a LOAD DATA INFILE
Don't know how to unload it on SQL Server side but there is probably something similar.
dump into a file and then user LOAD DATE INFILE
data inserts from a file are much quicker