what will happen when load data fails - mysql

I am doing load data from CSV-file in my master database.
My CSV-file size is 160 GB.
My questions:
Will Load data load in the slave parallely or not?
If disk gets full in master, will the entire process roll back or partially fail?
Please help.

in my personal experience when you use the import function it will fire a query for each of your line in csv... that means if you found problem on any of the lines or disks get full it will not be roll backed... so it is partially fail
& one doubt is also there that it will allow your csv to execute or not because of the size...

Related

MySQL Fastest Way To Import 125000 line CSV?

This is my first time working with MySQL besides a few basic queries on an existing DB, so I'm not great at troubleshooting this.
I have a CSV with 125,000 records that I want to load into MySQL. I got version 8 installed along with workbench. I used the Import Wizard to load my CSV and it started importing. The problem is that it was ~5 hours to get to 30,000 records. From what I read this is a long time and there should be a faster way.
I tried LOAD DATA INFILE but got an error regarding secure-file-priv so I went looking to solve that. The configuration appear to be off for secure-file-priv but it keeps popping up as the error. Now I'm getting "Access denied" errors so I'm just stuck.
I am the admin on this machine and this data doesn't mean anything to anyone so security isn't a concern. I just want to learn how to do this.
Is LOAD DATA INFILE the best way to load his amount of data?
Is 20 hours too long for 125000 records?
Anyone have any idea what I'm doing wrong?
You don't need to set secure-file-priv if you use LOAD DATA LOCAL INFILE. This allows the client to read the file content on the computer where the client runs, so you don't have to upload the file to the designated directory on the database server. This is useful if you don't have access to the database server.
But the LOCAL option is disabled by default. You have to enable it in both server and client with the local-infile option in my.cnf on the server, and also using in the MySQL client by using mysql --local-infile.
In addition, your user must be granted the FILE privilege to load files into a table. See https://dev.mysql.com/doc/refman/8.0/en/privileges-provided.html
Once it's working, LOAD DATA INFILE should be the fastest way to bulk-load data. I did a bunch of comparative speed tests for a presentation Load Data Fast!
You may also have some limiting factors with respect to MySQL Server configuration options, or even performance limitations with respect to the computer hardware.
I think the 5 hours for 30k records is way too long even on modest hardware.
I tested on a Macbook with builtin SSD storage. Even in my test designed to be as inefficient as possible (open connection, save one row using INSERT, disconnect), I still was able to insert 290 rows/second, or 10k rows in 34 seconds. The best result was using LOAD DATA INFILE, at a rate of close to 44k rows/second, loading 1 million rows in 22 seconds.
So something is severely underpowered on your database server, or else the Import Wizard is doing something so inefficient I cannot even imagine what it could be.

loading 20 million records from SSIS to SNOWFLAKE through ODBC

I am trying to load around 20 million records from ssis to snowflake using ODBC connection, this load is taking forever to complete. I there any faster method than using ODBC? I can think of loading it into flat file and then using flat file to load into snowflake but sure how to do it.
Update:
i generated a text file using bcp and the put that file on snowflake staging using ODBC connection and then using copy into command to load the data into tables.
issue: the txt file generated is a 2.5gb file and the ODBC is struggling to send the file to snowflake stage any help on this part??
It should be faster to write compressed objects to the cloud provider's object store (AWS S3, Azure blob, etc.) and then COPY INTO Snowflake. But also more complex.
You are, by chance, not writing one row at a time, for 20,000,000 database calls?
ODBC is slow on a database like this, Snowflake (and similar columnar warehouses) also want to eat shred files, not single large ones. The problem with your original approach was no method of ODBC usage is going to be particularly fast on a system designed to load nodes in parallel across shred staged files.
The problem with your second approach was no shred took place. Non-columnar databases with a head node (say, Netezza) would like and eat and shred your single file, but a Snowflake or a Redshift are basically going to ingest it as a single thread into a single node. Thus your ingest of a single 2.5 GB file is going to take the same amount of time on an XS 1-node Snowflake as an L 8-node Snowflake cluster. Your single node itself is not saturated and has plenty of CPU cycles to spare, doing nothing. Snowflake appears to use up to 8 write threads on a node basis for an extract or ingest operation. You can see some tests here: https://www.doyouevendata.com/2018/12/21/how-to-load-data-into-snowflake-snowflake-data-load-best-practices/
My suggestion would be to make at least 8 files of size (2.5 GB / 8), or about 8 315MB files. For 2-nodes, at least 16. Likely this involves some effort in your file creation process if it is not natively shredding and horizontally scaling; although as a bonus it's breaking up your data into easier bite sized processes to abort/resume/etc should any problems occur.
Also note that once the data is bulk insert into Snowflake it is unlikely to be optimally placed to take advantage of the benefits of micro-partitions - so I would recommend something like rebuilding the table with the loaded data and at least sorting it on an oft restricted column, ie. a fact table I would at least rebuild and sort by date. https://www.doyouevendata.com/2018/03/06/performance-query-tuning-snowflake-clustering/
generate the file and then use Snow CLI to Put it in the internal Stage. Use Cooy into for stage->table. Some coding to do, and you can never avoid transporting GB over the net, but Put coukd compress and transfer the file in chunks

Snowflake - Putting large file into internal Snowflake Stage

I am currently trying to upload a large, unzipped, CSV file into an internal snowflake stage. The file is 500 gb. I ran the put command, but it doesn't look like much is happening. There is no status update, it's just kind of hanging there.
Any ideas what's going on here? Will this eventually time out? Will it complete? Anyone have an estimated time?
I am tempted to try and kill it somehow. I am currently splitting the large 500 gb file up into about 1000 smaller files that I'm going to zip up and upload in parallel (after reading more on best practices).
Unless you've specified auto_compress=FALSE, then step 1 in the PUT is compressing the file, which may take some time on 500GB...
Using parallel=<n> will automatically split the files into smaller chunks and upload them in parallel - you don't have to split the source file yourself. (But you can if you want to...)
Per snowflake suggestion please split the file into multiple small file, then stage your file into snowflake internal stage.(By default snowflake will compress file)
Then try run copy command with multi-cluster warehouse, Then you will see the performance of snowflake.

TIMEOUT in Laravel

So, i have to read a excel file in which each row contains some data that i want do write in my database. I pass the whole file to laravel, it reads the file and format it to a array and then i make a new insertion (or update) in my databse.
The thing is, the input excel file can contain thousands of rows and its taking a while to complete, giving a timeout error in some cases.
When i try to make this locally i use set_time_limit(0); function so timeout doesnt occur, and it works pretty wel. But in a remote server this function is disabled for security reasons and my code crashes because of a timeout.
Somebody can help in how to solve this problem ? Maybe another ideia in how to better solve this problem ?
A nice way to handle tasks that take a long time is by making use of so called jobs.
You can make a job called ImportExcel and dispatch it when someone send you a file.
Take a good look at the docs, they have some great examples on how to do this.
You can take care of this using following steps :
1. Take the csv file and store it temporarily in storage :
You can store the large csv when user uploads. If it's something which is not uploaded from frontend, just make sure you have it saved to be processed in next step.
2. Then dispatch a job which can be queued :
You can create a job which can handle this asynchronously. You can use Supervisor to manage queues and timeouts etc.
3. Use package like thephpleague :
Using this package(or similar), you can chunk the records or read one at a time. It is really really helpful to keep your memory usage under limit. Plus it has different options of methods available to read the data from files.
4. Once file is processed, you can delete it from the temporary storage :
Just some teardown cleanup activity.

mysql huge operations

I am currently importing a huge CSV file from my iPhone to a rails server. In this case, the server will parse the data and then start inserting rows of data into the database. The CSV file is fairly large and would take a lot time for the operation to end.
Since I am doing this asynchronously, my iPhone is then able to go to other views and do other stuff.
However, when it requests another query in another table.. this will HANG because the first operation is still trying to insert the CSV's information into the database.
Is there a way to resolve this type of issue?
As long as the phone doesn't care when the database insert is complete, you might want to try storing the CSV file in a tmp directory on your server and then have a script write from that file to the database. Or simply store it in memory. That way, once the phone has posted the CSV file, it can move on to other things while the script handles the database inserts asynchronously. And yes, #Barmar is right about using an InnoDB engine rather than MyISAM (which may be default in some configurations).
Or, you might want to consider enabling "low-priority updates" which will delay write calls until all pending read calls have finished. See this article about MySQL table locking. (I'm not sure what exactly you say is hanging: the update, or reads while performing the update…)
Regardless, if you are posting the data asynchronously from your phone (i.e., not from the UI thread), it shouldn't be an issue as long as you don't try to use more than the maximum number of concurrent HTTP connections.