MySQL Fastest Way To Import 125000 line CSV? - mysql

This is my first time working with MySQL besides a few basic queries on an existing DB, so I'm not great at troubleshooting this.
I have a CSV with 125,000 records that I want to load into MySQL. I got version 8 installed along with workbench. I used the Import Wizard to load my CSV and it started importing. The problem is that it was ~5 hours to get to 30,000 records. From what I read this is a long time and there should be a faster way.
I tried LOAD DATA INFILE but got an error regarding secure-file-priv so I went looking to solve that. The configuration appear to be off for secure-file-priv but it keeps popping up as the error. Now I'm getting "Access denied" errors so I'm just stuck.
I am the admin on this machine and this data doesn't mean anything to anyone so security isn't a concern. I just want to learn how to do this.
Is LOAD DATA INFILE the best way to load his amount of data?
Is 20 hours too long for 125000 records?
Anyone have any idea what I'm doing wrong?

You don't need to set secure-file-priv if you use LOAD DATA LOCAL INFILE. This allows the client to read the file content on the computer where the client runs, so you don't have to upload the file to the designated directory on the database server. This is useful if you don't have access to the database server.
But the LOCAL option is disabled by default. You have to enable it in both server and client with the local-infile option in my.cnf on the server, and also using in the MySQL client by using mysql --local-infile.
In addition, your user must be granted the FILE privilege to load files into a table. See https://dev.mysql.com/doc/refman/8.0/en/privileges-provided.html
Once it's working, LOAD DATA INFILE should be the fastest way to bulk-load data. I did a bunch of comparative speed tests for a presentation Load Data Fast!
You may also have some limiting factors with respect to MySQL Server configuration options, or even performance limitations with respect to the computer hardware.
I think the 5 hours for 30k records is way too long even on modest hardware.
I tested on a Macbook with builtin SSD storage. Even in my test designed to be as inefficient as possible (open connection, save one row using INSERT, disconnect), I still was able to insert 290 rows/second, or 10k rows in 34 seconds. The best result was using LOAD DATA INFILE, at a rate of close to 44k rows/second, loading 1 million rows in 22 seconds.
So something is severely underpowered on your database server, or else the Import Wizard is doing something so inefficient I cannot even imagine what it could be.

Related

Large data upload using Scala JDBC

I am trying to implement a function that uploads around 40 million records to a MySQL database that is hosted on AWS. However, my write statement gets stuck at 94% for an infinitely long time.
This is the command I'm using to upload df_intermediate.write.mode("append").jdbc(jdbcUrl, "user", connectionProperties) with rewriteBatchedStatements and useServerPrepStmts enabled in the connection properties.
This statement works for small number of points(50000) but is unable to handle this large amount. I've also increased the maximum number of connections on the MySQL side.
EDIT: I'm running this on GCP n1-standard-16 machines.
Why could be the reasons that write is stuck at 94%?
I don't think this has anything to do with Scala really, you are just saying you want to add many many rows into a DB. The quick answer would be to not have all these in the one transaction, and to commit these lets say 100 at a time. Try this on a non-production sql database first to see if that works.

Seeking a Faster way to update MySql database

I have a database with data that is read-only as far as the application using it is concerned. However, different groups of tables within the database need to be refreshed weekly, or monthly (the organization generating the data provides entirely new flat files for this purpose). Most of the updates are small but some are large (more than 5 million rows with a large number of fields). I like to load the data to a test database and then just replace the entire database in production. So far I have been doing this by exporting the data using mysqldump and then importing it into production. The problem is that the import takes 6-8 hours and the system is unusable during that time.
I would like to get the downtime as short as possible. I’ve tried all the tips I could find to speed mysqldump, such as those listed here: http://vitobotta.com/smarter-faster-backups-restores-mysql-databases-with-mysqldump/#sthash.cQ521QnX.hpG7WdMH.dpbs. I know that many people recommend Percona’s XtraBackup, but unfortunately I’m on a Windows 8 Server and Perconia does not run on Windows. Other fast backups/restore options are too expensive (e.g., MySql Enterprise). Since my test server and production server are both 64 bit Windows machines and are both running the same version of MySql,(5.6) I thought I could just zip up the database files and copy them over to swap out the whole database at once (all are innodb). However, that didn’t work. I saw the tables in MySql Workbench, but couldn’t access them.
I’d like to know if copying the database files is a viable option (I may have done it wrong) and if it is not, then what low cost options are available to reduce my downtime?

mysql huge operations

I am currently importing a huge CSV file from my iPhone to a rails server. In this case, the server will parse the data and then start inserting rows of data into the database. The CSV file is fairly large and would take a lot time for the operation to end.
Since I am doing this asynchronously, my iPhone is then able to go to other views and do other stuff.
However, when it requests another query in another table.. this will HANG because the first operation is still trying to insert the CSV's information into the database.
Is there a way to resolve this type of issue?
As long as the phone doesn't care when the database insert is complete, you might want to try storing the CSV file in a tmp directory on your server and then have a script write from that file to the database. Or simply store it in memory. That way, once the phone has posted the CSV file, it can move on to other things while the script handles the database inserts asynchronously. And yes, #Barmar is right about using an InnoDB engine rather than MyISAM (which may be default in some configurations).
Or, you might want to consider enabling "low-priority updates" which will delay write calls until all pending read calls have finished. See this article about MySQL table locking. (I'm not sure what exactly you say is hanging: the update, or reads while performing the update…)
Regardless, if you are posting the data asynchronously from your phone (i.e., not from the UI thread), it shouldn't be an issue as long as you don't try to use more than the maximum number of concurrent HTTP connections.

what will happen when load data fails

I am doing load data from CSV-file in my master database.
My CSV-file size is 160 GB.
My questions:
Will Load data load in the slave parallely or not?
If disk gets full in master, will the entire process roll back or partially fail?
Please help.
in my personal experience when you use the import function it will fire a query for each of your line in csv... that means if you found problem on any of the lines or disks get full it will not be roll backed... so it is partially fail
& one doubt is also there that it will allow your csv to execute or not because of the size...

LOAD DATA not available; fgetcsv times out; alternatives?

I have a site where a CSV of racehorse data is to be uploaded once a week. The CSV contains the details of about 19,000 racehorses currently registered in the UK and is about 1.3MB in size, on average. I have a script that processes that csv and either updates the horse if it exists and the ratings data has changed, or adds it if it doesn't exist. If a horse is unchanged, it skips to the next one. The script works, as it was running on the host I use as a test. It took 5 or 6 minutes to run (less than ideal, I know), but it worked.
However, we're now testing on the staging version of the client's host, and it's running for 15 minutes and then returning a 504 timeout. We've tweaked htaccess and php.ini settings as much as we're able ... no joy.
The host is in a shared environment, so they tell me that MySQL's LOAD DATA is unavailable to us.
What other alternative approaches would you try? Or is there a way of splitting the CSV into chunks and running a process on each one in turn, for example?