I am trying to load data from excel files into a table in MySql. There are 400 excel files in .xlsx format.
I have successfully ingested one file into the table but the problem is that involves manually converting excel file into a csv file, saving it on a location and then running a query to load using LOAD LOCAL INFILE. How to do it for rest of the files.
How to load all the 400 .xlsx files in a folder without converting them manually to .csv files and then running the ingestion query one by one on them.Is there a way in MySql to do that. For example, any FOR Loop that goes through all the files and ingest them in the table.
Try bulk converting your XLSXs into CSVs using in2csv as found in csvkit.
## single file
in2csv file.xlsx > file.csv
## multiple files
for file in *.xlsx; do in2csv $file > $file.csv; done
Then import data into MySQL using LOAD LOCAL INFILE...
From loading multiple CSVs, use for file in *.csv; do... or see How to Import Multiple csv files into a MySQL Database.
Related
I am trying to load a compressed file which contain multiple CSV files into Redshift. I followed AWS documentation Loading Compressed Data Files from Amazon S3. However, I am not sure if I will be able to do following:
I have multiple CSV files for a table:
table1_part1.csv
table1_part2.csv
table1_part3.csv
I compressed these three files into one table1.csv.gz.
Can I load this gzip file into Redshift table using COPY command?
No you cannot; but using copy command you can give a folder name (containing all zip files) or wild card .. So just dont zip them into one file but independent files will work fine.
You could achieve by creating a Menifest file that should have link of all your CSV files and just specify the Menifest file in your copy command like-
copy customer
from 's3://mybucket/cust.manifest'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
manifest;
See Menifest at end.
For more details refer Amazon Red-Shift Documentation. Section "Using a Manifest to Specify Data Files".
I am trying to load data from files to MySQL, but the files don't have an extension. How can i load the files without specifying the file extension?
Use LOAD DATA INFILE, it doesn't require any file extension. You just name your file.
Both of the following would work fine:
LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;
LOAD DATA INFILE 'data' INTO TABLE db2.my_table;
The command-line equivalent of LOAD DATA INFILE, mysqlimport, says this in its doc:
For each text file named on the command line, mysqlimport strips any extension from the file name and uses the result to determine the name of the table into which to import the file's contents. For example, files named patient.txt, patient.text, and patient all would be imported into a table named patient.
I believe we can't pass a directory to LOAD DATA INFILE without specifying the complete file name . I did a merge of all the files to a single file and then ingested the data
I have a folder with 620 file I need to load all to neo4j with one load command is this is possible
You can concatenate all the files in that folder into a single file (e.g., cat * >all_data.csv) and load that single file.
The import-tool supports reading from multiple CSV source files, even using a regex pattern to select them. See http://neo4j.com/docs/operations-manual/current/#import-tool
I have a few large zip files on S3. Each of these zip files contains several gz files, which contain data in JSON format. I need to (i) Copy the gz files to HDFS and (ii) Process the files preferably by Apache Spark/Impala/Hive. What is the easiest/best way of going about it?
1) Try distcp for copying files from s3 to HDFS
2) For processing, use "org.apache.spark.sql.hive.HiveContext"'s read.json for reading JSON data from HDFS and create dataframe.
Then do any operation on it.
Follow this link,
http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes
I have a csv file which is not loading data. But it loads when i
1) open the file and save it in the same location.
2) delete any of the empty records or non empty records.
I am receiving file from ftp source. Is there any reason that why SSIS package is not loading data without doing any of the above steps.