I am really quite the amateur. I am trying to automate the import of csv data into a table that resides in Hadoop. The csv file would reside in a server. I have been googling, it seems that i would have to write a shell script to upload the csv file into HDFS and then write a hive script to import the csv into the table. All the scripts can be dumped to Oozie in a workflow to automate this. Is this right? Is there a better way? Could someone point me towards the right track.
To put a file to hdfs :
hadoop fs -put /here/the/local/file.csv /here/the/destination/in/HDFS
To create a Hive table base on a csv :
CREATE TABLE yourTable(Field1 INT, Field2 String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 'youSeparator';
And once you have created your table :
LOAD DATA INPATH 'HDFS/Path/To:YourFile.csv' INTO TABLE yourTable;
And yes you can do it with a Oozie Workflow or in Java for example ...
The way I have been doing it is with an sql file and a cron job. The sql consists of loading the data into a table, then doing some other operations on it as needed.
The file consists of the same sql you would input into the Hive CLI. You run it from the command line (or as a cron job) with 'hive -f '.
Hope that helps.
Related
The Sakila database comes with a schema.sql file, a data.sql file and a sakila.mwb file. To load the Sakila dataset in Workbench I first load the schema, then the data, and then open the .mwb file.
Is this the only way you can query in Workbench? Do you always have to import a schema, import data and then open a .mwb file?
For example, I want to be able to go on data.gov, download an XLS, HTML or CSV file (not that I really know how to work with all of these, but ultimately I'd like to) and create a database out of them. Do I have to do my own exploration on their format, create my own schema/model (do these mean the same thing?) and then somehow tell Workbench how to fill in each table in my schema with entries form the CSV/CLS/HTML file?
You can do something like:
1- Create table
CREATE TABLE test(id INT, value1 VARCHAR(255), value2 DECIMAL(8,4));
2- Maybe you have text or other file that have format like :
1 tareq 1.2
2 sarah 2.3
3 Hany 4.5
3- Use the load data command, to import that file into MySql table:
LOAD DATA LOCAL INFILE 'E:/data.txt'
INTO TABLE test COLUMNS TERMINATED BY '\t';
Note that here, values in data file should be delimited by a TAB, and new line after each row.
I have multiple csv files with details of people. I copy this into HDFS using the -copyFromLocal command and I view it using Hive table. But now my new use case is that these csv files in my local gets updated daily I want these data to be updated in HDFS just like the way Sqoop Inceremental import works which copies data from RDBMS to HDFS. Is there any way to do it and suggest me how to do it.
Assuming every file contains the same fields.
Create a single top level HDFS directory, put date partitions for every day
/daily_import
/day=20180704
/file.csv
/day=20180705
/file.csv
Then define a table over it
CREATE EXTERNAL TABLE daily_csv (
...
) PARTITIONED BY (`day` STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' STORED AS TEXT -- Use CsvSerde instead!
LOCATION '/daily_import'
;
Then every day after you copy files into the appropriate HDFS location, execute a metastore refresh for new partitions
MSCK REPAIR TABLE daily_csv;
I have 1 corer records of csv file but not having rowkey or auto-increment id and I want to upload that file into hbase table. But I can not do with hbase command. I have used following command for it
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=cf:mobile test1 hdfs://master.ambari.com:8020/user/root/www/dnd_20170705_0.csv
but I am getting following error.
ERROR: Must specify exactly one column as HBASE_ROW_KEY
Do you have any idea about it or any other way I can push data into hbase table.
is there any way that I can upload an xlsx-file to my mysql database automatically every 12 hours?
I have an xlsx-file with around 600 rows. The target table already exists.
I would like to perform the following steps:
1. Delete the content of the existing table.
2. Insert the data from the xlsx-file.
This should be performed every 12 hours. Is there a way doing this without using php?
Thanks in advance.
Yes. You can use LOAD DATA LOCAL INFILE provided that the file is in CSV foremat else convert the file to CSV format.
Delete the content of the existing table.
Before you do so take a backup of the table. You can create a backup intermediary table and insert the data there.
Insert the data from the xlsx-file.
use LOAD DATA INFILE and import the data.
This should be performed every 12 hours.
You can create a SQL script with all this steps. Create a scheduled task (Windows) which runs every 12 hour.
You can do it using Data Import tool in dbForge Studio for MySQL (command-line mode).
How to:
Create data-import template file: open Data Import master, select target table, check Repopulate mode (delete all + inserts), and save template file.
Use created template to import your file in command-line mode. Use Windows Scheduled Tasks to run it periodically.
I'm a newbie here trying to import some data into my wordpress database (MySQL) and I wonder if any of you SQL experts out there can help?
Database type: MySQL
Table name: wp_loans
I would like to completely replace the data in table wp_loans with the contents of file xyz.csv located on a remote server, for example https://www.mystagingserver.com/xyz.csv
All existing data in the table should be replaced with the contents of the CSV file.
The 1st row of the CSV file is the table headings so can be ignored.
I'd also like to automate the script to run daily at say 01:00 in the morning if possible.
UPDATE
Here is the SQL I'm using to try and replace the table contents:
LOAD DATA INFILE 'https://www.mystagingserver.com/xyz.csv'
REPLACE
INTO TABLE wp_loans
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
IGNORE 1 LINES
I would recommend a cron job to automate the process, and probably use BCP (bulk copy) to insert the data into a table... But seeing as you are using MySQL, instead of BCP, try load data in file - https://mariadb.com/kb/en/load-data-infile/