I have created 4 tables (a,b,c,d) in hive and created a view (x) on top of that tables by joining them.
-- How can i export the x underlying csv data from hdfs to local ?
-- How can i keep this csv in hdfs
for tables , we can do show create table a;
this will show the location of the hdfs where the underlying csv is stored.
hadoop fs get --from source_path_and_file --to dest_path_and_file
similarly how can i get the csv data from view into my local.
You can export view data to the CSV using this:
insert overwrite local directory '/user/home/dir' row format delimited fields terminated by ',' select * from view;
Concatenate files in the local directory if you need single file using cat :
cat /user/home/dir/* > view.csv
Alternatively if the dataset is small, you can add order by in the query, this will trigger single reducer and produce single ordered file. This will perform slow if the dataset is big.
1) to write your results in file you can use INSERT OVERWRITE as below:
insert overwrite local directory '/tmp/output'
row format delimited
fields terminated by '|'
select * from <view>;
2) If you want to write a file into HDFS then use above insert overwrite statement with local
3) No separate HDFS location for views.
View are purely logical construct from the table and there is no separate underlying storage created for them in HDFS.
Views are being used when you want to store intermediate results and query them directly instead of writing complex query on that table again and again. It's like we use with blocks in our query.
Related
I have created an empty table A in MySQL and inserted data from a local csv file, A.csv, using LOAD DATA LOCAL INFILE. Now I want to create more tables just like A and insert data from other different local csv files (they all have the same fields).
So for table B it will be something like:
CREATE TABLE B LIKE A;
LOAD DATA LOCAL INFILE 'mypath/B.csv'
...
I need to repeat this process for about 20 tables. How can I write a loop procedure to automate the process?
Any help on this would be much appreciated!
I have multiple csv files with details of people. I copy this into HDFS using the -copyFromLocal command and I view it using Hive table. But now my new use case is that these csv files in my local gets updated daily I want these data to be updated in HDFS just like the way Sqoop Inceremental import works which copies data from RDBMS to HDFS. Is there any way to do it and suggest me how to do it.
Assuming every file contains the same fields.
Create a single top level HDFS directory, put date partitions for every day
/daily_import
/day=20180704
/file.csv
/day=20180705
/file.csv
Then define a table over it
CREATE EXTERNAL TABLE daily_csv (
...
) PARTITIONED BY (`day` STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' STORED AS TEXT -- Use CsvSerde instead!
LOCATION '/daily_import'
;
Then every day after you copy files into the appropriate HDFS location, execute a metastore refresh for new partitions
MSCK REPAIR TABLE daily_csv;
I have two CSV-files that I uploaded to the Azure Blob Storage within HDInsight. I can upload these two files to the cluster without problems. I then create two Hive-tables with...
CREATE EXTERNAL TABLE IF NOT EXISTS hive_table1(id int, age string, date string...)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;' STORED AS TEXTFILE LOCATION '/user/hive/warehouse'
Similar syntax goes for the other table.
Now I want to load the first CSV-file into the first table and the second CSV-file into the second table (resulting in non-corresponding columns).
I use...
LOAD DATA INPATH '/file/file1.csv' OVERWRITE INTO TABLE hive_table1;
...and am able to load the CSV-file data into the first table. But..., not only is the first data set loaded into the first Hive table, it also loads the exact same file's data into the second Hive table.
Obviously, I only want to have the first data set loaded into one table and the second distinct data set only into the other table.
Can anyone help pointing out errors or contribute with a possible solution?
Thanks in advance.
It looks like you just need to specify a different 'LOCATION' for the second table. When you do the 'LOAD DATA', Hive is actually copying data into that path. If both tables have the same 'LOCATION', they will share the same data.
Your location is what creating problem. You have given same location for both the tables. As the tables are external the file will be created directly under your path.
Also LOAD DATA INPATH '/file/file1.csv' OVERWRITE INTO TABLE hive_table1; will overwrites the already existing file. This is what happening with your tables. As Farooque mentioned for different tables the location should be unique to get the desired results.
I see you are creating external table and creating 2 tables having single files each.
You have to follow the simple steps as below:
Create table
CREATE EXTERNAL TABLE IF NOT EXISTS hive_table1(id int, age string, date string...)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' STORED AS TEXTFILE LOCATION '/user/hive/warehouse/table1_dir/'
Copy file to HDFS location
hdfs dfs -put '/file/file1.csv' '/user/hive/warehouse/table1_dir/'
Similary for second table
Create table
CREATE EXTERNAL TABLE IF NOT EXISTS hive_table2(id int, age string, date string...)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' STORED AS TEXTFILE LOCATION '/user/hive/warehouse/table2_dir/'
Copy file to HDFS location
hdfs dfs -put '/file/file2.csv' '/user/hive/warehouse/table2_dir/'
Note: If you are using more than one table, then their location should be unique.
We want our users to be able to upload CSV files into our software and have the files be put into a temporary table so they can be analyzed and populated with additional data (upload process id, user name, etc).
Currently our application allows users to upload files into the system, but the files end up as TEXT values in a MySQL table (technically BLOB, but for our purposes, I will call it TEXT, as the only type of files I am concerned with are CSVs).
After a user uploads the CSV and it becomes a TEXT value, I want to take the TEXT value and interpret it as a CSV import, with multiple columns, to populate another table within MySQL without using a file output.
A simple insert-select into won't work as the TEXT is parsed as one big chunk (as it should be) instead of multiple columns.
insert into db2.my_table (select VALUE from db1.FILE_ATTACHMENT where id = 123456)
Most examples I have found export data from the DB as a file, then import it back in, i.e. something like:
SELECT VALUE INTO OUTFILE '/tmp/test.csv'
followed by something like:
LOAD DATA INFILE '/tmp/test.csv' INTO TABLE db2.my_table;
But I would like to do the entire process within MySQL if possible, without using the above "SELECT INTO OUTFILE/LOAD DATA INFILE" method.
Is there a way to have MySQL treat the TEXT value as multiple columns instead of one big block? Or am I stuck exporting to a file and then re-importing?
Thanks!
There is flaw in your data load approach.
Instead of keep each row in a single column, keep each value in respective column.
For example, suppose csv file contains n column. Create a table with n columns.
LOAD DATA INFILE '/tmp/test.csv'
INTO TABLE table_name
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; // for ignoring header if present in csv file.
I have tables that are on different mysql instances. I want to export some data as csv from a mysql instance, and perform a left join on a table with the exported csv data. How can I achieve this?
Quite surprisingly that is possible with MySQL, there are several steps that you need to go through.
First create a template table using CSV engine and desired table layout. This is the table into which you will import your CSV file. Use CREATE TABLE yourcsvtable (field1 INT NOT NULL, field2 INT NOT NULL) ENGINE=CSV for example. Please note that NULL values are not supported by CSV engine.
Perform you SELECT to extract the CSV file. E.g. SELECT * FROM anothertable INTO OUTFILE 'temp.csv' FIELDS TERMINATED BY ',';
Copy temp.csv into your target MySQL data directory as yourcsvtable.CSV. Location and exact name of this file depends on your MySQL setup. You cannot perform the SELECT in step 2 directly into this file as it is already open - you need to handle this in your script.
Use FLUSH TABLE yourcsvtable; to reload/import the CSV table.
Now you can execute your query against the CSV file as expected.
Depending on your data you need to ensure that the data is correctly enclosed by quotation marks or escaped - this needs to be taken into account in step 2.
CSV file can be created by MySQL on some another server or by some other application as long as it is well-formed.
If you export it as CSV, it's no longer SQL, it's just plain row data. Suggest you export as SQL, and import into the second database.