I have a csv file which has contents like this.
"DepartmentID","Name","GroupName","ModifiedDate"
"1","Engineering","Research and Development","2008-04-30 00:00:00"
I have
create external table if not exists AdventureWorks2014.Department
(
DepartmentID smallint ,
Name string ,
GroupName string,
rate_code string,
ModifiedDate timestamp
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' lines terminated by '\n'
STORED AS TEXTFILE LOCATION 'wasb:///ds/Department' TBLPROPERTIES('skip.header.line.count'='1');`
And after loading the data
LOAD DATA INPATH 'wasb:///ds/Department.csv' INTO TABLE AdventureWorks2014.Department;
The data is not loaded.
select * from AdventureWorks2014.Department;
The above select returns nothing.
I think the double quotes around each fileds is the issue. Is there a way to load the data from such a file to hive tables, Without having to strip out the double quotes?
Try this (cellphone...)
create external table if not exists AdventureWorks2014.Department ( DepartmentID smallint , Name string , GroupName string, rate_code string, ModifiedDate timestamp )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION 'wasb:///ds/Department'
** Limitation **
This SerDe treats all columns to be of type String. Even if you create a table with non-string column types using this SerDe, the DESCRIBE TABLE output would show string column type. The type information is retrieved from the SerDe. To convert columns to the desired type in a table, you can create a view over the table that does the CAST to the desired type.
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
FIELDS TERMINATED BY '","' is incorrect. Your fields are terminated by a , not ",". Change your DDL to FIELDS TERMINATED BY ','.
LOAD DATA LOCAL INPATH '/home/hadoop/hive/log_2013805_16210.log'into table_name
Related
I have my data in CSV format in the below form:
Id -> tinyint
Name -> String
Id Name
1 Alex
2 Sam
When I export the CSV file to S3 and create an Athena table, the data transform into the following format.
Id Name
1 "Alex"
2 "Sam"
How do I get rid of the double quotes while creating the table?
Any help is appreciated.
By default if SerDe is not specified, Athena is using LasySimpleSerDe, it does not support quoted values and reads quotes as a part of value. If your CSV file contains quoted values, use OpenCSVSerde (specify correct separatorChar if it is not comma):
CREATE EXTERNAL TABLE mytable(
id tinyint,
Name string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\'
)
LOCATION 's3://my-bucket/mytable/'
;
Read the manuals: https://docs.aws.amazon.com/athena/latest/ug/csv-serde.html
See also this answer about data types in OpenCSVSerDe
I am using load data syntax to load a csv file to a table.The file is same format as hive accepts. But still after load data is issued, Last 2 columns returns null on select.
1750,651,'2013-03-11','2013-03-17'
1751,652,'2013-03-18','2013-03-24'
1752,653,'2013-03-25','2013-03-31'
1753,654,'2013-04-01','2013-04-07'
create table dattable(
DATANUM INT,
ENTRYNUM BIGINT,
START_DATE DATE,
END_DATE DATE )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable ;
Select returns NULL values for the last 2 cols
Other question was what if the date format is different than YYYY-MM-DD. is it possible to make hive identify the format? (Because right now i am modifying the csv file format to accept by hive)
Answer to your 2nd question:
You will need an additional temporary table to read your input file, and then you can do date conversions in your insert select statements.In your temporary table store date fields as string. Ex.
create table dattable_ext(
DATANUM INT,
ENTRYNUM BIGINT,
START_DATE String,
END_DATE String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Load data into temporary table
LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable_ext;
Insert from temporary table to the managed table.
insert into table dattable select DATANUM, ENTRYNUM,
from_unixtime(unix_timestamp(START_DATE,'yyyy/MM/dd'),'yyyy-MM-dd'),
from_unixtime(unix_timestamp(END_DATE,'yyyy/MM/dd'),'yyyy-MM-dd') from dattable_ext;
You can replace date format in unix_timestamp function with your input date format.
LasySimpleSerDe (default) does not work with quoted CSV. Use CSVSerDe:
create table dattable(
DATANUM INT,
ENTRYNUM BIGINT,
START_DATE DATE,
END_DATE DATE )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "'"
)
STORED AS TEXTFILE;
Also read this: CSVSerDe treats all columns to be of type String
Define you date columns as string and apply conversion in select.
I am using a Load command to insert all the data in a CSV file to the mysql table. The load query sample is:
LOAD DATA LOCAL INFILE 'C:\\path\\to\\windows\\file.CSV'
INTO TABLE table_name
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(field1, field2, field3, fieldx);
The data in the file has the following format:
FName || LName || num1 || num2 || num3|| num4 || num5 || date
Here all nums are of Float data type.
Here the date format of date in csv file is dd-MM-yyyy.
So when loading the complete file in DB I am storing dates as a varchar, because when I store them in a DATE datatype I get 0000-00-00.
Now after inserting data I have to work on dates but I am not able to get the sorted dates as they are stored as a Varchar.
Is there any way I can specify the default dateformat at the time of table creation. For example:
create table test (
mydates date(date : dd-mm-yyyy));
something like this.
Or could anyone suggest a different approach to tackle this issue?
Use str_to_date to convert the string into a date time object and use set to set the column's value manually. Lets say fieldx is your date field:
LOAD DATA LOCAL INFILE 'C:\\path\\to\\windows\\file.CSV'
INTO TABLE table_name
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(field1, field2, field3, #fieldx)
SET fieldx = str_to_date(#fieldx, "%d-%m-%Y");
Have a look at the manual page for load data for more information; and adjust the format string using this table.
I suggest using the "string to date" function
STR_TO_DATE(table.datestring, '%m-%d-%Y')
i use below script for insert data to sql from textpad.
#!/bin/bash
mysql --utest -ptest test << EOF
LOAD DATA INFILE 'test.txt'
INTO TABLE content_delivery_process
FIELDS TERMINATED BY ',';
EOF
in my test file i have a format like,
cast , date , name , buy
i can insert but i need format like below,
S.NO | date | name | buy | cast
You can specify the columns you want to import:
From the MySQL Manual:
MySQL LOAD DATA INFILE
The following example loads all columns of the persondata table:
LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata;
By default, when no column list is provided at the end of the LOAD
DATA INFILE statement, input lines are expected to contain a field for
each table column.
If you want to load only some of a table's columns, specify a column
list:
LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata (col1,col2,...);
You must also specify a column list if the order of the fields in the
input file differs from the order of the columns in the table.
Otherwise, MySQL cannot tell how to match input fields with table
columns.
You would include "FIELDS TERMINATED BY '|';" at the end to import data delimited with a '|' symbol.
Hope this helps.
create table [YOUR TABLE] ( `S.NO` INT AUTO_INCREMENT, date DATETIME, name VARCHAR(50), buy VARCHAR(50), cast VARCHAR(50));
Load data local infile 'test.txt' ignore into table [YOUR TABLE] fields terminated by ',' lines terminated by '\n'(cast , date , name , buy);
I am interested in importing a CSV file into a Hive table. The first field from the Hive table (ts) is of type BIGINT. After performing the following query, Hive ams_csv table is successfully created but the ts values are NULL.
CREATE EXTERNAL TABLE ams_csv (ts BIGINT, id STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/tmp/csvFilesDirectory';
I performed the same query but with the following modification and it worked:
CREATE EXTERNAL TABLE ams_csv (ts STRING, id STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/tmp/csvFilesDirectory';
I am not interested in having ts of type String. Does anyone know how to perform the cast. I thought it was implicit.
Many thanks!