I am trying to create a table in MySQL, and load the data into it which i have as a txt file
the date format in the txt file is dd/mm/yyyy with few dates as 12/12/1988 and few as 1/2/1988
really confused how to give value for the date column while creating the table ?
Please help, I am a beginner with MySQL.
MySQL Supports Dates in Format 'YYYY-MM-DD' i.e '2014-01-28'
You can load the date strings into user-defined variables, and then use a function to convert them to MySQL dates.
Example
I have a table Like below
CREATE TABLE TestTable
(
patentId INT,
USPatentNum INT,
title CHAR(10),
grantDate DATE,
filedDate DATE
);
Now I need to load string dates into DATE column we can do like
load data local infile '/home/abdul/Test.csv'
into table TestTable
fields terminated by ','
enclosed by '"'
ignore 1 lines
( patentId, USPatentNum, title, #grantDate, #filedDate)
set grantDate = STR_TO_DATE(#grantDate, '%m/%d/%Y'),
filedDate = STR_TO_DATE(#filedDate, '%m/%d/%Y')
Related
I am using load data syntax to load a csv file to a table.The file is same format as hive accepts. But still after load data is issued, Last 2 columns returns null on select.
1750,651,'2013-03-11','2013-03-17'
1751,652,'2013-03-18','2013-03-24'
1752,653,'2013-03-25','2013-03-31'
1753,654,'2013-04-01','2013-04-07'
create table dattable(
DATANUM INT,
ENTRYNUM BIGINT,
START_DATE DATE,
END_DATE DATE )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable ;
Select returns NULL values for the last 2 cols
Other question was what if the date format is different than YYYY-MM-DD. is it possible to make hive identify the format? (Because right now i am modifying the csv file format to accept by hive)
Answer to your 2nd question:
You will need an additional temporary table to read your input file, and then you can do date conversions in your insert select statements.In your temporary table store date fields as string. Ex.
create table dattable_ext(
DATANUM INT,
ENTRYNUM BIGINT,
START_DATE String,
END_DATE String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Load data into temporary table
LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable_ext;
Insert from temporary table to the managed table.
insert into table dattable select DATANUM, ENTRYNUM,
from_unixtime(unix_timestamp(START_DATE,'yyyy/MM/dd'),'yyyy-MM-dd'),
from_unixtime(unix_timestamp(END_DATE,'yyyy/MM/dd'),'yyyy-MM-dd') from dattable_ext;
You can replace date format in unix_timestamp function with your input date format.
LasySimpleSerDe (default) does not work with quoted CSV. Use CSVSerDe:
create table dattable(
DATANUM INT,
ENTRYNUM BIGINT,
START_DATE DATE,
END_DATE DATE )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "'"
)
STORED AS TEXTFILE;
Also read this: CSVSerDe treats all columns to be of type String
Define you date columns as string and apply conversion in select.
I have a csv file which has contents like this.
"DepartmentID","Name","GroupName","ModifiedDate"
"1","Engineering","Research and Development","2008-04-30 00:00:00"
I have
create external table if not exists AdventureWorks2014.Department
(
DepartmentID smallint ,
Name string ,
GroupName string,
rate_code string,
ModifiedDate timestamp
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' lines terminated by '\n'
STORED AS TEXTFILE LOCATION 'wasb:///ds/Department' TBLPROPERTIES('skip.header.line.count'='1');`
And after loading the data
LOAD DATA INPATH 'wasb:///ds/Department.csv' INTO TABLE AdventureWorks2014.Department;
The data is not loaded.
select * from AdventureWorks2014.Department;
The above select returns nothing.
I think the double quotes around each fileds is the issue. Is there a way to load the data from such a file to hive tables, Without having to strip out the double quotes?
Try this (cellphone...)
create external table if not exists AdventureWorks2014.Department ( DepartmentID smallint , Name string , GroupName string, rate_code string, ModifiedDate timestamp )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION 'wasb:///ds/Department'
** Limitation **
This SerDe treats all columns to be of type String. Even if you create a table with non-string column types using this SerDe, the DESCRIBE TABLE output would show string column type. The type information is retrieved from the SerDe. To convert columns to the desired type in a table, you can create a view over the table that does the CAST to the desired type.
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
FIELDS TERMINATED BY '","' is incorrect. Your fields are terminated by a , not ",". Change your DDL to FIELDS TERMINATED BY ','.
LOAD DATA LOCAL INPATH '/home/hadoop/hive/log_2013805_16210.log'into table_name
Here is my code:
CREATE TABLE A
(`ID` INT NULL,
`DATE` DATE NULL,
`NUM` INT NULL
);
LOAD DATA LOCAL INFILE "fakepath/file.csv"
INTO TABLE A
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(ID,DATE,NUM)
SET
DATE = str_to_date(#DATE, '%Y%m%d');
The original data in csv file is like this-- 20160101,20160102,20160103 (the date is different). After I execute the code, all the date in the DATE column become one day value such as 2016-01-02 in table A.
Why do this happen? I have other table which used the same code(different column name)
How can I fix it? Thank you!
You have to tell MySQL to load the from-csv data value into a variable first:
LOAD DATA LOCAL INFILE "fakepath/file.csv"
[..snip..]
IGNORE 1 LINES
(ID,DATE,NUM)
^---table field, **NOT** a variable
SET
DATE = str_to_date(#DATE, '%Y%m%d');
^---variable never gets populated
Try
(ID, #DATE, NUM)
^--note this
instead. That'll load the id/num values directly into the table, but puts your date value into the variable, which you can use afterwards in the SET portion of the query.
The fact that you actually get a date value put into the table with a proper date format indicates that somwhere else, in previous code, you did set a #DATE variable, and it's simply being re-used in this query. But since you don't CHANGE that variable's value in this query, you end up using the SAME date value for all records.
I am trying to import data from a csv file into MySQL using LOAD DATA LOCAL INFILE. All columns except the date columns which have timestamp as their datatype are imported correctly. I am getting the error 1265: data truncated for date column and it inserts 0000-00-00 00:00:00 for all values.This has been asked before but I did not find a perfect solution for this. I have also tried various solutions posted for this type of question but none have worked for me.
table create statement :
CREATE TABLE MySchema.response
(
`id` int,
`version` int,
`name` varchar(500),
`date_created` timestamp,
`last_updated` timestamp,
`count` int,
);
loading data into table:
LOAD DATA LOCAL INFILE 'C:/response.csv'
INTO TABLE MySchema.response
FIELDS TERMINATED BY ',' optionally ENCLOSED by '"'
ignore 1 lines
Sample Data in CSV file
id version name date_created last_updated count
1, 0, xyz, 5/3/2013 1:18, 5/3/2013 1:18, 2
2, 0, abc, 5/3/2013 1:18, 5/3/2013 1:18, 1
Date columns in your sample are not in MySQL's default format and thus are not identified as dates. You need to try something like the following, in order to state how the dates should be interpreted:
LOAD DATA LOCAL INFILE 'C:/response.csv'
INTO TABLE MySchema.response
FIELDS TERMINATED BY ',' optionally ENCLOSED by '"'
IGNORE 1 lines
(id, version, name, #date_created, #last_updated, count)
SET date_created = STR_TO_DATE(#date_created, '%d/%c/%Y %k:%i'),
last_updated = STR_TO_DATE(#last_updated, '%d/%c/%Y %k:%i');
Check MySQL date format documentation for specifiers that suit your case (probably the ones I added in the sample above).
I am using a Load command to insert all the data in a CSV file to the mysql table. The load query sample is:
LOAD DATA LOCAL INFILE 'C:\\path\\to\\windows\\file.CSV'
INTO TABLE table_name
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(field1, field2, field3, fieldx);
The data in the file has the following format:
FName || LName || num1 || num2 || num3|| num4 || num5 || date
Here all nums are of Float data type.
Here the date format of date in csv file is dd-MM-yyyy.
So when loading the complete file in DB I am storing dates as a varchar, because when I store them in a DATE datatype I get 0000-00-00.
Now after inserting data I have to work on dates but I am not able to get the sorted dates as they are stored as a Varchar.
Is there any way I can specify the default dateformat at the time of table creation. For example:
create table test (
mydates date(date : dd-mm-yyyy));
something like this.
Or could anyone suggest a different approach to tackle this issue?
Use str_to_date to convert the string into a date time object and use set to set the column's value manually. Lets say fieldx is your date field:
LOAD DATA LOCAL INFILE 'C:\\path\\to\\windows\\file.CSV'
INTO TABLE table_name
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(field1, field2, field3, #fieldx)
SET fieldx = str_to_date(#fieldx, "%d-%m-%Y");
Have a look at the manual page for load data for more information; and adjust the format string using this table.
I suggest using the "string to date" function
STR_TO_DATE(table.datestring, '%m-%d-%Y')