Error when inserting timestamp data from CSV into a Redshift table column which is of timestamp data type - csv

I am trying to insert data from an UTF-8 encoded CSV file into Redshift database but I get the error when attempting to insert timestamp into a column which has timestamp data type.
Here's a sample CSV:
employeeId,employeeDept,employeeName,shiftStartTime,shiftEndTime,onPremises
KL214691,John Smith,operations,2023-01-17 09:01:34,2023-01-17 16:52:41,1
KL214692,Samantha Kennedy,operations,2023-01-17 08:31:54,2023-01-17 16:09:10,1
Here's a sample table DDL:
create table historical_metrics_agent_status_time_on_status
(
employeeid varchar(10),
employeename varchar(100),
employeedept varchar(50),
shiftstarttime timestamp encode az64,
shiftendtime timestamp encode az64,
onpremises boolean,
importdatetime timestamp encode az64
)
sortkey (employeeid);
The error message shows that there's an invalid digit - on position 4 in column shiftstarttime which has raw field value 2023-01-17 09:01:34. It looks like it's not reading timestamp from CSV file properly. Is there something I'm missing in CSV?

Check stl_load_errors for the exact row that is failing. My guess is that one of the VARCHAR columns has a comma (,) in it and is throwing off the alignment of the CSV to table columns. Like if one of the names is entered as “Smith, Joe”.

Related

Convert a column of varchar dates in mm-dd-yyyy into a column with Date datatype [duplicate]

This question already has answers here:
mySQL convert varchar to date
(3 answers)
Closed 2 years ago.
i have a table in a csv file with date format MM-DD-YYYY. I will like to import the whole table into mysql. To do so, I believe I will need to create a table first. Since I am unable to import that table because of the date format not being recognized by SQL, I created the field as a varchar field.
Now that I have imported the table in, I will like to convert this varchar column into a date column (YYYY-MM-DD) that I can run operations on (such as using the YEAR() function). Can someone teach me how? I am new to SQL.
Thanks!
If you "import CSV into MySQL" then you use LOAD DATA INFILE.
If your CSV contains date column in a format which is not recognized by MySQL then you must use SET-variant of LOAD DATA INFILE query.
CREATE TABLE imported__from_csv ( .. ,
datetime_column DATETIME,
.. );
LOAD DATA ..
INFILE ..
INTO TABLE imported__from_csv ( .. , #datetime_variable, .. )
..
SET .. ,
datetime_column = STR_TO_DATE( #datetime_variable, '%d-%m-%Y'),
.. ;
I.e. we load the value which has incorrect format into user-defined variable, then we convert the value to correct date literal, and finally we save the value into the table.

inner join two datasets but return nothing without any error (date format issue)?

I'm new to SQL, currently I'm doing a task about join two datasets, one of the dataset was created by myself, here's the query I used:
USE `abcde`;
CREATE TABLE `test_01`(
`ID` varchar(50) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`NUMBER01` bigint(20) NOT NULL DEFAULT '0',
`NUMBER02` bigint(20) NOT NULL,
`date01` date DEFAULT NULL,
PRIMARY KEY (`ID`, `date01`))
Then I load the data from a csv file to this table, the csv file looks like this:
ID NUMBER01 NUMBER02 DATE01
aaa=ee 12345678 235896578 **2009-01-01T00:00:00**
If I query this newly-created table, it looks like this(the format of the 'DATE01' changes):
ID NUMBER01 NUMBER02 DATE01
aaa=ee 12345678 235896578 **2009-01-01**
Another dataset, I queried and exported to a csv file, the format of the date01 column is like 01/12/1979 and in SQL the format looks like 1979-12-01.
I also usedselect * from information_schema.columns to check the datatype of the columns I need to join, for the newly-created dataset:
The date column for another dataset is:
The differences are:
1. The format of the date column in csv appears different
2. The COLUMN_DEFAULT are different, one is 0000-00-00, another one is NULL.
I wonder the reason why I got empty output is probably because the difference in the 'date' format, but I'm not sure how to make them the same so that I can get something in the output, can someone gave me some hint? Thank you.
the format of the 'DATE01' changes
Of course, DATE datatype does not contain timezone info/component.
I wonder the reason why I got empty output is probably because the difference in the 'date' format
If input value have some disadvantage (like wrong data format) than according value is truncated or is set to NULL. See - you must obtain a bunch of warnings during the importing similar to "truncate incorrect value".
If the date field in CSV have wrong format then you must use intermediate user-defined variable for accepting raw value, and apply proper converting expression to it in SET clause. Like
LOAD DATA INFILE ...
INTO TABLE tablename (field1, ..., #date01)
SET date01 = STR_TO_DATE(#date01, '%d/%m/%Y');

Handling corrupt JSON structure with Athena AWS ( HIVE_BAD_DATA)

I need to access a JSON structure from the table "data_test":
id (string)
att (struct<field1:string;field2:string;field3:int>)
SELECT
id,
att.field1,
att.field2,
att.field3
FROM database.data_test as rawdata
I receive the following error:
HIVE_BAD_DATA: Error parsing field value for field 1: For input
string: "2147483648"
So, as I understand do I have a numeric-value 2147483648 in string-field, that causes the corrupt data.
Then I tried to CAST the string-fields as varchar, but the result was the same.
SELECT
id,
CAST(att.field1 as VARCHAR) as field1,
CAST(att.field2 as VARCHAR) as field2,
att.field3
FROM database.data_test as rawdata
HIVE_BAD_DATA: Error parsing field value for field 1: For input
string: "2147483648"
When I just select the id, than everything works fine.
SELECT
id
FROM database.data_test as rawdata
Unfortunately, I do not even know the IDs of the corrupt data, otherwise I would just skip them with the WHERE-clause. I just have access over Athena to the data, so it is hard for me to get more information.
I asked the AWS-admin to add the ignore.malformed -option, so that the JSON-file do not allow corrupt data. He told me, that he can not do it, because than to much data will be skipped.
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
The admin gave me the DDL:
CREATE EXTERNAL TABLE ${dbName}.${tableName}(
`id` string,
`att` struct<field1:string,field2:string,field3:int>
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('paths'='att')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://${outPutS3BucketLocation}'
TBLPROPERTIES (
'CrawlerSchemaDeserializerVersion'='1.0',
'CrawlerSchemaSerializerVersion'='1.0',
'UPDATED_BY_CRAWLER'='create_benchmark_athena_table',
'averageRecordSize'='87361',
'classification'='json',
'compressionType'='gzip',
'objectCount'='1',
'recordCount'='100',
'sizeKey'='315084',
'typeOfData'='file')
I have three questions:
Is there a way to SELECT even corrupt data e.g. all fields as a string?
Can I just skip the corrupt data in SELECT statement?
How can I get more information e.g. the id-field of the corrupt data to skip it in where-clause?
Thanks!

csv file to hive table using load data - How to format the date in csv to accept by hive table

I am using load data syntax to load a csv file to a table.The file is same format as hive accepts. But still after load data is issued, Last 2 columns returns null on select.
1750,651,'2013-03-11','2013-03-17'
1751,652,'2013-03-18','2013-03-24'
1752,653,'2013-03-25','2013-03-31'
1753,654,'2013-04-01','2013-04-07'
create table dattable(
DATANUM INT,
ENTRYNUM BIGINT,
START_DATE DATE,
END_DATE DATE )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable ;
Select returns NULL values for the last 2 cols
Other question was what if the date format is different than YYYY-MM-DD. is it possible to make hive identify the format? (Because right now i am modifying the csv file format to accept by hive)
Answer to your 2nd question:
You will need an additional temporary table to read your input file, and then you can do date conversions in your insert select statements.In your temporary table store date fields as string. Ex.
create table dattable_ext(
DATANUM INT,
ENTRYNUM BIGINT,
START_DATE String,
END_DATE String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Load data into temporary table
LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable_ext;
Insert from temporary table to the managed table.
insert into table dattable select DATANUM, ENTRYNUM,
from_unixtime(unix_timestamp(START_DATE,'yyyy/MM/dd'),'yyyy-MM-dd'),
from_unixtime(unix_timestamp(END_DATE,'yyyy/MM/dd'),'yyyy-MM-dd') from dattable_ext;
You can replace date format in unix_timestamp function with your input date format.
LasySimpleSerDe (default) does not work with quoted CSV. Use CSVSerDe:
create table dattable(
DATANUM INT,
ENTRYNUM BIGINT,
START_DATE DATE,
END_DATE DATE )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "'"
)
STORED AS TEXTFILE;
Also read this: CSVSerDe treats all columns to be of type String
Define you date columns as string and apply conversion in select.

Importing csv file with null values into phpmyadmin

When I import a csv file into MySQL (phpmyadmin), for all integer values that are not specified in the file but have a default of null there is an error message: #1366 - Incorrect integer value: '' for column 'id' at row 1.. I have these questions:
a. How do I import a csv file that does not have the row-id specified if the DB table has that id defined as auto-increment?
b. What do I need in the csv file or in the table column specification in phpmyadmin for integer column that have a default of null?
Here is are sample rows from the csv file.
id,year,month,date,day,description,ranking
,,3,1,,,
,,3,2,,,
,,3,3,,,
,,3,4,,,
,,3,5,,,
,,3,6,,,
,,3,7,,,
,,3,7,,"Saints Perpetua and Felicity, Martyrs",
,,3,8,,,
,,3,8,,"Saint John of God, Religious",
,,3,9,,,
,,3,9,,"Saint Frances of Rome, Religious",
,,3,10,,,
The columns that cause the error are id, year, ranking. They are all integer columns. The column id is auto increment. The other columns are INT(11) with a default of NULL. Thanks.
CSV has no concept of "Nulls". It's impossible to differentiate between a field that is null, and a field that has a legitimately empty value (e.g. empty string). You'll have to massage the rows as you load them prior to query insertion, to replace any 'empty strings' with appropriate NULLs
e.g.
$row = fgetcsv(...);
$row[0] = 'NULL';