problems in copying a csv file from s3 to redshift - csv

i am getting the following error if i run a copy command to copy contents of a .csv file in s3 to a table in redshift.
error:"String length exceeds DDL length".
i am using following copy command:
COPY enjoy from 's3://nmk-redshift-bucket/my_workbook.csv' CREDENTIALS 'aws_access_key_id=”****”;aws_secret_access_key=’**** ' CSV QUOTE '"' DELIMITER ',' NULL AS '\0'
i figured lets open the link given by s3 for my file through was console.
link for the work book is :
link to my s3bucket cvs file
the above file is filled with many weird characters i really don't understand.
the copy command is taking these characters instead of the information i have entered in my csv file.So hence leading to string length exceeded error.
i use sql workbench to query.My 'stl_load_errors' table in redshift has raw_field_values component similar to the chars in the link i mentioned above, thats how i got to know how its taking in the input
i am new to aws and utf-8 configs. so please i appreciate help on this

The link you provide points to a .xlsx file (but has a .csv extension instead of .xlsx), which is actually a zip file.
That is why you see those strange characters, the first 2 being 'PK', which means it is a zip file.
So you will have to export to .csv first, before using the file.

Related

MySQL import - CSV - file refuses to be properly imported

I'm trying to import the following file into a MySQL Db:
https://drive.google.com/drive/folders/1WbRdNgqVre3wN4DpJZ-08jtGkJtCDJNQ?usp=sharing
Using the "data import wizard" on MySql Workbench, for some reason I'm getting "218\223 lines imported successfully", whereas the file contains close to 100K.
I tried looking for special chars around lines 210-230, also removing all of them, but still the same happens.
The file is a CSV of Microsoft Bing's geo locations, used in Microsoft Advertising campaigns, downloaded from Microsoft's website (using an ad account there).
I've been googling, reading, StackOverflowing, playing with the file and different import options...
I tried cutting the file into small bits, and the newly created file was completely corrupt somehow...
Encoding seems to be UTF-8, line breaks all "\n". I tried changing them all into "\r\n" using notepad++, but still the same happens.
File opens normally in Excel, looks normal, passes CSVlint.io...
The only weird thing is that the file contains quotes on some of the values but not on the rest (e.g. line 219. Yeah I know it sounds like this would be the problem, but I removed it, and all the rest of the lines with quotes, and it still happens... Also tried loading with ENCLOSED BY ", see below).
I also tried using SQL statements to import:
LOAD DATA LOCAL INFILE 'c:\\Users\\Gilad\\Downloads\\GeoLocations.csv'
INTO TABLE aw_geo_map_bmsl
FIELDS TERMINATED BY ','
(tried also with: ENCLOSED BY '"')
LINES TERMINATED BY '/n'
IGNORE 1 ROWS;
(had to add OPT_LOCAL_INFILE=1 to the connection on Advanced for MySQL Workbench to be allowed access to local files on my computer)
This gives 0 rows affected.
Help?
Epilogue: In the end I just gave up on all these import wizards and did it the old "make your SQL statements from Excel" way.
I imported the CSV data into Excel. Watch out: in this case I found I needed to use a data import wizard from Excel (but that one worked perfectly) to be able to change the encoding to UTF, which Excel 2010 chose as "windows" which was wrong.
After processing the data a bit to my liking, I used the following Excel code:
=CONCATENATE("INSERT INTO aw_geo_map_bmsl (`Location Id`,Name,`Canonical Name`,`Location Type`,Status,`Adwords Location Id`)
VALUES (",
A2,
",""",B2,"""",
",""",C2,"""",
",""",D2,"""",
",""",E2,"""",
",",F2,");")
to generate INSERT statements for every line, then copy-pasted and pasted only values, then pasted into an editor, removed additional quotes that Excel adds, and ran it in MySQL Workbench, which runs it line by line (takes some time), and you can see the progress.
Saved me hours of unsuccessfully playing around with "automatic tools" which fail for unknown reasons and don't give proper logs ootb.
Warning: do NOT do this for unsanitized code as it's vulnerable to SQL injection. In this case it was data from Microsoft so I know it's fine.

cannot load simple csv file into tableau public 9.3

I am trying to load the following simple csv file into tableau public 9.3:
customers,item1,item2,item3,item4
1,0,0,0,0
2,0,0,0,0
3,0,0,0,0
However, it doesn't read the file as separate columns, despite the field separator being Comma. Instead it treats the whole line as one column. Any help would be greatly appreciated :
If you change your locale settings to English US you will be able to load the file. You should also be able to work around this by creating a schema.ini file.
Go to Data > Manage fields > [Field] Options
You can also control imported CSV behavior post import both by splitting individual columns (which will remain split on update as well), or by the image below at the CSV level.
That doesn`t work for me. So I reopen the .csv file in Excel and save it again in .csv format with ',' as the delimeter.
After that my file looks like .csv with ';' delimeter and works with Tableau.

Redshift COPY - No Errors, 0 Record(s) Loaded Successfully

I'm attempting to COPY a CSV file to Redshift from an S3 bucket. When I execute the command, I don't get any error messages, however the load doesn't work.
Command:
COPY temp FROM 's3://<bucket-redacted>/<object-redacted>.csv'
CREDENTIALS 'aws_access_key_id=<redacted>;aws_secret_access_key=<redacted>'
DELIMITER ',' IGNOREHEADER 1;
Response:
Load into table 'temp' completed, 0 record(s) loaded successfully.
I attempted to isolate the issue via the system tables, but there is no indication there are issues.
Table Definition:
CREATE TABLE temp ("id" BIGINT);
CSV Data:
id
123,
The line endings in your csv file probably don't have a unix new line character at the end, so the COPY command probably sees your file as:
id123,
Given you have the IGNOREHEADER option enabled, and the line endings in the file aren't what COPY is expecting (my assumption based on past experience), the file contents get treated as one line, and then skipped.
I had this occur for some files created from a Windows environment.
I guess one thing to remember is that CSV is not a standard, more a convention, and different products/vendors have different implementations for csv file creation.
I repeated your instructions, and it worked just fine:
First, the CREATE TABLE
Then, the LOAD (from my own text file containing just the two lines you show)
This resulted in:
Code: 0 SQL State: 00000 --- Load into table 'temp' completed, 1 record(s) loaded successfully.
So, there's nothing obviously wrong with your commands.
At first, I thought that the comma at the end of your data line could cause Amazon Redshift to think that there is an additional column of data that it can't map to your table, but it worked fine for me. Nonetheless, you might try removing the comma, or create an additional column to store this 'empty' value.

How to copy csv data file to Amazon RedShift?

I'm trying to migrating some MySQL tables to Amazon Redshift, but met some problems.
The steps are simple:
1. Dump the MySQL table to a csv file
2. Upload the csv file to S3
3. Copy the data file to RedShift
Error occurs in step 3:
The SQL command is:
copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS
'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' delimiter ',' csv;
The error info:
An error occurred when executing the SQL command: copy TABLE_A from
's3://ciphor/TABLE_A.csv' CREDENTIALS
'aws_access_key_id=xxxx;aws_secret_access_key=xxxx ERROR: COPY CSV is
not supported [SQL State=0A000] Execution time: 0.53s 1 statement(s)
failed.
I don't know if there's any limitations on the format of the csv file, say the delimiters and quotes, I cannot find it in documents.
Any one can help?
The problem is finally resolved by using:
copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS
'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' delimiter ','
removequotes;
More information can be found here http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
Now Amazon Redshift supports CSV option for COPY command. It's better to use this option to import CSV formatted data correctly. The format is shown bellow.
COPY [table-name] FROM 's3://[bucket-name]/[file-path or prefix]'
CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' CSV;
The default delimiter is ( , ) and the default quotes is ( " ). Also you can import TSV formatted data with CSV and DELIMITER option like this.
COPY [table-name] FROM 's3://[bucket-name]/[file-path or prefix]'
CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' CSV DELIMITER '\t';
There are some disadvantages to use the old way(DELIMITER and REMOVEQUOTES) that REMOVEQUOTES does not support to have a new line or a delimiter character within an enclosed filed. If the data can include this kind of characters, you should use CSV option.
See the following link for the details.
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
If you want to save your self some code/ you have a very basic use case you can use Amazon Data Pipeline.
it stats a spot instance and perform the transformation within amazon network and it's really intuitive tool (but very simple so you can't do complex things with it)
You can try with this
copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' csv;
CSV itself means comma separated values, no need to provide delimiter with this. Please refer link.
[http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-format]
I always this code:
COPY clinical_survey
FROM 's3://milad-test/clinical_survey.csv'
iam_role 'arn:aws:iam::123456789123:role/miladS3xxx'
CSV
IGNOREHEADER 1
;
Description:
1- COPY the name of your file store in S3
2- FROM address of file
3- iam_role is a substitution for CREDENTIAL. Note that, iam_role should be defined in iam management menu at your console, and then in trust menu should be assigned to the user as well (That is the hardest part!)
4- CSV uses comma delimiter
5- IGNORHEADER 1 is a must! Otherwise it will throw an error. (skip one row of my CSV and consider it as a header)
Since the resolution has already been provided, I'll not repeat the obvious.
However, in case you receive some more error which you're not able to figure out, simply execute on your workbench while you're connected to any of the Redshift accounts:
select * from stl_load_errors [where ...];
stl_load_errors contains all the Amazon RS load errors in historical fashion where a normal user can view details corresponding to his / her own account but a superuser can have all the access.
The details are captured elaborately at :
Amazon STL Load Errors Documentation
Little late to comment but it can be useful:-
You can use an open source project to copy tables directly from mysql to redshift - sqlshift.
It only requires spark and if you have yarn then it can also be used.
Benefits:- It will automatically decides distkey and interleaved sortkey using primary key.
It looks like you are trying to load local file into REDSHIFT table.
CSV file has to be on S3 for COPY command to work.
If you can extract data from table to CSV file you have one more scripting option. You can use Python/boto/psycopg2 combo to script your CSV load to Amazon Redshift.
In my MySQL_To_Redshift_Loader I do the following:
Extract data from MySQL into temp file.
loadConf=[ db_client_dbshell ,'-u', opt.mysql_user,'-p%s' % opt.mysql_pwd,'-D',opt.mysql_db_name, '-h', opt.mysql_db_server]
...
q="""
%s %s
INTO OUTFILE '%s'
FIELDS TERMINATED BY '%s'
ENCLOSED BY '%s'
LINES TERMINATED BY '\r\n';
""" % (in_qry, limit, out_file, opt.mysql_col_delim,opt.mysql_quote)
p1 = Popen(['echo', q], stdout=PIPE,stderr=PIPE,env=env)
p2 = Popen(loadConf, stdin=p1.stdout, stdout=PIPE,stderr=PIPE)
...
Compress and load data to S3 using boto Python module and multipart upload.
conn = boto.connect_s3(AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucket_name)
k = Key(bucket)
k.key = s3_key_name
k.set_contents_from_file(file_handle, cb=progress, num_cb=20,
reduced_redundancy=use_rr )
Use psycopg2 COPY command to append data to Redshift table.
sql="""
copy %s from '%s'
CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s'
DELIMITER '%s'
FORMAT CSV %s
%s
%s
%s;""" % (opt.to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,opt.delim,quote,gzip, timeformat, ignoreheader)

Is there a way of LOAD DATA INFILE (import) xlsx file into MySQL database table

I know that this is discussed a lot but I don't find solution of how to do that.
What I need is to import an excel file (xls/xlsx) to my database table. It is a button which does that and the command which is executed is like that:
string cmdText = "LOAD DATA INFILE 'importTest4MoreMore.csv' INTO TABLE management FIELDS TERMINATED BY ',';";
It works great. But I need to import excel file not CSV. As far as I know LOAD DATA command does not support binary files which xls is.
So what's the solution to that? Please help
Thanks a lot
pepys
.xls will never be importable directly into MySQL. it's a compound OLE file, which means its internal layout is not understandable by mere mortals (or even Bill Gates). .xlsx is basically just a .zip file which contains multiple .xml/xslt/etc. files. You can probably extract the relevant .xml that contains the actual spreadsheet data, but again - it's not likely to be in a format that's directly importable by MySQL's load infile.
The simplest solution is to export the .xls/xlsx to a .csv.
How to import 'xlsx' file into MySQL:
1/ Open your '.xlsx' file Office Excel and click on 'Save As' button from menu and select
'CSV (MS-DOS) (*.csv)'
from 'Save as type' list. Finally click 'Save' button.
2/ Copy or upload the .csv file into your installed MySQL server (a directory path like: '/root/someDirectory/' in Linux servers)
3/ Login to your database:
mysql -u root -pSomePassword
4/ Create and use destination database:
use db1
5/ Create a MySQL table in your destination database (e.g. 'db1') with columns like the ones of '.csv' file above.
6/ Execute the following command:
LOAD DATA INFILE '/root/someDirectory/file1.csv' INTO TABLE `Table1` FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
Please note that the option 'IGNORE 1 LINES' says MySQL to ignore the first line of '.csv' file. So, it is just for '.xlsx' files with 1 header column. You can remove this option.
You can load xls or xlsx files with Data Import tool (MS Excel or MS Excel 2007 format) in dbForge Studio for MySQL. This tool opens Excel files directly, COM interface is not used; and command line is supported.