How to bulk load into cassandra other than copy method.? - csv

AM using the copy method for cpying the .csv file into the cassandra tables..
But am getting records mismatch error..
Record 41(Line 41) has mismatched number of records (85 instead of 82)
This is happening for all the .csv files & all the .csv files are system generated..
Any work around for this error..?

Based on your error message, it sounds like the copy command is working for you, until record 41. What are you using as a delimiter? The default delimiter for the COPY command is a comma, and I'll bet that your data has some additional commas in it on line 41.
A few options:
Edit your data and remove the extra commas.
Alter your .csv file to encapsulate the values of all of your fields in double-quotes, as COPY's default QUOTE value is ". This will allow you to leave the in-text commas.
Alter your .csv file to delimit with pipes | instead of a comma, and set the COPY command's DELIMITER option to |.
Try using either the Cassandra bulk loader or json2sstable utility to import your data. I've never used them, but I would bet you'll have similar problems if you have commas in your data set.

Related

structure in getMetadata activity for csv file dataset ignores an extra comma in the files first row

I am using a reference CSV file with just the correct number and name of columns and want to compare its structure with that of incoming CSV files before proceeding to use Copy Data to import the incoming CSV data into Azure SQL. Files arriving in blob storage trigger this pipeline.
The need to validate the structure has arisen due to random files arriving with a trailing comma in the header row which causes a failure in the copy data pipeline as it sees the trailing comma as an extra column.
I have set up a getMetadata for both the reference file & the incoming files. Using an If Condition, I compare schemas.
The problem I have is that the output of getMetadata is ignoring the trailing comma.
I have tried 'column count' & 'structure' arguments. The same problem either way as the getMetadata fails to see the trailing comma as an issue.
Any help appreciated
I tried with extra commas in header of csv file. Its not ignoring them
reading those extra commas also as columns.
Please check below screenshots.

IBM ETL/DataStage CSV file missing final comma

My Team's using IBM's DataStage ETL tool to read a CSV file into a Salesforce instance. If the last column is blank the file doesn't have a second comma to close out the Record. That is, it's just ',' instead of ',,' at the end of a line. That's causing ETL to reject the file. Excel Anyone know if ETL can be configured to handle the missing record deliminator? Thanks!
You can count the commas in every line and for the line if the final comma is missing add accordingly. All this you need to implement in sequncer command stage.

Redshift COPY - No Errors, 0 Record(s) Loaded Successfully

I'm attempting to COPY a CSV file to Redshift from an S3 bucket. When I execute the command, I don't get any error messages, however the load doesn't work.
Command:
COPY temp FROM 's3://<bucket-redacted>/<object-redacted>.csv'
CREDENTIALS 'aws_access_key_id=<redacted>;aws_secret_access_key=<redacted>'
DELIMITER ',' IGNOREHEADER 1;
Response:
Load into table 'temp' completed, 0 record(s) loaded successfully.
I attempted to isolate the issue via the system tables, but there is no indication there are issues.
Table Definition:
CREATE TABLE temp ("id" BIGINT);
CSV Data:
id
123,
The line endings in your csv file probably don't have a unix new line character at the end, so the COPY command probably sees your file as:
id123,
Given you have the IGNOREHEADER option enabled, and the line endings in the file aren't what COPY is expecting (my assumption based on past experience), the file contents get treated as one line, and then skipped.
I had this occur for some files created from a Windows environment.
I guess one thing to remember is that CSV is not a standard, more a convention, and different products/vendors have different implementations for csv file creation.
I repeated your instructions, and it worked just fine:
First, the CREATE TABLE
Then, the LOAD (from my own text file containing just the two lines you show)
This resulted in:
Code: 0 SQL State: 00000 --- Load into table 'temp' completed, 1 record(s) loaded successfully.
So, there's nothing obviously wrong with your commands.
At first, I thought that the comma at the end of your data line could cause Amazon Redshift to think that there is an additional column of data that it can't map to your table, but it worked fine for me. Nonetheless, you might try removing the comma, or create an additional column to store this 'empty' value.

How can I load 10,000 rows of test.xls file into mysql db table?

How can I load 10,000 rows of test.xls file into mysql db table?
When I use below query it shows this error.
LOAD DATA INFILE 'd:/test.xls' INTO TABLE karmaasolutions.tbl_candidatedetail (candidate_firstname,candidate_lastname);
My primary key is candidateid and has below properties.
The test.xls contains data like below.
I have added rows starting from candidateid 61 because upto 60 there are already candidates in table.
please suggest the solutions.
Export your Excel spreadsheet to CSV format.
Import the CSV file into mysql using a similar command to the one you are currently trying:
LOAD DATA INFILE 'd:/test.csv'
INTO TABLE karmaasolutions.tbl_candidatedetail
(candidate_firstname,candidate_lastname);
To import data from Excel (or any other program that can produce a text file) is very simple using the LOAD DATA command from the MySQL Command prompt.
Save your Excel data as a csv file (In Excel 2007 using Save As) Check
the saved file using a text editor such as Notepad to see what it
actually looks like, i.e. what delimiter was used etc. Start the MySQL
Command Prompt (I’m lazy so I usually do this from the MySQL Query
Browser – Tools – MySQL Command Line Client to avoid having to enter
username and password etc.) Enter this command: LOAD DATA LOCAL INFILE
‘C:\temp\yourfile.csv’ INTO TABLE database.table FIELDS TERMINATED
BY ‘;’ ENCLOSED BY ‘”‘ LINES TERMINATED BY ‘\r\n’ (field1, field2);
[Edit: Make sure to check your single quotes (') and double quotes (")
if you copy and paste this code - it seems WordPress is changing them
into some similar but different characters] Done! Very quick and
simple once you know it :)
Some notes from my own import – may not apply to you if you run a different language version, MySQL version, Excel version etc…
TERMINATED BY – this is why I included step 2. I thought a csv would default to comma separated but at least in my case semicolon was the deafult
ENCLOSED BY – my data was not enclosed by anything so I left this as empty string ”
LINES TERMINATED BY – at first I tried with only ‘\n’ but had to add the ‘\r’ to get rid of a carriage return character being imported into the database
Also make sure that if you do not import into the primary key field/column that it has auto increment on, otherwhise only the first row will be imported
Original Author reference

Postgres import file that has columns separated by new lines

I have a large text file that has one column per row and I want to import this data file into Postgres.
I have a working MySQL script.
LOAD DATA LOCAL
INFILE '/Users/Farmor/data.sql'
INTO TABLE tablename
COLUMNS TERMINATED BY '\n';
How can I translate this into Postgres? I've tried amongst other this command.
COPY tablename
FROM '/Users/Farmor/data.sql'
WITH DELIMITER '\n'
However it complains:
ERROR: COPY delimiter must be a single one-byte character
The immediate error is because \n is just a two char string, \ and n.
You want:
COPY tablename
FROM '/Users/Farmor/data.sql'
WITH DELIMITER E'\n'
The E'' syntax is a PostgreSQL extension.
It still won't work, though, because PostgreSQL's COPY can't understand files with newline column delimiters. I've never even seen that format.
You'll need to load it using another tool and transform the CSV. Use an office suite, the csv module for Python, Text::CSV for Perl, or whatever. Then feed the cleaned up CSV into PostgreSQL.
While postgresql will not recognize \n as a field delimiter, the original question asked how to import a row as a single column and this can be accomplished in postgresql by defining a delimiter not found in the data string. For example:
COPY tablename
FROM '/Users/Farmor/data.sql'
WITH DELIMITER '~';
If no ~ is found in the row, postgresql will treat the entire row as one column.
Your delimiter is two characters so it's a valid error message.
I believe the simplest approach would be to modify the file you're importing from and actually change the delimiters to something other than \n but that might not be an option in your situation.
This question addresses the same issue:
ERROR: COPY delimiter must be a single one-byte character