Cassandra COPY command never stops while loads .csv file - csv

Hello and thank you for take your time reading my issue.
I have the next issue with Cassandra cqlsh:
When I use the COPY command to load a .csv into my table, the command prompt never finishes the executing and loads nothing into the table if I stop it with ctrl+c.
Im using .csv's files from: https://www.kaggle.com/daveianhickey/2000-16-traffic-flow-england-scotland-wales
specifically from ukTrafficAADF.csv.
I put the code below:
CREATE TABLE first_query ( AADFYear int, RoadCategory text,
LightGoodsVehicles text, PRIMARY KEY(AADFYear, RoadCategory);
Im trying it:
COPY first_query (AADFYear, RoadCategory, LightGoodsVehicles) FROM '..\ukTrafficAADF.csv' WITH DELIMITER=',' AND HEADER=TRUE;
This give me the error below repeatedly:
Failed to import 5000 rows: ParseError - Invalid row length 29 should be 3, given up without retries
And never finishes.
Add that the .csv file have more columns that I need, and trying the previous COPY command with the SKIPCOLS reserved word including the unused columns does the same.
Thanks in advance.

In cqlsh COPY command, All column in the csv must be present in the table schema.
In your case your csv ukTrafficAADF has 29 column but in the table first_query has only 3 column that's why it's throwing parse error.
So in some way you have to remove all the unused column from the csv then you can load it into cassandra table with cqlsh copy command

Related

Invalid column count in CSV input on line 1

I'm trying to import a CVS made from an excel file, but is not working, i'm receiving the following error -> Invalid column count in CSV input on line 1.
Follow the table structure
Follow the cvs file content
Do I need to have an id field in the csv file? Since its auto incremented, do i need to add it?
Follow is the import config of phpmyadmin
Another doubt, for each entry of mysql table, i would like to add an timestamp, is the configuration right? For every entry of the table, a timestamp will be added?
Regards

Difficulties creating CSV table in Google BigQuery

I'm having some difficulties creating a table in Google BigQuery using CSV data that we download from another system.
The goal is to have a bucket in the Google Cloud Platform that we will upload a 1 CSV file per month. This CSV files have around 3,000 - 10,000 rows of data, depending on the month.
The error I am getting from the job history in the Big Query API is:
Error while reading data, error message: CSV table encountered too
many errors, giving up. Rows: 2949; errors: 1. Please look into the
errors[] collection for more details.
When I am uploading the CSV files, I am selecting the following:
file format: csv
table type: native table
auto detect: tried automatic and manual
partitioning: no partitioning
write preference: WRITE_EMPTY (cannot change this)
number of errors allowed: 0
ignore unknown values: unchecked
field delimiter: comma
header rows to skip: 1 (also tried 0 and manually deleting the header rows from the csv files).
Any help would be greatly appreciated.
This usually points to the error in the structure of data source (in this case your CSV file). Since your CSV file is small, you can run a little validation script to see that the number of columns is exactly the same across all your rows in the CSV, before running the export.
Maybe something like:
cat myfile.csv | awk -F, '{ a[NF]++ } END { for (n in a) print n, "rows have",a[n],"columns" }'
Or, you can bind it to the condition (lets say if your number of columns should be 5):
ncols=$(cat myfile.csv | awk -F, 'x=0;{ a[NF]++ } END { for (n in a){print a[n]; x++; if (x==1){break}}}'); if [ $ncols==5 ]; then python myexportscript.py; else echo "number of columns invalid: ", $ncols; fi;
It's impossible to point out the error without seeing an example CSV file, but it's very likely that your file is incorrectly formatted. As a result, one typo confuses BQ into thinking there are thousands. Let's say you have the following csv file:
Sally Whittaker,2018,McCarren House,312,3.75
Belinda Jameson 2017,Cushing House,148,3.52 //Missing a comma after the name
Jeff Smith,2018,Prescott House,17-D,3.20
Sandy Allen,2019,Oliver House,108,3.48
With the following schema:
Name(String) Class(Int64) Dorm(String) Room(String) GPA(Float64)
Since the schema is missing a comma, everything is shifted one column over. If you have a large file, it results in thousands of errors as it attempts to inserts Strings into Ints/Floats.
I suggest you run your csv file through a csv validator before uploading it to BQ. It might find something that breaks it. It's even possible that one of your fields has a comma inside the value which breaks everything.
Another theory to investigate is to make sure that all required columns receive an appropriate (non-null) value. A common cause of this error is if you cast data incorrectly which returns a null value for a specific field in every row.
As mentioned by Scicrazed, this issue seems to be generated as some file rows has an incorrect format, in which case it is required to validate the content data in order to figure out the specific error that is leading this issue.
I recommend you to check the errors[] collection that might contains additional information about the aspects that can be making to fail the process. You can do this by using the Jobs: get method that returns detailed information about your BigQuery Job or refer to the additionalErrors field of the JobStatus Stackdriver logs that contains the same complete error data that is reported by the service.
I'm probably too late for this, but it seems the file has some errors (it can be a character that cannot be parsed or just a string in an int column) and BigQuery cannot upload it automatically.
You need to understand what the error is and fix it somehow. An easy way to do it is by running this command on the terminal:
bq --format=prettyjson show -j <JobID>
and you will be able to see additional logs for the error to help you understand the problem.
If the error happens only a few times you just can increase the number of errors allowed.
If it happens many times you will need to manipulate your CSV file before you upload it.
Hope it helps

Redshift COPY - No Errors, 0 Record(s) Loaded Successfully

I'm attempting to COPY a CSV file to Redshift from an S3 bucket. When I execute the command, I don't get any error messages, however the load doesn't work.
Command:
COPY temp FROM 's3://<bucket-redacted>/<object-redacted>.csv'
CREDENTIALS 'aws_access_key_id=<redacted>;aws_secret_access_key=<redacted>'
DELIMITER ',' IGNOREHEADER 1;
Response:
Load into table 'temp' completed, 0 record(s) loaded successfully.
I attempted to isolate the issue via the system tables, but there is no indication there are issues.
Table Definition:
CREATE TABLE temp ("id" BIGINT);
CSV Data:
id
123,
The line endings in your csv file probably don't have a unix new line character at the end, so the COPY command probably sees your file as:
id123,
Given you have the IGNOREHEADER option enabled, and the line endings in the file aren't what COPY is expecting (my assumption based on past experience), the file contents get treated as one line, and then skipped.
I had this occur for some files created from a Windows environment.
I guess one thing to remember is that CSV is not a standard, more a convention, and different products/vendors have different implementations for csv file creation.
I repeated your instructions, and it worked just fine:
First, the CREATE TABLE
Then, the LOAD (from my own text file containing just the two lines you show)
This resulted in:
Code: 0 SQL State: 00000 --- Load into table 'temp' completed, 1 record(s) loaded successfully.
So, there's nothing obviously wrong with your commands.
At first, I thought that the comma at the end of your data line could cause Amazon Redshift to think that there is an additional column of data that it can't map to your table, but it worked fine for me. Nonetheless, you might try removing the comma, or create an additional column to store this 'empty' value.

Importing csv file in Cassandra Database throws error Record #0 (line 1) has the wrong number of fields 1 instead of 7

AM using the copy method for copying the CSV file into the Cassandra tables.. But am getting records error of has wrong number of fields .
Query is ---COPY activity FROM 'Detail.csv' with HEADER=TRUE
i have my activity as column family with 7 fields
but in my csv file everything is separated by semicolon
Error is Record #0 (line 1) has the wrong number of Fields (1 instead of 7)
Above image is Screen Shot of CSV file
in my csv file everything is separated by semicolon
The default behavior of the COPY command uses a comma as a delimiter. Since your file is (apparently) semi-colon-delimited, it will see the entire row as one field (unless the data contains commas). Try setting the DELIMITER option in your WITH clause.
COPY activity FROM 'Detail.csv' WITH HEADER=TRUE AND DELIMITER=';';
And as a suggestion, I have always had more luck getting COPY to work properly when listing-out the columns to import:
COPY airplanes (name, manufacturer, year, mach) FROM 'temp.csv';

Having troubles loading data in InfoBright ICE

ICE Version: infobright-3.5.2-p1-win_32
I’m trying to load a large file but keep running into problems with errors such as:
Wrong data or column definition. Row: 989, field: 5.
This is row 989, field 5:
”(450)568-3***"
Note: The last 3 chars are numbers as well, but didn’t want to post somebodys phone number on here.
It’s really no different to any of the other entries in that field.
The datatype of that field is VARCHAR(255) NOT NULL
Also, if you upgrade to the current release 4.0.6, we now support row level error checking during LOAD and support a reject file.
To enable the reject file functionality, you must specify BH_REJECT_FILE_PATH and one of the associated parameters (BH_ABORT_ON_COUNT or BH_ABORT_ON_THRESHOLD). For example, if you want to load data from the file DATAFILE.csv to table T but you expects that 10 rows in this file might be wrongly formatted, you would run the following commands:
set #BH_REJECT_FILE_PATH = '/tmp/reject_file';
set #BH_ABORT_ON_COUNT = 10;
load data infile DATAFILE.csv into table T;
If less than 10 rows are rejected, a warning will be output, the load will succeed and all problematic rows will be output to the file /tmp/reject_file. If the Infobright Loader finds a tenth bad row, the load will terminate with an error and all bad rows found so far will be output to the file /tmp/reject_file.
I've run into this issue when the last line of the file is not terminated by the value of --lines-terminated-by="\n".
For example If I am importing a file with 9000 lines of data I have to make sure there is a new line at the end of the file.
Depending on the size of the file, you can just open it with a text editor and hit the return k
I have found this to be consistent with the '\r\n' vs. '\n' difference. Even when running on the loader on Windows, '\n' succeeds 100% of times (assuming you don't have real issues with your data vs. col. definition)