Error code: Inavlid in Loading Data on BigQuery - csv

I have a large CSV file (nearly 10,000 rows) and I am trying to upload it on the BigQuery but it gives me this error:
ile-00000000: CSV table references column position 8, but line starting at position:622 contains only 8 columns. (error code: invalid)
Can anyone please tell me a possible to reason to it? I have double checked my Schema and it looks alright.
Thanks

I had this same issue when trying to import a large data set in a csv to a BigQuery table.
The issue turned out to be some ascii control characters (\b, \t, \r, \n) in the data that was written in the csv. When the csv was being sent to BigQuery these characters caused the BiqQuery csv parser to misinterpret the line and break because the data didn't match with the number of columns in the header.
Replacing these characters with a space (to preserve formatting as best as possible) allowed me to import the data without further issues.

The error message suggests that the load job failed because at least one row has fewer columns than the automatically detected schema dictates.
Add
allow_jagged_rows=true
in the options.

Related

Why aren't my functions working as expected in MySQL?

I am trying to figure out why MYSQL isn't working as expected.
I imported my data from a CSV into a table called Products, which is shown in the screenshot. It's a small table of just ID and Name.
But when I run the where clause, finding out where the Name = 'SMS', it returns nothing? I don't understand what the issue is.
My CSV contents in Notepad++ is shown below:
This is what I used to load in my CSV, if there are any errors here.
Could you share your csv file content?
It's happened to me too before, and the problem is because there's some blank space in the data in csv file.
So maybe you could parse first your csv file data (remove the "not needed" blank space) before import it to database
This is often caused by spaces or look-alike characters. If caused by spaces or invisible characters at the beginning/end, you can try:
where name like '%SMS%'
You can then make this more general:
where name like '%S%M%S%'
When you get a match, you'll need to do more investigate to find the actual cause.

bigquery loading a csv(, field seperated) file having data between quotes and got one column as commas in between data

i have a csv file like:
"id","name","address"
"10","aparna","hyderabad,hitech-city"
"11","mounika","hyderabad,kukatpally"
"12","raji","hyderabad,madhapur"
if I use --autodetect it is working, but if I use with schema it is giving me an error.
I want to load this file to a table in a big query like this
id name address
10 aparna hyderabad,hitech-city
11 mounika hyderabad,kukatpally
12 raji hyderabad,madhapur
table
for that, I used bq load project:dataset.table gs://filepath schemafile this is giving me an error like
.csv: Error
while reading data, error message: Too many values in row starting
at position: 0. Found 35 column(s) while expected 33.
- You are loading data without specifying data format, data will be
treated as CSV format by default. If this is not what you mean,
please specify data format by --source_format.
can anyone help me out in this
thanks in advance

Difficulties creating CSV table in Google BigQuery

I'm having some difficulties creating a table in Google BigQuery using CSV data that we download from another system.
The goal is to have a bucket in the Google Cloud Platform that we will upload a 1 CSV file per month. This CSV files have around 3,000 - 10,000 rows of data, depending on the month.
The error I am getting from the job history in the Big Query API is:
Error while reading data, error message: CSV table encountered too
many errors, giving up. Rows: 2949; errors: 1. Please look into the
errors[] collection for more details.
When I am uploading the CSV files, I am selecting the following:
file format: csv
table type: native table
auto detect: tried automatic and manual
partitioning: no partitioning
write preference: WRITE_EMPTY (cannot change this)
number of errors allowed: 0
ignore unknown values: unchecked
field delimiter: comma
header rows to skip: 1 (also tried 0 and manually deleting the header rows from the csv files).
Any help would be greatly appreciated.
This usually points to the error in the structure of data source (in this case your CSV file). Since your CSV file is small, you can run a little validation script to see that the number of columns is exactly the same across all your rows in the CSV, before running the export.
Maybe something like:
cat myfile.csv | awk -F, '{ a[NF]++ } END { for (n in a) print n, "rows have",a[n],"columns" }'
Or, you can bind it to the condition (lets say if your number of columns should be 5):
ncols=$(cat myfile.csv | awk -F, 'x=0;{ a[NF]++ } END { for (n in a){print a[n]; x++; if (x==1){break}}}'); if [ $ncols==5 ]; then python myexportscript.py; else echo "number of columns invalid: ", $ncols; fi;
It's impossible to point out the error without seeing an example CSV file, but it's very likely that your file is incorrectly formatted. As a result, one typo confuses BQ into thinking there are thousands. Let's say you have the following csv file:
Sally Whittaker,2018,McCarren House,312,3.75
Belinda Jameson 2017,Cushing House,148,3.52 //Missing a comma after the name
Jeff Smith,2018,Prescott House,17-D,3.20
Sandy Allen,2019,Oliver House,108,3.48
With the following schema:
Name(String) Class(Int64) Dorm(String) Room(String) GPA(Float64)
Since the schema is missing a comma, everything is shifted one column over. If you have a large file, it results in thousands of errors as it attempts to inserts Strings into Ints/Floats.
I suggest you run your csv file through a csv validator before uploading it to BQ. It might find something that breaks it. It's even possible that one of your fields has a comma inside the value which breaks everything.
Another theory to investigate is to make sure that all required columns receive an appropriate (non-null) value. A common cause of this error is if you cast data incorrectly which returns a null value for a specific field in every row.
As mentioned by Scicrazed, this issue seems to be generated as some file rows has an incorrect format, in which case it is required to validate the content data in order to figure out the specific error that is leading this issue.
I recommend you to check the errors[] collection that might contains additional information about the aspects that can be making to fail the process. You can do this by using the Jobs: get method that returns detailed information about your BigQuery Job or refer to the additionalErrors field of the JobStatus Stackdriver logs that contains the same complete error data that is reported by the service.
I'm probably too late for this, but it seems the file has some errors (it can be a character that cannot be parsed or just a string in an int column) and BigQuery cannot upload it automatically.
You need to understand what the error is and fix it somehow. An easy way to do it is by running this command on the terminal:
bq --format=prettyjson show -j <JobID>
and you will be able to see additional logs for the error to help you understand the problem.
If the error happens only a few times you just can increase the number of errors allowed.
If it happens many times you will need to manipulate your CSV file before you upload it.
Hope it helps

csv data with comma values throws error while processing the file through the BizTalk flatfile Disassembler

I'm going to a pick a csv file in BizTalk and after some process I wanted to update it with two or more different systems.
In order to getting the csv file, I'm using the default Flatfile Disassembler for breaking it and constructing it as XML with the help of genereted schema. I can do that successfully with some consistent data however if I use a data with comma in it (other than delimiters), BizTalk fails!
Any other way to do this without using a custom pipeline component?
Expecting a simple configuration within the flatfile disassembler component!
So, here's the deal. BizTalk is not failing. Well, it is, but that is the expected and correct behavior.
What you have in an invalid CSV file. The CSV specification disallows the comma in field data unless a wrap character is used. Either way, both are reserved characters.
To accept the comma in field data, you must choose a wrap character and set that in the Wrap Character property in the Flat File Schema.
This is valid:
1/1/01,"Smith, John", $5000
This is not:
1/1/01,Smith, John, $5000
Since your schema definition has ',' as delimiter, flat file disassembler will consider the data with comma as two fields and will fail due to mismatch in columns.
You have few options:
Either add a new field to schema if you know , in data will only be present in a particular field.
Or change the delimiter in flat file from , to |(pipe) or some other character so that data does not conflict with delimiter.
Or as you mentioned manipulate the flat file in a custom pipeline component, which should be last resort if above two are not feasible.

Getting Invalid column count in CSV input on line 1 error

I'm trying to export a CSV from my client's FluidSurvey's account and import it into a database I've created. I've never actually worked with a CSV before, so excuse my ignorance.
I've looked into this error and none of the solutions seem to be working for me, I'm at a loss, I've been trying to import this file for hours now.
Settings are as follows:
There is already a table with columns for this data to be inserted into.
What am I missing here?
You've showed exported csv file in Excel or Calc. It is impossible to understand how you columns are enclosed. Probably there is some sign other than ' or " Please show exported csv in notepad. This will clear the structure of csv.
I found that Fluidsurveys CSV files had the header two bytes incorrect.
They are only 7F7E. Changing them to the expected Unicode FFFE works as expected - they can be read into Excel with no garbage characters at the start.