Hive query - FAILED SemanticException invalid path - csv

Here is my problem:
I have just gotten my initial Azure subscription converted to a Pay-As-You-Go subscription (first was a 30-day trial) after it was shut down when I used up the first set of free credits. Now all is working fine again - I still have the same old resource group under which I establish a new cluster. The files with my CSV-data are all still present in the container I created last time (not the default container but one that was established earlier). The only thing I had to recreate was the Hive table needed to load the data into. Also that table I was able to establish again. However when I then try to run a Hive query to actually load data into the Hive table from the CSV-file as follows...
LOAD DATA INPATH '/container1/HdiSamples/user/data-file.csv' OVERWRITE INTO TABLE default.hive_table;
...I am constantly receiving "Failed" as an error message (I use Data Lake tools for VS to upload blobs and run the queries). In the specificerror log the line beginning with 'FAILED: SemanticException etc.' stands out each time... (this despite of using different locations for the file upload).
16/12/01 04:16:25 WARN conf.HiveConf: HiveConf of name hive.log.dir does not exist
FAILED: SemanticException Line 1:17 Invalid path ''/container1/HdiSamples/user/data-file.csv'': No files matching path wasb://container1#resourcegroup.blob.core.windows.net/container1/HdiSamples/user/data-file.csv
Here is my question:
Can anyone tell me why it doesn't find and load the file at/from the location where the file actually resides...?
I just don't get the cause for this error...

Although it's been a while since I asked this question, I worked out a solution to the issue myself which I thought, I'd share with others...
I had problems for about a week, being unable to load data into the Hive tables from the Azure Blob Storage. I had two CSV-files called data-file.csv and data-file-extended-1.CSV in my blob. Please note the capitals in the file extension here!
Hive and Hadoop do NOT accept these files unless...
a) the filename is spelled exactly the same way including the capitals in the file-extension
AND
b) the filename is shortened drastically and without the hyphens and numbers (in my case I used only 6 conjoined letters, i.e. "datfil" and "datfix")
Shockingly, there isn't any mention of these issues in neither the official Azure documentation nor did I find anything on the web. However, these two adjustments will resolve the error message.
Just to let people know...

Related

Difficulties creating CSV table in Google BigQuery

I'm having some difficulties creating a table in Google BigQuery using CSV data that we download from another system.
The goal is to have a bucket in the Google Cloud Platform that we will upload a 1 CSV file per month. This CSV files have around 3,000 - 10,000 rows of data, depending on the month.
The error I am getting from the job history in the Big Query API is:
Error while reading data, error message: CSV table encountered too
many errors, giving up. Rows: 2949; errors: 1. Please look into the
errors[] collection for more details.
When I am uploading the CSV files, I am selecting the following:
file format: csv
table type: native table
auto detect: tried automatic and manual
partitioning: no partitioning
write preference: WRITE_EMPTY (cannot change this)
number of errors allowed: 0
ignore unknown values: unchecked
field delimiter: comma
header rows to skip: 1 (also tried 0 and manually deleting the header rows from the csv files).
Any help would be greatly appreciated.
This usually points to the error in the structure of data source (in this case your CSV file). Since your CSV file is small, you can run a little validation script to see that the number of columns is exactly the same across all your rows in the CSV, before running the export.
Maybe something like:
cat myfile.csv | awk -F, '{ a[NF]++ } END { for (n in a) print n, "rows have",a[n],"columns" }'
Or, you can bind it to the condition (lets say if your number of columns should be 5):
ncols=$(cat myfile.csv | awk -F, 'x=0;{ a[NF]++ } END { for (n in a){print a[n]; x++; if (x==1){break}}}'); if [ $ncols==5 ]; then python myexportscript.py; else echo "number of columns invalid: ", $ncols; fi;
It's impossible to point out the error without seeing an example CSV file, but it's very likely that your file is incorrectly formatted. As a result, one typo confuses BQ into thinking there are thousands. Let's say you have the following csv file:
Sally Whittaker,2018,McCarren House,312,3.75
Belinda Jameson 2017,Cushing House,148,3.52 //Missing a comma after the name
Jeff Smith,2018,Prescott House,17-D,3.20
Sandy Allen,2019,Oliver House,108,3.48
With the following schema:
Name(String) Class(Int64) Dorm(String) Room(String) GPA(Float64)
Since the schema is missing a comma, everything is shifted one column over. If you have a large file, it results in thousands of errors as it attempts to inserts Strings into Ints/Floats.
I suggest you run your csv file through a csv validator before uploading it to BQ. It might find something that breaks it. It's even possible that one of your fields has a comma inside the value which breaks everything.
Another theory to investigate is to make sure that all required columns receive an appropriate (non-null) value. A common cause of this error is if you cast data incorrectly which returns a null value for a specific field in every row.
As mentioned by Scicrazed, this issue seems to be generated as some file rows has an incorrect format, in which case it is required to validate the content data in order to figure out the specific error that is leading this issue.
I recommend you to check the errors[] collection that might contains additional information about the aspects that can be making to fail the process. You can do this by using the Jobs: get method that returns detailed information about your BigQuery Job or refer to the additionalErrors field of the JobStatus Stackdriver logs that contains the same complete error data that is reported by the service.
I'm probably too late for this, but it seems the file has some errors (it can be a character that cannot be parsed or just a string in an int column) and BigQuery cannot upload it automatically.
You need to understand what the error is and fix it somehow. An easy way to do it is by running this command on the terminal:
bq --format=prettyjson show -j <JobID>
and you will be able to see additional logs for the error to help you understand the problem.
If the error happens only a few times you just can increase the number of errors allowed.
If it happens many times you will need to manipulate your CSV file before you upload it.
Hope it helps

SSIS 2012 extracting bool from .csv failing to insert to db "returned status 2"

Hi all quick question for you.
I have an SSIS2012 package that is reading a flat file (.csv) and is loading it into a SQL Server database table. However, I am getting an error for one of the columns when loading the OLEDB Destination:
[Flat File Source [32]] Error: Data conversion failed. The data conversion for column "Active_Flag" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
I am wondering if this is because in the flat file (which is comma delimited), the values are literally spelled out "TRUE" or "FALSE". The advanced page on the flat file properties has it set to "DT_BOOL" which I thought was right. It was on DT_STRING originally, and that wasn't working either.
In the SQL server table the column is set up as a bit, and allows nulls. Is this because it is literally typed out TRUE/FALSE? What's the easiest way to fix this?
Thanks for any advice!
It actually turned out there was a blank space in front of "True"/"False" in the file. Was just bad data and I missed it. Fixing that solved my issue. Thank you though, I did try that and when that didn't work that's when I knew it was something else.

Redshift COPY - No Errors, 0 Record(s) Loaded Successfully

I'm attempting to COPY a CSV file to Redshift from an S3 bucket. When I execute the command, I don't get any error messages, however the load doesn't work.
Command:
COPY temp FROM 's3://<bucket-redacted>/<object-redacted>.csv'
CREDENTIALS 'aws_access_key_id=<redacted>;aws_secret_access_key=<redacted>'
DELIMITER ',' IGNOREHEADER 1;
Response:
Load into table 'temp' completed, 0 record(s) loaded successfully.
I attempted to isolate the issue via the system tables, but there is no indication there are issues.
Table Definition:
CREATE TABLE temp ("id" BIGINT);
CSV Data:
id
123,
The line endings in your csv file probably don't have a unix new line character at the end, so the COPY command probably sees your file as:
id123,
Given you have the IGNOREHEADER option enabled, and the line endings in the file aren't what COPY is expecting (my assumption based on past experience), the file contents get treated as one line, and then skipped.
I had this occur for some files created from a Windows environment.
I guess one thing to remember is that CSV is not a standard, more a convention, and different products/vendors have different implementations for csv file creation.
I repeated your instructions, and it worked just fine:
First, the CREATE TABLE
Then, the LOAD (from my own text file containing just the two lines you show)
This resulted in:
Code: 0 SQL State: 00000 --- Load into table 'temp' completed, 1 record(s) loaded successfully.
So, there's nothing obviously wrong with your commands.
At first, I thought that the comma at the end of your data line could cause Amazon Redshift to think that there is an additional column of data that it can't map to your table, but it worked fine for me. Nonetheless, you might try removing the comma, or create an additional column to store this 'empty' value.

MySQL LOAD DATA Error (Errcode: 2 - "No such file or directory")

I am trying to load data into a table of my MySQL database, and getting this error.
LOAD DATA LOCAL INFILE 'C:\Users\Myself\Desktop\Blah Blah\LOAD DATA\week.txt'
INTO TABLE week;
Reference: this
The path is hundred percent correct, I copied it by pressing shift and clicking "copy path as" and checked it many times. So any tips on this will be much appreciated.
.
My research: Seeing this answer, I tried by changing C:\Users to C:\\Users. It did not work for me.
Secondly, is there a way to use some kind of a relative path (rather than an absolute path) here?
I spent 2 days on this and finally got my mistake, Just changing backslashes by forward ones, as one contributor previously said. And finally worked for me.
so was:
LOAD DATA LOCAL INFILE 'C:/ProgramData/MySQL/MySQL Server 5.7/Data/menagerie/pet.txt' INTO TABLE pet;
I just can say thanks a lot.
p.s. don't waste time on ytb...
I don't know what version of MySQL you are using but a quick Google search found possible answers to both your questions. Below are excerpts from the MySQL 5.1 Reference Manual:
The file name must be given as a literal string. On Windows, specify
backslashes in path names as forward slashes or doubled backslashes
The LOCAL keyword affects where the file is expected to be found:
If LOCAL is specified, the file is read by the client program on the
client host and sent to the server. The file can be given as a full
path name to specify its exact location. If given as a relative path
name, the name is interpreted relative to the directory in which the
client program was started.
Regards.
If using MySQL Workbench on a local Windows PC to connect to a remote MySQL server,
Add the "LOCAL" keyword
Add double backslashes "\\" to your folder path
If text file's first row has column names add "IGNORE 1 LINES".
LOAD DATA LOCAL INFILE 'C:\\MyTabDelimited.txt'
INTO TABLE my_table IGNORE 1 LINES;
Simply replace backslash with slash in the path.
This works for me (MySQL Workbench 6.3 on Win 10):
LOAD DATA LOCAL INFILE 'C:/Users/Myself/Desktop/Blah Blah/LOAD DATA/week.txt'
INTO TABLE week;
Ref. https://dev.mysql.com/doc/refman/5.5/en/loading-tables.html
One more reason for this type of error is another languge in the path.
You might have almost the entire path in English, but the username might be auto-filled in another language.
Try removing the word LOCAL from your query.
Try moving the week.txt file to the desktop
then execute in a terminal window:
LOAD DATA LOCAL INFILE 'C:\Users\Myself\Desktop\week.txt'
INTO TABLE week;
Instead of using double backslash That slash is also worked for me too.
I resolve this problem by replacing the path
Replace format "C:\Users\Myself\Desktop\week.txt"
With this different format "C:/Users/Myself/Desktop/week.txt"
My computer didnt recognize the ( \ ) symbols.

Junk characters at the beginning of file obtained via column transformations in SSIS

I need to export varbinary data to file. But, when I do it using Column Transformations in SSIS, the exported files are corrupt. There are few junk characters at the start of the file. On removing them, the file opens fine.
A similar post for BCP, says that these characters specify the data length.
Would like to know how to address this issue in SSIS?
Thanks
Export transformation is used for converting the varbinary to files.I have tried something similar using Adventure works which has image type of var-binary data.
Following Query is used for the Source query. I have Modified the query
since it does not have the full path to write image files.
SELECT [ProductPhotoID]
,[ThumbNailPhoto]
,'D:\SSISTesting\ThumnailPhotos\'+[ThumbnailPhotoFileName]
,[LargePhoto]
,'D:\SSISTesting\LargePhotos\'+[LargePhotoFileName]
,[ModifiedDate]
FROM [Production].[ProductPhoto]
Used the Export column transformation[also available in 2005 and
2008] and configured as follows.
Mapped rest of the columns to the destination.
After running package all the image files are written into the
respective folders[D:\SSISTesting\ThumnailPhotos\ and D:\SSISTesting\LargePhotos].
Hope this helps!