In one of my applications I am importing csv data into mt Access db using the following bulk insert query.
INSERT INTO Log_134_temp ([DATE],[TIME],CH0,CH1,CH2,CH3) SELECT [DATE],[TIME],CH0,CH1,CH2,CH3 FROM [Text;FMT=CSVDelimited;HDR=Yes;DATABASE=C:\tmp].[SAMPLE_1.csv]
The query gets executed and all the parameters in the query are correct. The issue is with just one of the csv files which gives the following error after the query is executed.
The field 'Log_134_temp.date' cannot contain a Null value because the
Required property for this field is set to True. Enter a value in
this field.
Where as the other csv files get imported without any issue.
The file that is imported successfully and the file with the issue however look identical with their formats. And this has puzzled me over a day now.
The file that gets imported
https://www.dropbox.com/s/amddhzhi6nr24ex/SAMPLE_1_111.csv?dl=0
The file that doesn't get imported
https://www.dropbox.com/s/2rrgdf7oor5ptbf/SAMPLE_1_112.csv?dl=0
Bad line in row 135169:
2019-02-14,16:57:54,310,837,300,650
It contains a lot of 00 symbols.
I found this with help of simple Python cycle:
In [43]: f = read_file(r'...SAMPLE_1_112.csv')
In [44]: li = f.split('\n')
...
In [60]: prev_len = 1
In [61]: for l in li:
...: if not len(l): continue
...: if prev_len != len(l): print(l)
...: prev_len = len(l)
...:
DATE,TIME,CH0,CH1,CH2,CH3
2018-10-03,11:45:44,246,621,250,600
2019-02-14,16:57:54,310,837,300,650
2019-02-14,16:59:01,309,859,300,650
Related
I am trying to create a STREAMING LIVE TABLE object in my DataBricks environment, using an S3 bucket with a bunch of CSV files as a source.
The syntax I am using is:
CREATE OR REFRESH STREAMING LIVE TABLE t1
COMMENT "test table"
TBLPROPERTIES
(
"myCompanyPipeline.quality" = "bronze"
, 'delta.columnMapping.mode' = 'name'
, 'delta.minReaderVersion' = '2'
, 'delta.minWriterVersion' = '5'
)
AS
SELECT * FROM cloud_files
(
"/input/t1/"
,"csv"
,map
(
"cloudFiles.inferColumnTypes", "true"
, "delimiter", ","
, "header", "true"
)
)
A sample source file content:
ROW_TS,ROW_KEY,CLASS_ID,EVENT_ID,CREATED_BY,CREATED_ON,UPDATED_BY,UPDATED_ON
31/07/2018 02:29,4c1a985c-0f98-46a6-9703-dd5873febbbb,HFK,XP017,test-user,02/01/2017 23:03,,
17/01/2021 21:40,3be8187e-90de-4d6b-ac32-1001c184d363,HTE,XP083,test-user,02/09/2017 12:01,,
08/11/2019 17:21,05fa881e-6c8d-4242-9db4-9ba486c96fa0,JG8,XP083,test-user,18/05/2018 22:40,,
When I run the associated pipeline, I am getting the following error:
org.apache.spark.sql.AnalysisException: Cannot create a table having a column whose name contains commas in Hive metastore.
For some reason, the loader is not recognizing commas as column separators and is trying to load the whole thing into a single column.
I spent a good few hours already trying to find a solution. Replacing commas with semicolons (both in the source file and in the "delimiter" option) does not help.
Trying to manually upload the same file to a regular (i.e. non-streaming) Databricks table works just fine. The issue is solely with a streaming table.
Ideas?
Not exactly the type of a solution I would have expected here but it seems to work so...
Rather than using SQL to create a DLT, using Python scripting helps:
import dlt
#dlt.table
def t1():
return (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "csv")
.load("/input/t1/")
)
Note that the above script needs to be executed via a DLT pipeline (running it directly from a notebook will throw a ModuleNotFoundError exception)
I am trying to access data from a dataframe in Octave which satisfy some criterias.
Let us say dataframe name is A with comulns 'Date' and 'Close Price'.
Let us say we intend to access Close prices when Close Price was less than 5.
I am using the command:
A.Close(A.Close<5)
I am getting the following error:
error: subsref.m querying name Close returned positions 5
subsref.m querying name returned positions
I see an instruction video in youtube, where same command is used but no error showed up.
I cannot reproduce the problem. Are you sure your dataframe is correctly initialised?
pkg load dataframe
A = dataframe( [1,2;3,4;5,6;7,8], 'colnames', { 'Date', 'Close Price' } );
A.Close( A.Close < 5 )
% ans =
% 2
% 4
I suspect your error may have to do with the fact that your column name is Close_Price, but you tried to index it via 'Close'. Do you have any other columns that start with the word 'Close' by any chance?
So I want to import a datetime from a txt:
2015-01-22 09:19:59
into a table using a data flow. I have my Flat Source File and my destination DB set up fine. I changed the data type for the txt input for that column in the advanced settings and the input and output properties to:
database timestamp [DT_DBTIMESTAMP]
This is the same data type as the DB used for the table so this should work.
However, when I execute the package I get a error saying the data conversion failed... How do I make this possible?
[Import txt data [1743]] Error: Data conversion failed. The data conversion for column "statdate" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
[Import txt data [1743]] Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "output column "statdate" (2098)" failed because error code 0xC0209084 occurred, and the error row disposition on "output column "statdate" (2098)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
[Import txt data [1743]] Error: An error occurred while processing file "C:\Program Files\Microsoft SQL Server\MON_Datamart\Sourcefiles\tbl_L30T1.txt" on data row 14939.
On the row he is giving the error the datetime is filled up with spaces. that is why on the table the "allow nulls" is checked but my SSIS package gives the error for some reason... can I somewhere tell the package to allow nulls aswell?
I suggest you import the data in to a character field and then parse it after entry.
The following function should help you:
SELECT IsDate('2015-01-22 09:19:59')
, IsDate(Current_Timestamp)
, IsDate(' ')
, IsDate('')
The IsDate() function returns a 1 when it thinks the value is a date and a 0 when it is not.
This would allow you to do something like:
SELECT value_as_string
, CASE WHEN IsDate(value_as_string) = 1 THEN
Cast(date_as_string As datetime)
ELSE
NULL
END As value_as_datetime
FROM ...
I solved it Myself. Thank you for your suggestion gvee but the way I did it is way easier.
In the Flat File Source when making a new connection in the advanced tab I fixed all the data types according to the table in the database EXCEPT the column with the timestamp (in my case it was called "statdate")! I changed this data type to a STRING because otherwise my Flat File Source would give me a conversion error even before any scripts would have been able to be executed and the only way arround this was setting the error output to ignore failure wich I don't want. (You still have to change the data type after you set it to a string in the advanced settings by right clicking the flat file source -> show advanced editor -> going to the output colums and changing the data type there from Date to string.)
After the timestamp was set to a string I added a Derived Column with this expression to delete all the spaces and give it then "NULL" value:
TRIM(<YourColumnName>) == "" ? (DT_STR,4,1252)NULL(DT_STR,4,1252) : <YourColumnName>
Next I added a Data Conversion to set the string back to a timestamp. The Data conversion is finally connected to the OLE DB Destination.
I hope this helps anyone with the same problem in the future.
End result: Picture of data flow
I have a csv comma separated file containing hundreds of thousands of records in the following format:
3212790556,1,0.000000,,0
3212790557,2,0.000000,,0
Now using the SQL Server Import Flat file method works just dandy. I can edit the sql so that the table name and column names are something meaningful. Plus I also edit the data type from the default varchar(50) to int or decimal. This all works fine and sql import is able to import successfully.
However I am unable to do this same task using the Bulk Insert Query which is as follows:
BULK
INSERT temp1
FROM 'c:\filename.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
This query returns the following 3 errors which I have no idea how to resolve:
Msg 4866, Level 16, State 1, Line 1
The bulk load failed. The column is too long in the data file for row 1, column 5. Verify that the field terminator and row terminator are specified correctly.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
The purpose of my application is that there are multiple csv files in a folder that all need to go up in a single table so that I can query for sum of values. At the moment I was thinking of writing a program in C# that will execute the BULK insert in a loop (according for the number of files) and then return back with my results. I am guessing I dont need to write a code and that I can just write a script that does all of this - any one can guide me to the right path :)
Many thanks.
Edit: just added
ERRORFILE = 'C:\error.log'
to the query and I am getting 5221 rows inserted. Some times its 5222 some times its 5222 but it just fails beyond this point. Dont know whats the issue??? The CSV is perfectly fine.
SOB. WTF!!!
I cant believe that replacing \n with "0x0A" in the ROWTERMINATOR worked!!! I mean seriously. I just tried it and it worked. WTF moment!! Totally.
However what is a bit interesting is that the SQL Import wizard too only about 10 something seconds to import. The import query took well over a minute. Any guesses??
I've created an SSIS package that executes inline SQL queries from our database and is supposed to output the contents to a text file. I originally had the text file comma delimited, but changed to pipe delimted after researching the error further. I also did a substring of the FirstName field and ensure that the SSIS placeholder fields matched in length. The error message is as follows:
[Customers Flat File [196]] Error: Data conversion failed. The data conversion for
column "FirstName" returned status value 4 and status text "Text was truncated or one or more
characters had no match in the target code page.".
The SQL statement I'm using in my OLE DB Source is as follows:
SELECT
dbo.Customer.Email, SUBSTRING(dbo.Customer.FirstName, 1, 100) AS FirstName,
dbo.Customer.LastName, dbo.Customer.Gender,
dbo.Customer.DateOfBirth, dbo.Address.Zip, dbo.Customer.CustomerID, dbo.Customer.IsRegistered
FROM
dbo.Customer INNER JOIN
dbo.Address ON dbo.Customer.CustomerID = dbo.Address.CustomerID
What other fixes should I put in place to ensure the package runs without error?
Have you tried to run this query in SSMS? If so, did you get a successful result?
If you havent tried it yet, paste this query in a new SSMS window and wait for it to complete.
If the Query completes, then we dont have a problem with the query. Something could be off inside the package.
But if the query does not finish up and fails, you know where to look.
EDIT
On second thoughts, is your Customer source a flat file or something? It looks like there is a value in the Customer table/file which does not match with the output metadata of the source. Check your source again.