proc import CSV SAS empty values date - csv

I need to import a CSV file in SAS and it gets stuck on empty fields with a date format. My log shows this field is properly recognised as DATETIME. and ANYDTDTM40. just like other datetime fields. In The first records this field is empty and the LOG then gives a NOTE invalid data. When I enter a date in the first rows with empty fields the message moves along. So it clearly has to do with missing values. Can someone help me out?

In the future please make sure to show your actual code and the log - feel free to omit the data part of the log if it's confidential information.
PROC IMPORT is a guessing procedure and guesses at types. For production processes it's not a good idea to use PROC IMPORT.
You can add the GUESSINGROWS=MAX; option to your code to force SAS to scan the entire file before guessing at types. This will increase the run time of the process but will likely fix your issue. Also, ensure your datetime fields are consistent and correct. If the data does has mixed date types, ie MMDDYY and DDMMYY then it can be bit of a pain to manage. Or if it has DDMMYY and SAS guesses MMDDYY (or vice versa) you'll get a bunch of errors. In that case you need to write your own data step code to read in the data. You can use the code from the log as a starting point.

Related

unable to import csv table DATEs columns to BigQuery

I am unable to import csv table > DATEs columns to BigQuery,
DATEs are not recognized, even they have correct format according this docu
https://cloud.google.com/bigquery/docs/schema-detect YYYY-MM-DD
So DATEs columns are not recognized and are renamed to _2020-0122, 2020-01-23...
Is the issue that DATES are in 1st row as column name ?
But How can I then import dates, when I want use them in TimeSeries Charts (DataStudio) ?
here is sample source csv>
Province/State,Country/Region,Lat,Long,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-026
Anhui,China,31.8257,117.2264,1,9,15,39,60
Beijing,China,40.1824,116.4142,14,22,36,41,68
Chongqing,China,30.0572,107.874,6,9,27,57,75
Here is ig from Bigquery
If you have finite number of days, you can try unpivot table when using it. See blog post.
otherwise, if you dont know how many day columns in csv file.
choose a unique character as csv delimiter then just load whole file into a single column staging table, then use split function. you'll also need unnest. This approach requires a full scan and will be more expensive, especially when file gets bigger.
The issue is that in column names you cannot have a date type, for this reason when the CSV is imported it takes the dates and transforms them to the format with underscores.
The first way to face the problem would be modifying the CSV file, because any import with the first row as a header will change the date format and then it will be harder to get to date type again. If you have any experience in any programming language you can do the transformation very easily. I can help doing this but I do not know your use case so maybe this is not possible. Where does this CSV come from?
If the CSV previous modification is not possible then the second option is what ktopcuoglu said, importing the whole file as one column and process this using SQL function. This is way harder than the first option and as you import all the data into a single column, all the data will have the same data type, what will be a headache too.
If you could explain where the CSV comes from we may be able to influence it before being ingested by BigQuery. Else, you'll need to deep into SQL a bit.
Hope it helps!
Hi, now I can help you further.
First I found some COVID datasets into the public bigquery datasets. The one you are taking from github is already in BigQuery, but there are many others that may work better for your task such as the one called “covid19_ecdc”, that is inside bigquery-public-data. This last one has the confirmed cases and deaths per date and country so it should be easy to make a time series.
Second, I found an interesting link performing what you meant with python and data studio. It’s a kaggle discussion so you may not be familiar with it, but it deserves a check for sure . Moreover, he is using the dataset you are trying to use.
Hope it helps. Do not hesitate to ask!

ETL: how to guess data types for messy CSVs with lots of nulls

I often have to cleanse and import messy CSV and Excel files into my MS SQL Server 2014 (but the question would be the same if I were using Oracle or another database).
I have found a way to do this with Alteryx. Can you help me understand if I can do the same with Pentaho Kettle or SSIS? Alternatively, can you recommend another ETL software which addresses my points below?
I often have tables of, say, 100,000 records where the first 90,000 records may be null. Most ETL tools scan only the first few hundred records to guess data types and therefore fail to guess the types of these fields. Can I force Pentaho or SSIS to scan the WHOLE file before guessing types? I understand this may not be efficient for huge files of many GBs, but for the files I handle scanning the entire file is much better than wasting a lot of time trying to guess each field manually
As above, but with the length of a string. If the first 10,000 records are, say, a 3-character string but the subsequent ones are longer, SSIS and Pentaho tend to guess nvarchar(3) and the import will fail. Can I force them to scan all rows before guessing the length of the strings? Or, alternatively, can I easily force all strings to be nvarchar(x) , where I set x myself?
Alteryx has a multi-field tool, which is particularly convenient when cleansing or converting multiple fields. E.g. I have 10 date columns whose datatype was not guessed automatically. I can use the multi-field formula to get Alteryx to convert all 10 fields to date and create new fields called $oldfield_reformatted. Do Pentajho and SSIS have anything similar?
Thank you!
A silly suggestion. In Excel add a row at the top of the list that has a formula that creates a text string with the same length of the longest value in the column.
This formula entered as an array formula would do it..
=REPT("X",MAX(LEN(A:A)))
You could also use a more advanced VBA function to create other dummy values to force datatypes in SSIS.
I've not used SSIS or anything like it, but in the past I would have loaded a file into a table with columns ALL of varchar 1000 say so that all the data loaded, then processed it across into the main table using SQL that casts or removes the data values as I required.
This gives YOU Ultimate control not a package or driver. I was very surprised to hear how this works!

ABAP TVRO field TRAZTD, Route Customizing Data

A customer of mine is looking to mass create some customizing data related the routes. and as such I have a small program which reads in a CSV file with all of the fields as they would be in the customizing transaction.
I'm having a particular problem wrapping my head around a field TVRO-TRAZTD for a couple of reasons.
The user is only filling in a number which represents a number of days.
There is a conversion exit on TRAZTD, except it's obsolete, use CONVERT TIMESTAMP they say
I don't have a timestamp, I have a decimal number representing a part of a day
For example, TRAZTD would be entered as 0,58 from the CSV file, so why is it represented in the table as 135.512?
I tried it the old fashion way and multiplied 0,58 * 24 which gives me 13,92. if I take 13,92 * 10 I get 139.200, which isn't the same but it's the closest I can get, but I don't get it why 10?
Using the conversion exit even though it's obsolete doens't give me a result either, no matter number I give it I always get 0 back. I can't use the convert timestamp either because well, it's not a timestamp or I didn't look up carefully enough how to use it (I didn't see anything other than strings and characters).
The other thing I tried too was just saying "screw it" and placed the data from the CSV directly into the field and hoping the conversion routine will take care of the work, but that doesn't happen either.
Is there anybody out here that can maybe shed some light on where the number after the conversion comes from?
everybody I came to a solution, just incase anybody stumbles upon this same problem.
I took the value from the excel document and multiplied it by 24 to get the amount of hours, and then multipled it 10000 because I don't know, I picked it randomly.

SSIS CSV File load to table

I have a problem loading the .CSV file as the connection manager editor settings are out of my knowledge.
When i load the .CSV file up to 18 rows i have no problem it is loading in to the table.
However, from the 19th column the data is not partioning correctly.
row delimeter is {CR}{LF}
column delimeter is Comma {,}
How can i partition the data correctly?
any help?
Here are some ideas I have with no details.
What happens when you try to import the same .CSV file into Excel? Anything interesting around row 19?
Does there appear to be anything different about row 19?
If you delete row 19, what happens?
See, I bet you've thought of these things as well, and probably more, since you have the details. If you want anything more than superficial bad guesses, you'll have to provide a little detail.
I've found the CSV Import to be a bit limited with regards to bad data. If you're having trouble with the 19th column, I would suggest figuring out why that column is failing. You can try and tell the import task's error conditions to Ignore Errors with data truncation, etc...but that may not fix the issue.
I have often switched complicated or error-prone CSV imports to simply use a SSIS Script Task, then just write my own code to parse out the CSV and handle bad data.
If it's not partitioning correctly, it might be something as trivial as one of your field values on row 19 containing a comma, thus throwing out the import by making that row seem to have more columns. If this is the case, I hope you can get a revised version of the CSV file - this time with a text qualifier set. If possible, use something like | rather than " as the qualifier so that it's less likely to appear in the field values.
Put the file in a text editor such as notepad++ or textpad and change the view to show control characters. You will probably find your culprit there.
Nothing unusuale. when i paste in excel as one column and converting text to column has no problem. but i can see in the SSIS preview the field value where the problem has started has two square boxs and data of the next row.
if any one want to see the file let me know i will e-mail you the file.

In Access 2007 CSV Export: Disable Scientific Notation

When exporting a CSV from Access 2007, it automatically converts decimals into scientific notation.
Unfortunately the tool that receives them treats these fields as text, and displays them as is.
The values being exported are from a query being run against some Excel linked tables, and they appear perfectly in the query view.
Is there any way to disable the automatic conversion to scientific notation.
I.e. if it appears as 0.007 in the query, it will appear as 0.007 in the output csv rather then 7E3?
Note: I'm constrained to use Excel and Access for this. As much as I'd like to switch to SQL Server, my wife would be unhappy if I put it on her work laptop!
You have a couple of choices:
you can use the Format() function directly in your query to force the data in the offending columns to be formatted a certain way, for instance:
SELECT ID, Format([Price],"standard") as Pricing FROM ORDERS;
you can write your own CSV export routine in VBA.
I posted one recently as an answer to this question.
You can easily modify the code to format numeric types a certain way.
If you don't know how, let me know and I'll modify the code and post it here.
You could write a short amount of VBA code in access to query the data from the linked table or Access query and write it out to a text file, thus creating your own .CSV and foregoing the "Wizard". I never liked Access' export "wizard" much, and just created the files myself.
One easy way to handle this in a Query is to double-convert the value to long integer and then to string.
For CSV-export it is character anyway.
myValue:ZString(ZLong(123456789))