CSV to SAS dataset: no line-final comma causes problems - csv

I'm trying to import a .CSV file into a SAS dataset, and am having some trouble. Here's a line of sample input:
Foo,5,10,3.5
Bar,2,3,1.0
The problem I'm having is that the line-final "3.5" and "1.0" are not being correctly interpreted as variable values (instead SAS complains that they are invalid values, giving me a NOTE: Invalid data for VARIABLE error). However, when I add a comma to the end of the line, like so:
Foo,5,10,3.5,
Bar,2,3,1.0,
Then everything works fine. Is there a way that I can make this import work without modifying the source file?
Currently, my DATA step's INFILE statement has the DSD, DLM=',', and MISSOVER options.

With this data in a .csv file in a windows environment
Foo,5,10,1.5
Bar,2,3,2.1
Foo,5,10,3.5
Bar,2,3,4.1
This code works (running SAS locally on a windows machine)
filename f 'D:\Data\SAS\input.csv';
data input;
infile f delimiter=',';
input char1 $ num1 num2 num3;
Run;
As #itzy mentioned, the environment is important..more info will help with the solution
When you are working with data from a different environment, you can use the TERMSTR option on the INFILE statement to tell SAS how the lines of data are terminated.

This most likely has to do with the different codes for line endings in Unix and Windows. I'm guessing your data comes from a different operating system than the one you're running SAS on.
The solution is to change the newline codes to the correct operating system. If you're running SAS on a unix system, try the dos2unix command. If you're running Windows, you can edit the CSV file with a text editor like UltraEdit or Notepad++ and save the file in Windows format.

Related

file "(...).csv" not Stata file error in using merge command

I use Stata 12.
I want to add some country code identifiers from file df_all_cities.csv onto my working data.
However, this line of code:
merge 1:1 city country using "df_all_cities.csv", nogen keep(1 3)
Gives me the error:
. run "/var/folders/jg/k6r503pd64bf15kcf394w5mr0000gn/T//SD44694.000000"
file df_all_cities.csv not Stata format
r(610);
This is an attempted solution to my previous problem of the file being a dta file not working on this version of Stata, so I used R to convert it to .csv, but that also doesn't work. I assume it's because the command itself "using" doesn't work with csv files, but how would I write it instead?
Your intuition is right. The command merge cannot read a .csv file directly. (using is technically not a command here, it is a common syntax tag indicating a file path follows.)
You need to read the .csv file with the command insheet. You can use it like this.
* Preserve saves a snapshot of your data which is brought back at "restore"
preserve
* Read the csv file. clear can safely be used as data is preserved
insheet using "df_all_cities.csv", clear
* Create a tempfile where the data can be saved in .dta format
tempfile country_codes
save `country_codes'
* Bring back into working memory the snapshot saved at "preserve"
restore
* Merge your country codes from the tempfile to the data now back in working memory
merge 1:1 city country using `country_codes', nogen keep(1 3)
See how insheet is also using using and this command accepts .csv files.

JSON-file without line breaks, cant import file to SAS

I have a large json file (250 Mb) that has no line breaks in it when opening the file in notepad or SAS. But if I open it in Wordpad, I get the correct line breaks. I suppose this could mean the json file uses unix line breaks, which notapad can't read, but wordpad can read, from what I have read.
I need to import the file to SAS. One way of doing this migth be to open the file in wordpad, save it as a text file, which will hopefully retain the correct line breaks, so that I can read the file in SAS. I have tried reading the file, but without line breaks, I only get the first observation, and I can't get the program to find the next observation.
I have tried getting wordpad to save the file, but wordpad crashes each time, probably because of the file size. Also tried doing this through powershell, but can't figure out how to save the file once it is opened, and I see no reason why it should work seeing as wordpad crashes when i try it through point and click.
Is there another way to fix this json-file? Is there a way to view the unix code for line breaks and replace it with windows line breaks, or something to that effect?
EDIT:
I have tried adding the TERMSTR=LF option both in filename and infile, without any luck:
filename test "C:\path";
data datatest;
infile test lrecl = 32000 truncover scanover TERMSTR=LF;
input #'"Id":' ID $9.;
run;
However, If I manually edit a small portion of the file to have line breaks, it works. The TERMSTR option doesn't seem to do much for me
EDIT 2:
Solved using RECFM=F
data datatest;
infile test lrecl = 42000 truncover scanover RECFM=F ;
input #'"Id":' ID $9.;
run;
EDIT 3:
Turn out it didnt solve the problem after all. RECFM=F means all records have a fixed length, which they don't, so my data gets mixed up and a lot of info is skipped. Tried RECFM=V(ariable), but this is not working either.
I guess you're using windows, so try:
TYPE input_filename | MORE /P > output_filename
this should replace unix style text file with windows/dos one.
250 Mbytes is not too long to treat as a single record.
data want ;
infile json lrecl=250000000; *250 Mb ;
input #'"Id":' ID :$9. ##;
run;

Load csv file with integers in Octave 3.2.4 under Windows

I am trying to import in Octave a file (i.e. data.txt) containing 2 columns of integers, such as:
101448,1077
96906,924
105704,1017
I use the following command:
data = load('data.txt')
However, the "data" matrix that results has a 1 x 1 dimension, with all the content of the data.txt file saved in just one cell. If I adjust the numbers to look like floats:
101448.0,1077.0
96906.0,924.0
105704.0,1017.0
the loading works as expected, and I obtain a matrix with 3 rows and 2 columns.
I looked at the various options that can be set for the load command but none of them seem to help. The data file has no headers, just plain integers, comma separated.
Any suggestions on how to load this type of data? How can I force Octave to cast the data as numeric?
The load function is not to read csv files. It is meant to load files saved from Octave itself which define variables.
To read a csv file use csvread ("data.txt"). Also, 3.2.4 is a very old version no longer supported, you should upgrade.

How to import multiline CSV in SAS

I got a file in this format.
abc;def;"ghi
asdasd
asdasd
asd
asd
aas
d
"
Now I want to import it with SAS. How do I handle the multiline values?
The answer might depend on what causes the linefeeds to be there, what kind of linefeeds they are, and possibly also on the OS you're running SAS on as well as the version of SAS you're using. Not knowing any of the answers to these questions, here are a couple of suggestions:
First, you could try this infile statement on your data step:
infile "C:\test.csv" dsd delimiter=';' termstr=crlf;
the termstr=crlf tells SAS to only use Windows linefeeds to trigger new records.
Alternatively, you could have SAS pre-process your file byte by byte to ensure that any linefeeds within paired quotes are replaced (perhaps with spaces):
data _null_;
infile 'C:\test.csv' recfm=n;
file 'C:\testFixed.csv' recfm=n;
input a $char1.;
retain open 0;
if a='"' then open=not open;
if (a='0A'x or a='0D'x) and open then put '00'x #;
else put a $char1. #;
run;
This is adapted from here for your reference. You might need to tinker around with this code a bit to get it working. The idea is that you would then read the resulting csv into SAS with a standard data step.

Can SAS convert CSV files into Binary Format?

The output we need to produce is a standard delimited file but instead of ascii content we need binary. Is this possible using SAS?
Is there a specific Binary Format you need? Or just something non-ascii? If you're using proc export, you're probably limited to whatever formats are available. However, you can always create the csv manually.
If anything will do, you could simply zip the csv file.
Running on a *nix system, for example, you'd use something like:
filename outfile pipe "gzip -c > myfile.csv.gz";
Then create the csv manually:
data _null_;
set mydata;
file outfile;
put var1 "," var2 "," var3;
run;
If this is PC/Windows SAS, I'm not as familiar, but you'll probably need to install a command-line zip utility.
This link from SAS suggests using winzip, which has a freely downloadable version. Otherwise, the code is similar.
http://support.sas.com/kb/26/011.html
You can actually make a CSV file as a SAS catalog entry; CSV is a valid SAS Catalog entry type.
Here's an example:
filename of catalog "sasuser.test.class.csv";
proc export data=sashelp.class
outfile=of
dbms=dlm;
delimiter=',';
run;
filename of clear;
This little piece of code exports SASHELP.CLASS to a SAS Catalog entry of entry type CSV.
This way you get a binary format you can move between SAS installations on different platforms with PROC CPORT/CIMPORT, not having to worry if the used binary package format is available to your SAS session, since it's an internal SAS format.
Are you saying you have binary data that you want to output to csv?
If so, I don't think there is necessarily a defined standard for how this should be handled.
I suggest trying it (proc export comes to mind) and seeing if the results match your expectations.
Using SAS, output a .csv file; Open it in Excel and Save As whichever format your client wants. You can automate this process with a little bit of scripting in ### as well. (Substitute ### with your favorite scripting language.)