I am pretty new to SAS programming and trying to find the most efficient way to my current ongoing initiative. Basically, I need to modify the existing .csv file stored on the SAS server and save it in my folder on the same server.
Modification required:
keep .csv as format
use "|" instead of "," as delimiter
have the following output name: filename_YYYYMMDDhhmmss.csv
keep only 4 variables from the original file
rename some of the variables we keep
Here is the script I am currently using, but there are a few issues with it:
PROC IMPORT OUT = libname.original_file (drop=var0)
FILE = "/.../file_on_server.csv"
DBMS = CSV
REPLACE;
RUN;
%PUT date_human = %SYSFUNC(PUTN(%sysevalf(%SYSFUNC(TODAY())-1), datetime20.));
proc export data = libname.original_file ( rename= ( var1=VAR11 var2=VAR22 Type=VAR33 ))
outfile = '/.../filename_&date_human..csv' label dbms=csv replace;
delimiter='|';
run;
I also have an issue with the variable called "Type" when renaming it as it looks like there is a conflict with some of the system key words. Date format is not good either, and I was not able to find the exact format on the SAS forums, unfortunately.
Any advice on how to make this script more efficient is greatly appreciated.
I wouldn't bother with trying to actually read the data into a SAS dataset. Just process it and write it back out. If the input structure is consistent then it is pretty simple. Just read everything as character strings and output the columns that you want to keep.
Let's assume that the data has 12 columns and the last one of the four that want to keep is the 10th column. So you only need to read in 10 of them.
First setup your input and output filenames in macro variables to make it easier to edit. You can use your logic for generating the filename for the new file.
%let infile=/.../file_on_server.csv;
%let outfile=/.../filename_&date_human..csv;
Then use a simple DATA _NULL_ step to read the data as character strings and write it back out. You can even change the relative order of the four columns if you want. So this program will copy the 2nd, 5th, 4th and 10th columns and change the column headers to NewName1, NewName2, NewName3 and NewName4.
data _null_;
infile "&infile" dsd dlm=',' truncover;
file "&outfile" dsd dlm='|';
length var1-var10 $200 ;
input var1-var10;
if _n_=1 then do;
var2='NewName1';
var5='NewName2';
var4='NewName3';
var10='NewName4';
end;
put var2 var5 var4 var10 ;
run;
If some of the data for the four columns you want to keep are longer than 200 characters then just update the LENGTH statement.
So let's try a little experiment. First let's make a dummy CSV file.
filename example temp;
data _null_;
file example ;
input;
put _infile_;
cards4;
a,b,c,d,e,f,g,h,i,j,k,l,m
1,2,3,4,5,6,7,8,9,10,11,12,13
o,p,q,r,s,t,u,v,w,x,y,z
;;;;
Now let's try running it. I will modify the INFILE and FILE statements to read from my temp file and write the result to the log.
infile example /* "&infile" */ dsd dlm=',' truncover;
file log /* "&outfile" */ dsd dlm='|';
Here are the resulting rows written.
NewName1|NewName2|NewName3|NewName4
2|5|4|10
p|s|r|x
Related
I cannot quite figure out how to change the format of a column in my data file. I have the data set proc imported, and it guessed the format of a specific column as numeric, I would like to to be character-based.
This is where I'm currently at, and it does not change the format of my NUMBER column:
proc import
datafile = 'datapath'
out = dataname
dbms = CSV
replace
;
format NUMBER $8.
;
guessingrows = 20000
;
run;
You could import the data and then format after using - I believe the following would work.
proc sql;
create table want as
select *,
put(Number, 4.) as CharacterVersion
from data;
quit;
You cannot change the type/format via PROC IMPORT. However, you can write a data step to read in the file and then customize everything. If you're not sure how to start with that, check the log after you run a PROC IMPORT and it will have the 'skeleton' code. You can copy that code, edit it, and run to get what you need. Writing from scratch also works using an INFILE and INPUT statement.
From the help file (search for "Processing Delimited Files in SAS")
If you need to revise your code after the procedure runs, issue the RECALL command (or press F4) to the generated DATA step. At this point, you can add or remove options from the INFILE statement and customize the INFORMAT, FORMAT, and INPUT statements to your data.
Granted, the grammar in this section is horrific! The idea is that the IMPORT Procedure generates source code that can be recalled and modified for subsequent submission.
I imported a csv in SAS however the format was incorrect in the original file. I am working with addresses, so for example, the city will be incorrectly concatenated to the street variable or the zip code will be in the city variable. How to set parameters after importing. when I tried to use set length, it gave me a message saying that the length was already set before and that I should work with the DATA step. I do not know where exactly to do this.
Well, you can manually define what and how lines are read into SAS. Here is an example from Proc import.
Just change the delimiter ; in this case. Also, depending if your data has header row set the firstObs properly. Other than those, just list the variables and their attributes.
data WORK.Imported;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'c:\input\datafile.csv' delimiter =';' MISSOVER DSD lrecl=13106 firstobs=2 ;
informat first_var $15. ;
informat second_var $24. ;
informat third_var best32. ;
/*... add as many as your data has */
format first_var $15. ;
format second_var $24. ;
format third_var best12. ;
/*... add as many as your data has */
input
First_var $
Second_var $
CCM
/*... add as many as your data has */
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Lazy approach is just using proc import with guessingrows=max option:
proc import datafile="c:\imput\input.csv" out=imprted replace;
DELIMITER=";" ;
getnames=yes;
guessingrows=MAX;
run;
Be aware that in very large files this is going to ta long time. Usually better to set the rows 'sufficiently large' like 32k.
For more on importing see proc import. Or more generally on importing/exporting
This issue happened to me as well, problem is there are some 'line breaks' used in the input csv file. If you replace all the line breaks with space, save the file, and then import in SAS - it will import the data correctly.
Easy way to do this is:
Press Ctrl+H to open the Find & Replace dialog box.
In the Find What field enter Ctrl+J. It will look empty, but you will see a tiny dot.
In the Replace With field, enter any value to replace line breaks. Usually, it is space to avoid 2 words join accidentally. If all you need is deleting the line breaks, leave the "Replace With" field empty.
See this page for other ways:
I have a text file and I want to change the delimiter from comma to pipe (|). Here is what the data file look like-
P0020016,450.05,20150818000000,24.1,140,1
P0020016,450.05,20150818010000,24.1,140,1
P0020016,450.05,20150818020000,24.1,140,1
How can I change the commas to pipe in SAS? I tried using ODS CSV but it did not work. Thanks!
I would just use a simple DATA _NULL to make that change.
data _null_;
length x1-x6 $200 ;
infile 'old.csv' dsd dlm=',' truncover ;
file 'new.pipe' dsd dlm='|' ;
input x1-x6 ;
put x1-x6;
run;
You could make the 6 into a macro variable. You could even add a step to read the first line and count how many columns there are and set the macro variable. You can change the length of the character variables you read the data into if $200 is too short.
It's not clear exactly what you're looking for but this might help:
data test;
text = "P0020016,450.05,20150818000000,24.1,140,1";
text2 = tranwrd(text,',','|');
run;
I would read the file into a dataset, then export with the DATA step below.
This will export your file and declare your delimiter as "|" (or whatever other character you want to specify)
DATA _NULL_;
FILE "path.txt" DLM = '|' TERMSTR=CRLF;
SET have;
IF _N_ = 1 THEN
PUT 'var1,var2,var3';
PUT var1 var2 var3;
RUN;
CRLF will make sure it does not create one long row.
3 and have a table which I need to update. From my understanding, you can do something like the following:
data new_table;
update old_table update_table;
by some_key;
run;
My issue (well I have a few...) is that I'm importing the "update_table" from a CSV file and the formats aren't matching the "old_table", so the update fails.
I've tried creating the "update_table" from the "old_table" using proc sql create table with zero observations, which created the correct types/formats, but then I was unable to insert data into it without replacing it.
The other major issue I have is that there are a large number of columns (480), and custom formats, and I've run up against a 6000 character limit for the script.
I'm very new to SAS and any help would be greatly appreciated :)
It sounds like you need to use a data step to read in your CSV. There are lots of papers out there explaining how to do this, so I won't cover it here. This will allow you to specify the format (numeric/character) for each field. The nice thing here is you already know what formats they need to be in (from your existing dataset), so you can create this read in fairly easily.
Let's say your data is so:
data have;
informat x date9.;
input x y z $;
datalines;
10JAN2010 1 Base
11JAN2010 4 City
12JAN2010 8 State
;;;;
run;
Now, if you have a CSV of the same format, you can read it in by generating the input code from the above dataset. You can use PROC CONTENTS to do this, or you can generate it by using dictionary.tables which has the same information as PROC CONTENTS.
proc sql;
select catx(' ',name,ifc(type='char', '$' ,' '))into :inputlist
separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE';
select catx(' ',name,informat) into :informatlist separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not missing(informat);
quit;
The above are two examples; they may or may not be sufficient for your particular needs.
Then you use them like so:
data want;
infile datalines dlm=',';
informat &informatlist.;
input &inputlist.;
datalines;
13JAN2010,9,REGION
;;;;
run;
(obviously you would use your CSV file instead of datalines, just used here as example).
The point is you can write the data step code using the metadata from your original dataset.
I needed this today, so I made a macro out of it: https://core.sasjs.io/mp__csv2ds_8sas.html
It doesn't wrap the input statement so it may break with a large number of columns if you have batch line length limits. If anyone would like me to fix that, just raise an issue: https://github.com/sasjs/core/issues/new
I have worked with coworkers on this, googled around, edited this code a million times and I cannot get it to work.
Essentially, I am trying to stack multiple CSV files into one SAS dataset. I created earlier in my SAS the ability to find all of the names of the files [variable fname inside dirlist1]. I've been trying to get this code to work but the problem is some of the observations within these CSV files are blank. So for example column "apples" (see below) will have a majority of the column blank - but will occasionally have data. Right now this code reads in the right data, but when an observation is blank (e.g. for an observation, "apples" is blank - it shifts my data to the left instead of leaving that part blank. Is there something I am missing in this current code that can solve that?
Basically it's skipping texttext,text,,text, text < it's skipping that blank between the commas and continuing on and I WANT that blank.
data all_data (drop=fname);
length bananas $256;
length apples $25;
length grapefruit $10;
length berries $10;
set dirlist1;
filepath = "&dirname"||fname;
infile dummy filevar = filepath length=reclen firstobs=2 dlm=',' end=done missover;
do while(not done);
myfilename = fname;
input bananas apples grapefruit berries;
output;
end;
run;
Edit:
To note, I based this code from code published on a UCLA based site 1
Add the DSD modifier to your infile statement.
infile dummy filevar = filepath length=reclen firstobs=2 dlm=',' end=done missover DSD;
That will tell it to change the default treatment of consecutive delimiters (and also allows it to correctly handle quoted fields with embedded delimiters).
See the documentation on INFILE for more information.