I have a text file and I want to change the delimiter from comma to pipe (|). Here is what the data file look like-
P0020016,450.05,20150818000000,24.1,140,1
P0020016,450.05,20150818010000,24.1,140,1
P0020016,450.05,20150818020000,24.1,140,1
How can I change the commas to pipe in SAS? I tried using ODS CSV but it did not work. Thanks!
I would just use a simple DATA _NULL to make that change.
data _null_;
length x1-x6 $200 ;
infile 'old.csv' dsd dlm=',' truncover ;
file 'new.pipe' dsd dlm='|' ;
input x1-x6 ;
put x1-x6;
run;
You could make the 6 into a macro variable. You could even add a step to read the first line and count how many columns there are and set the macro variable. You can change the length of the character variables you read the data into if $200 is too short.
It's not clear exactly what you're looking for but this might help:
data test;
text = "P0020016,450.05,20150818000000,24.1,140,1";
text2 = tranwrd(text,',','|');
run;
I would read the file into a dataset, then export with the DATA step below.
This will export your file and declare your delimiter as "|" (or whatever other character you want to specify)
DATA _NULL_;
FILE "path.txt" DLM = '|' TERMSTR=CRLF;
SET have;
IF _N_ = 1 THEN
PUT 'var1,var2,var3';
PUT var1 var2 var3;
RUN;
CRLF will make sure it does not create one long row.
Related
I imported a csv in SAS however the format was incorrect in the original file. I am working with addresses, so for example, the city will be incorrectly concatenated to the street variable or the zip code will be in the city variable. How to set parameters after importing. when I tried to use set length, it gave me a message saying that the length was already set before and that I should work with the DATA step. I do not know where exactly to do this.
Well, you can manually define what and how lines are read into SAS. Here is an example from Proc import.
Just change the delimiter ; in this case. Also, depending if your data has header row set the firstObs properly. Other than those, just list the variables and their attributes.
data WORK.Imported;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'c:\input\datafile.csv' delimiter =';' MISSOVER DSD lrecl=13106 firstobs=2 ;
informat first_var $15. ;
informat second_var $24. ;
informat third_var best32. ;
/*... add as many as your data has */
format first_var $15. ;
format second_var $24. ;
format third_var best12. ;
/*... add as many as your data has */
input
First_var $
Second_var $
CCM
/*... add as many as your data has */
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Lazy approach is just using proc import with guessingrows=max option:
proc import datafile="c:\imput\input.csv" out=imprted replace;
DELIMITER=";" ;
getnames=yes;
guessingrows=MAX;
run;
Be aware that in very large files this is going to ta long time. Usually better to set the rows 'sufficiently large' like 32k.
For more on importing see proc import. Or more generally on importing/exporting
This issue happened to me as well, problem is there are some 'line breaks' used in the input csv file. If you replace all the line breaks with space, save the file, and then import in SAS - it will import the data correctly.
Easy way to do this is:
Press Ctrl+H to open the Find & Replace dialog box.
In the Find What field enter Ctrl+J. It will look empty, but you will see a tiny dot.
In the Replace With field, enter any value to replace line breaks. Usually, it is space to avoid 2 words join accidentally. If all you need is deleting the line breaks, leave the "Replace With" field empty.
See this page for other ways:
I am pretty new to SAS programming and trying to find the most efficient way to my current ongoing initiative. Basically, I need to modify the existing .csv file stored on the SAS server and save it in my folder on the same server.
Modification required:
keep .csv as format
use "|" instead of "," as delimiter
have the following output name: filename_YYYYMMDDhhmmss.csv
keep only 4 variables from the original file
rename some of the variables we keep
Here is the script I am currently using, but there are a few issues with it:
PROC IMPORT OUT = libname.original_file (drop=var0)
FILE = "/.../file_on_server.csv"
DBMS = CSV
REPLACE;
RUN;
%PUT date_human = %SYSFUNC(PUTN(%sysevalf(%SYSFUNC(TODAY())-1), datetime20.));
proc export data = libname.original_file ( rename= ( var1=VAR11 var2=VAR22 Type=VAR33 ))
outfile = '/.../filename_&date_human..csv' label dbms=csv replace;
delimiter='|';
run;
I also have an issue with the variable called "Type" when renaming it as it looks like there is a conflict with some of the system key words. Date format is not good either, and I was not able to find the exact format on the SAS forums, unfortunately.
Any advice on how to make this script more efficient is greatly appreciated.
I wouldn't bother with trying to actually read the data into a SAS dataset. Just process it and write it back out. If the input structure is consistent then it is pretty simple. Just read everything as character strings and output the columns that you want to keep.
Let's assume that the data has 12 columns and the last one of the four that want to keep is the 10th column. So you only need to read in 10 of them.
First setup your input and output filenames in macro variables to make it easier to edit. You can use your logic for generating the filename for the new file.
%let infile=/.../file_on_server.csv;
%let outfile=/.../filename_&date_human..csv;
Then use a simple DATA _NULL_ step to read the data as character strings and write it back out. You can even change the relative order of the four columns if you want. So this program will copy the 2nd, 5th, 4th and 10th columns and change the column headers to NewName1, NewName2, NewName3 and NewName4.
data _null_;
infile "&infile" dsd dlm=',' truncover;
file "&outfile" dsd dlm='|';
length var1-var10 $200 ;
input var1-var10;
if _n_=1 then do;
var2='NewName1';
var5='NewName2';
var4='NewName3';
var10='NewName4';
end;
put var2 var5 var4 var10 ;
run;
If some of the data for the four columns you want to keep are longer than 200 characters then just update the LENGTH statement.
So let's try a little experiment. First let's make a dummy CSV file.
filename example temp;
data _null_;
file example ;
input;
put _infile_;
cards4;
a,b,c,d,e,f,g,h,i,j,k,l,m
1,2,3,4,5,6,7,8,9,10,11,12,13
o,p,q,r,s,t,u,v,w,x,y,z
;;;;
Now let's try running it. I will modify the INFILE and FILE statements to read from my temp file and write the result to the log.
infile example /* "&infile" */ dsd dlm=',' truncover;
file log /* "&outfile" */ dsd dlm='|';
Here are the resulting rows written.
NewName1|NewName2|NewName3|NewName4
2|5|4|10
p|s|r|x
I have a particular problem. I have exported a csv file where I on some columns needed to put the data in quoation-marks because of leading zeros, and sometimes a long datanumber includes "E" in them on the export. Now I am trying to import the same file into SAS to see if my proc import-routine works.
When I import the file all of the data comes through, but are compressed into two columns(hence wrong with my delimiter?) when I actually exported 20 columns.
Not all columns are enclosed in quotation-marks, just a couple of them. An example of the data:
CustomerID CustomerName Product Price BillingNR
"01234" Customer 1 Product1 Price1 "03541"
"52465" Customer 2 Product2 Price2 ""
"23454" Customer 3 Product3 Price3 "035411236952154589632154"
CustomerID and BillingNR are then enclosed in quotation marks.
How can I import this dataset when only some of the columns are enclosed in quotation marks while others arent? Or simply remove all double quotes from the when importing? Heres my code:
%macro import;
%if &exist= "Yes" %then %do;
proc import
datafile= "\\mypath\data.csv"
DBMS=CSV
out=Sales
replace;
getnames=YES;
run;
%end;
%else %do;
%put Nothing happens;
%end;
%mend;
%lesInn;
The IF/ELSE-test is just another macro where i test if the file specified exists. I have tried to research different methods, and am still looking for similar problems, but nothing have seemed to work.
All answers much appreciated.
Toor
If you read the file using the DSD option then SAS will automatically remove the quotes from around the values. Even quotes that are around values that do not need to be quoted, like most of your example data.
data want ;
infile cards dsd truncover firstobs=2;
length CustomerID $5 CustomerName $20 Product $20 Price $8 BillingNR $30 ;
input CustomerID -- BillingNR ;
cards;
CustomerID,CustomerName,Product,Price,BillingNR
"01234",Customer 1,Product1,Price1,"03541"
"52465",Customer 2,Product2,Price2,""
"23454",Customer 3,Product3,Price3,"035411236952154589632154"
;
Will result in values like:
CSV -> Comma Separated Values
I don't see commas being used as your delimiters, but pipes.
Specify that your delimiter is a pipe, and increase the GUESSINGROWS option to a large number so it assigns the correct length and type.
Proc import ... DBMS = DLM Replace;
Delimiter='|';
GuessingRows=10000;
....remaining options;
Run;
I'm still not sure Proc Import will work. If it doesn't you'll need to write the data step code and make sure to specify the DSD option which will deal with the quotes.
Edit: Based on question edit, most accurate method is to read via a data step. As mentioned the DSD option will handle the quotes.
3 and have a table which I need to update. From my understanding, you can do something like the following:
data new_table;
update old_table update_table;
by some_key;
run;
My issue (well I have a few...) is that I'm importing the "update_table" from a CSV file and the formats aren't matching the "old_table", so the update fails.
I've tried creating the "update_table" from the "old_table" using proc sql create table with zero observations, which created the correct types/formats, but then I was unable to insert data into it without replacing it.
The other major issue I have is that there are a large number of columns (480), and custom formats, and I've run up against a 6000 character limit for the script.
I'm very new to SAS and any help would be greatly appreciated :)
It sounds like you need to use a data step to read in your CSV. There are lots of papers out there explaining how to do this, so I won't cover it here. This will allow you to specify the format (numeric/character) for each field. The nice thing here is you already know what formats they need to be in (from your existing dataset), so you can create this read in fairly easily.
Let's say your data is so:
data have;
informat x date9.;
input x y z $;
datalines;
10JAN2010 1 Base
11JAN2010 4 City
12JAN2010 8 State
;;;;
run;
Now, if you have a CSV of the same format, you can read it in by generating the input code from the above dataset. You can use PROC CONTENTS to do this, or you can generate it by using dictionary.tables which has the same information as PROC CONTENTS.
proc sql;
select catx(' ',name,ifc(type='char', '$' ,' '))into :inputlist
separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE';
select catx(' ',name,informat) into :informatlist separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not missing(informat);
quit;
The above are two examples; they may or may not be sufficient for your particular needs.
Then you use them like so:
data want;
infile datalines dlm=',';
informat &informatlist.;
input &inputlist.;
datalines;
13JAN2010,9,REGION
;;;;
run;
(obviously you would use your CSV file instead of datalines, just used here as example).
The point is you can write the data step code using the metadata from your original dataset.
I needed this today, so I made a macro out of it: https://core.sasjs.io/mp__csv2ds_8sas.html
It doesn't wrap the input statement so it may break with a large number of columns if you have batch line length limits. If anyone would like me to fix that, just raise an issue: https://github.com/sasjs/core/issues/new
I have worked with coworkers on this, googled around, edited this code a million times and I cannot get it to work.
Essentially, I am trying to stack multiple CSV files into one SAS dataset. I created earlier in my SAS the ability to find all of the names of the files [variable fname inside dirlist1]. I've been trying to get this code to work but the problem is some of the observations within these CSV files are blank. So for example column "apples" (see below) will have a majority of the column blank - but will occasionally have data. Right now this code reads in the right data, but when an observation is blank (e.g. for an observation, "apples" is blank - it shifts my data to the left instead of leaving that part blank. Is there something I am missing in this current code that can solve that?
Basically it's skipping texttext,text,,text, text < it's skipping that blank between the commas and continuing on and I WANT that blank.
data all_data (drop=fname);
length bananas $256;
length apples $25;
length grapefruit $10;
length berries $10;
set dirlist1;
filepath = "&dirname"||fname;
infile dummy filevar = filepath length=reclen firstobs=2 dlm=',' end=done missover;
do while(not done);
myfilename = fname;
input bananas apples grapefruit berries;
output;
end;
run;
Edit:
To note, I based this code from code published on a UCLA based site 1
Add the DSD modifier to your infile statement.
infile dummy filevar = filepath length=reclen firstobs=2 dlm=',' end=done missover DSD;
That will tell it to change the default treatment of consecutive delimiters (and also allows it to correctly handle quoted fields with embedded delimiters).
See the documentation on INFILE for more information.