Proc json produces extra blanks after applying a format - json

I would like to export a sas dataset to json. I need to apply the commax10.1 format to make it suitable for some language versions. The problem is that the fmtnumeric option applies the format correctly but inserts extra blanks inside the quotes. I have tried trimblanks and other options but have not been able to get rid of them. How to delete the empty blanks inside the quotes? Note: I would like the values to remain inside the quotes
In addition, is it possible to replace the null values with “” ?
Sample data:
data testdata_;
input var1 var2 var3;
format _all_ commax10.1;
datalines;
3.1582 0.3 1.8
21 . .
1.2 4.5 6.4
;
proc json out = 'G:\test.json' pretty fmtnumeric nosastags trimblanks keys;
export testdata_;
run;
In the link you can see what the output looks like.
output of json

Use a custom format function that strips the leading and trailing spaces.
Example:
proc fcmp outlib=work.custom.formatfunctions;
function stripcommax(number) $;
return (strip(put(number,commax10.1)));
endsub;
run;
options cmplib=(work.custom);
proc format;
value commaxstrip other=[stripcommax()];
run;
data testdata_;
input var1 var2 var3;
datalines;
3.1582 0.3 1.8
21 . .
1.2 4.5 6.4
;
proc json out = 'test.json'
pretty
fmtnumeric
nosastags
keys
/*
/* trimblanks */
;
format var: commaxstrip.;
export testdata_;
run;
data _null_;
infile 'test.json';
input;
put _infile_ ;
run;

trimblanks only trims trailing blanks. The format itself is adding leading blanks, and proc json has no option that I am aware of to remove leading blanks.
One option would be to convert all of your values to strings, then export.
data testdata_;
input var1 var2 var3;
format _all_ commax10.1;
array var[*] var1-var3;
array varc[3] $;
do i = 1 to dim(var);
varc[i] = strip(compress(put(var[i], commax10.1), '.') );
end;
keep varc:;
datalines;
3.1582 0.3 1.8
21 . .
1.2 4.5 6.4
;
This would be a great feature request. I recommend posting this to the SASWare Ballot Ideas or contact Tech Support and let them know of this issue.

This is not the only issue with proc json - other challenges include line length limits for a SAS 9 _webout destination, proc failures when ingesting invalid characters, and inability to mod to a destination.
For that reason in the SASjs framework and Data Controller we tend to revert to a data step approach.
Our macro is open source and available here: https://core.sasjs.io/mp__jsonout_8sas.html
To send formatted values, invoke as follows:
data testdata_;
input var1 var2 var3;
format _all_ commax10.1;
datalines;
3.1582 0.3 1.8
21 . .
1.2 4.5 6.4
;
filename _dest "/tmp/test.json";
/* open the JSON */
%mp_jsonout(OPEN,jref=_dest)
/* send the data */
%mp_jsonout(OBJ,testdata_,jref=_dest,fmt=Y)
/* close the JSON */
%mp_jsonout(CLOSE,jref=_dest)
/* display result */
data _null_;
infile _dest;
input;
putlog _infile_;
run;
Which gives:

Related

SAS CSV export has an unwanted leading comma in each row

I'm new to SAS and am struggling to export data to CSV. My code successfully exports a file but it has a leading comma on each non-header row that creates a misalignment.
I feel like I am missing something obvious and could really use some help. Thank you!
data _null_;
file "/export.csv";
set "/table" (keep= field1 field2 'field 3'n);
if _n_ = 1 then
do; /* write header row */
put "field1"
','
"field2"
','
"field3";
end;
do;
put (_all_) (',');
end;
run;
My output ends up looking like this...
field1,field2,field3,
,x ,y ,z
,a ,b ,c
,d ,e ,f...
or
Field1
Field2
Field3
x
y
z
a
b
c
d
e
f
Use proc export instead. It'll save you a lot of time and effort compared to coding a manual export of .csv.
proc export
data = table(keep=field1 field2 field3)
file = '/export.csv'
dbms = csv
replace;
run;
Since you seem to already know the variable names (since you have hard coded the header line) then just modify the PUT statement to not insert the extra comma.
put field1 (field2 'field 3'n) (',');
But really you should just tell SAS you are writing a delimited file by using the DSD option.
/* write header row */
data _null_;
file "/export.csv";
put "field1,field2,field 3";
run;
/* Add the actual data */
data _null_;
file "/export.csv" dsd mod ;
set table;
put field1 field2 'field 3'n ;
run;
If you don't actually know the variable names in advance then just ask SAS to generate them for you.
/* write header row */
proc transpose data=table(obs=0) out=names;
var _all_;
run;
data _null_;
file "/export.csv" dsd;
set names;
put _name_ #;
run;
/* Add the actual data */
data _null_;
file "/export.csv" dsd mod ;
set table ;
put (_all_) (+0);
run;
A general version using my second favorite CALL routine.
data _null_;
file log ls=256 dsd; /*change LOG to your file*/
set sashelp.heart;
if _n_ eq 1 then link names;
put (_all_)(:);
return;
names:
length _name_ $32;
do while(1);
call vnext(_name_);
if _name_ eq '_name_' then leave;
put _name_ #;
end;
put;
return;
run;
In the interim, I had Magoo'd may way into a method that works, too, but these are all good answers.
I used the dsd file statement option.
data _null_;
file "/export.csv"; dsd delimiter=",";
set "/table" (keep= field1 field2 'field 3'n);
if _n_ = 1 then
do; /* write header row */
put "field1"
','
"field2"
','
"field3";
end;
do;
put (_all_) (+0);
end;
run;
One more option, which does work if renaming the columns.
ODS CSV
LABELS option to print labels instead of variable names which allows for renaming. Also supports changing/applying formats while exporting.
ods csv file='/home/fkhurshed/demo.csv';
proc print data=sashelp.class label noobs;
label age= 'Age (years)' sex = 'Sex' weight = 'Weight (lbs)' height = 'Height (inches)';
run;
ods csv close;

how to import csv file if it both includes delimiters / and ,

I have a file with mixed delimiters , and /. When I import it into SAS with the following data step:
data SASDATA.Publications ;
infile 'R:/Lipeng_Wang/PATSTAT/Publications.csv'
DLM = ','
DSD missover lrecl = 32767
firstobs = 3 ;
input pat_publn_id :29.
publn_auth :$29.
publn_nr :$29.
publn_nr_original :$29.
publn_kind :$29.
appln_id :29.
publn_date :YYMMDD10.
publn_lg :$29.
publn_first_grant :29.
publn_claims :29. ;
format publn_date :YYMMDDd10. ;
run ;
the sas log shows that
NOTE: Invalid data for appln_id in line 68262946 33-34.
NOTE: Invalid data for publn_date in line 68262946 36-44.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9
68262946 390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26, ,0,0 62
pat_publn_id=390735978 publn_auth=HK publn_nr=1053433 publn_nr_original=09/465 publn_kind=054
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68262944
NOTE: Invalid data for appln_id in line 68280355 33-34.
NOTE: Invalid data for publn_date in line 68280355 36-44.
68280355 390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26, ,0,0 62
pat_publn_id=390753387 publn_auth=HK publn_nr=1092990 publn_nr_original=60/523 publn_kind=466
appln_id=. publn_date=. publn_lg=2010-03-26 publn_first_grant=. publn_claims=0 _ERROR_=1
_N_=68280353
it seems that i need to file '60/523,466' into the volume of 'publn_nr_original'. but what should I do for it?
Your program code has two obvious issues.
First your syntax on the FORMAT statement is wrong. The : modifier is a feature of the INPUT or PUT statement syntax and should not be used in a FORMAT statement.
Second you are trying to read 29 digits into a number. You cannot store 29 digits accurately into a number in SAS. If those values are really longer than 15 digits you will need to read them into character variables. And if they really are smaller numbers (that could be stored as numbers) then you don't need to include an informat specification to the INPUT statement. SAS already knows how to read numbers from text files. In list mode the INPUT statement will ignore the width on the informat anyway.
But your error message looks to be caused by an improperly formed file. I suspect that one of the first 6 columns has a comma in its value, but whoever created the data file forgot to add quotes around the value with the comma. If you can figure out which field the comma should be in then you might be able to parse the line in a way that it can be used.
Here is one method that might work assuming that the commas only appear in the publn_nr_original variable and that at most one comma will appear.
data want ;
infile cards dsd truncover firstobs=3;
length
pat_publn_id $30
publn_auth $30
publn_nr $30
publn_nr_original $30
publn_kind $30
appln_id $30
publn_date 8
publn_lg $30
publn_first_grant $30
publn_claims $30
;
informat publn_date YYMMDD10. ;
format publn_date YYMMDDd10. ;
input #;
if countw(_infile_,',','mq')<= 10 then input pat_publn_id -- publn_claims ;
else do ;
list ;
input pat_publn_id -- publn_nr_original xxx :$30. publn_kind -- publn_claims ;
publn_nr_original=catx(',',publn_nr_original,xxx);
drop xxx;
end;
cards4;
Header1
Header2
1,22,333,4444,55,6666,2010-03-26,77,8,9999
390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26, ,0,0
390735978,HK,1053433,"09/465,054",A1,275562685,2010-03-26, ,0,0
390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26, ,0,0
;;;;
But the real solution is to fix the process that created the file. So instead of having a line like this in the file:
390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26, ,0,0
The line should have looked like this:
390735978,HK,1053433,"09/465,054",A1,275562685,2010-03-26, ,0,0
Ok, I see what you mean - you have a field with a comma, in a comma separated file, and that field is not quoted.
For this you will have to read the two parts in seperately and add the comma back in, as per example code below.
It's worth noting that all your values must have commas for this approach to work! This in fact looks like bad data, if your input field is indeed "60/523,466" then it should be "quoted" in your input file to be read in correctly.
%let some_csv=%sysfunc(pathname(work))/some.csv;
data _null_;
file "&some_csv";
put /;
put '390735978,HK,1053433,09/465,054,A1,275562685,2010-03-26, ,0,0';
put '390753387,HK,1092990,60/523,466,A1,275562719,2010-03-26, ,0,0';
run;
data work.Publications ;
infile "&some_csv" DLM = ',' DSD missover lrecl = 32767 firstobs = 3 ;
input pat_publn_id :best. publn_auth :$29. publn_nr :$29.
publn_nr_original1 :$29. publn_nr_original2:$29.
publn_kind :$29. appln_id :best.
publn_date :YYMMDD10. publn_lg :$29. publn_first_grant :best.
publn_claims :best. ;
format publn_date YYMMDDd10. ;
publn_nr_original=cats(publn_nr_original1,',',publn_nr_original2);
run ;

SAS - Reading Raw/Delimited file

I'm having an issue reading a CSV file into a SAS dataset without bringing each field with my import. I don't want every field imported, but that's the only way I can seem to get this to work. The issue is I cannot get SAS to read my data correctly, even if it's reading the columns correctly... I think part of the issue is that I have data above my actual column headers that I don't want to read in.
My data is laid out like so
somevalue somevalue somevalue...
var1 var2 var3 var4
abc abc abc abc
Where I want to exclude somevalue, only read in select var's and their corresponding data.
Below is a sample file where I've scrambled all the values in my fields. I only want to keep COLUMN H(8), AT(46) and BE(57)
Here's some code I've tried so far...
This was SAS generated from a PROC IMPORT. My PROC IMPORT worked fine to read in every field value, so I just deleted the fields that I didn't want, but I don't get the output I expect. The values corresponding to the fields does not match.
A) PROC IMPORT
DATAFILE="C:\Users\dip1\Desktop\TU_&YYMM._FIN.csv"
OUT=TU_&YYMM._FIN
DBMS=csv REPLACE;
GETNAMES=NO;
DATAROW=3;
RUN;
generated this in the SAS log (I cut out the other fields I didn't want)
B) DATA TU_&YYMM._FIN_TEST;
infile 'C:\Users\fip1\Desktop\TU_1701_FIN.csv' delimiter = ',' DSD lrecl=32767
firstobs=3 ;
informat VAR8 16. ;
informat VAR46 $1. ;
informat VAR57 $22. ;
format VAR8 16. ;
format VAR46 $1. ;
format VAR57 $22. ;
input
VAR8
VAR46 $
VAR57 $;
run;
I've also tried this below... I believe I'm just missing something..
C) DATA TU_TEST;
INFILE "C:\Users\fip1\Desktop\TU_&yymm._fin.csv" DLM = "," TRUNCOVER FIRSTOBS = 3;
LABEL ACCOUNT_NUMBER = "ACCOUNT NUMBER";
LENGTH ACCOUNT_NUMBER $16.
E $1.
REJECTSUBCATEGORY $22.;
INPUT ACCOUNT_NUMBER
E
REJECTSUBCATEGORY;
RUN;
As well as trying to have SAS point to the columns I want to read in, modifying the above to:
D) DATA TU_TEST;
INFILE "C:\Users\fip1\Desktop\TU_&yymm._fin.csv" DLM = "," TRUNCOVER FIRSTOBS = 3;
LABEL ACCOUNT_NUMBER = "ACCOUNT NUMBER";
LENGTH ACCOUNT_NUMBER $16.
E $1.
REJECTSUBCATEGORY $22.;
INPUT #8 ACCOUNT_NUMBER
#46 E
#57 REJECTSUBCATEGORY;
RUN;
None of which work. Again, I can do this successfully if I bring in all of the fields with either A) or B), given that B) includes all the fields, but I can't get C) or D) to work, and I want to keep the code to a minimum if I can. I'm sure I'm missing something, but I've never had time to tinker with it so I've just been doing it the "long" way..
Here's a snippet of what the data looks like
A(1) B(2) C(3) D(4) E(5) F(6) G(7)
ABCDEFGHIJ ABCDMCARD 202020 4578917 12345674 457894A (blank)
CRA INTERNALID SUBCODE RKEY SEGT FNM FILEDATE
CREDITBUR 2ABH123 AB2CHE123 A28O5176688 J2 Name 8974561
With a delimited file you need to read all of the fields (or at least all of the fields up to the last one you want to keep) even if you do not want to keep all of those fields. For the ones you want to skip you can just read them into a dummy variable that you drop. Or even one of the variables you want to keep that you will overwrite by reading from a later column.
Also don't model your DATA step after the code generated by PROC IMPORT. You can make much cleaner code yourself. For example there is no need for any FORMAT or INFORMAT statements for the three variables you listed. Although if VAR8 really needs 16 digits you might want to attach a format to it so that SAS doesn't use BEST12. format.
data tu_&yymm._fin_test;
infile 'C:\Users\fip1\Desktop\TU_1701_FIN.csv'
dlm=',' dsd lrecl=32767 truncover firstobs=3
;
length var8 8 var46 $1 var57 $22 ;
length dummy $1 ;
input 7*dummy var8 37*dummy var46 10*dummy var57 ;
drop dummy ;
format var8 16. ;
run;
You can replace the VARxx variable names with more meaningful ones if you want (or add a RENAME statement). Using the position numbers here just makes it clearer in this code that the INPUT statement is reading the 57 columns from the input data.

import issue with SAS due to large columns headers

I have many csv files with many variable column headers , up to 2000 variable column headers for some files.
I'm trying to do an import but at one point , the headers are truncated in a 'random' manner and the rest of the data are ignored therefore not imported. I'm putting random between quotes because it may not be random although I don't know the reason if it is not random. But let me give you more insight .
The headers are truncated randomly , some after the 977th variables , some others after the 1401th variable.
The headers are like this BAL_RT,ET-CAP,EXT_EA16,IVOL-NSA,AT;BAL_RT,ET-CAP,EXT_EA16,IVOL-NSA,AT;BAL_RT,ET-CAP,EXT_EA16,IVOL-NSA,AT
This the part of the import log
642130 VAR1439
642131 VAR1440
642132 VAR1441
642133 VAR1442 $
642134 VAR1443 $
642135 VAR1444 $
As you can see , some headers are seen as numeric although all the headers are alphanumeric as they are blending a mixture of character and numeric.
Please find my code for the import below
%macro lec ;
options macrogen symbolgen;
%let nfic=37 ;
%do i=1 %to &nfic ;
PROC IMPORT OUT= fic&i
DATAFILE= "C:\cygwin\home\appEuro\pot\fic&i..csv"
DBMS=DLM REPLACE;
DELIMITER='3B'x;
guessingrows=500 ;
GETNAMES=no;
DATAROW=1;
RUN;
data dico&i ; set fic&i (drop=var1) ;
if _n_ eq 1 ;
index=0 ;
array v var2-var1000 ;
do over v ;
if v ne "" then index=index+1 ;
end ;
run ;
data dico&i ; set dico&i ;
call symput("nvar&i",trim(left(index))) ;
run ;
%put &&nvar&i ;
%end ;
%mend ;
%lec ;
The code is doing an import and also creating a dictionnary with the headers as some of them are long (e.g more than 34 characters)
I'm not sure if these elements are related however, I would welcome any insights you will be able to give me.
Best.
You need to not use PROC IMPORT, as I mentioned in a previous comment. You need to construct your dictionary from a data step read in, because if you have 2000 columns times 34 or more long variable names, you will have more than 32767 record length.
An approach like this is necessary.
data headers;
infile "whatever" dlm=';' lrecl=99999 truncover; *or perhaps longer even, if that is needed - look at log;
length name $50; *or longer if 50 is not your real absolute maximum;
do until length(_infile_)=0;
input name $ #;
output;
end;
stop; *only want to read the first line!;
run;
Now you have your variable names. Now, you can read the file in with GETNAMES=NO; in proc import (you'll have to discard the first line), and then you can use that dictionary to generate rename statements (you will have lots of VARxxxx, but in a predictable order).

SAS set a variable from a filename

I have a number of csv files in a folder. They all have the same structure (3 columns). The SAS code below imports all of them into one dataset. It includes the 3 columns plus their file name.
My challenge is that the filename variable includes the directories and drive letter (e.g. 'Z:\DIRECTORYA\DIRECTORYB\file1.csv'). How can I just list the file name and not the path (e.g. file1.csv)? Thank you
data WORK.mydata;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
length FNAME $80.;
infile 'Z:\DIRECTORYA\DIRECTORYB\*2012*.csv' delimiter = ',' MISSOVER DSD lrecl=32767 filename=fname firstobs=2;
informat A $26. ;
informat B $6. ;
informat C 8. ;
format A $26. ;
format B $6. ;
format C 8. ;
input
A $
B $
C;
if _ERROR_ then call symputx('_EFIERR_',1);
filename=fname;
run;
I think, your best bet is to use regular expressions. Add to your DATA stel:
reg1=prxparse("/\\(\w+\.csv)/");
if prxmatch(reg1, filename) then
filename=prxposn(reg1,1,filename);
We can try break this into two data steps. We'll extract the filenames into one data set in the first data step. In the second data step, we'll slap on the filenames (incl. the .txt or .csv) to their respective observations in the combined data set.
We'll use the PIPEing method or PIPE statement, DIR command and /b.
For example, if I have three .txt files: example.txt, example2.txt and example3.txt
%let path = C:\sasdata;
filename my pipe 'dir "C:\sasdata\*.txt"/b ';
data example;
length filename $256;
infile my length=reclen;
input filename $varying256. reclen;
run;
data mydata;
length filename $100;
set example;
location=cat("&path\",filename);
infile dummy filevar=location length=reclen end=done missover;
do while (not done);
input A B C D;
output;
end;
run;
Output of first first data step:
filename
example.txt
example2.txt
example3.txt
Output of second data step:
filename A B C D
example.txt 171 255 61300 79077
example.txt 123 150 10300 13287
example2.txt 250 255 24800 31992
example2.txt 132 207 48200 62178
example2.txt 243 267 25600 33024
example3.txt 171 255 61300 79077
example3.txt 123 150 10300 13287
example3.txt 138 207 47400 61146
In Windows, this would read all the .txt files in the folder. It should work for .csv files as well as long as you add the delimiter=',' in the infile statement in the second data step and change the extension in filename statement to *.csv. Cheers.