i am having a problem exporting a SAS file to a CSV file. When exporting to CSV assigned variable "values" are being exported instead of the raw numbers. For example dichotomous variables with raw values of "0" or "1" in the SAS file are being exported using their assigned "values" of "yes" or "no" instead of the raw values of 0 or 1 to the exported CSV file.
Is the code to assure that the raw values(e.g 0 or 1) are exported?
Thanks
You have formats applied to your data then.
You can remove the format ahead of time or apply it as needed to generate the data you want. ODS CSV is a good way to do this as you can easily control the formats within a proc and test it easily.
ods csv file='/folders/myfolders/demo.csv';
proc print data=sashelp.class noobs;
format age 8.2; *formats variable as numeric with 2 decimal places;
run;
ods csv close;
Related
I need to convert a CSV file to JSON file using Python. I used this,
variable = csv.DictReader(file.csv)
It throws this ERROR
csv.Error: line contains NULL byte
I checked the CSV file in Excel, it shows no NULL chars, but when I printed the data in CSV file using Python. There are some data like SOHNULNULHG (here last 2 letters, HG is the data displaying in the Excel). I need to remove these ASCII chars in the CSV file, while converting to JSON. (i.e. I need only HG from the above string)
I just ran into the same issue. I converted my csv file to csv UTF-8 and ran it again without any errors. That seemed to fix the ASCII char issue. Hope that helps.
To convert the csv type, I just opened my file up in Excel, did save as, then selected CSV UTF-8(Comma delimited)(*.csv) in the Save as type.
Hope that helps.
I have data in the following json format:
{"metadata1":"val1","metadata2":"val2","data_rows":[{"var1":1,"var2":2,"var3":3},{"var1":4,"var2":5,"var3":6}]}
There are some metadata variables at the start, which only appear once, followed by multiple data records, all on the same line. How can I import this into a SAS dataset?
/*Create json file containing sample data*/
filename json "%sysfunc(pathname(work))\json.txt";
data _null_;
file json;
put '{"metadata1":"val1,","metadata2":"val2}","data_rows":[{"var1":1,"var2":2,"var3":3},{"var1":4,"var2":5,"var3":6}]}';
run;
/*Data step for importing the json file*/
data want;
infile json dsd dlm='},' lrecl = 1000000 n=1;
retain metadata1 metadata2;
if _n_ = 1 then input #'metadata1":' metadata1 :$8. #'metadata2":' metadata2 :$8. #;
input #'var1":' var1 :8. #'var2":' var2 :8. #'var3":' var3 :8. ##;
run;
Notes:
The point for SAS to start reading each variable is set using #'string' logic.
Setting , and } as delimiters and using : format modifiers on the input statement tells SAS to keep reading characters from the specified start point until it's read the maximum requested number or a delimiter has been reached.
Setting dsd on the infile statement removes the double quotes from character data values and prevents any problems from occurring if character variables contain delimiters.
The double trailing # tells SAS to continue reading more records from the same line using the same logic until it reaches the end of the line.
Metadata variables are handled as a special case using a separate input statement. They could easily be diverted to a single row in a separate file if desired.
lrecl needs to be greater than or equal to the length of your file for this approach to work.
Setting n=1 should help to reduce memory usage if your file is very large, by preventing SAS from attempting to buffer multiple input lines.
I`ve got (and will receive in the future) many CSV files that use the semicolon as delimiter and the comma as decimal separator.
So far I could not find out how to import these files into SAS using proc import -- or in any other automated fashion without the need for messing around with the variable names manually.
Create some sample data:
%let filename = %sysfunc(pathname(work))\sap.csv;
data _null_;
file "&filename";
put 'a;b';
put '12345,11;67890,66';
run;
The import code:
proc import out = sap01
datafile= "&filename"
dbms = dlm;
delimiter = ";";
GETNAMES = YES;
run;
After the import a value for the variable "AMOUNT" such as 350,58 (which corresponds to 350.58 in the US format) would look like 35,058 (meaning thirtyfivethousand...) in SAS (and after re-export to the German EXCEL it would look like 35.058,00).
A simple but dirty workaround would be the following:
data sap02; set sap01;
AMOUNT = AMOUNT/100;
format AMOUNT best15.2;
run;
I wonder if there is a simple way to define the decimal separator for the CVS-import (similar to the specification of the delimiter). ..or any other "cleaner" solution compared to my workaround.
Many thanks in advance!
You technically should use dbms=dlm not dbms=csv, though it does figure things out. CSV means "Comma separated values", while DLM means "delimited", which is correct here.
I don't think there's a direct way to make SAS read in with the comma via PROC IMPORT. You need to tell SAS to use the NUMXw.d informat when reading in the data, and I don't see a way to force that setting in SAS. (There's an option for output with a comma, NLDECSEPARATOR, but I don't think that works here.)
Your best bet is either to write data step code yourself, or to run the PROC IMPORT, go to the log, and copy/paste the read in code into your program; then for each of the read-in records add :NUMX10. or whatever the appropriate maximum width of the field is. It will end up looking something like this:
data want;
infile "whatever.txt" dlm=';' lrecl=32767 missover;
input
firstnumvar :NUMX10.
secondnumvar :NUMX10.
thirdnumvar :NUMX10.
fourthnumvar :NUMX10.
charvar :$15.
charvar2 :$15.
;
run;
It will also generate lots of informat and format code; you can alternately convert the informats to NUMX10. instead of BEST. instead of adding the informat to the read-in. You can also just remove the informats, unless you have date fields.
data want;
infile "whatever.txt" dlm=';' lrecl=32767 missover;
informat firstnumvar secondnumvar thirdnumvar fourthnumvar NUMX10.;
informat charvar $15.;
format firstnumvar secondnumvar thirdnumvar fourthnumvar BEST12.;
format charvar $15.;
input
firstnumvar
secondnumvar
thirdnumvar
fourthnumvar
charvar $
;
run;
Your best bet is either to write data step code yourself, or to run
the PROC IMPORT, go to the log, and copy/paste the read in code into
your program
This has a drawback. If there is a change in the stucture of the csv file, for example a changed column order, then one has to change the code in the SAS programm.
So it is safer to change the input, substituting in the numeric fields the comma with dot and passing SAS the modified input.
The first idea was to use a perl program for this, and then use in SAS a filename with a pipe to read the modified input.
Unfortunately there is a SAS restriction in the proc import: The IMPORT procedure does not support device types or access methods for the FILENAME statement except for DISK.
So one has to create a workfile on disk with the adjusted input.
I used the CVS_PP package to read the csv file.
testdata.csv contains the csv data to read.
substitute_commasep.perl is the name of the perl program
perl code:
# use lib "/........"; # specifiy, if Text::CSV_PP is locally installed. Otherwise error message: Can't locate Text/CSV_PP.pm in ....;
use Text::CSV_PP;
use strict;
my $csv = Text::CSV_PP->new({ binary => 1
,sep_char => ';'
}) or die "Error creating CSV object: ".Text::CSV_PP->error_diag ();
open my $fhi, "<", "$ARGV[0]" or die "Error reading CSV file: $!";
while ( my $colref = $csv->getline( $fhi) ) {
foreach (#$colref) { # analyze each column value
s/,/\./ if /^\s*[\d,]*\s*$/; # substitute, if the field contains only numbers and ,
}
$csv->print(\*STDOUT, $colref);
print "\n";
}
$csv->eof or $csv->error_diag();
close $fhi;
SAS code:
filename readcsv pipe "perl substitute_commasep.perl testdata.csv";
filename dummy "dummy.csv";
data _null_;
infile readcsv;
file dummy;
input;
put _infile_;
run;
proc import datafile=dummy
out=data1
dbms=dlm
replace;
delimiter=';';
getnames=yes;
guessingrows=32767;
run;
I just started using SAS 3 days ago and I need to merge ~50 csv files into 1 SAS dataset.
The 50 csv files have multiple variables with only 1 variable in common i.e. "region_id"
I've used SAS enterprise guide drag and drop functionalities to do this but it was too manual and took me half a day to upload and merge 47 csv files into 1 SAS file.
I was wondering whether anyone has a more intelligent way of doing this using base SAS?
Any advice and tips appreciated!
Thank you!
Example filenames:
2011Census_B01_AUST_short
2011Census_B02A_AUST_short
2011Census_B02B_AUST_short
2011Census_B03_AUST_short
.
.
2011Census_xx_AUST_short
I have more than 50 csv files to upload and merge.
The number and type of variables in the csv file varies in each csv file. However, all csv files have 1 common variable = "region_id"
Example variables:
region_id, Tot_P_M, Tot_P_F, Tot_P_P, Age_0_4_yr_F etc...
First, we'll need an automated way to import. The below simple macro takes the location of the file and the name of the file as inputs, and outputs a dataset to the work directory. (I'd use the concatenate function in Excel to create the SAS code 50 times). Also, we are sorting it to make the merge easier later.
%macro importcsv(location=,filename=);
proc import datafile="&location./&filename..csv"
out=&filename.
dbms=csv
replace;
getnames=yes;
run;
proc sort data= &filename.; by region_id; run;
%mend;
%importcsv(location = C:/Desktop,filename = 2011Census_B01_AUST_short)
.
.
.
Then simply merge all of the data together again. I added ellipses simply because I didn't want to right out 50 times.
data merged;
merge dataseta datasetb datasetc ... datasetax;
by region_id;
run;
Hope this helps.
The output we need to produce is a standard delimited file but instead of ascii content we need binary. Is this possible using SAS?
Is there a specific Binary Format you need? Or just something non-ascii? If you're using proc export, you're probably limited to whatever formats are available. However, you can always create the csv manually.
If anything will do, you could simply zip the csv file.
Running on a *nix system, for example, you'd use something like:
filename outfile pipe "gzip -c > myfile.csv.gz";
Then create the csv manually:
data _null_;
set mydata;
file outfile;
put var1 "," var2 "," var3;
run;
If this is PC/Windows SAS, I'm not as familiar, but you'll probably need to install a command-line zip utility.
This link from SAS suggests using winzip, which has a freely downloadable version. Otherwise, the code is similar.
http://support.sas.com/kb/26/011.html
You can actually make a CSV file as a SAS catalog entry; CSV is a valid SAS Catalog entry type.
Here's an example:
filename of catalog "sasuser.test.class.csv";
proc export data=sashelp.class
outfile=of
dbms=dlm;
delimiter=',';
run;
filename of clear;
This little piece of code exports SASHELP.CLASS to a SAS Catalog entry of entry type CSV.
This way you get a binary format you can move between SAS installations on different platforms with PROC CPORT/CIMPORT, not having to worry if the used binary package format is available to your SAS session, since it's an internal SAS format.
Are you saying you have binary data that you want to output to csv?
If so, I don't think there is necessarily a defined standard for how this should be handled.
I suggest trying it (proc export comes to mind) and seeing if the results match your expectations.
Using SAS, output a .csv file; Open it in Excel and Save As whichever format your client wants. You can automate this process with a little bit of scripting in ### as well. (Substitute ### with your favorite scripting language.)