Column names missing when exporting files using SAS data step - csv

I have a large SAS dataset raw_data which contains data collected from various countries. This dataset has a column "country" which lists the country from which the observation is originated. I would like to export a separate .csv file for each country in raw_data. I use the following data step to produce the output:
data _null_;
set raw_data;
length fv $ 200;
fv = "/directory/" || strip(put(country,$32.)) || ".csv";
file write filevar=fv dsd dlm=',';
put (_all_) (:);
run;
However, the resulting .csv files no longer have the column names from raw_data. I have over a hundred columns in my dataset, so listing all of the column names is prohibitive. Can anyone provide me some guidance on how I can modify the above code so as to attach the column names to the .csv files being exported? Any help is appreciated!

You can create a macro variable that holds the variable names and puts them to the CSV file.
proc sql noprint;
select name into :var_list separated by ", "
from sashelp.vcolumn
where libname="WORK" and memname='RAW_DATA'
order by varnum;
quit;
data _null_;
set raw_data;
length fv $ 200;
by country;
fv = "/directory/" || strip(put(country,$32.)) || ".csv";
if first.country then do;
put "&var_list";
end;
file write filevar=fv dsd dlm=',';
put (_all_) (:);
run;

Consider this data step that is very similar to your program. It uses VNEXT to query the PDV and write the variable names as the first record of each file.
proc sort data=sashelp.class out=class;
by age;
run;
data _null_;
set class;
by age;
filevar=catx('\','C:\Users\name\Documents',catx('.',age,'csv'));
file dummy filevar=filevar ls=256 dsd;
if first.age then link names;
put (_all_)(:);
return;
names:
length _name_ $32;
call missing(_name_);
do while(1);
call vnext(_name_);
if _name_ eq: 'FIRST.' then leave;
put _name_ #;
end;
put;
run;

Related

How compute some arithmetical operation on every row in CSV file

I starting with python and i have one question about CSV file where I have rows with data in format number;number:
485;16
646;8
920;16
1102;36
My code know how import csv, but I dont know how I can do some arithmetical operation on every row, f.e. multiplication, division etc. and save it in some variable.
import csv
with open('in.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
print(', '.join(row))
Thanks for help.
First you need to specify delimiter correctly: delimiter=';'
Then you can access elements by row[0] and row[1], but they are strings.
So you have to convert them into integer int(row[0])
All together:
import csv
with open('in.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=';')
for row in spamreader:
some_variable = int(row[0])*int(row[1])
print(some_variable)

SAS: Export a string containing a comma to a single cell within a CSV

I have a footnote which lists items using a comma. The output needs to be sent to a .csv. Because a .csv is comma delimited, the items of the list are output into different cells when opened in Excel. How does one escape a comma when exporting to .csv?
For example,
ods csvall file = "test.csv";
title 'The first seven letters of the alphabet.';
data test;
input x $ ;
datalines;
a
B
c
D
e
F
G
;
run;
footnote 'Observe that the letters a, c, and e are lowercase.';
proc print data = test;
run;
ods csvall close;
I have tried using ods escapechar "\"; to define an escape symbol and changing the footnote to
footnote 'Observe that the letters a\, c\, and e are lowercase.';
but this does not work. It may be that csvall is not part of ODS. Beyond this, though, I'm not sure what else to try.
This is the text in the file generated by the above code:
The first seven letters of the alphabet.
"Obs","x"
"1","a"
"2","B"
"3","c"
"4","D"
"5","e"
"6","F"
"7","G"
Observe that the letters a, c, and e are lowercase.
UPDATE: Reconsidering my question in light of responses, the division of the footnote into difference cells is not due to SAS. It is due to Excel when the CSV is imported. However, I do not think there is a way to automatically override this behavior.
If you add single quotes to the text of footnote then the commas will be quoted and can be read back into field1.
ods csvall file = "~/test.csv";
title 'The first seven letters of the alphabet.';
data test;
input x $ ;
datalines;
a
B
c
D
e
F
G
;
run;
footnote "'Observe that the letters a, c, and e are lowercase.'";
proc print data = test;
run;
ods csvall close;
data _null_;
infile "~/test.csv" dsd missover;
length field1-field2 $60.;
input (field:)(:);
put (field:)(=);
run;
I presume your goal is to produce a file containing text that looks like this:
"The first seven letters of the alphabet."
"Obs","x"
"1","a"
"2","B"
"3","c"
"4","D"
"5","e"
"6","F"
"7","G"
"Observe that the letters a, c, and e are lowercase."
This should behave the way you are expecting - the commas inside the quoted text of the footer should not be interpreted as field separators by most programs.
I would suggest doing this using a data step rather than ods csvall, as it is relatively simple to do:
data _null_;
file "/tmp/test2.csv" dsd;
set test end = eof;
if _n_ = 1 then put '"The first seven letters of the alphabet."';
put _n_ x;
if eof then put '"Observe that the letters a, c, and e are lowercase."';
run;
N.B. this will only add quotes around character variables if they contain commas or embedded quotes (which get promoted to double quotes), and you have to add the quotes around string literals yourself.

Dealing with currency values in PIG - pigstorage

I have 2 column CSV file loaded in HDFS. Column 1 is a Model name, column 2 is a price in $. Example - Model: IE33, Price: $52678.00
When I run the following script, the price values all return as a two digit result example $52.
ultraPrice = LOAD '/user/maria_dev/UltrasoundPrice.csv' USING PigStorage(',') AS (
Model, Price);
dump ultraPrice;
All my values are between $20000 and $60000. I don't know why it is being cut off.
If I change the CSV file and remove the $ from the price values everything works fine, but I know there has to be a better way.
Note that in your load statement you are not specifying the datatype.By default the model and price will be of type bytearray and hence the discrepancy.
You can either remove the $ from the csv file or load the data as chararray and replace the $ sign and cast it into float.
A = LOAD '/user/maria_dev/UltrasoundPrice.csv' USING TextLoader() as (line:chararray);
A1 = FOREACH A GENERATE REPLACE(line,'([^a-zA-Z0-9.,\\s]+)','');
B = FOREACH A1 GENERATE FLATTEN(STRSPLIT($0,','));
B1 = FOREACH B GENERATE $0 as Model,(float)$1 as Price;
DUMP B1;

Date variable is NULL while loading csv data into hive External table

I am trying to load a SAS Dataset to Hive external table. For that, I have converted SAS dataset into CSV file format first. In sas dataset, Date variable (i.e as_of_dt) contents shows this:
LENGTH=8 , FORMAT= DATE9. , INFORMAT=DATE9. , LABLE=as_of_dt
And for converting SAS into CSV, I have used below code patch (i have used 'retain' statement before in sas so that the order of variables are maintained):
proc export data=input_SASdataset_for_csv_conv
outfile= "/mdl/myData/final_merged_table_201501.csv"
dbms=csv
replace;
putnames=no;
run;
Till here (i.e till csv file creation), the Date variable is read correctly. But after this, when I am loading it into Hive External Table by using below command in HIVE, then the DATE variable (i.e as_of_dt) is getting assigned as NULL :
CREATE EXTERNAL TABLE final_merged_table_20151(as_of_dt DATE, client_cm_id STRING, cm11 BIGINT, cm_id BIGINT, corp_id BIGINT, iclic_id STRING, mkt_segment_cd STRING, product_type_cd STRING, rated_company_id STRING, recovery_amt DOUBLE, total_bal_amt DOUBLE, write_off_amt DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/mdl/myData';
Also, when i am using this command in hive desc formatted final_merged_table_201501, then I am getting following table parameters:
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
numFiles 0
numRows -1
rawDataSize -1
totalSize 0
transient_lastDdlTime 1447151851
But even though it shows numRows=-1, still I am able to see data inside the table, by using hive command SELECT * FROM final_merged_table_20151 limit 10; , with Date variable (as_of_dt) stored as NULL.
Where might be the problem?
Based on madhu's comment you need to change the format on as_of_dt to yymmdd10.
You can do that with PROC DATASETS. Here is an example:
data test;
/*Test data with AS_OF_DT formatted date9. per your question*/
format as_of_dt date9.;
do as_of_dt=today() to today()+5;
output;
end;
run;
proc datasets lib=work nolist;
/*Modify Test Data Set and set format for AS_OF_DT variable*/
modify test;
attrib as_of_dt format=yymmdd10.;
run;
quit;
/*Create CSV*/
proc export file="C:\temp\test.csv"
data=test
dbms=csv
replace;
putnames=no;
run;
If you open the CSV, you will see the date in YYYY-MM-DD format.

Appending csv files in SAS

I have a bunch of csv files. Each has data from a different period:
filename file1 'JAN2011_PRICE.csv';
filename file2 'FEB2011_PRICE.csv';
...
Do I need to manually create intermediate datasets and then append them all together? Is there a better way of doing this?
SOLUTION
From the documentation it is preferable to use:
data allcsv;
length fileloc myinfile $ 300;
input fileloc $ ; /* read instream data */
/* The INFILE statement closes the current file
and opens a new one if FILELOC changes value
when INFILE executes */
infile my filevar=fileloc
filename=myinfile end=done dlm=',';
/* DONE set to 1 when last input record read */
do while(not done);
/* Read all input records from the currently */
/* opened input file */
input col1 col2 col3 ...;
output;
end;
put 'Finished reading ' myinfile=;
datalines;
path-to-file1
path-to-file2
...
run;
To read a bunch of csv files into a single SAS dataset, you can use a single data step as described in the SAS documentation here. You want the second example in this section which uses the filevar= infile option.
There should be no reason to create intermediate datasets.
The easiest method is to use a wildcard.
filename allfiles '*PRICE.csv';
data allcsv;
infile allfiles end=done dlm=',';
input col1 col2 col3 ...;
run;