I'm generating a table in SAS and exporting it to a Microsoft Access database (mdb). Right now, I do this by connecting to the database as a library:
libname plus "C:\...\plus.mdb";
data plus.groupings;
set groupings;
run;
However, SAS doesn't export the variable formats to the database, so I end up with numeric values in a table that I want to be human-readable. I've tried proc sql with the same effect. What can I do to get the formatted data into Access?
What I've tried so far:
Plain libname to mdb, data step (as above)
Plain libname to mdb, proc sql create table
OLE DB libname (as in Rob's reference), data step
OLE DB libname, proc sql create table
There's a bunch of alternative connection types here, maybe one of those will work?:
http://support.sas.com/techsup/technote/ts793.pdf
What's working right now:
Because SAS does preserve formatted values in csv, I'm exporting the table to a csv file that feeds a linked table in the Access database. This seems less than ideal, but it works. It's weird that SAS clearly has the capacity to export formatted values, but doesn't seem to document it.
proc export
data= groupings
outfile= "C:\...\groupings.csv"
dbms= CSV
replace;
putnames= yes;
run;
A seeming disadvantage of this approach is that I have to manually recreate the table if I add a new field. If I could drop/create in proc sql, that wouldn't be an issue.
From SAS support: if you create a SQL view using put() to define the variables, you can then export the view to the database and maintain the formatted values. Example:
libname plus "C:\...\plus.mdb";
proc sql;
create view groupings_view as (
SELECT put(gender, gender.) AS gender,
put(race, race.) AS race,
... etc.
FROM groupings
);
create table plus.groupings as (
SELECT *
FROM groupings_view
);
quit;
I wasn't able to just create the view directly in Access - it's not entirely clear to me that Jet supports views in a way that SAS understands, so maybe that's the issue. At any rate, the above does the trick for the limited number of variables I need to export. I can imagine automating the writing of such queries with a funky macro working on the output of proc contents, but I'm terribly grateful I don't have to do that...
Related
I have a CSV file in Azure data lake that , when opened with notepad++ looks something like this:
a,b,c
d,e,f
g,h,i
j,"foo
bar,baz",l
Upon inspection in notepad++ (vew all symbols) it shows me this:
a,b,c[CR][LF]
d,e,f[CR][LF]
g,h,i[CR][LF]
j,"foo[LF]
[LF]
bar,baz",l[CR][LF]
That is to say normal Windows Carriage Return and Line Feed stuff after each row.
With the exceptions that for one of the columns someone inserted a fancy story like such:
foo
bar, baz
My TSQL code to injest the CSV looks like this:
COPY INTO
dbo.SalesLine
FROM
'https://mydatalakeblablabla/folders/myfile.csv'
WITH (
ROWTERMINATOR = '0x0d', -- Tried \n \r\n , 0x0d0a here
FILE_TYPE = 'CSV',
FIELDQUOTE = '"',
FIELDTERMINATOR = ',',
CREDENTIAL = (IDENTITY = 'Managed Identity') --Used to access datalake
)
But the query doesn't work. The common error message in SSMS is:
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 4, column 2 (NAME) in data file
I have no option to correct the faulty rows in the data lake or modify the CSV in any way.
Obviously it is much larger file with real data, but I took a simple example.
How can I modify or rescript the TSQL code to correct the CSV when it is being read?
I recreated a similar file and uploaded it to my datalake and serverless SQL pool seemed to manage just fine:
SELECT *
FROM
OPENROWSET(
BULK 'https://somestorage.dfs.core.windows.net/datalake/raw/badFile.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0'
) AS [result]
My results:
It probably seems like a bit of a workaround but if the improved parser in serverless is making light work of problems like this, then why not make use of the whole suite that is Azure Synapse Analytics. You could use serverless query as a source in a Copy activity in a Synapse Pipeline and load it to your dedicated SQL pool and that would have the same outcome as using the COPY INTO command.
In the past I've done stuff like written special parsing routines, loaded up the file as one column and split it in the db or used Regular Expressions but if there's a simple solution why not use it.
I viewed my test file via online hex editor, maybe I'm missing something:
I cannot quite figure out how to change the format of a column in my data file. I have the data set proc imported, and it guessed the format of a specific column as numeric, I would like to to be character-based.
This is where I'm currently at, and it does not change the format of my NUMBER column:
proc import
datafile = 'datapath'
out = dataname
dbms = CSV
replace
;
format NUMBER $8.
;
guessingrows = 20000
;
run;
You could import the data and then format after using - I believe the following would work.
proc sql;
create table want as
select *,
put(Number, 4.) as CharacterVersion
from data;
quit;
You cannot change the type/format via PROC IMPORT. However, you can write a data step to read in the file and then customize everything. If you're not sure how to start with that, check the log after you run a PROC IMPORT and it will have the 'skeleton' code. You can copy that code, edit it, and run to get what you need. Writing from scratch also works using an INFILE and INPUT statement.
From the help file (search for "Processing Delimited Files in SAS")
If you need to revise your code after the procedure runs, issue the RECALL command (or press F4) to the generated DATA step. At this point, you can add or remove options from the INFILE statement and customize the INFORMAT, FORMAT, and INPUT statements to your data.
Granted, the grammar in this section is horrific! The idea is that the IMPORT Procedure generates source code that can be recalled and modified for subsequent submission.
(SQL Server 2008)
So here's my task ..
I need to export query results to file, and then import that file using SSIS to another DB.
Specific to the task, the data contains every awkward unicode character you can think of, so delimiting with commas, pipes etc is out of the question.
Here are the options SSMS gives me for export format:
Column Aligned
Comma/Tab/Space delimited
Custom delimiter
And here are the options SSIS gives me for a flat file data source:
Delimited (custom)
Fixed Width
Ragged Right
So given that a delimiter character is out of the question ... I cannot see another method that both SSMS & SSIS agree on.
Such as fixed width ?
Seems strange that the 2 closely related MS products have such different options.
Or have I missed something here ?
Any advice appreciated !!
It seems you need to try out different combination of options while creating delimited flat file(for your exported query result).
Try setting Code page to UTF-8 with and without Unicode. Also use Text qualifier as " or any of your choice which you thought might work. Also try using different option for column delimiter.
Once you are able to create delimited file then you have to apply same setting on file while importing to another DB.
3 and have a table which I need to update. From my understanding, you can do something like the following:
data new_table;
update old_table update_table;
by some_key;
run;
My issue (well I have a few...) is that I'm importing the "update_table" from a CSV file and the formats aren't matching the "old_table", so the update fails.
I've tried creating the "update_table" from the "old_table" using proc sql create table with zero observations, which created the correct types/formats, but then I was unable to insert data into it without replacing it.
The other major issue I have is that there are a large number of columns (480), and custom formats, and I've run up against a 6000 character limit for the script.
I'm very new to SAS and any help would be greatly appreciated :)
It sounds like you need to use a data step to read in your CSV. There are lots of papers out there explaining how to do this, so I won't cover it here. This will allow you to specify the format (numeric/character) for each field. The nice thing here is you already know what formats they need to be in (from your existing dataset), so you can create this read in fairly easily.
Let's say your data is so:
data have;
informat x date9.;
input x y z $;
datalines;
10JAN2010 1 Base
11JAN2010 4 City
12JAN2010 8 State
;;;;
run;
Now, if you have a CSV of the same format, you can read it in by generating the input code from the above dataset. You can use PROC CONTENTS to do this, or you can generate it by using dictionary.tables which has the same information as PROC CONTENTS.
proc sql;
select catx(' ',name,ifc(type='char', '$' ,' '))into :inputlist
separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE';
select catx(' ',name,informat) into :informatlist separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not missing(informat);
quit;
The above are two examples; they may or may not be sufficient for your particular needs.
Then you use them like so:
data want;
infile datalines dlm=',';
informat &informatlist.;
input &inputlist.;
datalines;
13JAN2010,9,REGION
;;;;
run;
(obviously you would use your CSV file instead of datalines, just used here as example).
The point is you can write the data step code using the metadata from your original dataset.
I needed this today, so I made a macro out of it: https://core.sasjs.io/mp__csv2ds_8sas.html
It doesn't wrap the input statement so it may break with a large number of columns if you have batch line length limits. If anyone would like me to fix that, just raise an issue: https://github.com/sasjs/core/issues/new
How to export sqlite into CSV using RSqlite?
I am asking because I am not familiar with database files, so I want to convert it using R.
It may be very simple, but I haven't figure out.
not quite sure if you have figured this out. I am not quite sure how to do it within R either but it seems pretty simple to export to csv using SQLite itself, or by writing out csv from the database you have loaded to R.
In SQLite, you can do something like this at your command prompt
>.mode csv
>.export output.csv
>.header on
>select * from table_name;
>.exit
SQLite will automatically wrote out your table to a output.csv file
If the table is not too large, you can first export it into an data frame or matrix in R using the dbGetQuery or the dbSendQuery and fetch commands. Then, you can write that data frame as a .csv file.
my.data.frame <- dbGetQuery(My_conn, "SELECT * FROM My_Table")
write.csv(my.data.frame, file = "MyFileName.csv", ...)