Is there a way to convert a dta file to a csv?
I do not have a version of Stata installed on my computer, so I cannot do something like:
File --> "Save as csv"
The frankly-incredible data-analysis library for Python called Pandas has a function to read Stata files.
After installing Pandas you can just do:
>>> import pandas as pd
>>> data = pd.io.stata.read_stata('my_stata_file.dta')
>>> data.to_csv('my_stata_file.csv')
Amazing!
You could try doing it through R:
For Stata <= 15 you can use the haven package to read the dataset and then you simply write it to external CSV file:
library(haven)
yourData = read_dta("path/to/file")
write.csv(yourData, file = "yourStataFile.csv")
Alternatively, visit the link pointed by huntaub in a comment below.
For Stata <= 12 datasets foreign package can also be used
library(foreign)
yourData <- read.dta("yourStataFile.dta")
You can do it in StatTransfer, R or perl (as mentioned by others), but StatTransfer costs $$$ and R/Perl have a learning curve.
There is a free, menu-driven stats program from AM Statistical Software that can open and convert Stata .dta from all versions of Stata, see:
http://am.air.org/
I have not tried, but if you know Perl you can use the Parse-Stata-DtaReader module to convert the file for you.
The module has a command-line tool dta2csv, which can "convert Stata 8 and Stata 10 .dta files to csv"
Another way of converting between pretty much any data format using R is with the rio package.
Install R from CRAN and open R
Install the rio package using install.packages("rio")
Load the rio library, then use the convert() function:
library("rio")
convert("my_file.dta", "my_file.csv")
This method allows you to convert between many formats (e.g., Stata, SPSS, SAS, CSV, etc.). It uses the file extension to infer format and load using the appropriate importing package. More info can be found on the R-project rio page.
The R method will work reliably, and it requires little knowledge of R. Note that the conversion using the foreign package will preserve data, but may introduce differences. For example, when converting a table without a primary key, the primary key and associated columns will be inserted during the conversion.
From http://www.r-bloggers.com/using-r-for-stata-to-csv-conversion/ I recommend:
library(foreign)
write.table(read.dta(file.choose()), file=file.choose(), quote = FALSE, sep = ",")
In Python, one can use statsmodels.iolib.foreign.genfromdta to read Stata datasets. In addition, there is also a wrapper of the aforementioned function which can be used to read a Stata file directly from the web: statsmodels.datasets.webuse.
Nevertheless, both of the above rely on the use of the pandas.io.stata.StataReader.data, which is now a legacy function and has been deprecated. As such, the new pandas.read_stata function should now always be used instead.
According to the source file of stata.py, as of version 0.23.0, the following are supported:
Stata data file versions:
104
105
108
111
113
114
115
117
118
Valid encodings:
ascii
us-ascii
latin-1
latin_1
iso-8859-1
iso8859-1
8859
cp819
latin
latin1
L1
As others have noted, the pandas.to_csv function can then be used to save the file into disk. A related function numpy.savetxt can also save the data
as a text file.
EDIT:
The following details come from help dtaversion in Stata 15.1:
Stata version .dta file format
----------------------------------------
1 102
2, 3 103
4 104
5 105
6 108
7 110 and 111
8, 9 112 and 113
10, 11 114
12 115
13 117
14 and 15 118 (# of variables <= 32,767)
15 119 (# of variables > 32,767, Stata/MP only)
----------------------------------------
file formats 103, 106, 107, 109, and 116
were never used in any official release.
StatTransfer is a program that moves data easily between Stata, Excel (or csv), SAS, etc. It is very user friendly (requires no programming skills). See www.stattransfer.com
If you use the program just note that you will have to choose "ASCII/Text - Delimited" to work with .csv files rather than .xls
Some mentioned SPSS, StatTransfer, they are not free. R and Python (also mentioned above) may be your choice. But personally, I would like to recommend Python, the syntax is much more intuitive than R. You can just use several command lines with Pandas in Python to read and export most of the commonly used data formats:
import pandas as pd
df = pd.read_stata('YourDataName.dta')
df.to_csv('YourDataName.csv')
SPSS can also read .dta files and export them to .csv, but that costs money. PSPP, an open source version of SPSS, which is rough, might also be able to read/export .dta files.
PYTHON - CONVERT STATA FILES IN DIRECTORY TO CSV
import glob
import pandas
path=r"{Path to Folder}"
for my_dir in glob.glob("*.dta")[0:1]:
file = path+my_dir # collects all the stata files
# get the file path/name without the ".dta" extension
file_name, file_extension = os.path.splitext(file)
# read your data
df = pandas.read_stata(file, convert_categoricals=False, convert_missing=True)
# save the data and never think about stata again :)
df.to_csv(file_name + '.csv')
For those who have Stata (even though the asker does not) you can use this:
outsheet produces a tab-delimited file so you need to specify the comma option like below
outsheet [varlist] using file.csv , comma
also, if you want to remove labels (which are included by default
outsheet [varlist] using file.csv, comma nolabel
hat tip to:
http://www.ats.ucla.edu/stat/stata/faq/outsheet.htm
Related
I'd like to convert the file of float vector data written in CSV format which consists of 3 million rows and 150 columns like the following into NetCDF format.
0.3,0.9,1.3,0.5,...,0.9
-5.1,0.1,1.0,8.4,...,6.7
...
First, I tried something like cache-all-the-data-and-then-convert-it algorithm, but it didn't work because it could not allocate the memory for the cache.
So I need the code written in convert-one-by-one algorithm.
Does any one know such solutions?
The memory capacity of my machine is 8 MiB, and it's OK for any programming language such as C, Java, and Python.
With python you can read the file line by line.
with open("myfile.csv") as infile:
for line in infile:
appendtoNetcdf(line)
So you dont have to load all the file contents into memory.
Check the netCDF4-python library, you can create a netcdf4 or many netcdf4 files easily.
I am trying to import in Octave a file (i.e. data.txt) containing 2 columns of integers, such as:
101448,1077
96906,924
105704,1017
I use the following command:
data = load('data.txt')
However, the "data" matrix that results has a 1 x 1 dimension, with all the content of the data.txt file saved in just one cell. If I adjust the numbers to look like floats:
101448.0,1077.0
96906.0,924.0
105704.0,1017.0
the loading works as expected, and I obtain a matrix with 3 rows and 2 columns.
I looked at the various options that can be set for the load command but none of them seem to help. The data file has no headers, just plain integers, comma separated.
Any suggestions on how to load this type of data? How can I force Octave to cast the data as numeric?
The load function is not to read csv files. It is meant to load files saved from Octave itself which define variables.
To read a csv file use csvread ("data.txt"). Also, 3.2.4 is a very old version no longer supported, you should upgrade.
my data set contains 1300000 observations with 56 columns. it is a .csv file and i'm trying to import it by using proc import. after importing i find that only 44 out of 56 columns are imported.
i tried increasing the guessing rows but it is not helping.
P.S: i'm using sas 9.3
If (and only in that case as far as I am aware) you specify the file to load in a filename statement, you have to set the option lrecl to a value that is large enough.
If you don't, the default is only 256. Ergo, if your csv has lines longer than 256, he will not read the full line.
See this link for more information (just search for lrecl): https://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000308090.htm
If you have SAS Enterprise Guide (I think it's now included with all desktop licenses) try out the import wizard. It's excellent. And it will generate code you can reuse with a little editing.
It will take a while to run because it will read your entire file before writing the import logic.
I am using Stata 12. I have encountered the following problems. I am importing a bunch of .csv files to Stata using the insheet command. The datasets may conclude Russian, Croatian, Turkish, etc. I think they are encoded in "UTF-8". In .csv files, they are correct. After I imported them into Stata, the original strings are incorrect and become the strange characters. Would you please help me with that? Does Stat-Transfer can solve the problems? Does it support .csv format?
For example,
the original file is like:
My code is like:
insheet using name.csv, c n
save name.dta,replace
The result is like:
And I have tried to adjust the script in the fonts option, which does not work.
As #Nick Cox commented earlier, the problem is that Stata just doesn't support Unicode/UTF-8 encoding.
No, StatTransfer wouldn't resolve the problem (please refer to this explanation).
You can do the trick using an online decoder or MS Word. Let's do it with one language first, say, Russian as in your screenshots. Check out the correct encodings for Croatian, Turkish, and other languages you have.
Save the string variable from your .csv file as plain text (.txt), choosing the UTF-8 encoding option.
Encoding conversion:
Use iconv, suggested by #Dimitriy V. Masterov, or
Use an online tool, such as this: upload .txt file, choose source encoding as UTF-8 and output encoding according to the language of interest (for Russian, it must be CP1251), click "convert" button and save the output file, or
If you have MS Office, you can use also MS Word for the same purpose. Right click on .txt file, choose "Open with...", choose to open with MS Word. In the appeared window, confirm that the file encoding is "Unicode (UTF-8)", open, then click "Save as...", save as plain text. In the newly appeared window, choose "Cyrillic (Windows)" and mark "Insert line breaks". Save.
Check out your new .txt file - it still should have some strange characters (like ÌßÑÎÊÎÌÁÈÍÀÒ) but now Stata can display them properly.
Copy-paste the new string variable in Stata Data Editor, right click on the variable, choose "Font...", and then string "Cyrillic". You should see correct names on the screen both in data editor and in the results window (even though the string itself is intact).
Depending on your OS, you might need to install all appropriate languages first.
Hope it helps.
Update Answer: As of version 14, all of Stata is Unicode aware. That is results, help files, do files, ado files, data labels, etc.
This does not help users limited to accessing versions of Stata before 14, but is one kind of solution. Using the OP's example:
. insheet using "/home/Alexis/Desktop/data.csv"
(3 vars, 4 obs)
. ed
. list
+------------------------------------------------------------------------------+
| v1 v2 v3 |
|------------------------------------------------------------------------------|
1. | RU00040778 RUS ПРAЙCBOTEРXAУCKУПEРC AУДИT |
2. | RU00044434 RUS КПMГ |
3. | RU00044428 RUS Эрнст энд Янг |
4. | RU00044428 RUS Аудиторско-консулбтационная группа Раэвитие Биэнес-систем |
+------------------------------------------------------------------------------+
The output we need to produce is a standard delimited file but instead of ascii content we need binary. Is this possible using SAS?
Is there a specific Binary Format you need? Or just something non-ascii? If you're using proc export, you're probably limited to whatever formats are available. However, you can always create the csv manually.
If anything will do, you could simply zip the csv file.
Running on a *nix system, for example, you'd use something like:
filename outfile pipe "gzip -c > myfile.csv.gz";
Then create the csv manually:
data _null_;
set mydata;
file outfile;
put var1 "," var2 "," var3;
run;
If this is PC/Windows SAS, I'm not as familiar, but you'll probably need to install a command-line zip utility.
This link from SAS suggests using winzip, which has a freely downloadable version. Otherwise, the code is similar.
http://support.sas.com/kb/26/011.html
You can actually make a CSV file as a SAS catalog entry; CSV is a valid SAS Catalog entry type.
Here's an example:
filename of catalog "sasuser.test.class.csv";
proc export data=sashelp.class
outfile=of
dbms=dlm;
delimiter=',';
run;
filename of clear;
This little piece of code exports SASHELP.CLASS to a SAS Catalog entry of entry type CSV.
This way you get a binary format you can move between SAS installations on different platforms with PROC CPORT/CIMPORT, not having to worry if the used binary package format is available to your SAS session, since it's an internal SAS format.
Are you saying you have binary data that you want to output to csv?
If so, I don't think there is necessarily a defined standard for how this should be handled.
I suggest trying it (proc export comes to mind) and seeing if the results match your expectations.
Using SAS, output a .csv file; Open it in Excel and Save As whichever format your client wants. You can automate this process with a little bit of scripting in ### as well. (Substitute ### with your favorite scripting language.)