Similar language features to compare with Perl and Ruby __END__

Similar language features to compare with Perl and Ruby __END__ - language-agnostic

Background
Perl and Ruby have the __END__ and __DATA__ tokens that allow embedding of arbitrary data directly inside a source code file.
Although this practice may not be well-advised for general-purpose programming use, it is pretty useful for "one-off" quick scripts for routine tasks.
Question:
What other programming languages support this same or similar feature, and how do they do it?

Perl supports the __DATA__ marker, which you can access the contents of as though it were a regular file handle.

Fortran has a DATA statement that sounds like what you're looking for.

Basic on the VIC20 and C64 had a "Data" command that worked something like this
100 DATA 1,2,3
110 DATA 4,5,6
Data could be read via a READ command.
I no longer have a c64 to test my code on.

SAS has the datalines construct which is used for embedding an external data file inside the source program, e.g. in the following program, there are 5 datalines (the terminator is the semi-colon on a line by itself)
data output;
input name $ age;
datalines;
Jim 14
Sarah 11
Hannah 9
Ben 9
Timothy 4
;
run;

Related

Fuzzing command line arguments [argv]

I have a binary I've been trying to fuzz with AFL, the only thing is AFL only fuzzes STDIN, and File inputs and this binary takes input through its arguments pass_read [input1] [input2]. I was wondering if there are any methods/fuzzers that allow fuzzing in this manner?
I don't not have the source code so making a harness is not really applicable.

Michal Zalewski, the creator of AFL, states in this post:
AFL doesn't support argv fuzzing, because TBH, it's just not horribly useful in
practice. There is an example in experimental/argv_fuzzing/ showing how to do it
in a general case if you really want to.
Link to the mentioned example on GitHub: https://github.com/google/AFL/tree/master/experimental/argv_fuzzing
There are some instructions in the file argv-fuzz-inl.h (haven't tried myself).

Bash only Solution
As an example, lets generate 10 random strings and store them in a file
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 10 | head -n 10 > string-file.txt
Next, lets read 2 lines from string-file and pass it into our application
exec handle< string-file.txt
while read string1 <&handle ; do
read string2 <&handle
pass_read $line1 $line2 >> crash_file.txt
done
exec handle<&-
We then have any crashes stored within crash_file.txt for further analysis.
This may not be the most elegant solution, but perhaps you gives you an idea of some other possibilities if no tool necessarily fulfills the current requirements

I looked at the AFLplusplus repo on GitHub. Inside AFLplusplus/utils/argv_fuzzing/, there is a Makefile. If you run it, you will get a .so file (a shared library) that you can use to do argv fuzzing, even if you only have the binary. Obviously, you must use AFL_PRELOAD. You can read more in the README.

What does "the composition of UNIX byte streams" mean?

In the opening page of the book of "Lisp In Small Pieces", there is a paragraph goes like this:
Based on the idea of "function", an idea that has matured over
several centuries of mathematical research, applicative languages are
omnipresent in computing; they appear in various forms, such as the
composition of Un*x byte streams, the extension language for the Emacs
editor, as well as other scripting languages.
Can anyone elaborate a bit on "the composition of unix byte streams"? What does it mean? and how it is related to applicative/functional programming?
Thanks,
/bruin

My guess is that this is a reference to something like a pipe under linux.
cal | wc
the symbol | it's what invokes a pipe between 2 applications, a pipe is a feature provided by the kernel so you can use pipes where the applications are written using this kind of kernel APIs.
In this example cal is just the utility that prints a calendar, wc is an utility that counts words, rows and columns in the input that you pass to it, in this case the input is the result of piping cal to wc which makes things easier for you because it's more functional, you only care about what each applications does, you don't care, for example, about what is the name of the argument or where to allocate a temporary file to store the input/output in between.
Without the pipes you should do something like
cal > temp.txt
wc temp.txt
rm temp.xt
to obtain pretty much the same information. Also this second solution could possibly generate problems, for example what if temp.txt already exists ? Following what kind of rationale you will tell to your script to pick a name for your temporary file ? What if another process modifies your file in between the 2 calls to cal and wc ?

Importing foreign languages from csv file to Stata

I am using Stata 12. I have encountered the following problems. I am importing a bunch of .csv files to Stata using the insheet command. The datasets may conclude Russian, Croatian, Turkish, etc. I think they are encoded in "UTF-8". In .csv files, they are correct. After I imported them into Stata, the original strings are incorrect and become the strange characters. Would you please help me with that? Does Stat-Transfer can solve the problems? Does it support .csv format?
For example,
the original file is like:
My code is like:
insheet using name.csv, c n
save name.dta,replace
The result is like:
And I have tried to adjust the script in the fonts option, which does not work.

As #Nick Cox commented earlier, the problem is that Stata just doesn't support Unicode/UTF-8 encoding.
No, StatTransfer wouldn't resolve the problem (please refer to this explanation).
You can do the trick using an online decoder or MS Word. Let's do it with one language first, say, Russian as in your screenshots. Check out the correct encodings for Croatian, Turkish, and other languages you have.
Save the string variable from your .csv file as plain text (.txt), choosing the UTF-8 encoding option.
Encoding conversion:
Use iconv, suggested by #Dimitriy V. Masterov, or
Use an online tool, such as this: upload .txt file, choose source encoding as UTF-8 and output encoding according to the language of interest (for Russian, it must be CP1251), click "convert" button and save the output file, or
If you have MS Office, you can use also MS Word for the same purpose. Right click on .txt file, choose "Open with...", choose to open with MS Word. In the appeared window, confirm that the file encoding is "Unicode (UTF-8)", open, then click "Save as...", save as plain text. In the newly appeared window, choose "Cyrillic (Windows)" and mark "Insert line breaks". Save.
Check out your new .txt file - it still should have some strange characters (like ÌßÑÎÊÎÌÁÈÍÀÒ) but now Stata can display them properly.
Copy-paste the new string variable in Stata Data Editor, right click on the variable, choose "Font...", and then string "Cyrillic". You should see correct names on the screen both in data editor and in the results window (even though the string itself is intact).
Depending on your OS, you might need to install all appropriate languages first.
Hope it helps.

Update Answer: As of version 14, all of Stata is Unicode aware. That is results, help files, do files, ado files, data labels, etc.
This does not help users limited to accessing versions of Stata before 14, but is one kind of solution. Using the OP's example:
. insheet using "/home/Alexis/Desktop/data.csv"
(3 vars, 4 obs)
. ed
. list
+------------------------------------------------------------------------------+
| v1 v2 v3 |
|------------------------------------------------------------------------------|
1. | RU00040778 RUS ПРAЙCBOTEРXAУCKУПEРC AУДИT |
2. | RU00044434 RUS КПMГ |
3. | RU00044428 RUS Эрнст энд Янг |
4. | RU00044428 RUS Аудиторско-консулбтационная группа Раэвитие Биэнес-систем |
+------------------------------------------------------------------------------+

Edit csv tables in emacs 24

How can I edit csv tables with the comfort I can edit tables in org mode?
I tried csv-mode, but its code is unmaintained since August 2004 and the code says
This package is intended for use with GNU Emacs 21 (only).
Which package will support editing csv files in Emacs 24 best?
One solution could be to use org-mode, but I could not set the column seperator to , yet.
If org-mode is running I could convert the table to the | separated layout with C-c |. But I think it should be possible to make it easier. Something like this (not working example):
tea, price
fruit, 3.45
earl grey, 2.42
ginger, 1.63
# eval: (setq 'org-mode-separator ",")
# eval: (org-mode)
# End:

Work on csv-mode has been restarted. See https://sites.google.com/site/fjwcentaur/emacs. You can install the latest version from elpa with M-x package-install csv-mode. It works perfectly with emacs 24.

This is a bit quick & dirty, but might work for you: You can temporarily replace all commas with pipes (using M-%) and then turn on the org-table minor mode with:
M-x orgtbl-mode
When you're done, you replace all pipes with commas again. Just make sure that there aren't any pipe characters or quotes commas in your cell fields before you begin.

Convert a dta file to csv without Stata software

Is there a way to convert a dta file to a csv?
I do not have a version of Stata installed on my computer, so I cannot do something like:
File --> "Save as csv"

The frankly-incredible data-analysis library for Python called Pandas has a function to read Stata files.
After installing Pandas you can just do:
>>> import pandas as pd
>>> data = pd.io.stata.read_stata('my_stata_file.dta')
>>> data.to_csv('my_stata_file.csv')
Amazing!

You could try doing it through R:
For Stata <= 15 you can use the haven package to read the dataset and then you simply write it to external CSV file:
library(haven)
yourData = read_dta("path/to/file")
write.csv(yourData, file = "yourStataFile.csv")
Alternatively, visit the link pointed by huntaub in a comment below.
For Stata <= 12 datasets foreign package can also be used
library(foreign)
yourData <- read.dta("yourStataFile.dta")

You can do it in StatTransfer, R or perl (as mentioned by others), but StatTransfer costs $$$ and R/Perl have a learning curve.
There is a free, menu-driven stats program from AM Statistical Software that can open and convert Stata .dta from all versions of Stata, see:
http://am.air.org/

I have not tried, but if you know Perl you can use the Parse-Stata-DtaReader module to convert the file for you.
The module has a command-line tool dta2csv, which can "convert Stata 8 and Stata 10 .dta files to csv"

Another way of converting between pretty much any data format using R is with the rio package.
Install R from CRAN and open R
Install the rio package using install.packages("rio")
Load the rio library, then use the convert() function:
library("rio")
convert("my_file.dta", "my_file.csv")
This method allows you to convert between many formats (e.g., Stata, SPSS, SAS, CSV, etc.). It uses the file extension to infer format and load using the appropriate importing package. More info can be found on the R-project rio page.

The R method will work reliably, and it requires little knowledge of R. Note that the conversion using the foreign package will preserve data, but may introduce differences. For example, when converting a table without a primary key, the primary key and associated columns will be inserted during the conversion.
From http://www.r-bloggers.com/using-r-for-stata-to-csv-conversion/ I recommend:
library(foreign)
write.table(read.dta(file.choose()), file=file.choose(), quote = FALSE, sep = ",")

In Python, one can use statsmodels.iolib.foreign.genfromdta to read Stata datasets. In addition, there is also a wrapper of the aforementioned function which can be used to read a Stata file directly from the web: statsmodels.datasets.webuse.
Nevertheless, both of the above rely on the use of the pandas.io.stata.StataReader.data, which is now a legacy function and has been deprecated. As such, the new pandas.read_stata function should now always be used instead.
According to the source file of stata.py, as of version 0.23.0, the following are supported:
Stata data file versions:
104
105
108
111
113
114
115
117
118
Valid encodings:
ascii
us-ascii
latin-1
latin_1
iso-8859-1
iso8859-1
8859
cp819
latin
latin1
L1
As others have noted, the pandas.to_csv function can then be used to save the file into disk. A related function numpy.savetxt can also save the data
as a text file.
EDIT:
The following details come from help dtaversion in Stata 15.1:
Stata version .dta file format
----------------------------------------
1 102
2, 3 103
4 104
5 105
6 108
7 110 and 111
8, 9 112 and 113
10, 11 114
12 115
13 117
14 and 15 118 (# of variables <= 32,767)
15 119 (# of variables > 32,767, Stata/MP only)
----------------------------------------
file formats 103, 106, 107, 109, and 116
were never used in any official release.

StatTransfer is a program that moves data easily between Stata, Excel (or csv), SAS, etc. It is very user friendly (requires no programming skills). See www.stattransfer.com
If you use the program just note that you will have to choose "ASCII/Text - Delimited" to work with .csv files rather than .xls

Some mentioned SPSS, StatTransfer, they are not free. R and Python (also mentioned above) may be your choice. But personally, I would like to recommend Python, the syntax is much more intuitive than R. You can just use several command lines with Pandas in Python to read and export most of the commonly used data formats:
import pandas as pd
df = pd.read_stata('YourDataName.dta')
df.to_csv('YourDataName.csv')

SPSS can also read .dta files and export them to .csv, but that costs money. PSPP, an open source version of SPSS, which is rough, might also be able to read/export .dta files.

PYTHON - CONVERT STATA FILES IN DIRECTORY TO CSV
import glob
import pandas
path=r"{Path to Folder}"
for my_dir in glob.glob("*.dta")[0:1]:
file = path+my_dir # collects all the stata files
# get the file path/name without the ".dta" extension
file_name, file_extension = os.path.splitext(file)
# read your data
df = pandas.read_stata(file, convert_categoricals=False, convert_missing=True)
# save the data and never think about stata again :)
df.to_csv(file_name + '.csv')

For those who have Stata (even though the asker does not) you can use this:
outsheet produces a tab-delimited file so you need to specify the comma option like below
outsheet [varlist] using file.csv , comma
also, if you want to remove labels (which are included by default
outsheet [varlist] using file.csv, comma nolabel
hat tip to:
http://www.ats.ucla.edu/stat/stata/faq/outsheet.htm

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008