Is there an equivalent function of "%in%" from R for Stata?
As already mentioned, it's hard to tell what you need from the question. inlist() might work, or it might not depending on the setting.
I find that Stata's macro lists functions are invaluable. Store your list in a macro (local or global) and then a suite of useful commands are available:
local list a b c d d e
local search c
local search_in_list : list search in list
di `search_in_list'
These can be calculated on the fly:
if `: list search in list' {
actions if true
}
Stata does not offer the same flexible tool, but inlist will cover the basic operation that you might be looking for, as in count if inlist(country,"FR","US","DE").
working with lists proper is one way, you could also just treat the rhs like a string and treat the lhs as a regex, use regexm()
Related
Sometimes I accidentally declare variables that have the name of a function.
Here is a constructed example:
max(4:5) % 5
max(1:10)=10*ones(10,1); % oops, should be == instead of =
max(4:5) % [10 10]
At the moment I always find this out the hard way and it especially happens with function names that I don't use frequently.
Is there any way to let matlab give a warning about this? It would be ideal to see this on the right hand side of the screen with the other warnings, but I am open to other suggestions.
Since Matlab allows you to overload built-in functionality, you will not receive any warnings when using existing names.
There are, however, a few tricks to minimize the risk of overloading existing functions:
Use explicitFunctionNames. It is much less likely that there is a function maxIndex instead of max.
Use the "Tab"-key often. Matlab will auto-complete functions on the path (as well as variables that you've declared previously). Thus, if the variable auto-completes, it already exists. In case you don't remember whether it's also a function, hit "F1" to see whether there exists a help page for it.
Use functions rather than scripts, so that "mis-"assigned variables in the workspace won't mess up your code.
I'm pretty sure mlint can also check for that.
Generally I would wrap code into functions as much as possible. That way the range of such an override is limited to the scope of the function - so no lasting problems, besides the accidental assumption of course.
When in doubt, check:
exist max
ans =
5
Looking at help exist, you can see that "max" is a function, and shouldn't be assigned as a variable.
>> help exist
exist Check if variables or functions are defined.
exist('A') returns:
0 if A does not exist
1 if A is a variable in the workspace
2 if A is an M-file on MATLAB's search path. It also returns 2 when
A is the full pathname to a file or when A is the name of an
ordinary file on MATLAB's search path
3 if A is a MEX-file on MATLAB's search path
4 if A is a MDL-file on MATLAB's search path
5 if A is a built-in MATLAB function
6 if A is a P-file on MATLAB's search path
7 if A is a directory
8 if A is a class (exist returns 0 for Java classes if you
start MATLAB with the -nojvm option.)
When writing recursive functions, it sometimes happens that something should happen only on the first pass of the recursive algorithm. When this is true, I have two options,
Have an optional parameter called "first run" which is set to true by default but when called recursively, the argument is false
Have two functions
Which option is preferable? If it is the latter, what should I name these functions? (e.g. if its a flood fill algorithm would I choose FloodFill and FloodFillRecursive?)
Thanks in advance, ell.
I might use two functions, and I would say that the function that will be called should be named FloodFill : the user doesn't need to know how that function is implemented, so it should not be named FloodFillRecursive.
Actually, FloodFillRecursive could be the name of the inner function : the one that contains the implementation, the on that is called by the one called by the user -- as it is that second function that is recursive.
Ideally, that function should not be visible from the users : it should be kind of hidden in your library (be it trully hidden, or using some naming-convention that informs users they should not call it directly).
And, this way, if you change implementation, you will not have your users call a FloodFillRecursive function that might no be recursive anymore.
It would depend really if the function is intended to be usable by 3rd party developers. If it is it might be preferable to use the two functions approach for neatness's sake, with the second function (FloodFillRecursive) private/internal to your library.
If it's not then the optional parameter approach is fine.
Option 2 is better in every case I can think of. This depends on the language you're using, but you're probably going to see significantly more (entirely avoidable) overhead by passing an additional argument every time.
For the naming convention, use a normal name for the outer function (eg FloodFill). For the inner function I'd say FloodFillRecursive or FloodFillInner are good choices.
If the language allows it then in my opinion the best is to have one function with the official "clean" interface, and the using a local function (not visible outside) for the recursion.
For example in Common Lisp
(defun n-queens (n)
(let ((result (list)))
(labels ((place-queen (row free-cols free-diagonals free-counter-diagonals)
...))
(place-queen 0 ...)
result)))
or Python
def n_queens(n):
result = []
def place_queen(row, free_cols, free_diags, free_counter_diags):
...
place_queen(0, ...)
return result
in the above example the recursive functions requires many parameters (e.g. the still free columns, diagonals and counter-diagonals) but the official public function only accepts a parameter and the recursion is handled internally.
I have an internal wiki and I created a function w(argument), which directly opens the corresponding page on my wiki using browseURL(url, browser). However, instead of w(argument), I'd like to replace it by #argument, similar to ?argument. Does somebody know if such a function definition with a shortkey is possible within R
Thanks a lot for your help
BR
Martin
No. What you are looking for is to define a new unary operator in R, and that isn't possible. (And # is the comment character in R so is used already anyway, so that wouldn't work.)
This post by Brian Ripley, in response to a similarly motivated question, has a bit more explanation (not much)
'#' starts a comment in R, so that will never get passed the parser. You'll have to modify the core and recompile R if you really want #foo to do something other than nothing.
You can change what ?foo does by reassigning it:
> assign("?",function(x){cat("HALP!\n")})
> ?foo
HALP!
Obviously you'd make it fall through to the default help system if the arg isn't what you are interested in, but this is pretty ugly.
You could define a binary operator, then pass anything in to the first argument, e.g.,
"%w%" <- function(x, y) w(y)
1%w%argument
It's 4 keys rather than 1, but that's about as close as you can get without major reworking of R.
Like Smalltalk or Lisp?
EDIT
Where control structures are like:
Java Python
if( condition ) { if cond:
doSomething doSomething
}
Or
Java Python
while( true ) { while True:
print("Hello"); print "Hello"
}
And operators
Java, Python
1 + 2 // + operator
2 * 5 // * op
In Smalltalk ( if I'm correct ) that would be:
condition ifTrue:[
doSomething
]
True whileTrue:[
"Hello" print
]
1 + 2 // + is a method of 1 and the parameter is 2 like 1.add(2)
2 * 5 // same thing
how come you've never heard of lisp before?
You mean without special syntax for achieving the same?
Lots of languages have control structures and operators that are "really" some form of message passing or functional call system that can be redefined. Most "pure" object languages and pure functional languages fit the bill. But they are all still going to have your "+" and some form of code block--including SmallTalk!--so your question is a little misleading.
Assembly
Befunge
Prolog*
*I cannot be held accountable for any frustration and/or headaches caused by trying to get your head around this technology, nor am I liable for any damages caused by you due to aforementioned conditions including, but not limited to, broken keyboard, punched-in screen and/or head-shaped dents in your desk.
Pure lambda calculus? Here's the grammar for the entire language:
e ::= x | e1 e2 | \x . e
All you have are variables, function application, and function creation. It's equivalent in power to a Turing machine. There are well-known codings (typically "Church encodings") for such constructs as
If-then-else
while-do
recursion
and such datatypes as
Booleans
integers
records
lists, trees, and other recursive types
Coding in lambda calculus can be a lot of fun—our students will do it in the undergraduate languages course next spring.
Forth may qualify, depending on exactly what you mean by "no control structures or operators". Forth may appear to have them, but really they are all just symbols, and the "control structures" and "operators" can be defined (or redefined) by the programmer.
What about Logo or more specifically, Turtle Graphics? I'm sure we all remember that, PEN UP, PEN DOWN, FORWARD 10, etc.
The SMITH programming language:
http://esolangs.org/wiki/SMITH
http://catseye.tc/projects/smith/
It has no jumps and is Turing complete. I've also made a Haskell interpreter for this bad boy a few years back.
I'll be first to mention brain**** then.
In Tcl, there's no control structures; there's just commands and they can all be redefined. Every last one. There's also no operators. Well, except for in expressions, but that's really just an imported foreign syntax that isn't part of the language itself. (We can also import full C or Fortran or just about anything else.)
How about FRACTRAN?
FRACTRAN is a Turing-complete esoteric programming language invented by the mathematician John Conway. A FRACTRAN program is an ordered list of positive fractions together with an initial positive integer input n. The program is run by updating the integer (n) as follows:
for the first fraction f in the list for which nf is an integer, replace n by nf
repeat this rule until no fraction in the list produces an integer when multiplied by n, then halt.
Of course there is an implicit control structure in rule 2.
D (used in DTrace)?
APT - (Automatic Programmed Tool) used extensively for programming NC machine tools.
The language also has no IO capabilities.
XSLT (or XSL, some say) has control structures like if and for, but you should generally avoid them and deal with everything by writing rules with the correct level of specificity. So the control structures are there, but are implied by the default thing the translation engine does: apply potentially-recursive rules.
For and if (and some others) do exist, but in many many situations you can and should work around them.
How about Whenever?
Programs consist of "to-do list" - a series of statements which are executed in random order. Each statement can contain a prerequisite, which if not fulfilled causes the statement to be deferred until some (random) later time.
I'm not entirely clear on the concept, but I think PostScript meets the criteria, although it calls all of its functions operators (the way LISP calls all of its operators functions).
Makefile syntax doesn't seem to have any operators or control structures. I'd say it's a programming language but it isn't Turing Complete (without extensions to the POSIX standard anyway)
So... you're looking for a super-simple language? How about Batch programming? If you have any version of Windows, then you have access to a Batch compiler. It's also more useful than you'd think, since you can carry out basic file functions (copy, rename, make directory, delete file, etc.)
http://www.csulb.edu/~murdock/dosindex.html
Example
Open notepad and make a .Bat file on your Windows box.
Open the .Bat file with notepad
In the first line, type "echo off"
In the second line, type "echo hello world"
In the third line, type "pause"
Save and run the file.
If you're looking for a way to learn some very basic programming, this is a good way to start. (Just be careful with the Delete and Format commands. Don't experiment with those.)
Seeking a method to:
Take whitespace separated tokens in a String; return a suggested Word
ie:
Google Search can take "fonetic wrd nterpreterr",
and atop of the result page it shows "Did you mean: phonetic word interpreter"
A solution in any of the C* languages or Java would be preferred.
Are there any existing Open Libraries which perform such functionality?
Or is there a way to Utilise a Google API to request a suggested word?
In his article How to Write a Spelling Corrector, Peter Norvig discusses how a Google-like spellchecker could be implemented. The article contains a 20-line implementation in Python, as well as links to several reimplementations in C, C++, C# and Java. Here is an excerpt:
The full details of an
industrial-strength spell corrector
like Google's would be more confusing
than enlightening, but I figured that
on the plane flight home, in less than
a page of code, I could write a toy
spelling corrector that achieves 80 or
90% accuracy at a processing speed of
at least 10 words per second.
Using Norvig's code and this text as training set, i get the following results:
>>> import spellch
>>> [spellch.correct(w) for w in 'fonetic wrd nterpreterr'.split()]
['phonetic', 'word', 'interpreters']
You can use the yahoo web service here:
http://developer.yahoo.com/search/web/V1/spellingSuggestion.html
However it's only a web service... (i.e. there are no APIs for other language etc..) but it outputs JSON or XML, so... pretty easy to adapt to any language...
You can also use the Google API's to spell check. There is an ASP implementation here (I'm not to credit for this, though).
First off:
Java
C++
C#
Use the one of your choice. I suspect it runs the query against a spell-checking engine with a word limit of exactly one, it then does nothing if the entire query is valid, otherwise it replaces each word with that word's best match. In other words, the following algorithm (an empty return string means that the query had no problems):
startup()
{
set the spelling engines word suggestion limit to 1
}
option 1()
{
int currentPosition = engine.NextWord(start the search at word 0, querystring);
if(currentPosition == -1)
return empty string; // Query is a-ok.
while(currentPosition != -1)
{
queryString = engine.ReplaceWord(engine.CurrentWord, queryString, the suggestion with index 0);
currentPosition = engine.NextWord(currentPosition, querystring);
}
return queryString;
}
Since no one has yet mentioned it, I'll give one more phrase to search for: "edit distance" (for example, link text).
That can be used to find closest matches, assuming it's typos where letters are transposed, missing or added.
But usually this is also coupled with some sort of relevancy information; either by simple popularity (to assume most commonly used close-enough match is most likely correct word), or by contextual likelihood (words that follow preceding correct word, or come before one). This gets into information retrieval; one way to start is to look at bigram and trigrams (sequences of words seen together). Google has very extensive freely available data sets for these.
For simple initial solution though a dictionary couple with Levenshtein-based matchers works surprisingly well.
You could plug Lucene, which has a dictionary facility implementing the Levenshtein distance method.
Here's an example from the Wiki, where 2 is the distance.
String[] l=spellChecker.suggestSimilar("sevanty", 2);
//l[0] = "seventy"
http://wiki.apache.org/lucene-java/SpellChecker
An older link http://today.java.net/pub/a/today/2005/08/09/didyoumean.html
The Google SOAP Search APIs do that.
If you have a dictionary stored as a trie, there is a fairly straightforward way to find best-matching entries, where characters can be inserted, deleted, or replaced.
void match(trie t, char* w, string s, int budget){
if (budget < 0) return;
if (*w=='\0') print s;
foreach (char c, subtrie t1 in t){
/* try matching or replacing c */
match(t1, w+1, s+c, (*w==c ? budget : budget-1));
/* try deleting c */
match(t1, w, s, budget-1);
}
/* try inserting *w */
match(t, w+1, s + *w, budget-1);
}
The idea is that first you call it with a budget of zero, and see if it prints anything out. Then try a budget of 1, and so on, until it prints out some matches. The bigger the budget the longer it takes. You might want to only go up to a budget of 2.
Added: It's not too hard to extend this to handle common prefixes and suffixes. For example, English prefixes like "un", "anti" and "dis" can be in the dictionary, and can then link back to the top of the dictionary. For suffixes like "ism", "'s", and "ed" there can be a separate trie containing just the suffixes, and most words can link to that suffix trie. Then it can handle strange words like "antinationalizationalization".