Pumping Lemma for Regular Languages - proof

I'm having some trouble with a rather difficult question. I'm being asked to prove the language {0^n 1^m 0^n | m,n >= 0} is irregular using the pumping lemma. In all the examples I've seen, the language is only being raised to the same variable (i.e. a^n b^n). So my question is, how do I pick a suitable string to test if this language is irregular?
Also a follow up to that question is once I have my string, how do you decompose the string into the form xyz where |xy| <= pumping length and |y| >=1?

In the examples you have seen before there were different letters: n as followed by bs. In the given example, the are n Os at the beginning and the end of the word. The language adds 0 or more 1s between those blocks of Os.
W in the pumping lemma is decomposed w = x y z with |xy| <= m and |y| > 0, where m is the pumping length. The way to pick a w is the same as before: you pick it such that the xy is completely inside a block consisting of one letter. For a^n b^n a word in L was selected such that xy would entirely consist of as, such that if it is 'pumped' there will be more as than bs. So you need at least m as and for the word to be in the language that means you need to pick m bs. The shortest is w = a^mb^m. For the new troublesome language, pick a word in this L such that xy consists entirely of Os (in the first block), such that if it is 'pumped' there will be more Os in the first block than the last block -and the number of 1s in the middle was not changed. However, you need to include at least one 1 in your original word otherwise there is only one block of Os - and pumped words in fact are in the language, which means there is no contradiction and thus not proof that L is irregular.

Related

proof L = {a^n b^m | n>=m} is irregular language

I am stuck in finding S for pumping lemma. is there any idea to proof that
L = {a^n b^m | n>=m} is an irregular language?
The pumping lemma states this:
If L is a regular language, then there exists a natural number p such that any string w of length at least p can be written as w = uvx where |uv| <= p, |v| > 0 and for all natural numbers n, u(v^n)x is also in the language.
To prove a language is not regular using the pumping lemma, we need to design a string w such that the rest of the statement fails: that is, there are no valid assignments of u, v and x.
Our language L requires the number of a's to be the same as the number of b's. The shortest string that satisfies the hypothesis that the string w has length at least p is a^(p/2) b^(p/2). We could guess this as our string. If we do, we have a few cases:
v is entirely made of a's. But then, pumping is going to result in a different number of a's and b's, so the resulting string is not in the language; a condtradiction.
v spans a's and b's. But then, pumping is going to cause a's and b's to be mixed up in the middle, whereas our language requires all the a's to come first. This is also a contradiction.
v is entirely made of b's. But then, we have the same contradiction as in case #1.
In all cases, this choice of w led to a contradiction. That means the guess worked.
There was a simpler choice for w here: choose w = a^p b^p, then there is only one case. But our choice worked out fine. If our choice had not worked out, we could have learned from that choice what went wrong and chosen a different candidate.
For the previous comment,(1) doesn't make sense, since we can have more a's then b's. n>=m. I probably bombed a midterm yesterday due to this question, but found that the answer is actually in the pumping part.
The solution is that we can pump down as well as up. The pumping lemma for regular languages says that for all i>=0, w=x(y^i)z.
CASE 1: y = only a's
So by using a^n b^m with w = a^p b^p, if y is some amount of a's then we see:
x = a^p-l
y = a^l
z = b^m
Now if we use y^0, then there will be less a's than b's.
The next two cases should be easy to prove but I'll add them regardless.
CASE 2: y = only b's
x = a^p
y = b^l
z = b^(p-l)
Pumping to xy^2z leaves more b's than a's so that is not an accepted word in L.
CASE 3: y = a's and b's
x = a^(p-l)
y = (a^l)(b^k)
z = b^(p-k)
Pumping x(y^2)z gives a^(p-l) [(a^l)(b^k)(a^l)(b^k)] b^(p-k) which is not included in L.

How to map number in a range to another in the same range with no collisions?

Effectively what I'm looking for is a function f(x) that outputs into a range that is pre-defined. Calling f(f(x)) should be valid as well. The function should be cyclical, so calling f(f(...(x))) where the number of calls is equal to the size of the range should give you the original number, and f(x) should not be time dependent and will always give the same output.
While I can see that taking a list of all possible values and shuffling it would give me something close to what I want, I'd much prefer it if I could simply plug values into the function one at a time so that I do not have to compute the entire range all at once.
I've looked into Minimal Perfect Hash Functions but haven't been able to find one that doesn't use external libraries. I'm okay with using them, but would prefer to not do so.
If an actual range is necessary to help answer my question, I don't think it would need to be bigger than [0, 2^24-1], but the starting and ending values don't matter too much.
You might want to take a look at Linear Congruential Generator. You shall be looking at full period generator (say, m=224), which means parameters shall satisfy Hull-Dobell Theorem.
Calling f(f(x)) should be valid as well.
should work
the number of calls is equal to the size of the range should give you the original number
yes, for LCG with parameters satisfying Hull-Dobell Theorem you'll get full period covered once, and 'm+1' call shall put you back at where you started.
Period of such LCG is exactly equal to m
should not be time dependent and will always give the same output
LCG is O(1) algorithm and it is 100% reproducible
LCG is reversible as well, via extended Euclid algorithm, check Reversible pseudo-random sequence generator for details
Minimal perfect hash functions are overkill, all you've asked for is a function f that is,
bijective, and
"cyclical" (ie fN=f)
For a permutation to be cyclical in that way, its order must divide N (or be N but in a way that's just a special case of dividing N). Which in turn means the LCM of the orders of the sub-cycles must divide N. One way to do that is to just have one "sub"-cycle of order N. For power of two N, it's also really easy to have lots of small cycles of some other power-of-two order. General permutations do not necessarily satisfy the cycle-requirement, of course they are bijective but the LCM of the orders of the sub-cycles may exceed N.
In the following I will leave all reduction modulo N implicit. Without loss of generality I will assume the range starts at 0 and goes up to N-1, where N is the size of the range.
The only thing I can immediately think of for general N is f(x) = x + c where gcd(c, N) == 1. The GCD condition ensures there is only one cycle, which necessarily has order N.
For power-of-two N I have more inspiration:
f(x) = cx where c is odd. Bijective because gcd(c, N) == 1 so c has a modular multiplicative inverse. Also cN=1, because φ(N)=N/2 (since N is a power of two) so cφ(N)=1 (Euler's theorem).
f(x) = x XOR c where c < N. Trivially bijective and trivially cycles with a period of 2, which divides N.
f(x) = clmul(x, c) where c is odd and clmul is carry-less multiplication. Bijective because any odd c has a carry-less multiplicative inverse. Has some power-of-two cycle length (less than N) so it divides N. I don't know why though. This is a weird one, but it has decent special cases such as x ^ (x << k). By symmetry, the "mirrored" version also works.
Eg x ^ (x >> k).
f(x) = x >>> k where >>> is bit-rotation. Obviously bijective, and fN(x) = x >>> Nk, where Nk mod N = 0 so it rotates all the way back to the unrotated position regardless of what k is.

Pollard’s p−1 algorithm: understanding of Berkeley paper

This paper explains about Pollard's p-1 factorization algorithm. I am having trouble understanding the case when factor found is equal to the input we go back and change 'a' (basically page 2 point 2 in the aforementioned paper).
Why we go back and increment 'a'?
Why we not go ahead and keep incrementing the factorial? It it because we keep going into the same cycle we have already seen?
Can I get all the factors using this same algorithm? Such as 49000 = 2^3 * 5^3 * 7^2. Currently I only get 7 and 7000. Perhaps I can use this get_factor() function recursively but I am wondering about the base cases.
def gcd(a, b):
if not b:
return a
return gcd(b, a%b)
def get_factor(input):
a = 2
for factorial in range(2, input-1):
'''we are not calculating factorial as anyway we need to find
out the gcd with n so we do mod n and we also use previously
calculate factorial'''
a = a**factorial % input
factor = gcd(a - 1, input)
if factor == 1:
continue
elif factor == input:
a += 1
elif factor > 1:
return factor
n = 10001077
p = get_factor(n)
q = n/p
print("factors of", n, "are", p, "and", q)
The linked paper is not a particularly good description of Pollard's p − 1 algorithm; most descriptions discuss smoothness bounds that make the algorithm much more practical. You might like to read this page at Prime Wiki. To answer your specific questions:
Why increment a? Because the original a doesn't work. In practice, most implementations don't bother; instead, a different factoring method, such as the elliptic curve method, is tried instead.
Why not increment the factorial? This is where the smoothness bound comes into play. Read the page at Mersenne Wiki for more details.
Can I get all factors? This question doesn't apply to the paper you linked, which assumes that the number being factored is a semi-prime with exactly two factors. The more general answer is "maybe." This is what happens at Step 3a of the linked paper, and choosing a new a may work (or may not). Or you may want to move to a different factoring algorithm.
Here is my simple version of the p − 1 algorithm, using x instead of a. The while loop computes the magical L of the linked paper (it's the least common multiple of the integers less than the smoothness bound b), which is the same calculation as the factorial of the linked paper, but done in a different way.
def pminus1(n, b, x=2):
q = 0; pgen = primegen(); p = next(pgen)
while p < b:
x = pow(x, p**ilog(p,b), n)
q, p = p, next(pgen)
g = gcd(x-1, n)
if 1 < g < n: return g
return False
You can see it in action at http://ideone.com/eMPHtQ, where it factors 10001 as in the linked paper as well as finding a rather spectacular 36-digit factor of fibonacci(522). Once you master that algorithm, you might like to move on to the two-stage version of the algorithm.

Repeated application of functions

Reading this question got me thinking: For a given function f, how can we know that a loop of this form:
while (x > 2)
x = f(x)
will stop for any value x? Is there some simple criterion?
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
Specifically, can we prove this for sqrt and for log?
For these functions, a proof that ceil(f(x))<x for x > 2 would suffice. You could do one iteration -- to arrive at an integer number, and then proceed by simple induction.
For the general case, probably the best idea is to use well-founded induction to prove this property. However, as Moron pointed out in the comments, this could be impossible in the general case and the right ordering is, in many cases, quite hard to find.
Edit, in reply to Amnon's comment:
If you wanted to use well-founded induction, you would have to define another strict order, that would be well-founded. In case of the functions you mentioned this is not hard: you can take x << y if and only if ceil(x) < ceil(y), where << is a symbol for this new order. This order is of course well-founded on numbers greater then 2, and both sqrt and log are decreasing with respect to it -- so you can apply well-founded induction.
Of course, in general case such an order is much more difficult to find. This is also related, in some way, to total correctness assertions in Hoare logic, where you need to guarantee similar obligations on each loop construct.
There's a general theorem for when then sequence of iterations will converge. (A convergent sequence may not stop in a finite number of steps, but it is getting closer to a target. You can get as close to the target as you like by going far enough out in the sequence.)
The sequence x, f(x), f(f(x)), ... will converge if f is a contraction mapping. That is, there exists a positive constant k < 1 such that for all x and y, |f(x) - f(y)| <= k |x-y|.
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
If we're talking about floats here, that's not true. If for all x > n f(x) is strictly less than x, it will reach n at some point (because there's only a limited number of floating point values between any two numbers).
Of course this means you need to prove that f(x) is actually less than x using floating point arithmetic (i.e. proving it is less than x mathematically does not suffice, because then f(x) = x may still be true with floats when the difference is not enough).
There is no general algorithm to determine whether a function f and a variable x will end or not in that loop. The Halting problem is reducible to that problem.
For sqrt and log, we could safely do that because we happen to know the mathematical properties of those functions. Say, sqrt approaches 1, log eventually goes negative. So the condition x < 2 has to be false at some point.
Hope that helps.
In the general case, all that can be said is that the loop will terminate when it encounters xi≤2. That doesn't mean that the sequence will converge, nor does it even mean that it is bounded below 2. It only means that the sequence contains a value that is not greater than 2.
That said, any sequence containing a subsequence that converges to a value strictly less than two will (eventually) halt. That is the case for the sequence xi+1 = sqrt(xi), since x converges to 1. In the case of yi+1 = log(yi), it will contain a value less than 2 before becoming undefined for elements of R (though it is well defined on the extended complex plane, C*, but I don't think it will, in general converge except at any stable points that may exist (i.e. where z = log(z)). Ultimately what this means is that you need to perform some upfront analysis on the sequence to better understand its behavior.
The standard test for convergence of a sequence xi to a point z is that give ε > 0, there is an n such that for all i > n, |xi - z| < ε.
As an aside, consider the Mandelbrot Set, M. The test for a particular point c in C for an element in M is whether the sequence zi+1 = zi2 + c is unbounded, which occurs whenever there is a |zi| > 2. Some elements of M may converge (such as 0), but many do not (such as -1).
Sure. For all positive numbers x, the following inequality holds:
log(x) <= x - 1
(this is a pretty basic result from real analysis; it suffices to observe that the second derivative of log is always negative for all positive x, so the function is concave down, and that x-1 is tangent to the function at x = 1). From this it follows essentially immediately that your while loop must terminate within the first ceil(x) - 2 steps -- though in actuality it terminates much, much faster than that.
A similar argument will establish your result for f(x) = sqrt(x); specifically, you can use the fact that:
sqrt(x) <= x/(2 sqrt(2)) + 1/sqrt(2)
for all positive x.
If you're asking whether this result holds for actual programs, instead of mathematically, the answer is a little bit more nuanced, but not much. Basically, many languages don't actually have hard accuracy requirements for the log function, so if your particular language implementation had an absolutely terrible math library this property might fail to hold. That said, it would need to be a really, really terrible library; this property will hold for any reasonable implementation of log.
I suggest reading this wikipedia entry which provides useful pointers. Without additional knowledge about f, nothing can be said.

Algorithm to generate all possible letter combinations of given string down to 2 letters

Algorithm to generate all possible letter combinations of given string down to 2 letters
Trying to create an Anagram solver in AS3, such as this one found here:
http://homepage.ntlworld.com/adam.bozon/anagramsolver.htm
I'm having a problem wrapping my brain around generating all possible letter combinations for the various lengths of strings. If I was only generating permutations for a fixed length, it wouldn't be such a problem for me... but I'm looking to reduce the length of the string and obtain all the possible permutations from the original set of letters for a string with a max length smaller than the original string. For example, say I want a string length of 2, yet I have a 3 letter string of “abc”, the output would be: ab ac ba bc ca cb.
Ideally the algorithm would produce a complete list of possible combinations starting with the original string length, down to the smallest string length of 2. I have a feeling there is probably a small recursive algorithm to do this, but can't wrap my brain around it. I'm working in AS3.
Thanks!
For the purpose of writing an anagram solver the kind of which you linked, the algorithm that you are requesting is not necessary. It is also VERY expensive.
Let's look at a 6-letter word like MONKEY, for example. All 6 letters of the word are different, so you would create:
6*5*4*3*2*1 different 6-letter words
6*5*4*3*2 different 5-letter words
6*5*4*3 different 4-letter words
6*5*4 different 3-letter words
6*5 different 2-letter words
For a total of 1950 words
Now, presumably you're not trying to spit out all 1950 words (e.g. 'OEYKMN') as anagrams (which they are, but most of them are also gibberish). I'm guessing you have a dictionary of legal English words, and you just want to check if any of those words are anagrams of the query word, with the option of not using all letters.
If that is the case, then the problem is simple.
To determine if 2 words are anagrams of each other, all you need to do is count how many times each letters are used, and compare these numbers!
Let's restrict ourself to only 26 letters A-Z, case insensitive. What you need to do is write a function countLetters that takes a word and returns an array of 26 numbers. The first number in the array corresponds to the count of the letter A in the word, second number corresponds to count of B, etc.
Then, two words W1 and W2 are exact anagram if countLetters(W1)[i] == countLetters(W2)[i] for every i! That is, each word uses each letter the exact same number of times!
For what I'd call sub-anagrams (MONEY is a sub-anagram of MONKEY), W1 is a sub-anagram of W2 if countLetters(W1)[i] <= countLetters(W2)[i] for every i! That is, the sub-anagram may use less of certain letters, but not more!
(note: MONKEY is also a sub-anagram of MONKEY).
This should give you a fast enough algorithm, where given a query string, all you need to do is read through the dictionary once, comparing the letter count array of each word against the letter count array of the query word. You can do some minor optimizations, but this should be good enough.
Alternatively, if you want utmost performance, you can preprocess the dictionary (which is known in advance) and create a directed acyclic graph of sub-anagram relationship.
Here's a portion of such a graph for illustration:
D=1,G=1,O=1 ----------> D=1,O=1
{dog,god} \ {do,od}
\
\-------> G=1,O=1
{go}
Basically each node is a bucket for all words that have the same letter count array (i.e. they're exact anagrams). Then there's a node from N1 to N2 if N2's array is <= (as defined above) N1's array (you can perform transitive reduction to store the least amount of edges).
Then to list all sub-anagrams of a word, all you have to do is find the node corresponding to its letter count array, and recursively explore all nodes reachable from that node. All their buckets would contain the sub-anagrams.
The following js code will find all possible "words" in an n letter word. Of course this doesn't mean that they are real words but does give you all the combinations. On my machine it takes about 0.4 seconds for a 7 letter word and 15 secs for a 9 letter word (up to almost a million possibilities if no repeated letters). However those times include looking in a dictionary and finding which are real words.
var getWordsNew=function(masterword){
var result={}
var a,i,l;
function nextLetter(a,l,key,used){
var i;
var j;
if(key.length==l){
return;
}
for(i=0;i<l;i++){
if(used.indexOf(""+i)<0){
result[key+a[i]]="";
nextLetter(a,l,key+a[i],used+i);
}
}
}
a=masterword.split("");
l=a.length;
for (i = 0; i < a.length; i++) {
result[a[i]] = "";
nextLetter(a, l, a[i], "" + i)
}
return result;
}
Complete code at
Code for finding words in words
You want a sort of arrangements. If you're familiar with the permutation algorithm then you know you have a check to see when you've generated enough numbers. Just change that limit:
I don't know AS3, but here's a pseudocode:
st = an array
Arrangements(LettersInYourWord, MinimumLettersInArrangement, k = 1)
if ( k > MinimumLettersInArrangements )
{
print st;
}
if ( k > LettersInYourWord )
return;
for ( each position i in your word that hasn't been used before )
st[k] = YourWord[i];
Arrangements(<same>, <same>, k + 1);
for "abc" and Arrangements(3, 2, 1); this will print:
ab
abc
ac
acb
...
If you want those with three first, and then those with two, consider this:
st = an array
Arrangements(LettersInYourWord, DesiredLettersInArrangement, k = 1)
if ( k > DesiredLettersInArrangements )
{
print st;
return
}
for ( each position i in your word that hasn't been used before )
st[k] = YourWord[i];
Arrangements(<same>, <same>, k + 1);
Then for "abc" call Arrangements(3, 3, 1); and then Arrangements(3, 2, 1);
You can generate all words in an alphabet by finding all paths in a complete graph of the letters. You can find all paths in that graph by doing a depth first search from each letter and returning the current path at each point.
There is simple O(N) where n is size of vocabulary.
Just sort letters in each word in vocabulary or better, create binary mask of them and then compare whit letters you have.