Counting the number of occurrences of each prefix using Knuth Morris - knuth-morris-pratt

I am working on my competitive coding skills I came across an article for counting the number of occurrences of each prefix in a string.
Here is the problem statement
Given a string s of length n. In the first variation of the problem, we want to count the number of appearances of each prefix s[0…i] in the same string. In the second variation of the problem another string t is given and we want to count the number of appearances of each prefix s[0…i] in t.
I found the solution to it.
Code:
for (int i = 0; i < n; i++)
ans[pi[i]]++;
for (int i = n-1; i > 0; i--)
ans[pi[i-1]] += ans[i];
for (int i = 0; i <= n; i++)
ans[i]++;
I am not able to understand the problem statement completely as far as I know what prefix is :
For example: string: geekforgeek
Has prefix as:{g,ge,gee,geek,geekf,geekfo,geekfor,geekforg,geekforge,geekforgee} as proper prefix.
Can somebody help me what this question is trying to compute because only these are the prefix available which are occuring once.
Thanks in advance.

If you have reached so far i am assuming that you know the prefix_function.
so we consider string str = "ABACABA"
we get its prefix array say pi as = {0, 0, 1, 0, 1, 2, 3}
to store the occurance of all the proper prefix (i.e. we dont include the string itself) we create a new array or vector(acc. to you convinience) 'occ' of length str.length()+1 where occ[i] denotes the count of occurance of prefix str[0:i].
So as you quoted the code above, there are three loops. first of all i must explain what those loops are actually computing.
First loop is straightforward it just computes the no of longest prefix which is also suffix in the string of length i. For prefix "A" we have same suffix as prefix for str[0:3] and str[0:5], if noticed carefully it can be said "A" is the longest prefix which is also a suffix in these both strings, Hence we get this from the array pi which we calculated above as it stores the length of longest prefix which is also a suffix. Similarly for prefix "AB" we have it as longest prefix and suffix in str[1:6], so on.
we get occ = {3, 2, 1, 1, 0, 0, 0, 0}.
I hope idea for first loop is clear.
Now when we consider the example above of prefix "A", if we observe "ABACABA", we see that in string str[0:7] we have "A" as a suffix too, but is not the longest where the longest prefix == suffix is "ABA" so in our first loop we missed this occurance of the prefix. Also assume if we have prefix of length l which is also a suffix and ends at index i, and to get some prefix of length l' < l we go for pi[pi[i]-1] or say pi[l-1] as is clear from prefix function. Hence this way using array pi we trace prefixes of length less that were calculated in first loop. If we know that the length prefix i appears exactly occ[i] times, then this number must be added to the number of occurrences of its longest suffix that is also a prefix.
for the third loop is we add the occurance of each prefix. while in other two we just consider the suffix.

the code should be
{ for (int i = 0; i < n; i++)
ans[pi[i]]++;
for (int i = n-1; i > 0; i--)
ans[pi[i]-1] += ans[i]; ///note the index
for (int i = 0; i <= n; i++)
ans[i]++;
}
and the article is https://cp-algorithms.web.app/string/prefix-function.html#counting-the-number-of-occurrences-of-each-prefix
where that index is wrongly mentioned.
please correct me if i am wrong.

Related

Make a good hash algorithm for a Scrabble Cheater

I want to build a Scrabble Cheater which stores Strings in an array of linked lists. In a perfect scenario each linked list would only have words with the same permutations of letters (like POOL and LOOP for example). The user would put a String in there like OLOP and the linked list would be printed out.
I want the task to be explicitly solved using hashing.
I have built a stringHashFunction() for that (Java code):
public int stringHashFunction(String wordToHash) {
int hashKeyValue = 7;
//toLowerCase and sort letters in alphabetical order
wordToHash = normalize(wordToHash);
for(int i = 0; i < wordToHash.length(); i++) {
int charCode = wordToHash.charAt(i) - 96;
//calculate the hash key using the 26 letters
hashKeyValue = (hashKeyValue * 26 + charCode) % hashTable.length;
}
return hashKeyValue;
}
Does it look like an OK-hash-function? I realize that it's far from a perfect hash but how could I improve that?
My code overall works but I have the following statistics for now:
Number of buckets: 24043
All items: 24043
The biggest bucket counts: 11 items.
There are: 10264 empty buckets
On average there are 1.7449016619493432 per bucket.
Is it possible to avoid the collisions so that I only have buckets (linked lists) with the same permutations? I think if you have a whole dictionary in there it might be useful to have that so that you don't have to run an isPermutation() method on each bucket every time you want to get some possible permutations on your String.

Bitwise comparison for 16 bitstrings

I have 16 unrelated binary strings (of the same length). eg. 100000001010, 010100010010 and so on, and I need to find out a bitstring in which position x is a 1 IF position x is 1 for ATLEAST 2 bitstrings out of the 16.
Initially, I tries using bitwise XOR and this works great as long as even number of strings contain a 1, but when odd number of strings contain 1, the answer given is reverse.
A simple example (with 3 strings) would be:
A: 10101010
B: 01010111
C: 11011011
f(A,B,C)= answer
Expected answer: 11011011
Answer I'm getting right now: 11011001
I know I'm wrong somewhere but I'm at a loss on how to proceed
Help much appreciated
You can do something like
unsigned once = x[0], twice = 0;
for (int i = 1; i < 16; ++i) {
twice |= once & x[i];
once |= x[i];
}
(A AND B) OR (A AND C) OR (B AND C)
This is higher complexity than what you had originally.

Strange behavior in a simple for using uint

This works as expected:
for (var i:uint = 5; i >= 1; i-- )
{
trace(i); // output is from 5~1, as expected
}
This is the strange behavior:
for (var i:uint = 5; i >= 0; i-- )
{
trace(i)
}
// output:
5
4
3
2
1
0
4294967295
4294967294
4294967293
...
Below 0, something like a MAX_INT appears and it goes on decrementing forever. Why is this happening?
EDIT
I tested a similar code using C++, with a unsigned int and I have the same result. Probably the condition is being evaluated after the decrement.
The behavior you are describing has little to do with any programming language. This is true for C, C++, actionscript, etc. Let me say this though, what you do see is quite normal behavior and has to do with the way a number is represented (see the wiki article and read about unsigned integers).
Because you are using an uint (unsigned integer). Which can only be a positive number, the type you are using cannot represent negative numbers so if you take a uint like this:
uint i = 0;
And you reduce 1 from the above
i = i - 1;
In this case i does not represent negative numbers, as it is unsigned. Then i will display the maximum value of a uint data type.
Your edit that you posted above,
"...in C++, .. same result..."
Should give you a clue as to why this is happening, it has nothing to do with what language you are using, or when the comparison is done. It has to do with what data type you are using.
As an excercise, fire up that C++ program again and write a program that displays the maximum value of a uint. The program should not display any defined constants :)..it should take you one line of code too!

AS3 math: nearest neighbour in array

So let's say i have T, T = 1200. I also have A, A is an array that contains 1000s of entries and these are numerical entries that range from 1000-2000 but does not include an entry for 1200.
What's the fastest way of finding the nearest neighbour (closest value), let's say we ceil it, so it'll match 1201, not 1199 in A.
Note: this will be run on ENTER_FRAME.
Also note: A is static.
It is also very fast to use Vector.<int>instead of Arrayand do a simple for-loop:
var vector:Vector.<int> = new <int>[ 0,1,2, /*....*/ 2000];
function seekNextLower( searchNumber:int ) : int {
for (var i:int = vector.length-1; i >= 0; i--) {
if (vector[i] <= searchNumber) return vector[i];
}
}
function seekNextHigher( searchNumber:int ) : int {
for (var i:int = 0; i < vector.length; i++) {
if (vector[i] >= searchNumber) return vector[i];
}
}
Using any array methods will be more costly than iterating over Vector.<int> - it was optimized for exactly this kind of operation.
If you're looking to run this on every ENTER_FRAME event, you'll probably benefit from some extra optimization.
If you keep track of the entries when they are written to the array, you don't have to sort them.
For example, you'd have an array where T is the index, and it would have an object with an array with all the indexes of the A array that hold that value. you could also put the closest value's index as part of that object, so when you're retrieving this every frame, you only need to access that value, rather than search.
Of course this would only help if you read a lot more than you write, because recreating the object is quite expensive, so it really depends on use.
You might also want to look into linked lists, for certain operations they are quite a bit faster (slower on sort though)
You have to read each value, so the complexity will be linear. It's pretty much like finding the smallest int in an array.
var closestIndex:uint;
var closestDistance:uint = uint.MAX_VALUE;
var currentDistance:uint;
var arrayLength:uint = A.length;
for (var index:int = 0; index<arrayLength; index++)
{
currentDistance = Math.abs(T - A[index]);
if (currentDistance < closestDistance ||
(currentDistance == closestDistance && A[index] > T)) //between two values with the same distance, prefers the one larger than T
{
closestDistance = currentDistance;
closestIndex = index;
}
}
return T[closestIndex];
Since your array is sorted you could adapt a straightforward binary search (such as explained in this answer) to find the 'pivot' where the left-subdivision and the right-subdivision at a recursive step bracket the value you are 'searching' for.
Just a thought I had... Sort A (since its static you can just sort it once before you start), and then take a guess of what index to start guessing at (say A is length 100, you want 1200, 100*(200/1000) = 20) so guess starting at that guess, and then if A[guess] is higher than 1200, check the value at A[guess-1]. If it is still higher, keep going down until you find one that is higher and one that is lower. Once you find that determine what is closer. if your initial guess was too low, keep going up.
This won't be great and might not be the best performance wise, but it would be a lot better than checking every single value, and will work quite well if A is evenly spaced between 1000 and 2000.
Good luck!
public function nearestNumber(value:Number,list:Array):Number{
var currentNumber:Number = list[0];
for (var i:int = 0; i < list.length; i++) {
if (Math.abs(value - list[i]) < Math.abs(value - currentNumber)){
currentNumber = list[i];
}
}
return currentNumber;
}

Most efficient way to search a sorted matrix?

I have an assignment to write an algorithm (not in any particular language, just pseudo-code) that receives a matrix [size: M x N] that is sorted in a way that all of it's rows are sorted and all of it's columns are sorted individually, and finds a certain value within this matrix. I need to write the most time-efficient algorithm I can think of.
The matrix looks something like:
1 3 5
4 6 8
7 9 10
My idea is to start at the first row and last column and simply check the value, if it's bigger go down and if it's smaller than go left and keep doing so until the value is found or until the indexes are out of bounds (in case the value does not exist). This algorithm works at linear complexity O(m+n). I've been told that it's possible to do so with a logarithmic complexity. Is it possible? and if so, how?
Your matrix looks like this:
a ..... b ..... c
. . . . .
. 1 . 2 .
. . . . .
d ..... e ..... f
. . . . .
. 3 . 4 .
. . . . .
g ..... h ..... i
and has following properties:
a,c,g < i
a,b,d < e
b,c,e < f
d,e,g < h
e,f,h < i
So value in lowest-rigth most corner (eg. i) is always the biggest in whole matrix
and this property is recursive if you divide matrix into 4 equal pieces.
So we could try to use binary search:
probe for value,
divide into pieces,
choose correct piece (somehow),
goto 1 with new piece.
Hence algorithm could look like this:
input: X - value to be searched
until found
divide matrix into 4 equal pieces
get e,f,h,i as shown on picture
if (e or f or h or i) equals X then
return found
if X < e then quarter := 1
if X < f then quarter := 2
if X < h then quarter := 3
if X < i then quarter := 4
if no quarter assigned then
return not_found
make smaller matrix from chosen quarter
This looks for me like a O(log n) where n is number of elements in matrix. It is kind of binary search but in two dimensions. I cannot prove it formally but resembles typical binary search.
and that's how the sample input looks? Sorted by diagonals? That's an interesting sort, to be sure.
Since the following row may have a value that's lower than any value on this row, you can't assume anything in particular about a given row of data.
I would (if asked to do this over a large input) read the matrix into a list-struct that took the data as one pair of a tuple, and the mxn coord as the part of the tuple, and then quicksort the matrix once, then find it by value.
Alternately, if the value of each individual location is unique, toss the MxN data into a dictionary keyed on the value, then jump to the dictionary entry of the MxN based on the key of the input (or the hash of the key of the input).
EDIT:
Notice that the answer I give above is valid if you're going to look through the matrix more than once. If you only need to parse it once, then this is as fast as you can do it:
for (int i = 0; i<M; i++)
for (int j=0; j<N; j++)
if (mat[i][j] == value) return tuple(i,j);
Apparently my comment on the question should go down here too :|
#sagar but that's not the example given by the professor. otherwise he had the fastest method above (check the end of the row first, then proceed) additionally, checking the end of the middlest row first would be faster, a bit of a binary search.
Checking the end of each row (and starting on the end of the middle row) to find a number higher than the checked for number on an in memory array would be fastest, then doing a binary search on each matching row till you find it.
in log M you can get a range of rows able to contain the target (binary search on the first value of rows, binary search on last value of rows, keep only those rows whose first <= target and last >= target) two binary searches is still O(log M)
then in O(log N) you can explore each of these rows, with again, a binary search!
that makes it O(logM x logN)
tadaaaa
public static boolean find(int a[][],int rows,int cols,int x){
int m=0;
int n=cols-1;
while(m<rows&&n>=0){
if(a[m][n]==x)
return1;
else if(a[m][n]>x)
n--;
else m++;
}
}
what about getting the diagonal out, then binary search over the diagonal, start bottom right check if it is above, if yes take the diagonal array position as the column it is in, if not then check if it is below. each time running a binary search on the column once you have a hit on the diagonal (using the array position of the diagonal as the column index). I think this is what was stated by #user942640
you could get the running time of the above and when required (at some point) swap the algo to do a binary search on the initial diagonal array (this is taking into consideration its n * n elements and getting x or y length is O(1) as x.length = y.length. even on a million * million binary search the diagonal if it is less then half step back up the diagonal, if it is not less then binary search back towards where you where (this is a slight change to the algo when doing a binary search along the diagonal). I think the diagonal is better than the binary search down the rows, Im just to tired at the moment to look at the maths :)
by the way I believe running time is slightly different to analysis which you would describe in terms of best/worst/avg case, and time against memory size etc. so the question would be better stated as in 'what is the best running time in worst case analysis', because in best case you could do a brute linear scan and the item could be in the first position and this would be a better 'running time' than binary search...
Here is a lower bound of n. Start with an unsorted array A of length n. Construct a new matrix M according to the following rule: the secondary diagonal contains the array A, everything above it is minus infinity, everything below it is plus infinity. The rows and columns are sorted, and looking for an entry in M is the same as looking for an entry in A.
This is in the vein of Michal's answer (from which I will steal the nice graphic).
Matrix:
min ..... b ..... c
. . .
. II . I .
. . .
d .... mid .... f
. . .
. III . IV .
. . .
g ..... h ..... max
Min and max are the smallest and largest values, respectively. "mid" is not necessarily the average/median/whatever value.
We know that the value at mid is >= all values in quadrant II, and <= all values in quadrant IV. We cannot make such claims for quadrants I and III. If we recurse, we can eliminate one quadrant at each level.
Thus, if the target value is less than mid, we must search quadrants I, II, and III. If the target value is greater than mid, we must search quadrants I, III, and IV.
The space reduces to 3/4 its previous at each step:
n * (3/4)x = 1
n = (4/3)x
x = log4/3(n)
Logarithms differ by a constant factor, so this is O(log(n)).
find(min, max, target)
if min is max
if target == min
return min
else
return not found
else if target < min or target > max
return not found
else
set mid to average of min and max
if target == mid
return mid
else
find(b, f, target), return if found
find(d, h, target), return if found
if target < mid
return find(min, mid, target)
else
return find(mid, max, target)
JavaScript solution:
//start from the top right corner
//if value = el, element is found
//if value < el, move to the next row, element can't be in that row since row is sorted
//if value > el, move to the previous column, element can't be in that column since column is sorted
function find(matrix, el) {
//some error checking
if (!matrix[0] || !matrix[0].length){
return false;
}
if (!el || isNaN(el)){
return false;
}
var row = 0; //first row
var col = matrix[0].length - 1; //last column
while (row < matrix.length && col >= 0) {
if (matrix[row][col] === el) { //element is found
return true;
} else if (matrix[row][col] < el) {
row++; //move to the next row
} else {
col--; //move to the previous column
}
}
return false;
}
this is wrong answer
I am really not sure if any of the answers are the optimal answers. I am going at it.
binary search first row, and first column and find out the row and column where "x" could be. you will get 0,j and i,0. x will be on i row or j column if x is not found in this step.
binary search on the row i and the column j you found in step 1.
I think the time complexity is 2* (log m + log n).
You can reduce the constant, if the input array is a square (n * n), by binary searching along the diagonal.