Make a good hash algorithm for a Scrabble Cheater - function

I want to build a Scrabble Cheater which stores Strings in an array of linked lists. In a perfect scenario each linked list would only have words with the same permutations of letters (like POOL and LOOP for example). The user would put a String in there like OLOP and the linked list would be printed out.
I want the task to be explicitly solved using hashing.
I have built a stringHashFunction() for that (Java code):
public int stringHashFunction(String wordToHash) {
int hashKeyValue = 7;
//toLowerCase and sort letters in alphabetical order
wordToHash = normalize(wordToHash);
for(int i = 0; i < wordToHash.length(); i++) {
int charCode = wordToHash.charAt(i) - 96;
//calculate the hash key using the 26 letters
hashKeyValue = (hashKeyValue * 26 + charCode) % hashTable.length;
}
return hashKeyValue;
}
Does it look like an OK-hash-function? I realize that it's far from a perfect hash but how could I improve that?
My code overall works but I have the following statistics for now:
Number of buckets: 24043
All items: 24043
The biggest bucket counts: 11 items.
There are: 10264 empty buckets
On average there are 1.7449016619493432 per bucket.
Is it possible to avoid the collisions so that I only have buckets (linked lists) with the same permutations? I think if you have a whole dictionary in there it might be useful to have that so that you don't have to run an isPermutation() method on each bucket every time you want to get some possible permutations on your String.

Related

Google Apps Script: filter one 2D array based off the closest timestamp of another 2D array

I have two 2D-arrays; both of which contain a timestamp and a float-value. Array1 contains hundreds of timestamp-value pairs, while Array2 will generally only contain ~10 timestamp-value pairs.
I am trying to compare Array1 with Array2, and only want to keep Array1's timestamp-value pairings for the timestamps that are closest to the timestamps in Array2.
Array1 = {[59.696, 2020-12-30T00:00:00.000Z],
[61.381, 2020-12-30T00:15:00.000Z],
[59.25, 2020-12-30T00:30:00.000Z],
[57.313, 2020-12-30T00:45:00.000Z],...}
Array2 = {[78.210, 2020-12-30T00:06:00.000Z],
[116.32, 2020-12-30T00:39:00.000Z],...}
So in these Array examples above, I would want Array1 to be filtered down to:
Array1 = {[59.696, 2020-12-30T00:00:00.000Z],
[57.313, 2020-12-30T00:45:00.000Z],...}
because those timestamps are the closest match for timestamps in Array2.
I have tried implementing the suggested code from Find matches for two columns on two sheets google script but can't get it to work, and I can't otherwise find a neat solution for the timestamp matching.
Any help is much appreciated. Let me know if I need to edit or add to my question with more information.
The goal
Given two arrays:
Array1 = [
[59.696, "2020-12-30T00:00:00.000Z"],
[61.381, "2020-12-30T00:15:00.000Z"],
[59.25, "2020-12-30T00:30:00.000Z"],
[57.313, "2020-12-30T00:45:00.000Z"]
]
Array2 = [
[78.210, "2020-12-30T00:06:00.000Z"],
[116.32, "2020-12-30T00:39:00.000Z"]
]
The goal is for each item in Array2 to be matched with the closest date in Array1. So the resultant array of the above example would be 2 items. If Array2 had 100 items and Array1 had 1000, the resultant array would be of 100 items.
I am assuming that each item in Array1 can only be used once.
I am also assuming that the first item in the array, the float value, is to be ignored in the calculation, but kept together with the date and be included in the output.
The script
function filterClosestTimes(array1, array2) {
// Initializing "maps" to keep track of the differences
// closest, will contain the final pairs used
// differences, is used to decide which pair gets used
// positions used, is so that the same pair in array 1
// doesn't get used twice.
closest = []
differences = []
positionsUsed = []
// For each member of array2
array2.forEach((pair, i) => {
// Initializing a date object
targetDate = new Date(pair[1])
// Initializing the position of the current index in the
// tracking arrays.
closest.push(null)
differences.push(Infinity)
positionsUsed.push(null)
// Going through each member of array 1
array1.forEach((subpair, j) => {
// First checking if the position has already been used
if (positionsUsed.includes(j)) return
//initializing date
dateToTest= new Date(subpair[1])
// checking difference between the currently testing date
// of array 2
difference = Math.abs(targetDate - dateToTest)
// Checking if it is the date with the smallest difference
// if so, setting the position in the tracking arrays.
// These values will likely be overwritten many times until
// the date with the least difference is found.
if (differences[i] > difference) {
differences[i] = difference
closest[i] = subpair
positionsUsed[i] = j
}
})
})
return closest
}
function test(){
Logger.log(filterClosestTimes(Array1, Array2))
}
Running test returns:
[["59.696, 2020-12-30T00:00:00.000Z"], ["57.313, 2020-12-30T00:45:00.000Z"]]
Note
This algorithm, seeing as it involves checking every element of one array with almost every one in another array, can get slow. Though if you are only dealing with hundreds of values and comparing with ~10 in your Array2 then it will be fine. Just be aware that this approach has O(n^2) time complexity. Which will mean that as the number of comparisons go up, the time needed to complete the operation increases exponentially. If you try to compare tens of thousands with tens of thousands, then there will be a noticeable wait!
References
JS Date object
Time Complexity

How to optimize finding values in 2D array in verilog

I need to set up a function that determines if a match exists in a 2D array (index).
My current implementation works, but is creating a large chain of LUTs due to if statements checking each element of the array.
function result_type index_search ( index_type index, logic[7:0] address );
for ( int i=0; i < 8; i++ ) begin
if ( index[i] == address ) begin
result = i;
end
end
Is there a way to check for matches in a more efficient manner?
Not much to be done, really; at least for the code in hand. Since your code targets hardware, to optimize it think in terms of hardware, not function/verilog code.
For a general purpose implementation, without any known data patterns, you'll definitely need (a) N W-bit equality checks, plus (b) an N:1 FPA (Fixed Priority Arbiter, aka priority encoder, aka leading zero detector) that returns the first match, assuming N W-bit inputs. Something like this:
Not much optimization to be done, but here are some possible general-purpose optimizations:
Pipelining, as shown in the figure, if timing is an issue.
Consider an alternative FPA implementation that makes use of 2's complement characteristics and may result to a more LUT-efficient implementation: assign fpa_out = fpa_in & ((~fpa_in)+1); (result in one-hot encoding, not weighted binary, as in your code)
Sticking to one-hot encoding can come in handy and reduce some of your logic down your path, but I cannot say for sure until we see some more code.
This is what the implementation would look like:
logic[N-1:0] addr_eq_idx;
logic[N-1:0] result;
for (genvar i=0; i<N; i++) begin: g_eq_N
// multiple matches may exist in this vector
assign addr_eq_idx[i] = (address == index[i]) ? 1'b1 : 1'b0;
// pipelined version:
// always_ff #(posedge clk, negedge arstn)
// if (!arstn)
// addr_eq_idx[i] <= 1'b0;
// else
// addr_eq_idx[i] <= (address == index[i]) ? 1'b1 : 1'b0;
end
// result has a '1' at the position where the first match is found
assign result = addr_eq_idx & ((~addr_eq_idx) + 1);
Finally, try to think if your design can be simplified due to known run-time data characteristics. For example, let's say you are 100% sure that the address you're looking for may exist within the index 2D array in at most one position. If that is the case, then you do not need an FPA at all, since the first match will be the only match. In that case, addr_eq_idx already points to the matching index, as a one-hot vector.

Finding Median WITHOUT Data Structures

(my code is written in Java but the question is agnostic; I'm just looking for an algorithm idea)
So here's the problem: I made a method that simply finds the median of a data set (given in the form of an array). Here's the implementation:
public static double getMedian(int[] numset) {
ArrayList<Integer> anumset = new ArrayList<Integer>();
for(int num : numset) {
anumset.add(num);
}
anumset.sort(null);
if(anumset.size() % 2 == 0) {
return anumset.get(anumset.size() / 2);
} else {
return (anumset.get(anumset.size() / 2)
+ anumset.get((anumset.size() / 2) + 1)) / 2;
}
}
A teacher in the school that I go to then challenged me to write a method to find the median again, but without using any data structures. This includes anything that can hold more than one value, so that includes Strings, any forms of arrays, etc. I spent a long while trying to even conceive of an idea, and I was stumped. Any ideas?
The usual algorithm for the task is Hoare's Select algorithm. This is pretty much like a quicksort, except that in quicksort you recursively sort both halves after partitioning, but for select you only do a recursive call in the partition that contains the item of interest.
For example, let's consider an input like this in which we're going to find the fourth element:
[ 7, 1, 17, 21, 3, 12, 0, 5 ]
We'll arbitrarily use the first element (7) as our pivot. We initially split it like (with the pivot marked with a *:
[ 1, 3, 0, 5, ] *7, [ 17, 21, 12]
We're looking for the fourth element, and 7 is the fifth element, so we then partition (only) the left side. We'll again use the first element as our pivot, giving (using { and } to mark the part of the input we're now just ignoring).
[ 0 ] 1 [ 3, 5 ] { 7, 17, 21, 12 }
1 has ended up as the second element, so we need to partition the items to its right (3 and 5):
{0, 1} 3 [5] {7, 17, 21, 12}
Using 3 as the pivot element, we end up with nothing to the left, and 5 to the right. 3 is the third element, so we need to look to its right. That's only one element, so that (5) is our median.
By ignoring the unused side, this reduces the complexity from O(n log n) for sorting to only O(N) [though I'm abusing the notation a bit--in this case we're dealing with expected behavior, not worst case, as big-O normally does].
There's also a median of medians algorithm if you want to assure good behavior (at the expense of being somewhat slower on average).
This gives guaranteed O(N) complexity.
Sort the array in place. Take the element in the middle of the array as you're already doing. No additional storage needed.
That'll take n log n time or so in Java. Best possible time is linear (you've got to inspect every element at least once to ensure you get the right answer). For pedagogical purposes, the additional complexity reduction isn't worthwhile.
If you can't modify the array in place, you have to trade significant additional time complexity to avoid avoid using additional storage proportional to half the input's size. (If you're willing to accept approximations, that's not the case.)
Some not very efficient ideas:
For each value in the array, make a pass through the array counting the number of values lower than the current value. If that count is "half" the length of the array, you have the median. O(n^2) (Requires some thought to figure out how to handle duplicates of the median value.)
You can improve the performance somewhat by keeping track of the min and max values so far. For example, if you've already determined that 50 is too high to be the median, then you can skip the counting pass through the array for every value that's greater than or equal to 50. Similarly, if you've already determined that 25 is too low, you can skip the counting pass for every value that's less than or equal to 25.
In C++:
int Median(const std::vector<int> &values) {
assert(!values.empty());
const std::size_t half = values.size() / 2;
int min = *std::min_element(values.begin(), values.end());
int max = *std::max_element(values.begin(), values.end());
for (auto candidate : values) {
if (min <= candidate && candidate <= max) {
const std::size_t count =
std::count_if(values.begin(), values.end(), [&](int x)
{ return x < candidate; });
if (count == half) return candidate;
else if (count > half) max = candidate;
else min = candidate;
}
}
return min + (max - min) / 2;
}
Terrible performance, but it uses no data structures and does not modify the input array.

AS3 math: nearest neighbour in array

So let's say i have T, T = 1200. I also have A, A is an array that contains 1000s of entries and these are numerical entries that range from 1000-2000 but does not include an entry for 1200.
What's the fastest way of finding the nearest neighbour (closest value), let's say we ceil it, so it'll match 1201, not 1199 in A.
Note: this will be run on ENTER_FRAME.
Also note: A is static.
It is also very fast to use Vector.<int>instead of Arrayand do a simple for-loop:
var vector:Vector.<int> = new <int>[ 0,1,2, /*....*/ 2000];
function seekNextLower( searchNumber:int ) : int {
for (var i:int = vector.length-1; i >= 0; i--) {
if (vector[i] <= searchNumber) return vector[i];
}
}
function seekNextHigher( searchNumber:int ) : int {
for (var i:int = 0; i < vector.length; i++) {
if (vector[i] >= searchNumber) return vector[i];
}
}
Using any array methods will be more costly than iterating over Vector.<int> - it was optimized for exactly this kind of operation.
If you're looking to run this on every ENTER_FRAME event, you'll probably benefit from some extra optimization.
If you keep track of the entries when they are written to the array, you don't have to sort them.
For example, you'd have an array where T is the index, and it would have an object with an array with all the indexes of the A array that hold that value. you could also put the closest value's index as part of that object, so when you're retrieving this every frame, you only need to access that value, rather than search.
Of course this would only help if you read a lot more than you write, because recreating the object is quite expensive, so it really depends on use.
You might also want to look into linked lists, for certain operations they are quite a bit faster (slower on sort though)
You have to read each value, so the complexity will be linear. It's pretty much like finding the smallest int in an array.
var closestIndex:uint;
var closestDistance:uint = uint.MAX_VALUE;
var currentDistance:uint;
var arrayLength:uint = A.length;
for (var index:int = 0; index<arrayLength; index++)
{
currentDistance = Math.abs(T - A[index]);
if (currentDistance < closestDistance ||
(currentDistance == closestDistance && A[index] > T)) //between two values with the same distance, prefers the one larger than T
{
closestDistance = currentDistance;
closestIndex = index;
}
}
return T[closestIndex];
Since your array is sorted you could adapt a straightforward binary search (such as explained in this answer) to find the 'pivot' where the left-subdivision and the right-subdivision at a recursive step bracket the value you are 'searching' for.
Just a thought I had... Sort A (since its static you can just sort it once before you start), and then take a guess of what index to start guessing at (say A is length 100, you want 1200, 100*(200/1000) = 20) so guess starting at that guess, and then if A[guess] is higher than 1200, check the value at A[guess-1]. If it is still higher, keep going down until you find one that is higher and one that is lower. Once you find that determine what is closer. if your initial guess was too low, keep going up.
This won't be great and might not be the best performance wise, but it would be a lot better than checking every single value, and will work quite well if A is evenly spaced between 1000 and 2000.
Good luck!
public function nearestNumber(value:Number,list:Array):Number{
var currentNumber:Number = list[0];
for (var i:int = 0; i < list.length; i++) {
if (Math.abs(value - list[i]) < Math.abs(value - currentNumber)){
currentNumber = list[i];
}
}
return currentNumber;
}

How to represent a binomial tree in memory

I've got such a structure, is described as a "binomial tree". Let'see a drawing:
Which is the best way to represent this in memory? Just to clarify, is not a simple binary tree since the node N4 is both the left child of N1 and the right child of N2, the same sharing happens for N7 and N8 and so on... I need a construction algorithm tha easily avoid to duplicates such nodes, but just referencing them.
UPDATE
Many of us does not agree with the "binomial tree deefinition" but this cames from finance ( expecially derivative pricing ) have a look here: http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter45.html for example. So I used the "Domain acceted definition".
You could generate the structure level by level. In each iteration, create one level of nodes, put them in an array, and connect the previous level to them. Something like this (C#):
Node GenerateStructure(int levels)
{
Node root = null;
Node[] previous = null;
for (int level = 1; level <= levels; level++)
{
int count = level;
var current = new Node[count];
for (int i = 0; i < count; i++)
current[i] = new Node();
if (level == 1)
root = current[0];
for (int i = 0; i < count - 1; i++)
{
previous[i].Left = current[i];
previous[i].Right = current[i + 1];
}
previous = current;
}
return root;
}
The whole structure requires O(N^2) memory, where N is the number of level. This approach requires O(N) additional memory for the two arrays. Another approach would be to generate the graph from left to right, but that would require O(N) additional memory too.
The time complexity is obviously O(N^2).
More than a tree, of which I would give a definition like 'connected graph of N vertex and N-1 edges', that structure seems like a Pascal (or Tartaglia, as teached in Italy) triangle. As such, an array with a suitable indexing suffices.
Details on construction depends on your data input: please give some more hint.