Google Apps Script: filter one 2D array based off the closest timestamp of another 2D array

Google Apps Script: filter one 2D array based off the closest timestamp of another 2D array - google-apps-script

I have two 2D-arrays; both of which contain a timestamp and a float-value. Array1 contains hundreds of timestamp-value pairs, while Array2 will generally only contain ~10 timestamp-value pairs.
I am trying to compare Array1 with Array2, and only want to keep Array1's timestamp-value pairings for the timestamps that are closest to the timestamps in Array2.
Array1 = {[59.696, 2020-12-30T00:00:00.000Z],
[61.381, 2020-12-30T00:15:00.000Z],
[59.25, 2020-12-30T00:30:00.000Z],
[57.313, 2020-12-30T00:45:00.000Z],...}
Array2 = {[78.210, 2020-12-30T00:06:00.000Z],
[116.32, 2020-12-30T00:39:00.000Z],...}
So in these Array examples above, I would want Array1 to be filtered down to:
Array1 = {[59.696, 2020-12-30T00:00:00.000Z],
[57.313, 2020-12-30T00:45:00.000Z],...}
because those timestamps are the closest match for timestamps in Array2.
I have tried implementing the suggested code from Find matches for two columns on two sheets google script but can't get it to work, and I can't otherwise find a neat solution for the timestamp matching.
Any help is much appreciated. Let me know if I need to edit or add to my question with more information.

The goal
Given two arrays:
Array1 = [
[59.696, "2020-12-30T00:00:00.000Z"],
[61.381, "2020-12-30T00:15:00.000Z"],
[59.25, "2020-12-30T00:30:00.000Z"],
[57.313, "2020-12-30T00:45:00.000Z"]
]
Array2 = [
[78.210, "2020-12-30T00:06:00.000Z"],
[116.32, "2020-12-30T00:39:00.000Z"]
]
The goal is for each item in Array2 to be matched with the closest date in Array1. So the resultant array of the above example would be 2 items. If Array2 had 100 items and Array1 had 1000, the resultant array would be of 100 items.
I am assuming that each item in Array1 can only be used once.
I am also assuming that the first item in the array, the float value, is to be ignored in the calculation, but kept together with the date and be included in the output.
The script
function filterClosestTimes(array1, array2) {
// Initializing "maps" to keep track of the differences
// closest, will contain the final pairs used
// differences, is used to decide which pair gets used
// positions used, is so that the same pair in array 1
// doesn't get used twice.
closest = []
differences = []
positionsUsed = []
// For each member of array2
array2.forEach((pair, i) => {
// Initializing a date object
targetDate = new Date(pair[1])
// Initializing the position of the current index in the
// tracking arrays.
closest.push(null)
differences.push(Infinity)
positionsUsed.push(null)
// Going through each member of array 1
array1.forEach((subpair, j) => {
// First checking if the position has already been used
if (positionsUsed.includes(j)) return
//initializing date
dateToTest= new Date(subpair[1])
// checking difference between the currently testing date
// of array 2
difference = Math.abs(targetDate - dateToTest)
// Checking if it is the date with the smallest difference
// if so, setting the position in the tracking arrays.
// These values will likely be overwritten many times until
// the date with the least difference is found.
if (differences[i] > difference) {
differences[i] = difference
closest[i] = subpair
positionsUsed[i] = j
}
})
})
return closest
}
function test(){
Logger.log(filterClosestTimes(Array1, Array2))
}
Running test returns:
[["59.696, 2020-12-30T00:00:00.000Z"], ["57.313, 2020-12-30T00:45:00.000Z"]]
Note
This algorithm, seeing as it involves checking every element of one array with almost every one in another array, can get slow. Though if you are only dealing with hundreds of values and comparing with ~10 in your Array2 then it will be fine. Just be aware that this approach has O(n^2) time complexity. Which will mean that as the number of comparisons go up, the time needed to complete the operation increases exponentially. If you try to compare tens of thousands with tens of thousands, then there will be a noticeable wait!
References
JS Date object
Time Complexity

Related

read CSV file in Cplex

my question is related to my previous question. I should make some change in my code. I have a number of nodes between 1 to 100 in the CSV file. I create another CSV file and generate 20 random numbers between the 100 nodes and called them demand points. Each of this demand point has specific demands which are the randomly generate numbers between 1 to 10. I want to read this demand points(the indexes) and their weights. this is the first part of my question? how can I read this?
After that, I need to have a distance between each of these demand points and all nodes. I don't how can I just read the indexes of demand points and calculate the distance between them and all the nodes.
Based on the code that I provided, I need the indexes of demand points for a lot of places. My main problem is that I don't know how should I get these indexes in Cplex through the CSV file.
The demand points with their demands picture is:
first column is demandpointindex and second column in their demands
this file has 200 rows
I tried this code for reading the demand points:
tuple demands
{
int demandpoint;
int weight;
}
{demands} demand={};
execute
{
var f=new IloOplInputFile("weight.csv");
while (!f.eof)
{
var data = f.readline().split(",");
if (ar.length==2)
demand.add(Opl.intValue(ar[0]),Opl.intValue(ar[1]));
}
f.close();
}
execute
{
writeln(demand);
}
but it's not true.
int n=100;
int p=5;
tuple demands
{
int demandpointindex;
int weight;
}
{demands} demand={};
execute
{
var f=new IloOplInputFile("weight.csv");
while (!f.eof)
{
var data = f.readline().split(",");
if (ar.length==2)
demand.add(Opl.intValue(ar[0]),Opl.intValue(ar[1]));
}
f.close();
}
execute
{
writeln(demand);
}
float d[demandpointindexes][facilities];
execute {
var f = new IloOplInputFile("test1.csv");
while (!f.eof) {
var data = f.readline().split(",");
if (data.length == 3)
d[Opl.intValue(data[0])][Opl.intValue(data[1])] = Opl.floatValue(data[2]);
}
writeln(d);
}
dvar boolean x[demandpointindexe][facilities];
...

I hope I got your explanation right. Assume you have a file weight.csv like this:
1,2,
3,7,
4,9,
Here the first item in each row is the index of a demand point, the second item is its weight. Then you can parse this as before using this scripting block:
tuple demandpoint {
int index;
int weight;
}
{demandpoint} demand={};
execute {
var f = new IloOplInputFile("weight.csv");
while (!f.eof) {
var data = f.readline().split(",");
if (data.length == 3)
demand.add(Opl.intValue(data[0]), Opl.intValue(data[1]));
}
writeln(demand);
}
Next you can create a set that contains the indeces of all the demand points:
{int} demandpoints = { d.index | d in demand };
Assume file test1.csv looks like this
1,1,0,
1,2,5,
1,3,6,
1,4,7,
3,1,1,
3,2,1.5,
3,3,0,
3,4,3.5,
4,1,1,
4,2,1.5,
4,3,1.7,
4,4,0,
Here the first item is a demand point index, the second item is a facility index and the third item is the distance between first and second item. Note that there are no lines that start with 2 since there is no demand point with index 2 in weight.csv. Also note that I assume only 4 facilities here (to keep the file short). You can read the distance between demand points and facitilies as follows:
range facilities = 1..4;
float d[demandpoints][facilities];
execute {
var f = new IloOplInputFile("test1.csv");
while (!f.eof) {
var data = f.readline().split(",");
if (data.length == 4)
d[Opl.intValue(data[0])][Opl.intValue(data[1])] = Opl.floatValue(data[2]);
}
writeln(d);
}
The full script (including a dummy objective and constraints so that it can be run) looks:
tuple demandpoint {
int index;
int weight;
}
{demandpoint} demand={};
execute {
var f = new IloOplInputFile("weight.csv");
while (!f.eof) {
var data = f.readline().split(",");
if (data.length == 3)
demand.add(Opl.intValue(data[0]), Opl.intValue(data[1]));
}
writeln(demand);
}
// Create a set that contains all the indeces of demand points
// as read from weight.csv
{int} demandpoints = { d.index | d in demand };
range facilities = 1..4;
float d[demandpoints][facilities];
execute {
var f = new IloOplInputFile("test1.csv");
while (!f.eof) {
var data = f.readline().split(",");
if (data.length == 4)
d[Opl.intValue(data[0])][Opl.intValue(data[1])] = Opl.floatValue(data[2]);
}
writeln(d);
}
minimize 0;
subject to {}
It prints
{<1 2> <3 7> <4 9>}
[[0 5 6 7]
[1 1.5 0 3.5]
[1 1.5 1.7 0]]
Be careful about how many commas you have in your csv! The code posted above assumes that each line ends with a comma. That is, each line has as many commas as it has fields. If the last field is not terminated by a comma then you have to adapt the parser.
If you have in test1.csv the distance between all the nodes then it makes sense to first read the data into an array float distance[facilities][facilities]; and then define the array d based on that as
float d[i in demandpoints][j in facilities] = distance[i][j];
Update for the more detailed specification you gave in the comments:
In order to handle the test1.csv you explained in the comments you could define a new tuple:
tuple Distance {
int demandpoint;
int facility;
float distance;
}
{Distance} distances = {};
and read/parse this exactly as you did parse the weight.csv file (with one additional field, of course).
Then you can create the distance matrix like so:
float d[i in I][j in J] = sum (dist in distances : dist.demandpoint == i && dist.facility == j) dist.distance;
Here I and J are the sets or ranges of demand points and facilities, respectively. See above for how you can get a set of all demand points defined in the tuple set. The created matrix will have an entry for each demandpoint/distance pair. The trick in the definition d is that there are two cases:
If a pair (i,j) is defined in test1.csv then the sum will match exactly one element in distances: the one that defines the distance between two points.
If a pair (i,j) is not defined in test1.csv then the sum will not match anything and the corresponding entry in the distance matrix will thus be 0.

Make a good hash algorithm for a Scrabble Cheater

I want to build a Scrabble Cheater which stores Strings in an array of linked lists. In a perfect scenario each linked list would only have words with the same permutations of letters (like POOL and LOOP for example). The user would put a String in there like OLOP and the linked list would be printed out.
I want the task to be explicitly solved using hashing.
I have built a stringHashFunction() for that (Java code):
public int stringHashFunction(String wordToHash) {
int hashKeyValue = 7;
//toLowerCase and sort letters in alphabetical order
wordToHash = normalize(wordToHash);
for(int i = 0; i < wordToHash.length(); i++) {
int charCode = wordToHash.charAt(i) - 96;
//calculate the hash key using the 26 letters
hashKeyValue = (hashKeyValue * 26 + charCode) % hashTable.length;
}
return hashKeyValue;
}
Does it look like an OK-hash-function? I realize that it's far from a perfect hash but how could I improve that?
My code overall works but I have the following statistics for now:
Number of buckets: 24043
All items: 24043
The biggest bucket counts: 11 items.
There are: 10264 empty buckets
On average there are 1.7449016619493432 per bucket.
Is it possible to avoid the collisions so that I only have buckets (linked lists) with the same permutations? I think if you have a whole dictionary in there it might be useful to have that so that you don't have to run an isPermutation() method on each bucket every time you want to get some possible permutations on your String.

Finding Median WITHOUT Data Structures

(my code is written in Java but the question is agnostic; I'm just looking for an algorithm idea)
So here's the problem: I made a method that simply finds the median of a data set (given in the form of an array). Here's the implementation:
public static double getMedian(int[] numset) {
ArrayList<Integer> anumset = new ArrayList<Integer>();
for(int num : numset) {
anumset.add(num);
}
anumset.sort(null);
if(anumset.size() % 2 == 0) {
return anumset.get(anumset.size() / 2);
} else {
return (anumset.get(anumset.size() / 2)
+ anumset.get((anumset.size() / 2) + 1)) / 2;
}
}
A teacher in the school that I go to then challenged me to write a method to find the median again, but without using any data structures. This includes anything that can hold more than one value, so that includes Strings, any forms of arrays, etc. I spent a long while trying to even conceive of an idea, and I was stumped. Any ideas?

The usual algorithm for the task is Hoare's Select algorithm. This is pretty much like a quicksort, except that in quicksort you recursively sort both halves after partitioning, but for select you only do a recursive call in the partition that contains the item of interest.
For example, let's consider an input like this in which we're going to find the fourth element:
[ 7, 1, 17, 21, 3, 12, 0, 5 ]
We'll arbitrarily use the first element (7) as our pivot. We initially split it like (with the pivot marked with a *:
[ 1, 3, 0, 5, ] *7, [ 17, 21, 12]
We're looking for the fourth element, and 7 is the fifth element, so we then partition (only) the left side. We'll again use the first element as our pivot, giving (using { and } to mark the part of the input we're now just ignoring).
[ 0 ] 1 [ 3, 5 ] { 7, 17, 21, 12 }
1 has ended up as the second element, so we need to partition the items to its right (3 and 5):
{0, 1} 3 [5] {7, 17, 21, 12}
Using 3 as the pivot element, we end up with nothing to the left, and 5 to the right. 3 is the third element, so we need to look to its right. That's only one element, so that (5) is our median.
By ignoring the unused side, this reduces the complexity from O(n log n) for sorting to only O(N) [though I'm abusing the notation a bit--in this case we're dealing with expected behavior, not worst case, as big-O normally does].
There's also a median of medians algorithm if you want to assure good behavior (at the expense of being somewhat slower on average).
This gives guaranteed O(N) complexity.

Sort the array in place. Take the element in the middle of the array as you're already doing. No additional storage needed.
That'll take n log n time or so in Java. Best possible time is linear (you've got to inspect every element at least once to ensure you get the right answer). For pedagogical purposes, the additional complexity reduction isn't worthwhile.
If you can't modify the array in place, you have to trade significant additional time complexity to avoid avoid using additional storage proportional to half the input's size. (If you're willing to accept approximations, that's not the case.)

Some not very efficient ideas:
For each value in the array, make a pass through the array counting the number of values lower than the current value. If that count is "half" the length of the array, you have the median. O(n^2) (Requires some thought to figure out how to handle duplicates of the median value.)
You can improve the performance somewhat by keeping track of the min and max values so far. For example, if you've already determined that 50 is too high to be the median, then you can skip the counting pass through the array for every value that's greater than or equal to 50. Similarly, if you've already determined that 25 is too low, you can skip the counting pass for every value that's less than or equal to 25.
In C++:
int Median(const std::vector<int> &values) {
assert(!values.empty());
const std::size_t half = values.size() / 2;
int min = *std::min_element(values.begin(), values.end());
int max = *std::max_element(values.begin(), values.end());
for (auto candidate : values) {
if (min <= candidate && candidate <= max) {
const std::size_t count =
std::count_if(values.begin(), values.end(), [&](int x)
{ return x < candidate; });
if (count == half) return candidate;
else if (count > half) max = candidate;
else min = candidate;
}
}
return min + (max - min) / 2;
}
Terrible performance, but it uses no data structures and does not modify the input array.

How to write arbitrary datatypes into Matlab cell array

This is a general question, not related to a particular operation. I would like to be able to write the results of an arbitrary function into elements of a cell array without regard for the data type the function returns. Consider this pseudocode:
zout = cell(n,m);
myfunc = str2func('inputname'); %assume myfunc puts out m values to match zout dimensions
zout(1,:) = myfunc(x,y);
That will work for "inputname" == "strcat" , for example, given that x and y are strings or cells of strings with appropriate dimension. But if "inputname" == "strcmp" then the output is a logical array, and Matlab throws an error. I'd need to do
zout(1,:) = num2cell(strcmp(x,y));
So my question is: is there a way to fill the cell array zout without having to test for the type of variable generated by myfunc(x,y ? Should I be using a struct in the first place (and if so, what's the best way to populate it)?
(I'm usually an R user, where I could just use a list variable without any pain)
Edit: To simplify the overall scope, add the following "requirement" :
Let's assume for now that, for a function which returns multiple outputs, only the first one need be captured in zout . But when this output is a vector of N values or a vector of cells (i.e. Nx1 cell array), these N values get mapped to zout(1,1:N) .

So my question is: is there a way to fill the cell array zout without having to test for the type of variable generated by myfunc(x,y) ? Should I be using a struct in the first place (and if so, what's the best way to populate it)?
The answer provided by #NotBoStyf is almost there, but not quite. Cell arrays are the right way to go. However, the answer very much depends on the number of outputs from the function.
Functions with only one output
The function strcmp has only one output, which is an array. The reason that
zout{1,:} = strcmp(x,y)
gives you an error message, when zout is dimensioned N x 2, is that the left-hand side (zout{1,:}) expects two outputs from the right-hand side. You can fix this with:
[zout{1,:}] = num2cell(strcmp(x,y)); % notice the square brackets on the LHS
However, there's really no reason to do this. You can simply define zout as an N x 1 cell array and capture the results:
zout = cell(1,1);
x = 'a';
y = { 'a', 'b' };
zout{1} = strcmp(x,y);
% Referring to the results:
x_is_y_1 = zout{1}(1);
x_is_y_2 = zout{1}(2);
There's one more case to consider...
Functions with multiple outputs
If your function produces multiple outputs (as opposed to a single output that is an array), then this will only capture the first output. Functions that produce multiple outputs are defined like this:
function [outA,outB] = do_something( a, b )
outA = a + 1;
outB = b + 2;
end
Here, you need to explicitly capture both output arguments. Otherwise, you just get a. For example:
outA = do_something( [1,2,3], [4,5,6] ); % outA is [2,3,4]
[outA,outB] = do_something( [1,2,3], [4,5,6] ); % outA is [2,3,4], outB is [6,7,8]
Z1 = cell(1,1);
Z1{1,1} = do_something( [1,2,3], [4,5,6] ); % Z1{1,1} is [2,3,4]
Z2 = cell(1,2);
Z2{1,1:2} = do_something( [1,2,3], [4,5,6] ); % Same error as above.
% NB: You really never want to have a cell expansion that is not surrounded
% by square brackets.
% Do this instead:
[Z2{1,1:2}] = do_something( [1,2,3], [4,5,6] ); % Z2{1,1} is [2,3,4], Z2{1,2} is [6,7,8]
This can also be done programmatically, with some limits. Let's say we're given function
func that takes one input and returns a constant (but unknown) number of outputs. We
have cell array inp that contains the inputs we want to process, and we want to collect the results in cell around outp:
N = numel(inp);
M = nargout(#func); % number of outputs produced by func
outp = cell(N,M);
for i=1:N
[ outp{i,:} ] = func( inp{i} );
end
This approach has a few caveats:
It captures all of the outputs. This is not always what you want.
Capturing all of the outputs can often change the behavior of the function. For example, the find function returns linear indices if only one output is used, row/column indices if two outputs are used, and row/column/value if three outputs are used.
It won't work for functions that have a variable number of outputs. These functions are defined as function [a,b,...,varargout] = func( ... ). nargout will return a negative number if the function has varargout declared in its output list, because there's no way for Matlab to know how many outputs will be produced.
Unpacking array and cell outputs into a cell
All true so far, but: what I am hoping for is a generic solution. I can't use num2cell if the function produces cell outputs. So what worked for strcmp will fail for strcat and vice versa. Let's assume for now that, for a function which returns multiple outputs, only the first one need be captured in zout – Carl Witthoft
To provide a uniform output syntax for all functions that return either a cell or an array, use an adapter function. Here is an example that handles numeric arrays and cells:
function [cellOut] = cellify(input)
if iscell(input)
cellOut = input;
elseif isnumeric(input)
cellOut = num2cell(input);
else
error('cellify currently does not support structs or objects');
end
end
To unpack the output into a 2-D cell array, the size of each output must be constant. Assuming M outputs:
N = numel(inp);
% M is known and constant
outp = cell(N,M);
for i=1:N
outp(i,:) = cellify( func( inp{i} ) ); % NB: parentheses instead of curlies on LHS
end
The output can then be addressed as outp{i,j}. An alternate approach allows the size of the output to vary:
N = numel(inp);
% M is not necessary here
outp = cell(N,1);
for i=1:N
outp{i} = cellify( func( inp{i} ) ); % NB: back to curlies on LHS
end
The output can then be addressed as outp{i}{j}, and the size of the output can vary.
A few things to keep in mind:
Matlab cells are basically inefficient pointers. The JIT compiler does not always optimize them as well as numeric arrays.
Splitting numeric arrays into cells can cost quite a bit of memory. Each split value is actually a numeric array, which has size and type information associated with it. In numeric array form, this occurs once for each array. When the array is split, this incurs once for each element.

Use curly braces instead when asigning a value.
Using
zout{1,:} = strcmp(x,y);
instead should work.

Most efficient way to search a sorted matrix?

I have an assignment to write an algorithm (not in any particular language, just pseudo-code) that receives a matrix [size: M x N] that is sorted in a way that all of it's rows are sorted and all of it's columns are sorted individually, and finds a certain value within this matrix. I need to write the most time-efficient algorithm I can think of.
The matrix looks something like:
1 3 5
4 6 8
7 9 10
My idea is to start at the first row and last column and simply check the value, if it's bigger go down and if it's smaller than go left and keep doing so until the value is found or until the indexes are out of bounds (in case the value does not exist). This algorithm works at linear complexity O(m+n). I've been told that it's possible to do so with a logarithmic complexity. Is it possible? and if so, how?

Your matrix looks like this:
a ..... b ..... c
. . . . .
. 1 . 2 .
. . . . .
d ..... e ..... f
. . . . .
. 3 . 4 .
. . . . .
g ..... h ..... i
and has following properties:
a,c,g < i
a,b,d < e
b,c,e < f
d,e,g < h
e,f,h < i
So value in lowest-rigth most corner (eg. i) is always the biggest in whole matrix
and this property is recursive if you divide matrix into 4 equal pieces.
So we could try to use binary search:
probe for value,
divide into pieces,
choose correct piece (somehow),
goto 1 with new piece.
Hence algorithm could look like this:
input: X - value to be searched
until found
divide matrix into 4 equal pieces
get e,f,h,i as shown on picture
if (e or f or h or i) equals X then
return found
if X < e then quarter := 1
if X < f then quarter := 2
if X < h then quarter := 3
if X < i then quarter := 4
if no quarter assigned then
return not_found
make smaller matrix from chosen quarter
This looks for me like a O(log n) where n is number of elements in matrix. It is kind of binary search but in two dimensions. I cannot prove it formally but resembles typical binary search.

and that's how the sample input looks? Sorted by diagonals? That's an interesting sort, to be sure.
Since the following row may have a value that's lower than any value on this row, you can't assume anything in particular about a given row of data.
I would (if asked to do this over a large input) read the matrix into a list-struct that took the data as one pair of a tuple, and the mxn coord as the part of the tuple, and then quicksort the matrix once, then find it by value.
Alternately, if the value of each individual location is unique, toss the MxN data into a dictionary keyed on the value, then jump to the dictionary entry of the MxN based on the key of the input (or the hash of the key of the input).
EDIT:
Notice that the answer I give above is valid if you're going to look through the matrix more than once. If you only need to parse it once, then this is as fast as you can do it:
for (int i = 0; i<M; i++)
for (int j=0; j<N; j++)
if (mat[i][j] == value) return tuple(i,j);
Apparently my comment on the question should go down here too :|
#sagar but that's not the example given by the professor. otherwise he had the fastest method above (check the end of the row first, then proceed) additionally, checking the end of the middlest row first would be faster, a bit of a binary search.
Checking the end of each row (and starting on the end of the middle row) to find a number higher than the checked for number on an in memory array would be fastest, then doing a binary search on each matching row till you find it.

in log M you can get a range of rows able to contain the target (binary search on the first value of rows, binary search on last value of rows, keep only those rows whose first <= target and last >= target) two binary searches is still O(log M)
then in O(log N) you can explore each of these rows, with again, a binary search!
that makes it O(logM x logN)
tadaaaa

public static boolean find(int a[][],int rows,int cols,int x){
int m=0;
int n=cols-1;
while(m<rows&&n>=0){
if(a[m][n]==x)
return1;
else if(a[m][n]>x)
n--;
else m++;
}
}

what about getting the diagonal out, then binary search over the diagonal, start bottom right check if it is above, if yes take the diagonal array position as the column it is in, if not then check if it is below. each time running a binary search on the column once you have a hit on the diagonal (using the array position of the diagonal as the column index). I think this is what was stated by #user942640
you could get the running time of the above and when required (at some point) swap the algo to do a binary search on the initial diagonal array (this is taking into consideration its n * n elements and getting x or y length is O(1) as x.length = y.length. even on a million * million binary search the diagonal if it is less then half step back up the diagonal, if it is not less then binary search back towards where you where (this is a slight change to the algo when doing a binary search along the diagonal). I think the diagonal is better than the binary search down the rows, Im just to tired at the moment to look at the maths :)
by the way I believe running time is slightly different to analysis which you would describe in terms of best/worst/avg case, and time against memory size etc. so the question would be better stated as in 'what is the best running time in worst case analysis', because in best case you could do a brute linear scan and the item could be in the first position and this would be a better 'running time' than binary search...

Here is a lower bound of n. Start with an unsorted array A of length n. Construct a new matrix M according to the following rule: the secondary diagonal contains the array A, everything above it is minus infinity, everything below it is plus infinity. The rows and columns are sorted, and looking for an entry in M is the same as looking for an entry in A.

This is in the vein of Michal's answer (from which I will steal the nice graphic).
Matrix:
min ..... b ..... c
. . .
. II . I .
. . .
d .... mid .... f
. . .
. III . IV .
. . .
g ..... h ..... max
Min and max are the smallest and largest values, respectively. "mid" is not necessarily the average/median/whatever value.
We know that the value at mid is >= all values in quadrant II, and <= all values in quadrant IV. We cannot make such claims for quadrants I and III. If we recurse, we can eliminate one quadrant at each level.
Thus, if the target value is less than mid, we must search quadrants I, II, and III. If the target value is greater than mid, we must search quadrants I, III, and IV.
The space reduces to 3/4 its previous at each step:
n * (3/4)x = 1
n = (4/3)x
x = log4/3(n)
Logarithms differ by a constant factor, so this is O(log(n)).
find(min, max, target)
if min is max
if target == min
return min
else
return not found
else if target < min or target > max
return not found
else
set mid to average of min and max
if target == mid
return mid
else
find(b, f, target), return if found
find(d, h, target), return if found
if target < mid
return find(min, mid, target)
else
return find(mid, max, target)

JavaScript solution:
//start from the top right corner
//if value = el, element is found
//if value < el, move to the next row, element can't be in that row since row is sorted
//if value > el, move to the previous column, element can't be in that column since column is sorted
function find(matrix, el) {
//some error checking
if (!matrix[0] || !matrix[0].length){
return false;
}
if (!el || isNaN(el)){
return false;
}
var row = 0; //first row
var col = matrix[0].length - 1; //last column
while (row < matrix.length && col >= 0) {
if (matrix[row][col] === el) { //element is found
return true;
} else if (matrix[row][col] < el) {
row++; //move to the next row
} else {
col--; //move to the previous column
}
}
return false;
}

this is wrong answer
I am really not sure if any of the answers are the optimal answers. I am going at it.
binary search first row, and first column and find out the row and column where "x" could be. you will get 0,j and i,0. x will be on i row or j column if x is not found in this step.
binary search on the row i and the column j you found in step 1.
I think the time complexity is 2* (log m + log n).
You can reduce the constant, if the input array is a square (n * n), by binary searching along the diagonal.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Google Apps Script: filter one 2D array based off the closest timestamp of another 2D array - google-apps-script

Related

read CSV file in Cplex

Make a good hash algorithm for a Scrabble Cheater

Finding Median WITHOUT Data Structures

How to write arbitrary datatypes into Matlab cell array

Most efficient way to search a sorted matrix?

Categories

Resources