What data structures can efficiently store 2-d "grid" data? - language-agnostic

I am trying to write an application that performs operations on a grid of numbers, where each time a function runs the value of each cell is changed, and the value of each cell is dependent on its neighbours. The value of each cell would be a simple integer.
What would be the best way of storing my data here? I've considered both a flat list/array structure, but that seems ineffective as I have to repeatedly do calculations to work out which cell is 'above' the current cell (when there is an arbitrary grid size) and nested lists, which doesn't seem to be a very good way of representing the data.
I can't help but feel there must be a better way of representing this data in memory for this sort of purpose. Any ideas?
(note, I don't think this is really a subjective question - but stack overflow seems to think it is.. I'm kinda hoping there's an accepted way this sort of data is stored)

Here are a few approaches. I'll (try to) illustrate these examples with a representation of a 3x3 grid.
The flat array
+---+---+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+---+---+---+---+---+---+---+---+---+
a[row*width + column]
To access elements on the left or right, subtract or add 1 (take care at the row boundaries). To access elements above or below, subtract or add the row size (in this case 3).
The two dimensional array (for languages such as C or FORTRAN that support this)
+-----+-----+-----+
| 0,0 | 0,1 | 0,2 |
+-----+-----+-----+
| 1,0 | 1,1 | 1,2 |
+-----+-----+-----+
| 2,0 | 2,1 | 2,2 |
+-----+-----+-----+
a[row,column]
a[row][column]
Accessing adjacent elements is just incrementing or decrementing either the row or column number. The compiler is still doing exactly the same arithmetic as in the flat array.
The array of arrays (eg. Java)
+---+ +---+---+---+
| 0 |-->| 0 | 1 | 2 |
+---+ +---+---+---+
| 1 |-->| 0 | 1 | 2 |
+---+ +---+---+---+
| 2 |-->| 0 | 1 | 2 |
+---+ +---+---+---+
a[row][column]
In this method, a list of "row pointers" (represented on the left) each is a new, independent array. Like the 2-d array, adjacent elements are accessed by adjusting the appropriate index.
Fully linked cells (2-d doubly linked list)
+---+ +---+ +---+
| 0 |-->| 1 |-->| 2 |
| |<--| |<--| |
+---+ +---+ +---+
^ | ^ | ^ |
| v | v | v
+---+ +---+ +---+
| 3 |-->| 4 |-->| 5 |
| |<--| |<--| |
+---+ +---+ +---+
^ | ^ | ^ |
| v | v | v
+---+ +---+ +---+
| 6 |-->| 7 |-->| 8 |
| |<--| |<--| |
+---+ +---+ +---+
This method has each cell containing up to four pointers to its adjacent elements. Access to adjacent elements is through the appropriate pointer. You will need to still keep a structure of pointers to elements (probably using one of the above methods) to avoid having to step through each linked list sequentially. This method is a bit unwieldy, however it does have an important application in Knuth's Dancing Links algorithm, where the links are modified during execution of the algorithm to skip over "blank" space in the grid.

If lookup time is important to you, then a 2-dimensional array might be your best choice since looking up a cell's neighbours is a constant time operation given the (x,y) coordinates of the cell.

Further to my comment, you may find the Hashlife algorithm interesting.
Essentially (if I understand it correctly), you store your data in a quad-tree with a hash table pointing to nodes of the tree. The idea here is that the same pattern may occur more than once in your grid, and each copy will hash to the same value, thus you only have to compute it once.
This is true for Life, which is a grid of mostly-false booleans. Whether it's true for your problem, I don't know.

A dynamically allocated array of arrays makes it trivial to point to the cell above the current cell, and supports arbitrary grid sizes as well.

You should abstract from how you store your data.
If you need to do relative operations inside array, Slice is the common patterd to do it.
You could have something like this:
public interface IArray2D<T>
{
T this[int x, int y] { get; }
}
public class Array2D<T> : IArray2D<T>
{
readonly T[] _values;
public readonly int Width;
public readonly int Height;
public Array2D(int width, int height)
{
Width = width;
Height = height;
_values = new T[width * height];
}
public T this[int x, int y]
{
get
{
Debug.Assert(x >= 0);
Debug.Assert(x < Width);
Debug.Assert(y >= 0);
Debug.Assert(y < Height);
return _values[y * Width + x];
}
}
public Slice<T> Slice(int x0, int y0)
{
return new Slice<T>(this, x0, y0);
}
}
public class Slice<T> : IArray2D<T>
{
readonly IArray2D<T> _underlying;
readonly int _x0;
readonly int _y0;
public Slice(IArray2D<T> underlying, int x0, int y0)
{
_underlying = underlying;
_x0 = x0;
_y0 = y0;
}
public T this[int x, int y]
{
get { return _underlying[_x0 + x, _y0 + y]; }
}
}

Related

CUDD: Manipulation of BDDs

I'm working with CUDD C++ interface (https://github.com/ivmai/cudd) but there is almost no information about this library. I would like to know how to remove one variable according to its value.
For example, I have now the next table stored in a bdd:
|-----|-----|-----|
| x1 | x2 | y |
|-----|-----|-----|
| 0 | 0 | 1 |
|-----|-----|-----|
| 0 | 1 | 1 |
|-----|-----|-----|
| 1 | 0 | 1 |
|-----|-----|-----|
| 1 | 1 | 0 |
|-----|-----|-----|
And I want to split the previous table in two separate bdds according to the value of x2 and remove that node afterwards:
If x2 = 0:
|-----|-----|
| x1 | y |
|-----|-----|
| 0 | 1 |
|-----|-----|
| 1 | 1 |
|-----|-----|
If x2 = 1:
|-----|-----|
| x1 | y |
|-----|-----|
| 0 | 1 |
|-----|-----|
| 1 | 0 |
|-----|-----|
Is it possible?
The reason that there is almost no documentation on the C++ interface of the CUDD library is that it is just a wrapper for the C functions, for which there is plenty of documentation.
The C++ wrapper is mainly useful for getting rid of all the Cudd_Ref(...) and Cudd_RecursiveDeref(...) calls that code using the C interface would need to do. Note that you can use the C interface from C++ code as well, if you want.
To do what you want to do, you have to combine the Boolean operators offered by CUDD in a way that you obtain a new Boolean function with the desired properties.
The first step is to restrict s to the x=0 and x=1 case:
BDD s0 = s & !x;
BDD s1 = s & x;
As you noticed, the new BDDs are not (yet) oblivious to the value of the x variable. You want them to be "don't care" w.r.t to the value of x. Since you already know that x is restricted to one particular value in s0 and s1, you can use the existential abstraction operator:
s0 = s0.ExistAbstract(x);
s1 = s1.ExistAbstract(x);
Note that x is used here as a so-called cube (see below).
These are now the BDDs that you want.
Cube explanation: If you abstract from multiple variables at the same time, you should compute such a cube from all the variables to be abstracted from first. A cube is mainly used for representing a set of variables. From mathematical logic, it is known that if you existentially or universally abstract away multiple variables, then the order to abstracting away these variables does not matter. Since the recursive BDD operations in CUDD are implemented over pairs (or triples) of BDDs whenever possible, CUDD internally represents a set of variables as a cube as well, so that an existential abstraction operation can just work on (1) the BDD for which existential abstraction is to be performed, and (2) the BDD representing the set of variables to be abstracted from. The internal representation of a cube as a BDD should not be of relevance to a developer just using CUDD (rather than extending CUDD), except that a BDDD representing a variable can also be used as a cube.
An approach using the Cython bindings to CUDD of the Python package dd is the following, which uses substitution of constant values for variable x2.
import dd.cudd as _bdd
bdd = _bdd.BDD()
bdd.declare('x1', 'x2')
# negated conjunction of the variables x1 and x2
u = bdd.add_expr(r'~ (x1 /\ x2)')
let = dict(x2=False)
v = bdd.let(let, u)
assert v == bdd.true, v
let = dict(x2=True)
w = bdd.let(let, u)
w_ = bdd.add_expr('~ x1')
assert w == w_, (w, w_)
The same code runs in pure Python by changing the import statement to import dd.autoref as _bdd. The pure Python version of dd can be installed with pip install dd. Installation of dd with the module dd.cudd is described here.

Whats the best way to retrieve array data from MySql

I'm storing a object / data structure like this inside a MySql (actually a MariaDb) database:
{
idx: 7,
a: "content A",
b: "content B",
c: ["entry c1", "entry c2", "entry c3"]
}
And to store it I'm using 2 tables, very similar to the method described in this answer: https://stackoverflow.com/a/17371729/3958875
i.e.
Table 1:
+-----+---+---+
| idx | a | b |
+-----+---+---+
Table 2:
+------------+-------+
| owning_obj | entry |
+------------+-------+
And then made a view that joins them together, so I get this:
+-----+------------+------------+-----------+
| idx | a | b | c |
+-----+------------+------------+-----------+
| 7 | content A1 | content B1 | entry c11 |
| 7 | content A1 | content B1 | entry c21 |
| 7 | content A1 | content B1 | entry c31 |
| 8 | content A2 | content B2 | entry c12 |
| 8 | content A2 | content B2 | entry c22 |
| 8 | content A2 | content B2 | entry c32 |
+-----+------------+------------+-----------+
My question is what is the best way I can get it back to my object form? (e.g. I want an array of the object type specified above of all entries with idx between 5 and 20)
There are 2 ways I can think of, but both seem to be not very efficient.
Firstly we can just send this whole table back to the server, and it can make a hashmap with the keys being the primary key or some other unique index, and collect up the different c columns, and rebuild it that way, but that means it has to send a lot of duplicate data, and take a bit more memory and processing time to rebuild on the server. This method also won't be very pleasant to scale if we have multiple arrays, or have arrays within arrays.
Second method would be to do multiple queries, filter Table 1 and get back the list of idx's you want, and then for each idx, send a query for Table 2 where owning_obj = current idx. This would mean sending a whole lot more queries.
Neither of these options seems very good, so I'm wondering if there is a better way. Currently I'm thinking it can be something that utilizes JSON_OBJECT(), but I'm not sure how.
This seems like a common situation, but I can't seem to find the exact wording to search for to get the answer.
PS: The server interfacing with MySql/MariaDb is written in Rust, don't think this is relevant in this question though
You can use GROUP_CONCAT to combine all the c values into a comma-separated string.
SELECT t1.idx, t1.a, t1.b, GROUP_CONCAT(entry) AS c
FROM table1 AS t1
LEFT JOIN table2 AS t2 ON t1.idx = t2.owning_obj
GROUP BY t1.idx
Then explode the string in PHP:
$result_array = [];
while ($row = $result->fetch_assoc()) {
$row['c'] = explode(',', $row['c']);
$result_array[] = $row;
}
However, if the entries can be long, make sure you increase group_concat_max_len.
If you're using MySQL 8.0 you can also use JSON_ARRAYAGG(). This will create a JSON array of the entry values, which you can convert to a PHP array using json_decode(). This is a little safer, since GROUP_CONCAT() will mess up if any of the values contain comma. You can change the separator, but you need a separator that will never be in any values. Unfortunately, this isn't in MariaDB.

postgresql, select multiple json_array_elements works so werid

I want use json_array_elements to expands json array. But it works so werid. Pls see below.
select json_array_elements('[1, 2]') as a, json_array_elements('[2, 3, 4]') as b;
a | b
---+---
1 | 2
2 | 3
1 | 4
2 | 2
1 | 3
2 | 4
(6 rows)
select json_array_elements('[1, 2]') as a, json_array_elements('[2, 3]') as b;
a | b
---+---
1 | 2
2 | 3
(2 rows)
It's seems when the length of the arrays are equal, something goes wrong.
Can anyone tell me, why is like this.
PostgreSQL repeats each list until both happen to be at the end simultaneously.
In other words, the length of the result list is the least common multiple of the length of the input lists.
This behaviour is indeed weird, and will be changed in PostgreSQL v10:
select json_array_elements('[1, 2]') as a, json_array_elements('[2, 3, 4]') as b;
a | b
---+---
1 | 2
2 | 3
| 4
(3 rows)
From the commit message:
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.

MYSQL Query to replace a section of a string

I have a ton of positions saved in a database in the following format
Examples:
[32.306,[7195.4,9414.24,0.005]]
[219.184,[7197.41,9416.66,-0.003]]
[161.215,[1170.26,4852.79,3.815e-04]]
[37.338,[479.163,3757.15,-0.005]]
[11.719,[12436.5,4780.36,-9.46e-04]]
The coordinates are in the format [DIRECTION,[X,Y,Z]]
I would like to replace all of the Z coordinates with 0. Been struggling with finding the correct way of doing this in an SQL query.
Query:
SQLFIDDLEExample
CONCAT(SUBSTRING_INDEX(col,',',3), ',0]]') col
Result:
| COL |
|-------------------------------|
| [32.306,[7195.4,9414.24,0]] |
| [219.184,[7197.41,9416.66,0]] |
| [161.215,[1170.26,4852.79,0]] |
| [37.338,[479.163,3757.15,0]] |
| [11.719,[12436.5,4780.36,0]] |
I would chop up the your string field using SUBSTRING and cast the values to floats and store them in 4 different fields (direction, x, y and z).
Then you can easily update the Z value (and any other value)
When you need the more complex string representation, just concatenate the 4 fields by casting it back to varchar.

Record field overwriting other fields

I'm writing an SDL/input library for my game in Free Pascal, and I'm facing an issue.
I've got a variant record that, when I access an element of it, changes the other elements.
The record type is:
tInput = Record
case Device: TInputDevice of
ID_KeyOnce, ID_KeyCont: (Key: TSDLKey);
ID_MouseButton: (MouseButton: Byte);
ID_MouseAxis, ID_JoyAxis,
ID_JoyBall, ID_JoyHat: (Axis: Byte);
ID_JoyButton, ID_JoyButtonOnce, ID_JoyAxis,
ID_JoyHat, ID_JoyBall: (Which: Byte);
ID_JoyButton, ID_JoyButtonOnce: (Button: Byte);
end;
The code that crashes it is:
with Input do begin
Device := ID_JoyAxis;
Which := 0;
Axis := 1;
end;
When axis is set to one, all of the other variables in the record go to one two!
Is this a known bug? Or some functionality I'm not aware of? Or something I've screwed up?
This is called a union and intended behavior of this type of record declaration.
case Device : TInputDevice of
... is the "magic" here.
In a union the storage of members is "shared".
Edit: taking the record you have in terms of byte offsets (... under the assumption that sizeof(TSDLKey) = 4):
------------------------------------------------
00 | Key | MouseButton | Axis | Which | Button |
---| |-------------|------|-------|--------|
01 | | | | | |
---| |-------------|------|-------|--------|
02 | | | | | |
---| |-------------|------|-------|--------|
03 | | | | | |
------------------------------------------------
By the rules I know, TInputDevice should be an enum type, otherwise you'd have to explicitly give Integer there:
type xyz = record
case integer of
0: (foo: Byte);
1: (bar: Integer);
end;
NB: it is customary for variant types to have one member describe which of the union members should be picked and valid (nested unions).