How to separate/partition polygons into existing regions? - gis

I'm facing a problem regarding "partitioning"/subsetting polygons into regions (bigger polygons) so that each region should have disjoint meaningful elements.
For example, we have the following regions/polygons. At a given time, we know only the form of one region (let's say R1 for now).
It is clear that L3 would belong to R1.
How about L1, L2 and P1?
I thought about creating bounding boxes around them and check if the South-East coordinate( minX and minY) belongs to R1.
In this way L1 would belong to R2, even though it doesn't even crosses R2.
Do you have any concrete idea what I should look into for these sort of algorithms or how to solve this space separation problem?

If your regions and polygons are all described as polygons (discrete sequences of vertices), you can resort to the available polygon clipping techniques.
In particular, have a look at the Sutherland–Hodgman and Weiler–Atherton techniques.
Some optimization is possible if preprocessing of the windows is allowed (when there are many subject polygons for the same windows), using scanline techniques. This is a little more sophisticated.
The case of line segment entities is a little easier.

Related

Area of overlap of two circular Gaussian functions

I am trying to write code that finds the overlap of between 3D shapes.
Each shape is defined by two intersecting normal distributions (one in the x direction, one in the y direction).
Do you have any suggestions of existing code that addresses this question or functions that I can utilize to build this code? Most of my programming experience has been in R, but I am open to solutions in other languages as well.
Thank you in advance for any suggestions and assistance!
The longer research context on this question: I am studying the use of acoustic space by insects. I want to know whether randomly assembled groups of insects would have calls that are more or less similar than we observe in natural communities (a randomization test). To do so, I need to randomly select insect species and calculate the similarity between their calls.
For each species, I have a mean and variance for two call characteristics that are approximately normally distributed. I would like to use these two call characteristics to build a 3D probability distribution for the species. I would then like to calculate the amount by which the PDF for one species overlaps with another.
Please accept my apologies if the question is not clear or appropriate for this forum.
I work in small molecule drug discovery, and I frequently use a program (ROCS, by OpenEye Scientific Software) based on algorithms that represent molecules as collections of spherical Gaussian functions and compute intersection volumes. You might look at the following references, as well as the ROCS documentation:
(1) Grant and Pickup, J. Phys. Chem. 1995, 99, 3503-3510
(2) Grant, Gallardo, and Pickup, J. Comp. Chem. 1996, 17, 1653-1666

Efficient represention for growing circles in 2D space?

Imagines there's a 2D space and in this space there are circles that grow at different constant rates. What's an efficient data structure for storing theses circles, such that I can query "Which circles intersect point p at time t?".
EDIT: I do realize that I could store the initial state of the circles in a spatial data structure and do a query where I intersect a circle at point p with a radius of fastest_growth * t, but this isn't efficient when there are a few circles that grow extremely quickly whereas most grow slowly.
Additional Edit: I could further augment the above approach by splitting up the circles and grouping them by there growth rate, then applying the above approach to each group, but this requires a bounded time to be efficient.
Represent the circles as cones in 3d, where the third dimension is time. Then use a BSP tree to partition them the best you can.
In general, I think the worst-case for testing for intersection is always O(n), where n is the number of circles. Most spacial data structures work by partitioning the space cleverly so that a fraction of the objects (hopefully close to half) are in each half. However, if the objects overlap then the partitioning cannot be perfect; there will always be cases where more than one object is in a partition. If you just think about the case of two circles overlapping, there is no way to draw a line such that one circle is entirely on one side and the other circle is entirely on the other side. Taken to the logical extreme, assuming arbitrary positioning of the circles and arbitrary radiuses, there is no way to partition them such that testing for intersection takes O(log(n)).
This doesn't mean that, in practice, you won't get a big advantage from using a tree, but the advantage you get will depend on the configuration of the circles and the distribution of the queries.
This is a simplified version of another problem I have posted about a week ago:
How to find first intersection of a ray with moving circles
I still haven't had the time to describe the solution that was expected there, but I will try to outline it here(for this simplar case).
The approach to solve this problem is to use a kinetic KD-tree. If you are not familiar with KD trees it is better to first read about them. You also need to add the time as additional coordinate(you make the space 3d instead of 2d). I have not implemented this idea yet, but I believe this is the correct approach.
I'm sorry this is not completely thought through, but it seems like you might look into multiplicatively-weighted Voronoi Diagrams (MWVDs). It seems like an adversary could force you into computing one with a series of well-placed queries, so I have a feeling they provide a lower-bound to your problem.
Suppose you compute the MWVD on your input data. Then for a query, you would be returned the circle that is "closest" to your query point. You can then determine whether this circle actually contains the query point at the query time. If it doesn't, then you are done: no circle contains your point. If it does, then you should compute the MWVD without that generator and run the same query. You might be able to compute the new MWVD from the old one: the cell containing the generator that was removed must be filled in, and it seems (though I have not proved it) that the only generators that can fill it in are its neighbors.
Some sort of spatial index, such as an quadtree or BSP, will give you O(log(n)) access time.
For example, each node in the quadtree could contain a linked list of pointers to all those circles which intersect it.
How many circles, by the way? For small n, you may as well just iterate over them. If you constantly have to update your spatial index and jump all over cache lines, it may end up being faster to brute-force it.
How are the centres of your circles distributed? If they cover the plane fairly evenly you can discretise space and time, then do the following as a preprocessing step:
for (t=0; t < max_t; t++)
foreach circle c, with centre and radius (x,y,r) at time t
for (int X = x-r; X < x+r; x++)
for (int Y = x-r; Y < y+r; y++)
circles_at[X][Y][T].push_back (&c)
(assuming you discretise space and time along integer boundaries, scale and offset however you like of course, and you can add circles later on or amortise the cost by deferring calculation for distant values of t)
Then your query for point (x,y) at time (t) could do a brute-force linear check over circles_at[x][y][ceil(t)]
The trade-off is obvious, increasing the resolution of any of the three dimensions will increase preprocessing time but give you a smaller bucket in circles_at[x][y][t] to test.
People are going to make a lot of recommendations about types of spatial indices to use, but I would like to offer a bit of orthogonal advice.
I think you are best off building a few indices based on time, i.e. t_0 < t_1 < t_2 ...
If a point intersects a circle at t_i, it will also intersect it at t_{i+1}. If you know the point in advance, you can eliminate all circles that intersect the point at t_i for all computation at t_{i+1} and later.
If you don't know the point in advance, then you can keep these time-point trees (built based on the how big each circle would be at a given time). At query time (e.g. t_query), find i such that t_{i-1} < t_query <= t_i. If you check all the possible circles at t_i, you will not have any false negatives.
This is sort of a hack for a data structure that is "time dynamics aware", but I don't know of any. If you have a threaded environment, then you only need to maintain one spacial index and be working on the next one in the background. It will cost you a lot of computation for the benefit of being able to respond to queries with low latency. This solution should be compared at the very least to the O(n) solution (go through each point and check if dist(point, circle.center) < circle.radius).
Instead of considering the circles, you can test on their bounding boxes to filter out the ones which do not contain the point. If your bounding box sides are all sorted, this is essentially four binary searches.
The tricky part is reconstructing the sorted sides for any given time, t. To do that, you can start off with the original points: two lists for the left and right sides with the x coordinate, and two lists for top and bottom with the y coordinates. For any time greater than 0, all the left side points will move to left, etc. You only need to check each location to the one next to it to obtain a points where the element and the one next to it are are swapped. This should give you a list of time points to modify your ordered lists. If you now sort these modification records by time, for any given starting time and an ending time you can extract all the modification records between the two, and apply them to your four lists in order. I haven't completely figured out the algorithm, but I think there will be edge cases where three or more successive elements can cross over exactly at the same time point, so you may need to modify the algorithm to handle those edge cases as well. Perhaps a list modification record that contains the position in list, and the number of records to reorder would suffice.
I think it's possible to create a binary tree that solves this problem.
Each branch should contain a growing circle, a static circle for partitioning and the latest time at which the partitioning circle cleanly partitions. Further more the growing circle that is contained within a node should always have a faster growing rate than either of it's child nodes' growing circles.
To do a query, take the root node. First check it's growing circle, if it contains the query point at the query time, add it to the answer set. Then, if the time that you're querying is greater than the time at which the partition line is broken, query both children, otherwise if the point falls within the partitioning circle, query the left node, else query the right node.
I haven't quite completed the details of performing insertions, (the difficult part is updating the partition circle so that the number of nodes on the inside and outside is approximately equal and the time when the partition is broken is maximized).
To combat the few circles that grow quickly case, you could sort the circles in descending order by rate of growth and check each of the k fastest growers. To find the proper k given t, I think you can perform a binary search to find the index k such that k*m = (t * growth rate of k)^2 where m is a constant factor you'll need to find by experimentation. The will balance the part the grows linearly with k with the part that falls quadratically with the growth rate.
If you, as already suggested, represent growing circles by vertical cones in 3d, then you can partition the space as regular (may be hexagonal) grid of packed vertical cylinders. For each cylinder calculate minimal and maximal heights (times) of intersections with all cones. If circle center (vertex of cone) is placed inside the cylinder, then minimal time is zero. Then sort cones by minimal intersection time. As result of such indexing, for each cylinder you’ll have the ordered sequence of records with 3 values: minimal time, maximal time and circle number.
When you checking some point in 3d space, you take the cylinder it belongs to and iterate its sequence until stored minimal time exceeds the time of the given point. All obtained cones, which maximal time is less than given time as well, are guaranteed to contain given point. Only cones, where given time lies between minimal and maximal intersection times, are needed to recalculate.
There is a classical tradeoff between indexing and runtime costs – the less is the cylinder diameter, the less is the range of intersection times, therefore fewer cones need recalculation at each point, but more cylinders have to be indexed. If circle centers are distributed non-evenly, then it may be worth to search better cylinder placement configuration then regular grid.
P.S. My first answer here - just registered to post it. Hope it isn’t late.

multi-level floor plan graph mapping

There's a collection of buildings each having multiple floors that are interconnected by stairs and lifts. Currently, I'm attempting to design a system that will find the shortest-path between two points across the any of the buildings, being the same building or in another building.
At the moment each floor is modeled in a graph as follows:
the door of each room is a vertex. the junctions of edges connecting the rooms to the main edge(corridor) is also a vertex.
The stairs between the floors are edges.
The question that remains is how should I represent the lifts(elevators) (which are right next to the stairs)?
To have it as an edge makes me wonder what weight it should have, given that I'll have to run a graph traversal algorithm after for finding the shortest path.
Lift(elevator) as edge or as vertex? That is the question.
thanks!
Edges
Using an edge is the most immediate answer, as you do that for stairs. However, while stairs can only go from floor X to floor X+1, a lift can go from any floor to any floor, with slightly different times - I usually find the stairs quicker for two floors, but slower for more than 2. To mirror this you'll need an edge from every floor to every other floor, complete with weightings for each.
Vertices
You could instead have some additional vertices as well as edges. If you had a vertex at each floor of the liftshaft, then you'd only need a single path of edges connecting all the floors together, rather than a combinatorial number of edges.
If you also added an additional vertex outside the doors at each level, then you could add the average delay for getting into a lift and so reflect the fact that a lift can pass multiple floors quickly. However, lifts are going to need average timings at best. At busy times, they can end up stopping at almost every floor anyway, so for a busy campus you wouldn't really gain from these extra vertices.
My vote is for a vertex for each floor of the lift and a single edge to link adjacent floors. It should simplify the graph and reduce the effort of any path-optimisation algorithm as there are fewer paths. Plus it is a more accurate reflection of reality and minimises your workload to set up the edge weights.
If the lifts are a possible shortest path from one floor to the next, then they must be edges with weights. The entrances to each level are vertices. If close enough to the stairs then they are possible shared with the stair vertices.
I vote for edge.
Say you choose to use an elevator. You walk to it, press button and wait a bit. You then get in, wait some more, get out and continue your walk. Now, although you are physically not moving much, in time you are moving. Taking a lift between floors is like walking, say, 50 meters.
What I mean is that the time spent standing around the elevator is equivalent to a distance that you travel if walking. So treat the elevator as an edge that you are walking along during the duration that you are using it. Use that distance to compare, say, walking down the stairs.

GIS: Converting multi polygons to multiple features

I am involved in a GIS project. I have a base map file (shape file) that contains the road layer for a large portion of a town. The problem is that the shape file contains only two features each containing around 500000 points each. The features are multipolygons containing a large no of polygons inside. I wish to convert it to numerous features each containing not more than one polygon. Is it possible? If yes, how?
Seems like what you have here is a multi-part feature. If you are using ArcGIS, you need to add the advance editor toolbar in your arcmap. Start an editing session and use the explode multi-part feature tool and then you will have one geometry for each record.
If you have connectivity information (e.g. you have polygons and not just points) it's not too tough to do a decent job of polygon reduction.
What I've done in the past consisted of two steps.
Any vertex that is surrounded by polygons, all of which are coplanar, can be removed. I did this by "sliding" the vertex to a neighbor vertex, that neighbor getting all of the test vertex's neighbors and any triangles that become degenerate (e.g. any triangles shared between the two vertices) were removed.
Any vertex which has two edges leaving opposite one another, where the polygons on either side are either completely nonexistent or are coplanar can also be similarly collapsed into a neighbor vertex, but obviously only one that is along one of the parallel edges.
note-
Two polygons are coplanar if they share at least one point in common and if they have the same normal. Since the candidate polygons are always attached to the candidate vertex, you just need to compare polygon normals. The normal can be computed by taking the cross product of two of the edges of the polygon.

What are the lesser known but useful data structures?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
There are some data structures around that are really useful but are unknown to most programmers. Which ones are they?
Everybody knows about linked lists, binary trees, and hashes, but what about Skip lists and Bloom filters for example. I would like to know more data structures that are not so common, but are worth knowing because they rely on great ideas and enrich a programmer's tool box.
PS: I am also interested in techniques like Dancing links which make clever use of properties of a common data structure.
EDIT:
Please try to include links to pages describing the data structures in more detail. Also, try to add a couple of words on why a data structure is cool (as Jonas Kölker already pointed out). Also, try to provide one data-structure per answer. This will allow the better data structures to float to the top based on their votes alone.
Tries, also known as prefix-trees or crit-bit trees, have existed for over 40 years but are still relatively unknown. A very cool use of tries is described in "TRASH - A dynamic LC-trie and hash data structure", which combines a trie with a hash function.
Bloom filter: Bit array of m bits, initially all set to 0.
To add an item you run it through k hash functions that will give you k indices in the array which you then set to 1.
To check if an item is in the set, compute the k indices and check if they are all set to 1.
Of course, this gives some probability of false-positives (according to wikipedia it's about 0.61^(m/n) where n is the number of inserted items). False-negatives are not possible.
Removing an item is impossible, but you can implement counting bloom filter, represented by array of ints and increment/decrement.
Rope: It's a string that allows for cheap prepends, substrings, middle insertions and appends. I've really only had use for it once, but no other structure would have sufficed. Regular strings and arrays prepends were just far too expensive for what we needed to do, and reversing everthing was out of the question.
Skip lists are pretty neat.
Wikipedia
A skip list is a probabilistic data structure, based on multiple parallel, sorted linked lists, with efficiency comparable to a binary search tree (order log n average time for most operations).
They can be used as an alternative to balanced trees (using probalistic balancing rather than strict enforcement of balancing). They are easy to implement and faster than say, a red-black tree. I think they should be in every good programmers toolchest.
If you want to get an in-depth introduction to skip-lists here is a link to a video of MIT's Introduction to Algorithms lecture on them.
Also, here is a Java applet demonstrating Skip Lists visually.
Spatial Indices, in particular R-trees and KD-trees, store spatial data efficiently. They are good for geographical map coordinate data and VLSI place and route algorithms, and sometimes for nearest-neighbor search.
Bit Arrays store individual bits compactly and allow fast bit operations.
Zippers - derivatives of data structures that modify the structure to have a natural notion of 'cursor' -- current location. These are really useful as they guarantee indicies cannot be out of bound -- used, e.g. in the xmonad window manager to track which window has focused.
Amazingly, you can derive them by applying techniques from calculus to the type of the original data structure!
Here are a few:
Suffix tries. Useful for almost all kinds of string searching (http://en.wikipedia.org/wiki/Suffix_trie#Functionality). See also suffix arrays; they're not quite as fast as suffix trees, but a whole lot smaller.
Splay trees (as mentioned above). The reason they are cool is threefold:
They are small: you only need the left and right pointers like you do in any binary tree (no node-color or size information needs to be stored)
They are (comparatively) very easy to implement
They offer optimal amortized complexity for a whole host of "measurement criteria" (log n lookup time being the one everybody knows). See http://en.wikipedia.org/wiki/Splay_tree#Performance_theorems
Heap-ordered search trees: you store a bunch of (key, prio) pairs in a tree, such that it's a search tree with respect to the keys, and heap-ordered with respect to the priorities. One can show that such a tree has a unique shape (and it's not always fully packed up-and-to-the-left). With random priorities, it gives you expected O(log n) search time, IIRC.
A niche one is adjacency lists for undirected planar graphs with O(1) neighbour queries. This is not so much a data structure as a particular way to organize an existing data structure. Here's how you do it: every planar graph has a node with degree at most 6. Pick such a node, put its neighbors in its neighbor list, remove it from the graph, and recurse until the graph is empty. When given a pair (u, v), look for u in v's neighbor list and for v in u's neighbor list. Both have size at most 6, so this is O(1).
By the above algorithm, if u and v are neighbors, you won't have both u in v's list and v in u's list. If you need this, just add each node's missing neighbors to that node's neighbor list, but store how much of the neighbor list you need to look through for fast lookup.
I think lock-free alternatives to standard data structures i.e lock-free queue, stack and list are much overlooked.
They are increasingly relevant as concurrency becomes a higher priority and are much more admirable goal than using Mutexes or locks to handle concurrent read/writes.
Here's some links
http://www.cl.cam.ac.uk/research/srg/netos/lock-free/
http://www.research.ibm.com/people/m/michael/podc-1996.pdf [Links to PDF]
http://www.boyet.com/Articles/LockfreeStack.html
Mike Acton's (often provocative) blog has some excellent articles on lock-free design and approaches
I think Disjoint Set is pretty nifty for cases when you need to divide a bunch of items into distinct sets and query membership. Good implementation of the Union and Find operations result in amortized costs that are effectively constant (inverse of Ackermnan's Function, if I recall my data structures class correctly).
Fibonacci heaps
They're used in some of the fastest known algorithms (asymptotically) for a lot of graph-related problems, such as the Shortest Path problem. Dijkstra's algorithm runs in O(E log V) time with standard binary heaps; using Fibonacci heaps improves that to O(E + V log V), which is a huge speedup for dense graphs. Unfortunately, though, they have a high constant factor, often making them impractical in practice.
Anyone with experience in 3D rendering should be familiar with BSP trees. Generally, it's the method by structuring a 3D scene to be manageable for rendering knowing the camera coordinates and bearing.
Binary space partitioning (BSP) is a
method for recursively subdividing a
space into convex sets by hyperplanes.
This subdivision gives rise to a
representation of the scene by means
of a tree data structure known as a
BSP tree.
In other words, it is a method of
breaking up intricately shaped
polygons into convex sets, or smaller
polygons consisting entirely of
non-reflex angles (angles smaller than
180°). For a more general description
of space partitioning, see space
partitioning.
Originally, this approach was proposed
in 3D computer graphics to increase
the rendering efficiency. Some other
applications include performing
geometrical operations with shapes
(constructive solid geometry) in CAD,
collision detection in robotics and 3D
computer games, and other computer
applications that involve handling of
complex spatial scenes.
Huffman trees - used for compression.
Have a look at Finger Trees, especially if you're a fan of the previously mentioned purely functional data structures. They're a functional representation of persistent sequences supporting access to the ends in amortized constant time, and concatenation and splitting in time logarithmic in the size of the smaller piece.
As per the original article:
Our functional 2-3 finger trees are an instance of a general design technique in- troduced by Okasaki (1998), called implicit recursive slowdown. We have already noted that these trees are an extension of his implicit deque structure, replacing pairs with 2-3 nodes to provide the flexibility required for efficient concatenation and splitting.
A Finger Tree can be parameterized with a monoid, and using different monoids will result in different behaviors for the tree. This lets Finger Trees simulate other data structures.
Circular or ring buffer - used for streaming, among other things.
I'm surprised no one has mentioned Merkle trees (ie. Hash Trees).
Used in many cases (P2P programs, digital signatures) where you want to verify the hash of a whole file when you only have part of the file available to you.
<zvrba> Van Emde-Boas trees
I think it'd be useful to know why they're cool. In general, the question "why" is the most important to ask ;)
My answer is that they give you O(log log n) dictionaries with {1..n} keys, independent of how many of the keys are in use. Just like repeated halving gives you O(log n), repeated sqrting gives you O(log log n), which is what happens in the vEB tree.
How about splay trees?
Also, Chris Okasaki's purely functional data structures come to mind.
An interesting variant of the hash table is called Cuckoo Hashing. It uses multiple hash functions instead of just 1 in order to deal with hash collisions. Collisions are resolved by removing the old object from the location specified by the primary hash, and moving it to a location specified by an alternate hash function. Cuckoo Hashing allows for more efficient use of memory space because you can increase your load factor up to 91% with only 3 hash functions and still have good access time.
A min-max heap is a variation of a heap that implements a double-ended priority queue. It achieves this by by a simple change to the heap property: A tree is said to be min-max ordered if every element on even (odd) levels are less (greater) than all childrens and grand children. The levels are numbered starting from 1.
http://internet512.chonbuk.ac.kr/datastructure/heap/img/heap8.jpg
I like Cache Oblivious datastructures. The basic idea is to lay out a tree in recursively smaller blocks so that caches of many different sizes will take advantage of blocks that convenient fit in them. This leads to efficient use of caching at everything from L1 cache in RAM to big chunks of data read off of the disk without needing to know the specifics of the sizes of any of those caching layers.
Left Leaning Red-Black Trees. A significantly simplified implementation of red-black trees by Robert Sedgewick published in 2008 (~half the lines of code to implement). If you've ever had trouble wrapping your head around the implementation of a Red-Black tree, read about this variant.
Very similar (if not identical) to Andersson Trees.
Work Stealing Queue
Lock-free data structure for dividing the work equaly among multiple threads
Implementation of a work stealing queue in C/C++?
Bootstrapped skew-binomial heaps by Gerth Stølting Brodal and Chris Okasaki:
Despite their long name, they provide asymptotically optimal heap operations, even in a function setting.
O(1) size, union, insert, minimum
O(log n) deleteMin
Note that union takes O(1) rather than O(log n) time unlike the more well-known heaps that are commonly covered in data structure textbooks, such as leftist heaps. And unlike Fibonacci heaps, those asymptotics are worst-case, rather than amortized, even if used persistently!
There are multiple implementations in Haskell.
They were jointly derived by Brodal and Okasaki, after Brodal came up with an imperative heap with the same asymptotics.
Kd-Trees, spatial data structure used (amongst others) in Real-Time Raytracing, has the downside that triangles that cross intersect the different spaces need to be clipped. Generally BVH's are faster because they are more lightweight.
MX-CIF Quadtrees, store bounding boxes instead of arbitrary point sets by combining a regular quadtree with a binary tree on the edges of the quads.
HAMT, hierarchical hash map with access times that generally exceed O(1) hash-maps due to the constants involved.
Inverted Index, quite well known in the search-engine circles, because it's used for fast retrieval of documents associated with different search-terms.
Most, if not all, of these are documented on the NIST Dictionary of Algorithms and Data Structures
Ball Trees. Just because they make people giggle.
A ball tree is a data structure that indexes points in a metric space. Here's an article on building them. They are often used for finding nearest neighbors to a point or accelerating k-means.
Not really a data structure; more of a way to optimize dynamically allocated arrays, but the gap buffers used in Emacs are kind of cool.
Fenwick Tree. It's a data structure to keep count of the sum of all elements in a vector, between two given subindexes i and j. The trivial solution, precalculating the sum since the begining doesn't allow to update a item (you have to do O(n) work to keep up).
Fenwick Trees allow you to update and query in O(log n), and how it works is really cool and simple. It's really well explained in Fenwick's original paper, freely available here:
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol24/issue3/spe884.pdf
Its father, the RQM tree is also very cool: It allows you to keep info about the minimum element between two indexes of the vector, and it also works in O(log n) update and query. I like to teach first the RQM and then the Fenwick Tree.
Van Emde-Boas trees. I have even a C++ implementation of it, for up to 2^20 integers.
Nested sets are nice for representing trees in the relational databases and running queries on them. For instance, ActiveRecord (Ruby on Rails' default ORM) comes with a very simple nested set plugin, which makes working with trees trivial.
It's pretty domain-specific, but half-edge data structure is pretty neat. It provides a way to iterate over polygon meshes (faces and edges) which is very useful in computer graphics and computational geometry.