MySQL writing efficient procedures for geometry queries - mysql

This is a question regarding efficiency because what I want to write is likely to break my machine.
Brief description. I have two sets of data,
Set1 contains ~2500 entries, each entry has a polygon associated.
Set2 contains ~4000 entries, each entry has a point associated.
I want to find out which polygons from set1 enclose which points from set2. All points and polygons are unique and do not overlap.
I was about to embark on writing a procedure using a nested cursor that will look at a point in set2 scroll through all of set1 and find a polygon that encloses the point.
Then I realized how much data I have, that I will want to run it more than once, and this may take quite a while. Is there a better way?

MySQL for Geometry - Fine.
Does not seem to be a lengthy process.
You may or may not have nested cursor.
You can pretty well create a linked view having each point from set2 along with all poygons enclosing the point.
And that view can be called from your code for selection list, etc (replacing your expected cursor). Once the view is defined, MySQL will be almost ready with the output always.

Related

SSAS calculated measure: Access relational database

I recently asked a question about many-to-many relationships and how they can be used to calculate intersections that got answered pretty fine. Now, there is another nice-to-have requirement for our cube to extend that to more data. The general question remains: How many orders contain both product x and y?
However, the measure groups are now much larger, currently about 1.4 billion rows. I tried to implement that using the method described in the other post, with several hidden cross-referenced measure groups. However, this is simply too much for our hardware, the cube is reaching sizes next to 0.5 TB, and querys take several minutes to complete.
Now I would try to use another option: Can I access our relational database in a calculated measure? It seems I can, using UDFs like described in this article. I could write a Function in c# that queries our relational database and returns all the orders that contain the products chosen by the user. But in order to do that, I need to supply all the dimensional data the user has selected to the UDF. I also need the UDF to return the calculated value so it can be output as the result of the calculated member. Is that possible? If yes, how? The example microsoft provides only includes a small deterministic string-function as the UDF.
Here my own results:
It seems to be possible, though with limitations. The class Microsoft.AnalysisServices.AdomdServer.Context can provide you with the currentMember of each Hierarchy, however this does not work with Excel-Style-Subselects. It either contains a single member or the AllMember.
Another option is to get the MDX query using the dmv SELECT * FROM $System.DISCOVER_SESSIONS. There will be a column on that view which contains the last mdx query for a given session. However in order to not overwrite your own last query, you will need to not use the current connection, but to open a new one. The session id can be obtained through Microsoft.AnalysisServices.AdomdServer.Context.CurrentConnection.SessionID.
The second approach is ok for our use-case. It does not allow you to handle axes, since the udf-function has a cell-scope, but you don't know which cell you are in. If anyone of you knows anything about that last bit, please tell me. Thanks!

Statistical Process Control Charts in SQL Server 2008 R2

I'm hoping you can point me in the right direction.
I'm trying to generate a control chart (http://en.wikipedia.org/wiki/Control_chart) using SQL Server 2008. Creating a basic control chart is easy enough. I'd just calculate the mean and standard deviations and then plot them.
The complex bit (for me at least) is that I would like the chart to reset the mean and the control limits when a step change is identified.
Currently I'm only interested in a really simple method of identifying a step change, 5 points appearing consecutively above or below the mean. There are more complex ways of identifying them (http://en.wikipedia.org/wiki/Western_Electric_rules) but I just want to get this off the ground first.
The process I have sort of worked out is:
Aggregate and order by month and year, apply row numbers.
Calculate overall mean
Identify if each data item is higher, lower or the same as the mean, tag with +1, -1 or 0.
Identify when their are 5 consecutive data items which are above or below the mean (currently using a cursor).
Recalculate the mean if 5 points are above or 5 points are below the mean.
Repeat until end of table.
Is this sort of process possible in SQL server? It feels like I maybe need a recursive UDF but recursion is a bit beyond me!
A nudge in the right direction would be much appreciated!
Cheers
Ok, I ended up just using WHILE loops to iterate through. I won't post full code but the steps were:
Set up a user defined table data type in order to pass data into a stored procedure parameter.
Wrote accompanying stored procedure that uses row numbers and while loops to iterate along each data value in the input table and then uses the current row number to do set based processing on a subset of the input data (to check if following 5 points are above/below mean and recalculate the mean and standard deviations when this flag is tripped).
Outputs table with original values, row numbers, months, mean values, upper control limit and lower control limit.
I've also got one up and running that works based on full Nelson rules and will also state which test the data has failed.
Currently it's only been used by me as I develop it further so I've set up an Excel sheet with some VBA to dynamically construct a SQL string which it passes to a pivot table as the command text. That way you can repeatedly ping the USP with different data sets and also change a few of the other parameters on how the procedure runs (such as adjusting control limits and the like).
Ultimately I want to be able to pass the resulting data to Business Objects reports and dashboards that we're working on.

MySQL Find Polygon Nearest to Point

I have a MySQL database that contains geo-tagged objects. The objects are tagged by using a bounding polygon that the user draws and my program exports into the database. The bounding polygon is stored in the database as a Polygon (the MySQL spatial extensions kind).
I can think of a couple ways to do this, but I'm not very pleased with any of them, as this needs to be an efficient process that will execute fairly often, although on probably only < 50,000 records in the pertinent table.
I need a way to, given any point on the earth, find the record that corresponds to the closest geo-tagged/bounded object. It doesn't need to be correct in all cases but, let's say (just to invent a number), 95% of the time. Manual correction is acceptable if it doesn't need to be done very frequently.
It appears as though this question is very similar
Get polygons close to a lat,long in MySQL.
I am going to write some application-level code to do an interatively-widening search on the distance in the linked question.

Mysql Database design - Storing single and multi lat longs

I have a mysql database table that will store locations for buildings and events and a few other things. All locations are stored in one table, and linked to buildings, events etc through their own many to many table. That way I can just display dots on a map, and also allow filtering etc.
However the problem comes with some things having a single location so 1 lat,long but some like a track has a number of lat long positions, and something like a large stadium might have a polygon over it. These are also stored as a list of lat,longs with the first and last being the same.
Im wondering how I should store this in the mysql db though. Originally I just had a column for lat, long and id for the lookup table. Should I have ANOTHER lookup table for the co-ordinates or serialise the data before putting it into the DB in some way or should I just store the whole string in one field
lat1,long1
lat1,long1;lat2,long2;lat1,long1
Any suggestions?
I wouldn't de-normalize the data from the start, by pushing a whole "serialized" polygon into a single field.
Rather, I'd have a Polygons table (with polygon ID and possibly auxiliary per-polygon info, such as whether it's an actual closed polygon or just a polyline -- though that might alternatively be represented in the following table by having the last point equal to the first one for a certain polygon), and a PointsInPolygon table (with coordinates of the point, polygon ID foreign key, vertex number within polygon -- the latter two jointly being unique).
Normalization will make (as usual) your life much simpler for ad-hoc queries (including in this case "polygons near X", point in polygon, etc). Again as usual, you can later add redundant denormalized values if and when you determine that some specific query really needs to get optimized (at some cost to table updates, integrity checks, etc). Geodata are not all that different from other kinds in this regard.
Since you're not doing any lookups on the locations, and you're using (I'm assuming) Google Maps API, the simplest solution would probably be to encode a list of lat/lon as JSON and store in a varchar column.
You can just output the JSON straight from the database for your Google Maps API code to use. I would suggest you to use some simple JSON structure like so: ["point",1.23456,2.34567] or ["line",1.23456,2.34567,3.45678,4.56789] and so on.

Pathing in a non-geographic environment

For a school project, I need to create a way to create personnalized queries based on end-user choices.
Since the user can choose basically any fields from any combination of tables, I need to find a way to map the tables in order to make a join and not have extraneous data (This may lead to incoherent reports, but we're willing to live with that).
For up to two tables, I already managed to design an algorithm that works fine. However, when I add another table, I can't find a way to path through my database. All tables available for the personnalized reports can be linked together so it really all falls down to finding which path to use.
You might be able to try some form of an A* algorithm. Basically this looks at each of the possible next options to choose and applies a heuristic to it, a function that determines roughly how far it is between this node and your goal. It then chooses the one that is closer and repeats. The hardest part of implementing A* is designing a good heuristic.
Without more information on how the tables fit together, or what you mean by a 'path' through the tables, it's hard to recommend something though.
Looks like it didn't like my link, probably the * in it, try:
http://en.wikipedia.org/wiki/A*_search_algorithm
Edit:
If that is the whole database, I'd go with a depth-first exhaustive search.
I thought about using A* or a similar algorithm, but as you said, the hardest part is about designing the heuristic.
My tables are centered around somewhat of a backbone with quite a few branches each leading to at most a single leaf node. Here is the actual map (table names removed because I'm paranoid). Assuming I want to view data from tha A, B and C tables, I need an algorithm to find the blue path.