MySQL GIS and Spatial Extensions - how to map regions and query against them - mysql

I am trying to make a smartphone app which will return a list of users within a certain proximity, say 100m. It's easy to get the coordinates of my BlackBerry and write them to a database, but in order to return a list of other users within 100m, I need to pull every other record from the database and compare the distance between the two points, checking to see if it's within range, before outputting that user's information.
This is going to be time consuming if there are many users involved. So I would like to map areas (countries, cities, I'm not yet sure of the resolution I'll need) so that I can first target a smaller subset of all users. This will save on processing time.
I have read the basics of GIS and spatial querying on the mysql website but to be honest the query is over my head and I hate copying and pasting code without understanding it. Plus it only checks for proximity - I want to first check if a coordinate falls within a certain area.
Does anyone have any experience of such matters and feel like giving me some pointers? Resources such as any preexisting databases of points describing countries as polygons would be really helpful too.
Many thanks to anyone who takes the time :)

I would reccomend against using MySQL for doing spatial analysis. They have only implemented bounding box analysis and it is not implemented for most spatial functions. I would recommend using either PostGIS or perhaps spatialCouch or extend MongoDB. This would be much better for what you look to be doing.

Related

How to store relatively static data?

I'm interested in the following question: how best to store relatively static data - in the database or in the code, or combine?
For example, we have a list of countries, they do not change every day and on the one hand they can not be saved into the database and stored as an array, numbered manually or even use 2-character country codes.
Next, we now have info from the Google geocoder, we have place_id and the country's bounds, it's also like the unchangeable data.
A little more complicated, now we need to deploy the application on another machine, let's say we have the list of countries, but there is no information from Google, and doing 200+ requests as a certain part of the migration is also not an option ..
And there can be many such examples, some harder, others easier - currencies, languages, different types of something.
I would like to hear some of your approaches, reflections, best practices. Thank you!
As soon as you decide to JOIN the static data to a 'dynamic' table, you will be glad to store it in the database.
And, when Yugoslavia or Czechoslovakia splits again, you don't have to edit your code.
Definitely in code, unless it grows unmannaged. But event then, you're free to move it out into *json/*xml or whatever-format-you-like-more. Why? Mostly for few reasons: having data locally makes you internet-independent, your code still can run; no delay penalties in terms of connection slowness or similar problems.
Regarding deployment problem. You can always have default json/xml stored in DB as plan "B". In case it's really necessary, you can make a single call to retrieve the default list and then cache it, memorize in any fashion you like more.
To summarize, database is the last thing you'd like to rely on (unless we talk about the data of certain kind, which has to be gathered and managed from the single place - namely, your SQL server).

Business Intelligence: Live reports from MySQL

I wanted to create a (nearly) live dashboard from MySQL databases I tried PowerBI, SSRS and other similar tools but they were not as fast as I wanted. What I have in mind is the data to be updated every 1 minute or even less. Is it possible? and are there any free (or inexpensive) tools for this?
Edit: I want to build a wallboard to show some data on a big TV screen. I need it to be real-time. I tried SSRS autorefresh as well but it has a loading sign and very slow, plus PowerBI uses Azure which is very complex to configure and blocked for my country.
This is a topic which has many more layers than to ask which tool is best for this case.
You have to consider
Velocity
Veracity
Variety
Kind
Use Case
of the data. Sure, this is usually only being recounted if talking about Big Data, but will give you a feeling about the size and complexity of data.
Loading
Is the data being loaded and you "just" use it? Or do you also need to load it realtime or near-realtime (for clarification read this answer here)?
Polling/Pushing
Do you want to poll data every x seconds or minutes? Or do you want to work event based? What are the requirements which will need you to show data this fast?
Use case
Do you want to show financial data? Do you need to show data about error and system logs of servers and applications? Do you want to generate insights as soon as a visitor of a webpage is making a request?
Conclusion
When thinking about those questions, keep in mind this should just be a hint to go into one direction or another. Depending on the data and the use case, you might use an ELK stack (for logs), Power BI (for financial data) or even some scripts (for billing).

Database for counting page accesses

So let's say I have a site with appx. 40000 articles.
What Im hoping to do is record the number of page visits per each article overtime.
Basically the end goal is to be able to visualize via graph the number of lookups for any article between any period of time.
Here's an example: https://books.google.com/ngrams
I've began thinking about mysql data structure -> but my brain tells me it's probably not the right task for mysql. Almost seems like I'd need to use some specific nosql analytics solution.
Could anyone advice what DB is the right fit for this job?
SQL is fine. It supports UPDATE statements that guarantee your count is correct rather than just eventual consistency.
Although most people will just use a log file, and process this on-demand. Unless you are Google scale, that will be fast enough.
There exist many tools for this, often including some very efficient specialized data structures such as RDDs that you won't find in any database. Why don't you just use them?

neo4j or neo4j+mysql for partial graph dataset

Even though I read another question here advising not to use both neo4j and mysql (neo4j - graph database along with a relational database?), I was wondering what approach would be the best for dataset that has some data which can be modeled like a graph and the rest looks relational. For some reasons, I can't post the kind of data I'm using.
I can shoehorn the relational part into neo4j but it looks ugly and complex, something I would want to avoid.
On the other hand, if I use both together, I'll have to do double the amount of queries to get the result, decreasing performance (assume the DBs are on cloud in separate machines).
I can't use mysql alone because one of the queries requires a depth of around 20-30 which I assume can't be handled by mysql.
Have any of you encountered such a situation before ? If so, how did you solve it ?
As everyone else says: "give us a better idea of what data you are trying to model so we can best give you a suggestion".
That being said, dealing with 2 DBs is not an issue and its more common than people think: often-times you use a Full-Text store for searches and then get back a list of Document IDs which you then hit the relational DB for additional metadata. Or hitting Redis to get a list of IDs which you also hit the relational DB for more data.
I proof-of-concepted a system of Neo4j+MySQL for targeted searching based on your social network ("show me all restaurants my network has recommended ordered by depth (e.g. 1st level friend recs are weighted higher than 2nd level, and so on) and it didn't feel awkward. But I also didn't take it to scale.
You will be having to keep both datastores in sync. So in my case when a user recommends a place on the web app (which inserts it into MySQL) you then need to turn around and do the same insert into Neo. You probably want to do this asynchronously as well, so you'll need to setup a message queue with workers.

Storing GPX-Files in Database mysql vs. postgis (postgresql)

i'm working on a web-platform which analyzes GPX-Tracks and draws some profiles, speed stuff, etc. Currently i just calculate the statistics once (like distance, avg speed, duration, height gain/loose) and keep the GPX-File. The Profile is drawn once and stored on disk.
But currently i want to update this to a Javascript based Graph-Library to choose what kind of graph you want to plot. For this to work i need to access the trkpt's of the GPX-File over and over again and started to think abut putting the points into the Database to access them easier.
The question i'm asking myself now is, does it make sense to me to switch over to PostgreSQL and use postgis or will i just stay at mysql and join the stuff in as i was teached in the normalization-class? ;)
Are there any other benefits of postgis beside calculating distances and stuff? Because the only situation i calculate the distances between lon/lat-Points is once someone uploads a new file. Afterwards i only want to access the points (coordinates, times, speeds and elevations) on a easy way?
Thanks,
Sven
Bostonongis has an overview of MySQL 5.1 and PostgreSQL 8.3/PostGIS 1.3. PostGIS is much more mature, but it all depends on your requirements if you benefit from this. Just give it a try and see if you feel happy with it.
Have fun!
The BostonGIS cross compare is really a a good place to have an idea of the benefits you can have.
I've been using PostgreSQL/PostGIS for a while now, and I'm quite happy with it. Back on my researcher days at the university one of my reseach topics was migrating a deductive database system, to PostgreSQL and then to PostgGIS manly because of the limitations that MySQL presented when using geometric data.
So it's a matter of looking at your needs and looking a bit into a the future and check if it would be worth it.