As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is anyone aware of a free web service which allows me to translate uk postcodes to easting and northings. I found a website where I can use screen scraping but perhaps there is a nice FREE web service out there. Thanks!
Christian
The ordnance survey have open sourced their list of postcodes including geographic coordinates (you may have to do some converting though). I haven't used the data myself yet, but I think this fits the bill.
Code-Point Open is a dataset that contains postcode units, each of which have a precise geographical location.
There are approximately 1.7 million postcode units in England, Scotland and Wales. Each postcode unit, such as KY12 8UP or PO14 2RS, contains an average of fifteen adjoining addresses.
Northern Ireland postcodes are not available with Code-Point Open.
Note that due to the nature of the british postcode system, updates happen quite frequently during a year. Make sure you download new versions as they are released.
Check out NPE Map
The Codepoint postcode database was released in 2010 as open data as a zipped series of CSV files.
If you are converting a mass of postcodes, you'll find it more efficient to locate the databases on your own system as calling to an on-line API results in your scraper waiting longer than you would have expected for a response.
You can order the postcode data from the official site here:
https://www.ordnancesurvey.co.uk/oswebsite/products/code-point-open/index.html
Or simply download it from this copy:
http://www.freepostcodes.org.uk/
The Northern Ireland data is not included, but there is a copy of the data here:
http://jamiethompson.co.uk/web/2010/05/30/code-point-open-northern-ireland-addendum/
Save work.
The python code for parsing these two files is here:
http://scraperwiki.com/scrapers/uk_postcodes_from_codepoint/edit/
http://scraperwiki.com/scrapers/ni_postcodes_from_codepoint/edit/
The python CGI script code for looking up into these two databases is here:
http://scraperwikiviews.com/run/uk_postcode_lookup/?
If you are processing a lot of data then might be better to use your own local solution. Here is a Python script that can make the conversion:
http://webscraping.com/blog/Converting-UK-Easting-Northing-coordinates/
from pyproj import Proj, transform
v84 = Proj(proj="latlong",towgs84="0,0,0",ellps="WGS84")
v36 = Proj(proj="latlong", k=0.9996012717, ellps="airy",
towgs84="446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894")
vgrid = Proj(init="world:bng")
def ENtoLL84(easting, northing):
"""Returns (longitude, latitude) tuple
"""
vlon36, vlat36 = vgrid(easting, northing, inverse=True)
return transform(v36, v84, vlon36, vlat36)
def LL84toEN(longitude, latitude):
"""Returns (easting, northing) tuple
"""
vlon36, vlat36 = transform(v84, v36, longitude, latitude)
return vgrid(vlon36, vlat36)
if __name__ == '__main__':
# outputs (-1.839032626389436, 57.558101915938444)
print ENtoLL84(409731, 852012)
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I have a lot of places with coordinates, I have people that maintain those places when something goes wrong.
I'm looking for a way to place the field people so they be the closest possible to the major amount of sites.
The idea is something like:
I have 3000 sites with lat&long.
I want to choose how many people I have available and with that info I want to get the optimum coordinates to distribute them.
I'm not looking for a existent tool (but if exists I could look for it), but I dont know how to start with something like this (I can work with mysql, php, Gmaps, I learn another languaje/tool if it helps me).
Thank you
The problem of distributing an arbitrary number of people upon a given set of locations is an optimization problem. More specifically, it can be interpreted as a clustering problem. A nice clustering example implemented in JS can be found at the A Curious Animal blog.
As you can see in the above example, clustering means grouping of neighbouring locations. In other words, it is a computation that yields an optimal distribution of groups of locations (clusters) upon a given set of locations. If we declare that a cluster is a person instead of a location group we get to your problem statement.
Since the number of people is your input, I'd suggest k-means clustering algorithm (short explanation, available software list #wikipedia).
EDIT:
When working with optimization algorithms in general there are two caveats:
the chosen algorithm is designed to solve your (class of) problem
some input parameter combination can lead to odd, non acceptable results
The first point requires some knowledge of the algorithm, while the second is a matter of try-and-error as you nicely noticed. Also, the suptile differences in input can result in huge differences in output.
The above link states that the k-means algorithm "does not work well with non-globuar clusters".
It will be easier to start from his opposite - a globular cluster which is defined as: "A more precise mathematical term is convex, which roughly means that any line you can draw between two cluster members stays inside the boundaries of the cluster.":
A non-globular cluster (non-convex set of points) looks like this:
Probably your "thin ovoid clusters" are non-convex?
Another important characteristic (also stated in the above link) is that k-means is a non-deterministic algorithm, meaning that it may (and most probably will) yield different outputs for the same input when ran multiple times.
This happens because the algorithm makes the initial partitioning of clusters at random - and the final output is highly sensitive of that initial partitioning. Depending on the implementation used, you may have some space for modifying here.
If that doesn't lead to satisfying results the only thing left is to try another algorithm (since the locations are given). I'll suggest QT clustering algorithm that I use in a commercial product. It is a deterministic clustering algorithm which takes as input the minimal cluster size, and the threshold distance - distance of the point from the center of the cluster.
But, with this approach you will need to modify the algorithm itself. The algorithm usually stops when "no more clusters can be formed having the minimum cluster size.". You'll need to modify the algorithm to stop when a wanted amount of clusters has been reached. The minimum cluster size value should be OK as 1, but you might want to try out different values for the threshold distance.
Here is some code sample in C# that I stumbled upon. Hope it was helpful.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I need to store GPS tracks that users record into a database. The tracks will consist of a marker every 5 meter of movement for the purpose of drawing a line on a map. I am estimating 200 km tracks which means 40,000 lnlt markers. I estimate 50,000 users minimum and 20 pieces of 200 km tracks for each. That means at least 40 billion lnlt markers.
This needs to scale too, so for 1 million users I need capacity for 800 billion GPS markers.
Since each set of 40,000 markers belong to a single track, we are talking 1 - 20 million records/sets of GPS tracks.
Requirements:
Users will request to view these tracks on top of a Google map in a mobile application.
Relations:
I currently have 2 tables. Table one has:[trackid], [userid], [comment], [distance], [time], [top speed].
Table 2 has [trackid] [longitude] [latitude] and this is where all GPS markers are stored. What is an efficient way of storing this volume of GPS data while maintaining read performance?
New information:
Storing the GPS data in a KML file for the purpose of displaying them as a track on top of a Google map is a good solution that saves database space. Compressing the KML into a KMZ (basically a zipped KML wit a KMZ extension) greatly reduces file size further. KMZ loads much quicker than GPX and can be integrated with the Google Maps API as a KML layer. See this information from Google for further assistance. This seems to be the best solution so far for the intended requirement.
The choice of a particular database, as always, is tied to how you want to store the information and how you want to use it. As such, without knowing the exact requirements of your project, as well as the relationships of the data, the best thing to do would be to do some reading on the topic to determine what particular product or storage model is best suited to you.
A good place to start is reading blogs that compare the performance and uses of the databases (see attached):
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm playing with an idea of creating client that would use the torrent protocol used today in torrent download client such as uTorrrent or Vuze to create:
Client software that would:
Select files you would like to backup
Create torrent like descriptor files for each file
Offer optional encryption of your files based on key phrase
Let you select redundancy you would like to trade with other clients
(Redundancy would be based on give-and-take principle. If you want to backup 100MB five times you would have to offer extra 500MB of your own storage space in your system. The file backup would not get distributed only amongst 5 clients but it would utilize as many clients as possible offering storage in exchange based on physical distance specified in settings)
Optionally:
I'm thinking to include edge file sharing. If you would have non encrypted files shared in you backup storage and would prefer clients that have their port 80 open for public HTTP sharing. But this gets tricking since I have hard time coming up with simple scheme where the visitor would pick the closest backup client.
Include file manager that would allow file transfers (something like FTP with GUI) style between two systems using torrent protocol.
I'm thinking about creating this as service API project (sort of like http://www.elasticsearch.org ) that could be integrated with any container such as tomcat and spring or just plain Swing.
This would be P2P open source project. Since I'm not completely confident in my understanding of torrent protocol the question is:
Is the above feasible with current state of the torrent technology (and where should I look to recruit java developers for this project)
If this is the wrong spot to post this please move it to more appropriate site.
You are considering the wrong technology for the job. What you want is an erasure code using Vandermonde matrixes. What this allows you to do is get the same level of protection against lost data without needing to store nearly as many copies. There's an open source implementation by Luigi Rizzo that works perfectly.
What this code allows you to do is take a 8MB chunk of data and cut it into any number of 1MB chunks such that any eight of them can reconstruct the original data. This allows you to get the same level of protection as tripling the size of the data stored without even doubling the size of the data stored.
You can tune the parameters any way you want. With Luigi Rizzo's implementation, there's a limit of 256 chunks. But you can control the chunk size and the number of chunks required to reconstruct the data.
You do not need to generate or store all the possible chunks. If you cut an 80MB chunk of data into 8MB chunks such that any ten can recover the original data, you can construct up to 256 such chunks. You will likely only want 20 or so.
You might have great difficulty enforcing the reciprocal storage feature, which I believe is critical to large-scale adoption (finally, a good use for those three terabyte drives that you get in cereal boxes!) You might wish to study the mechanisms of BitCoin to see if there are any tools you can steal or adopt for your own needs for distributed non-repudiable proof of storage.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is someone out there who is using ZooKeeper for their sites? If you do, what do you use it for? i just want to see real word use case?
I've just started doing the research for using Zookeeper for a number of cases in my companies infrastructure.
The one that seems to fit ZK the best is where we have an array of 30+ dynamic content servers that rely heavily on file based caching ( Memcached is too slow ). Each of these servers will have an agent watching a specific ZK path and when a new node shows up, all servers join into a barrier lock, then once all of them are present, they all update their configuration at the exact same time. This way we can keep all 30 servers configuration / run-states consistent.
Second use case, we receive 45-70 million page views a day in a typical bell curve like pattern. The caching strategy implemented falls from client, to CDN, to memcache, and then to file cache before determining when to make a DB call. Even with a series of locks in place, it's pretty typical to get race conditions ( I've nicknamed them stampedes ) that can strain our backend. The hope is that ZK can provide a tool for developing a consistent and unified locking service across multiple servers and maybe data-centers.
You may be interested in the recently published scientific paper on ZooKeeper:
http://research.yahoo.com/node/3280
The paper also describes three use cases and comparable projects.
We do use ZK as a dependency of HBase and have implemented a scheduled work queue for a feed reader (millions of feeds) with it.
The ZooKeeper "PoweredBy" page has some detail that you might find interesting:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy
HBase uses ZK and is open source (Apache) which would allow you to look at actual code.
http://hbase.apache.org/
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I want to be able to give a best guess for what city and state a zip code is in via a web application. Is there some web service I can use to figure this out?
Hate to do this to you, but look here...
I believe the USPS has an API for some queries, you should look into that.
However, here are a few from the above resource:
http://www.cedar.buffalo.edu/AdServ/zip-search.html - Includes the database so you can host one yourself
http://www.zipinfo.com/search/zipcode.htm - commercial site, but they have a simple search
http://zip4.usps.com/zip4/citytown_zip.jsp - USPS interface, probably has an API
http://www.webservicex.net/uszip.asmx?op=GetInfoByZIP - SOAP interface
-Adam
The USPS has an API for this, but you have to register:
USPS Web Tools
I found a couple of ways to do this with web based APIs. I think the US Postal Service would be the most accurate, since Zip codes are their thing, but Ziptastic looks much easier.
Using the US Postal Service HTTP/XML API
According to this page on the US Postal Service website which documents their XML based web API, specifically Section 4.0 (page 22) of this PDF document, they have a URL where you can send an XML request containing a 5 digit Zip Code and they will respond with an XML document containing the corresponding City and State.
According to their documentation, here's what you would send:
http://SERVERNAME/ShippingAPITest.dll?API=CityStateLookup&XML=<CityStateLookupRequest%20USERID="xxxxxxx"><ZipCode ID= "0"><Zip5>90210</Zip5></ZipCode></CityStateLookupRequest>
And here's what you would receive back:
<?xml version="1.0"?>
<CityStateLookupResponse>
<ZipCode ID="0">
<Zip5>90210</Zip5>
<City>BEVERLY HILLS</City>
<State>CA</State>
</ZipCode>
</CityStateLookupResponse>
USPS does require that you register with them before you can use the API, but, as far as I could tell, there is no charge for access. By the way, their API has some other features: you can do Address Standardization and Zip Code Lookup, as well as the whole suite of tracking, shipping, labels, etc.
Using the Ziptastic HTTP/JSON API
This is a pretty new service, but according to their documentation, it looks like all you need to do is send a GET request to http://ziptasticapi.com, like so:
GET http://ziptasticapi.com/48867
And they will return a JSON object along the lines of:
{"country": "US", "state": "MI", "city": "OWOSSO"}
Indeed, it works. You can test this from a command line by doing something like:
curl http://ziptasticapi.com/48867
These people seem to expose one...
http://www.webservicex.net/uszip.asmx
You can get a database that maps zip code to longitude/latitude, and another database that provides longitude/latitude for all US cities. Then you could just do that on your side, without having to send out to a web service.
I've seen both these databases, but I can't remember where to find them right now. I'll poke around and try to remember to add a comment.