Geographical ontologies ready to use? [closed] - open-source

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking for an ontology containing geographical knowledge.
In particular I'd like to have these types of information:
political states / regions / cities / city areas
geographical regions (e.g. continents, name of mountains, lakes, etc)
For example, starting from the node "New York" I'd like to be able to find parents like the New York state, the USA etc, and children like Manhattan, Bronx, etc.
I couldn't find anything open-source/free to use.
I know that a lot of researchers extract such information from Wikipedia, but I couldn't find any off-the-shelf packages to use.
I also checked OpenStreetMap, which is great for the amount of data but doesn't seem to contain a proper geographical ontology.
Even a web service would be good!
Any hints?
Mulone

geonames maintains a large hierarchical feature list which has a corresponding ontology. rdf, web services, etc... It has all the sorts of things that you list wanting and more.

I would suggest looking for GIS data. www.geodata.gov has many free datasets. Most States will have a GIS organization that probably has free data sets as well.
If the GIS data is stored in a shapefile (.shp) format, look for the corresponding database file (.dbf). You should be able to just open that up in Excel and extract the required data.
Good luck!
Edit
I forgot to add that since this data is probably stored in a format suitable for a relational database, perhaps you write a script that converts this into a suitable schema?

yes, there are two notable ones around. The first one is W3C's Geospatial Vocabulary which is formalised in OWL. Furthermore,there is the GeoConcepts ontology. Hope this helps to point you into the right direction!

How about looking at the Getty institute tags - they maintain an ontology. I am not sure if they are open source and I am pretty positive they have no web service.
Another idea would be to look at Yahoo!'s WOEIDs - they are a web service and they are free to use for non-commercial purposes.
Census geography is an ontology for the US but won't get you the rest of the world. It is also not a web service.
There are some ideas for ya - hope it helps.

Related

Is there an API for receipt reading? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I had an idea for my first mobile application and I was thinking of making it in HTML5 + Jquery Mobile. The core functionality is:
to be able to take a picture of a receipt
digitize all the information.
I've never made a mobile app before and I'm not sure if this is possible. If there is no API available, how would I go about rolling my own receipt reader? Thanks! Please let me know if I am being stupid.
Edit: I found a service that lets me use their application to take a picture(or e-mail the picture) of the receipt and have it extract the necessary information. http://www.proongo.com/b/receipt-reading.php. I'm not exactly sure how to use this service but I will do more research tomorrow and share with you what I find.
I found an OCR API service with a number of different pay-per business models called OCRAPISERVICE. They have a number of examples hosted on github using various mobileOSs through PhoneGap. They do have a free-trial model that lets you submit 100 requests.
I guess you need to apply OCR for software solution with a function of recognizing supermarket receipts. There are many open source OCR solutions like Tesseract and others. However, they are targeted to general OCR. Therefore, you have to use some additional tools for recognizing receipts via a mobile app.
Recently we have worked on the web-based app for receipt recognition. Here you may find some details of the research: http://rnd.azoft.com/applying-ocr-technology-receipt-recognition/
Besides Tesseract, all the big boys: Google, Microsoft and IBM have now got their own offering of OCR APIs. These APIs provide simple image-to-text OCR scan with various degree of accuracy. I find Google Vision to be the most accurate for pictures of a receipt. You would still need to extract the data out of the half-garbage text though.
If you want an API that returns field metadata like: total amount, tax amount, date and merchant information, where you apps can consume directly. Check out https://www.taggun.io. I've built the APIs specifically for this purpose.

What's the best way to open source data (rather than code)? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
As part of a recent programming project I compiled a database, the contents of which may conceivably be of use to someone else one day. I'm looking for the best way to 'open source' the data.
I could (and probably will) upload the SQL onto GitHub, but was wondering if anyone had found a more 'data-centric' way of sharing - maybe a website that makes it easy for users to browse/query/visualise/improve data sets, rather than just giving them a big lump of SQL.
To clarify, I'm looking for a place where I can share the data, rather than a format in which to share it - ideally a data-set equivalent of GitHub/Sourceforge.
The data is relatively small (a few thousand lines of SQL) so the volume should not be an obstacle.
I'm a big fan of Amazon's S3 for stuff like this. And if your data set is interesting enough, maybe you could publish it with InfoChimps.
I have worked with a lot of data from different companies. Most often this data has been in text delimited data format. The most popular of course being comma separated or tab. Using comma's is often a good choice because MySQL can also export and import CSV. Here is an example:
id, first_name, last_name, address
1, John, Smith, 11222 Stree Name
Google Fusion Tables ticks some of these boxes, although the emphasis seems to be on visualisation (I haven't used it, so this may be unfair). I am also reluctant to commit too heavily to any second-tier Google products these days, since they have a habit of disappearing.
You could export it to XML, that being probably the most compatible data format, although it is rather verbose. Another solution is OData, but this implies hosting the data and the platform that serves the data which may not be desirable.
Sparkfun is another possibility, it seems to be mainly targeted at real-time data sources but they offer free storage and the platform is open-source so you can host your own server.

Can anyone recommend OCR software to process invoices? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
guys. I need OCR software that can read a variety of types of invoices and extract data. The exported data should be presented in a tabular format, preferably with a link to the source document. It must be able to read the documents in a variety of formats (.pdf, .jpg, .gif, .tiff, etc.).
You can visit Wikipedia, on which there is a list of OCR software vendors, professional OCR vendors such as ExperVision, Abby, Nuance... will be better choice. As I know, formats of invoices are complex and variable. It need design templates aimed at each kind of invoice. Therefore, standard OCR is not the best choice and it generally needs customizing development. So you’d better choose OCR vendors providing customized services.
How many forms do you want to process per day ?
How many different types of invoices and layouts do you want to process ?
How many different paper sizes ?
How accurate do you need it to be ?
How many people will be using the system for data correction / validation ?
What system are you exporting the data to ?
How many fields do you want to extract automatically ?
Expect to pay for such solutions if you want it yo read any and every type of invoice. These solutions listed below are not cheap as they are high end solutions.
http://www.documation.co.uk/emc_captiva.html
http://www.captaris-dt.com/product/dokustar-capturesuite/en/
http://www.abbyy.com/data_capture_software/
http://www.kofax.com/forms-processing/
http://www.readsoft.com/
The cost will depend on how many invoices you want to process. My best guess is that the Abbyy product will probably be the cheapest option.
If you have a limited number of documents types to read you may get away with a simpler OCR fixed-form solution as opposed to the free from solutions above.
Also, your scanning solution required will depend very much on your volumes.
Knowledge Lake makes one that integrates with sharepoint. It is also made to integrate with other kinds of systems.
http://www.knowledgelake.com/

Openlayers commercial application: licensing issues? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I am planning on creating a commercial fleet/asset tracking web app, but got discouraged when I found our about the high price Google and Microsoft charge to use their services in a commercial setting. I found Openlayers, and it claims to be free, so I am wondering if anyone has had experience using it commercially?
I looks like the use of the API is free, but does that include the maps as well? Openlayers also lets you use Google as the mapping provider, but if I do that, would I be breaking Google's TOS since it is commercial?
I apologize if this isn't the correct place to ask such a question as it isn't directly related to a programming problem, but I can't find a definitive answer anywhere else and I imagine someone on SO has had experience creating a commercial mapping application.
OpenLayers has no data - it is an opensource mapping API that can be used with many different data sources.
To be free of all data licensing concerns use OSM data rather than Google - http://www.openstreetmap.org/ with OpenLayers. See some examples at http://wiki.openstreetmap.org/wiki/OpenLayers#Examples
Depending on usage, you'll probably want to provide your own map server rather than rely on (for example) a free OSM one. These can provide the data (including map tiles) that OpenLayers uses to draw its maps.
UMN MapServer and GeoServer are popular. I've found MapServer combined with OpenLayuers a powerful combination.
I've never used GeoServer, but I think it requires serverside Java. And there are other options.

Where can I obtain an English dictionary with structured data? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I would like to download an English dictionary -- not just a word list -- in a structured format such as TXT, XML, or SQL.
Specifically, I need phonetic pronunciation and parts of speech (definition is not required).
Surprisingly, I can't find this online anywhere. Wiktionary is available for download, but it is only the MediaWiki articles themselves. Crawling all articles and extracting the phonetics and parts of speech would be a huge exercise.
Is this available anywhere? I don't mind paying.
Edit: a few people have asked what I would like to do. My immediate need is just curiosity, for example "what the most common two-syllable verbs?". Eventually my hope would be a tool that helps you find available domain names, and does so by pairing the correct parts of speech, with bonus points for phonetic matches.
Note: cross-posted on English Language and Usage.
Go to http://www.speech.cs.cmu.edu/cgi-bin/cmudict and you will find the download page for the pronunciation dictionary at https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/
The latest version is currently cmudict.0.7a.
This is what I am currently using to implement the syllable counter for http://www.haikuvillage.com. It's in Ruby and I'd be happy to open source it for you if that helps.
Parts of Speech Dictionary in the public domain with highly structured format: http://icon.shef.ac.uk/Moby/mpos.html
Each line is an entry, separated by ×, with the word value on the left and the part-of-speech value (verb, etc.) on the right. Simple text file.
Wordnet is one of the best dictionaries i know. Perhaps you will find something there:
http://wordnet.princeton.edu/wordnet/related-projects/
Portman, while I used the SpellChecker tool from DevExpress I knew that there existed the OpenOffice dictionaries I'm pretty sure they have a well defined data structure. I recommend you to use that in combination with any free/paid text to speech tool.
Hope that helps,
This is not a direct answer to your question, but the Double Metaphone algorithm is very good at finding word or phrase matches for search engine application servers (such as Solr and others).
I cannot tell what your intended use of this is, so I can't tell if my suggestion is useful or not. If it is close to your intended use, the Wikipedia page about Double Metaphone has a listing of about a dozen implementations of it which may be worth exploring.
http://en.wikipedia.org/wiki/Double_Metaphone