Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have a large number of English OCRed documents from the 19th century and want to clean up some of the OCR errors by using a contextual spell-checker such as the one proposed by Peter Norvig at http://norvig.com/spell-correct.html. My main goal is to be able to use a probabilistic model (together with the ocred text data and an appropriate and large dictionary) to be able to correct words that are misspelled.
I am happy using the code that Norvig gives in his website and improving it, but before I do so, I would like to ask if there is an open-source solution for this. Norivg himself suggests looking at aspell, but I don't think that aspell is a contextual spell-checker, and I'm worried it might not work so well on OCR error correction.
So, you're looking for a spell checker that will substitute the most probabilistic choice whenever there is a phrase or word it doesn't understand? That seems like it would be a bad idea on 19c texts unless you have a large corpus of such texts that have already been spell checked by hand. Words that were commonplace then but rare now will be replaced without your knowledge. I daresay, you may find a contextual spell-checker trained on modern locution to be tetotaciously exflunctified by your 19c phraseology. ☺
If you have such a corpus, or you're up for creating one, there is a powerful Python based tool for OCR and analysis called OCRopus. It uses natural language processing, neural networks and many other buzzwords — I think I saw "deep learning" on the to-do list. It does not appear easy to use, though I admit I've never tried it myself. It seems to require skill at the command line and programming in Python. If you're still not daunted, it may be exactly what you're looking for.
On the other hand, if you are looking for something simpler, consider using a program with a standard spell checker. For example, gImageReader which can read in your PDF files, OCR them, and let you correct & add the words it doesn't know. I suggest at least trying a simple spell checker before searching for something more complicated.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I'm about to enter the world of programming. I know a little bit, but practically nothing, but I'm usually a quick learner when it comes to tech-related stuff.
My brother came up with an idea that I thought I would like to help him with, but I'm not sure what the best way to go about it is.
In order for you to better understand the functionality of the programme/website, here is a little backstory:
My brother spends a lot of time making sourdough for his burger business and optimising his baking all the time.
The way he is keeping track of everything is through an Excel spreadsheet, where amounts of flower, which kinds of flower, cost etc. goes into the spreadsheet.
This is fine if it's only for one type of bread, but he bakes several types of bread.
So, what would the best way to go about building a website for this application be?
I'm thinking that this could be applicable to more than just sourdough, but for simplistic reasons, let just start with that.
The visitor should be able to create a user and that user should be able to store their own recipes, log their changes for future reference and rate the different recipes.
So, on the top of my head, I'm thinking MySql for database, HTML/CSS for styling and Python for functionality?
Can Python and HTML be integrated?
Let me know what you guys think! All help is deeply appreciated!
If you are interested into websites javascript and node js is the most popular choice for server development. Python is mostly used for Neural Networking not for server sided development. Do some more research on what is the best to start with. I started with lua for my first language making dedicated servers on roblox,rust, GTA fivem and ect. What ever entertains you the most with programming is where you should start.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I've been working on a framework in AS3 that I want to release, but first I obviously need to prepare some documentation for it.
I've noticed that quite a few sites have the exact same layout, functionality etc as Adobe Livedocs, which has let me to believe that there's something open source out there for creating online documentation.
Here's some examples:
http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/
http://papervision3d.googlecode.com/svn/trunk/as3/trunk/docs/index.html
http://www.fisixengine.com/api/
Would anyone be able to point me in the right direction for tools that I can use to prepare online documentation?
Ideally the system would be specifically suited for documentation in ActionScript 3. I don't have a requirement in terms of the documentation being automatically generated either - if there's something out there that looks/works nice I'm happy to manually create the documentation (provided it comes with tools for easily adding classes, arguments, etc).
Adobe has a free tool called ASDoc. It generates documentation which follows the official Adobe patter. Frankly, it isn't worth it though. The ASDoc tool is buggy and unreliable. If it has difficulty finding an import, if an import isn't used, a comment is not correctly formatted, or you have your source code spread out in any sort of unexpected way, it simply breaks.
My company has lost over 50 developer hours (a few people tried to get a couple of different projects to work and failed) in an attempt to get around these limitations and our solution? We used NaturalDocs (A JavaDoc compiler). Is it perfect? No. Is it comparable to ASDoc in output? Sort of, it isn't as neat, and it would be nice if it treated things a little differently, but it works to display the documentation.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking for a open source project for dictionary for a language (probably you never heard of it) which has not been "digitized". The dictionary will be from one language to several others, and several others to THE language. Since the language has not been "digitized", I need following features along with searching a word:
1 - Add your own translation to existing words/phrases
2 - Add a new word/phrase and add translation
3 - Request a word/phrase to be translated
4 - Rate (like/dislike or rate within the range) the translation (depending on the rating "correctness" get points")
5 - Possibly relate words (especially nouns) with pictures
6 - Easier to implement mobile version of it
I guess it's more "collaboration site", than dictionary. So the project I'm looking for may not be called as "Dictionary".
I know it's possible to design and write from the scratch, but would be good to begin with something in hand, especially if you are just spending your time/effort for non-profit stuff.
I'm looking around for the project, but didn't find something useful. At the same time designing the architecture in my mind.
If you could share some open source projects, it would be really great.
Thanks.
I am unsure what exactly you need, but would Wiktionary be of any help? There are a lot of localized variations to support different languages and there will probably be a way to ask them to support your language of interest, if it is not already there.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I would like to download an English dictionary -- not just a word list -- in a structured format such as TXT, XML, or SQL.
Specifically, I need phonetic pronunciation and parts of speech (definition is not required).
Surprisingly, I can't find this online anywhere. Wiktionary is available for download, but it is only the MediaWiki articles themselves. Crawling all articles and extracting the phonetics and parts of speech would be a huge exercise.
Is this available anywhere? I don't mind paying.
Edit: a few people have asked what I would like to do. My immediate need is just curiosity, for example "what the most common two-syllable verbs?". Eventually my hope would be a tool that helps you find available domain names, and does so by pairing the correct parts of speech, with bonus points for phonetic matches.
Note: cross-posted on English Language and Usage.
Go to http://www.speech.cs.cmu.edu/cgi-bin/cmudict and you will find the download page for the pronunciation dictionary at https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/
The latest version is currently cmudict.0.7a.
This is what I am currently using to implement the syllable counter for http://www.haikuvillage.com. It's in Ruby and I'd be happy to open source it for you if that helps.
Parts of Speech Dictionary in the public domain with highly structured format: http://icon.shef.ac.uk/Moby/mpos.html
Each line is an entry, separated by ×, with the word value on the left and the part-of-speech value (verb, etc.) on the right. Simple text file.
Wordnet is one of the best dictionaries i know. Perhaps you will find something there:
http://wordnet.princeton.edu/wordnet/related-projects/
Portman, while I used the SpellChecker tool from DevExpress I knew that there existed the OpenOffice dictionaries I'm pretty sure they have a well defined data structure. I recommend you to use that in combination with any free/paid text to speech tool.
Hope that helps,
This is not a direct answer to your question, but the Double Metaphone algorithm is very good at finding word or phrase matches for search engine application servers (such as Solr and others).
I cannot tell what your intended use of this is, so I can't tell if my suggestion is useful or not. If it is close to your intended use, the Wikipedia page about Double Metaphone has a listing of about a dozen implementations of it which may be worth exploring.
http://en.wikipedia.org/wiki/Double_Metaphone
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm looking for a Common Lisp implementation I ran across once, sometime in the past year or two. I only remember a few things, and I don't know how to search for it based on these facts, so maybe somebody here can help.
it was open-source, but wasn't one of the big ones (SBCL, CMUCL, MCL, etc.)
it was likely incomplete; it looked almost more like an exercise in writing the simplest possible self-hosted Common Lisp
the main webpage was plain black-on-white, and had 2 columns, where the left column was a link to the source file for a particular area of functionality (loop, format, clos, etc.), and the right column was a link to the tests for that functionality
the source files themselves were pretty-printed for the web, with syntax highlighting that looked kind of like an old Redhat Emacs default config: slate-gray background, etc.
Where can I find this Lisp implementation?
Thanks!
I don't know which one you are referring too, but you can find a list of Common Lisp Implementations here.
Is there any particular reason why this Lisp is grabbing your attention now?
Its hard to pin down, but open-source + minimalistic + incomplete sounds vaugely similar to Paul Graham's Arc programming language.