As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
How much better would commercial OCR software be compared to the stuff that's available online for free?
More specifically: Reading text in pictures (things like book covers etc...)
I work with OCR quite a lot and can definitely vouch that the commercial offerings are much better than what you can find out there for free. Yes, you can make a free one 'work', but it will take a lot of effort for sub-optimal results.
I recommend finding a product that uses the ABBYY FineReader : It does a great job with little configuration.
You may want to consider whether you need to use an SDK provided by the OCR supplier or an end-user application. The SDK will provide position details, etc of what it finds and offer a lot more in-depth control, but will be more expensive. The end-user package will basically just read everything it finds, but you may be able to set it to automatic or control it rudimentally and it might be good enough for what you're trying to do, and may be a lot cheaper.
Get a trial version and give it a go!
Google's ocropus is free opensource and one of the best
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am new to the world of MS Access. I have already worked on it for a few days now. What I have done mostly is to look for solutions on web, do some relevant research and get going.
I was wondering if anyone could share:
What is the best way of mastering MS Access?
Any suggestions?
Although it's not the method du jour in a web-based world, my recommendation is NOT to look on the web to try to learn a piece of software because you'll be getting a scattergun effect. You'll be getting a bit here, a bit there, some good techniques, some really poor techniques (and at your skill level no way to know the difference) but more importantly... you won't be getting the required CONTEXT to help you understand the various parts of the product and how they connect into a whole.
I would therefore get a decent introductory level textbook which will normally lead you through understanding basic concepts, building on each one so that you grow into the product as you go. I don't do much with Access any more so I can't recommend a specific book but have found that O'Reilly ( http://search.oreilly.com/?q=Access&x=0&y=0 ) has a range of good books in both paper and digital formats.
If you do want to learn from the web, I'd recommend a structured training program such as the ones offered by Lynda.com or Total Training. I've subscribed to both at various times and although I'm currently a Lynda member, TT's courses are usually quite good as well. There are probably some others but I can't give any first hand recommendations on those.
Good luck with your studies.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
By asking for the 'relative popularity' of different languages, rather than asking 'what is the best language?' or 'what is your favorite language', I hope to make this somewhat objective.
I want a language for machine learning / matrices, that is:
opensource-friendly (cf matlab)
fast for inner-loops (cf python,matlab)
fast for matrices (most languages are about the same, since they can usually use BLAS)
has terse, easy to read syntax (cf java)
I've currently settled on java, since it's average at everything, but really poor at nothing, but I can't help feeling that java feels more and more dated, eg no operator overloading, and the borked generics, so I'm wondering what the feeling on the relative popularity of different languages for machine learning is?
I think mostly people use C++, matlab and python, but just curious if there's some language that I've missed that everyone's busy using, that I didn't realize yet?
When I worked on a machine learning project with a friend, I picked up R, which is open source, designed for matrix math, and has extensive library support. It's certainly terser than Java, and I found the syntax pleasant, but that's a subjective judgement.
According to Rexer Analytics, R is the most popular data mining tool, being used by almost half of all of their survey respondents.
(Information on R is hard to search for, so they have a Google frontend for searching for information about it.)
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am playing around with the idea of starting a specialized wiki. I think having a reputation system would greatly increase the user's motivation on such a site. The original wikipedia does indeed have a reputation system but it is not comparable to the one used on the stackexchange network.
Hence my question:
Is there an open-source reputation system for Wiki's like the one used at stackexchange?
IMHO it is very hard to calculate reputation for a real Wiki in the SO-way. If you let the users vote for articles, how you distribute this reputation amongst the editors? After the count of edits? After the amount of added bytes? Both variants could be played to gain more reputation, and that wouldn't improve the article. That's why I would be very careful about that.
The authors of that paper contributed work to the WikiTrust project. While it is not "SO" like in function, it's "batch mode" seems like it might be helpful. It demonstrates extraction of detailed author history for an article. Using WikiTrust in "online mode" would add reputation coding to all the pages.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
In 5 days I'm going to ETL interview. It's my first interview on this subject. What question would I be asked? Most likely they will be about MS SQL Server Integration Service.
If possible, provide the answers. =)
If possible, provide the answers. =)
Keep it high-level if you have to, but don't ask a question that couldn't answer yourself.
I agree with Brad that syntax is not important, it's the thought process.
Another idea is to ask them about how they would pack up and move an office. It gives you insight into the same kinds of decisions needed in ETL (prep, actual moving of stuff, and validation), and you might be more comfortable talking about that than the details of SSIS
Think practically. Hand them a printout of a sample file that might need to be imported (possibly simplified to save time). Have them talk about database design, considerations, concerns, possible ways to improve the data. Then bring out a second printout of somehow related and see if they can figure how to validate the one from the other.
Make sure you talk about how much time is available to perform the ETL processes based on business rules and environment.
Require as much pseudo-code as you like, but I personally subscribe to the idea that syntax can be taught cheaply, but learning how to think is a very expensive thing to teach someone; and sometimes it's not even successful.
Also, ask them what standards they would implement if they were to design the optimum layout of the source data. Make sure you consider data distribution beyond your company (if applicable).
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was wondering; what is the best open source software that I can use for non-binary association rule generations. I need a non-binary implementation because converting my currently non-binary data to binary data would not give the desired results.
Thanks and can't wait to here your comments!
Also take a look at Weka
Check out:
RapidMiner
and
R with Rattle
Try the Orange data mining toolkit.
http://www.ailab.si/orange/
Try Data Mining SDK.
These days I like Knime. See http://knime.org.
you could even try another one called Tanagra http://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html
Its mainly for research purpose but works well and has good tutorials here
http://data-mining-tutorials.blogspot.com
I have an open-source software named SPMF with more than 130 algorithms related to association rules mining, frequent itemset mining, sequential rule mining and sequential pattern mining. You can check my webpage for more details and to download it:
It is Java source code. It has a simple graphical user interface. It also has many specialized algorithms that you will not find in other data mining software.