Test patterns for optical flow - opticalflow

We do a research on several optical flow implementations. The primary goal is to find, which one best fits our needs, and which parameter values work better than others. Now we do our tests on some self-made photos. But we're not sure if it's the best choice.
Are there any specific test images that can be used for this purpose, similar to test functions for optimization problems?

There are a lot of Optical Flow benchmarks where you can get samples. The most common are Middlebury, the KITTI2012/2015, SIntel and the HAMA dataset/benchmarks. But most recent optical flow methods struggle with large motions, varying illumination and low contrast/textures.

Related

Camera image recognition with small sample set

I need to visually recognise some flat pictures showed to camera. There are not many of them (maybe 30) but discrimination may depend on details. The input may be partly obscured or shadowed and is suspect to lighting changes.
The samples need to be updatable.
There are many existing frameworks for object detection, with the most reliable ones depending on deep learning methods (mostly convolutional networks). However, the pretrained models are not well optimised to discern flat imagery of course, and even if I start training from scratch, updating the system for new samples would take a cumbersome training process, if I am right about how this works.
Is it possible to use deep learning while still keeping the sample pool flexible?
Is there any other well known reliable method to detect images from a small sample set?
One can use well trained networks for visual classification like Inception or SqueezeNet, slice of the last layer(s) and add a simple statistical algorithm (for example k-nearest neighbour) that can be directly teached by the samples in a non-iterative fashion.
Most classification-related calculations like lighting and orientation insensitivity are already handled by the pre-trained network then, while the network's output keep enough information to allow statistical algorithms decide the image class.
An implementation using k-nearest neighbour is shown here: https://teachablemachine.withgoogle.com/ , the source is hosted here: https://github.com/googlecreativelab/teachable-machine .
Use transfer learning; you’ll still need to build a training set, but you’ll get better results than starting with random weights. Try to find a model trained on images similar to yours. You might also do some black box testing of the selected model with your curated images to baseline it’s response curve to your images.

Howto improve OCR results

I tried to improved the results of OpenSource OCR software. I'm using tessaract, because I find it still produces better results than gocr, but with bad quality input it has huge problems. So I tried to prepocess the image with various tools I found in the internet:
unpaper
Fred's ImageMagick Scripts: TEXTCLEANER
manuall using GIMP
But I was not able to get good results with this bad test document: (really just for test, I don't need to content of this file)
http://9gag.com/gag/aBrG8w2/employee-handbook
This online service works surprisingly good with this test document:
http://www.onlineocr.net/
I'm wonderung if it is possible using smart preprocessing to get similar results with tesseract. Are the OpenSource OCR engines really so bad compared to commercial ones? Even google uses tesseract to scan documents, so I was expecting more...
Tesseract's precision in recognition is a little bit lower than the precision of the best commercial one (Abbyy FineReader), but it's more flexible because of its nature.
This flexibility entail sometimes some preprocessing, because it's not possible for Tesseract to manage each situation.
Actually is used by google because is Google its main sponsor!
The first thing you could do is to try to expand the text in order to have at least 20 pixel wide characters or more. Since Tesseract works using as features the main segments of the characters' borders, it needs to have a bigger characters' size comparing with other algorithms.
Another thing that you could try, always referring to the test document you mentioned, is to binarize your image with an adaptive thresholding method (here you can find some infos about that https://dsp.stackexchange.com/a/2504), because some changes in the illumination are present. Tesseract binarizes the image internally, but this could be the case when it fails to do that (it's similar to the example here Improving the quality of the output with Tesseract, where you can also find some other useful informations)

Interesting NLP/machine-learning style project -- analyzing privacy policies

I wanted some input on an interesting problem I've been assigned. The task is to analyze hundreds, and eventually thousands, of privacy policies and identify core characteristics of them. For example, do they take the user's location?, do they share/sell with third parties?, etc.
I've talked to a few people, read a lot about privacy policies, and thought about this myself. Here is my current plan of attack:
First, read a lot of privacy and find the major "cues" or indicators that a certain characteristic is met. For example, if hundreds of privacy policies have the same line: "We will take your location.", that line could be a cue with 100% confidence that that privacy policy includes taking of the user's location. Other cues would give much smaller degrees of confidence about a certain characteristic.. For example, the presence of the word "location" might increase the likelihood that the user's location is store by 25%.
The idea would be to keep developing these cues, and their appropriate confidence intervals to the point where I could categorize all privacy policies with a high degree of confidence. An analogy here could be made to email-spam catching systems that use Bayesian filters to identify which mail is likely commercial and unsolicited.
I wanted to ask whether you guys think this is a good approach to this problem. How exactly would you approach a problem like this? Furthermore, are there any specific tools or frameworks you'd recommend using. Any input is welcome. This is my first time doing a project which touches on artificial intelligence, specifically machine learning and NLP.
The idea would be to keep developing these cues, and their appropriate confidence intervals to the point where I could categorize all privacy policies with a high degree of confidence. An analogy here could be made to email-spam catching systems that use Bayesian filters to identify which mail is likely commercial and unsolicited.
This is text classification. Given that you have multiple output categories per document, it's actually multilabel classification. The standard approach is to manually label a set of documents with the classes/labels that you want to predict, then train a classifier on features of the documents; typically word or n-gram occurrences or counts, possibly weighted by tf-idf.
The popular learning algorithms for document classification include naive Bayes and linear SVMs, though other classifier learners may work too. Any classifier can be extended to a multilabel one by the one-vs.-rest (OvR) construction.
A very interesting problem indeed!
On a higher level, what you want is summarization- a document has to be reduced to a few key phrases. This is far from being a solved problem. A simple approach would be to search for keywords as opposed to key phrases. You can try something like LDA for topic modelling to find what each document is about. You can then search for topics which are present in all documents- I suspect what will come up is stuff to do with licenses, location, copyright, etc. MALLET has an easy-to-use implementation of LDA.
I would approach this as a machine learning problem where you are trying to classify things in multiple ways- ie wants location, wants ssn, etc.
You'll need to enumerate the characteristics you want to use (location, ssn), and then for each document say whether that document uses that info or not. Choose your features, train your data and then classify and test.
I think simple features like words and n-grams would probably get your pretty far, and a dictionary of words related to stuff like ssn or location would finish it nicely.
Use the machine learning algorithm of your choice- Naive Bayes is very easy to implement and use and would work ok as a first stab at the problem.

Digit Recognition with Bayesian classes

I need to write an OCR program for digits only. I will use MNIST datasets. The problem is I do not know where to start. There are a lot of papers which doesn't really explain the algorithm. I don't really have much knowledge about pattern recognition. So I have a few questions.
Q1 : Where can I find the algorithm (or a tutorial)
Q2 : How do I classify digits? I don't need very advanced things. First thing that comes to my mind is finding the ratio of upper half/lower half and left side/ right side. Is there more useful and easy classification methods.
Q3 : What is back propagation and the layers which is shown in most of the papers. Do I need them for my simple OCR.
Note: I know my OCR program won't be accurate. It isn't very important for now.
If the closest engineering library to you has a section on image processing, computer vision, or machine vision, then with luck that library will have a copy of a book I recommend for OCR:
Character Recognition Systems by Cheriet, Kharma, Liu, and Suen
This book provides a fairly comprehensive overview of OCR techniques and recent research. It does not go into great depth on any particular subject, but it does provide references to academic papers.
Make sure you have access to a good introductory textbook on image processing. The book by Gonzalez and Woods is a standard in many universities:
Digital Image Processing by Gonzalez and Woods
Even "simple" OCR gets tricky very quickly. It could be overwhelming if you jump into a class about neural networks, Bayes theorem, etc., before you have a firm grasp of basic image processing principles.
If you can, try writing one or more OCR algorithms for machine-printed characters before you attempt to write an algorithm for handwritten characters.
Q1 : Where can I find the algorithm (or a tutorial)
There are numerous algorithms for OCR. The Cheriet book will give you a good start.
Q2 : How do I classify digits? I don't need very advanced things. First thing that comes to my mind is finding the ratio of upper half/lower half and left side/ right side. Is there more useful and easy classification methods.
Try implementing that technique and see how well it works. Even if the implementation doesn't work as well as you'd like, lessons learned while implementing it could help you later.
You can also subdivide a character into a 2 x 2 grid or 3 x 3 grid and check for relatively densities of pixels. Unlike machine printed characters, handwritten characters won't line up nicely in rectilinear grids.
Template matching using normalized correlation is simple, and it can work reasonably well for machine printed characters for a single, known font. It's relatively simple to implement and worth learning:
http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation
For OCR it's common to thin the characters in your sample as an initial step. Thinning is a technique to reduce a character (or any other shape) to a representation that is 1 pixel wide. Once you have a thinned character it can be easier to identify lines and intersections. If you can identify lines (or curves) and intesections, then one technique is to look at the relative position and angle of each line with respect to the others.
Common thinning algorithms include Stentiford and Zhang-Suen. There's a freeware version of WinTopo that demonstrates both of these algorithms:
http://wintopo.com/
You can look into academic papers about "stroke extraction", but those techniques tend to be more difficult to implement.
Q3 : What is back propagation and the layers which is shown in most of the papers. Do I need them for my simple OCR.
These terms refer to artificial neural networks. For a simple OCR algorithm you'll hard-code the recognition logic OR use simple training methods. Artificial neural networks can be trained to recognize characters that aren't hard-coded in your software.
http://en.wikipedia.org/wiki/Neural_network
Although you don't need to learn about artificial neural network to write a simple OCR algorithm, a simple algorithm will have only limited success with handwritten characters.
Above all, keep in mind that OCR for handwritten characters is an extremely difficult problem. If you could achieve a handwritten character read rate of 20% with a simple technique, then consider that a success.

So was that Data Structures & Algorithms course really useful after all?

I remember when I was in DSA I was like wtf O(n) and wondering where would I use it other than in grad school or if you're not a PhD like Bloch. Somehow uses for it does pop up in business analysis, so I was wondering when have you guys had to call up your Big O skills to see how to write an algorithm, which data structure did you use to fit or whether you had to actually create a new ds (like your own implementation of a splay tree or trie).
Understanding Data Structures has been fundamental to many of the projects I've worked on, and that goes beyond the ten minute song 'n dance one does when asked such a question in an interview situation.
Granted that modern environments with all sorts of collection classes can make light work of storing and accessing large amounts of data, but having an understanding that a particular problem is best solved with a particular data structure can be a great timesaver. And by "timesaver" I mean "the difference between something working and not working".
Honestly, being able to answer that stuff is my biggest criterion for taking interviewees seriously in an interview. Knowing how basic data structures work, basic O(n) analysis, and some light theory is really crucial to being able to write large applications successfully.
It's important in the interview because it's important in the job. I've worked with techs in the past that were self taught, without taking the data structures course or reading a data structures book, and their code is occasionally bad in ways they should have seen coming.
If you don't know that n2 is going to run slowly compared to n log n, you've got more to learn.
As far as the later half of the data structures courses, it isn't generally applicable to most tech jobs, but if you ever do wind up needing it, you'll wish you had paid more attention.
Big-O notation is one of the basic notations used when describing algorithms implemented by a particular library. For example, all documentation on STL that I've seen describes various operations in terms of big-O, so naturally you have to e.g. understand the difference between O(1), O(log n) and O(n) to understand the implications of your choice of STL containers and algorithms. MSDN also does that for .NET classes, and IIRC Java documentation does that for standard Java classes. So, I'd say that knowing the notation is pretty much a requirement for understanding documentation of most popular frameworks out there.
Sure (even though I'm a humble MS in EE -- no PhD, no CS, differently from my colleague Joshua Block), I write a lot of stuff that needs to be highly scalable (or components that may need to be reused in highly scalable apps), so big-O considerations are most always at work in my design (and it's not hard to take them into account). The data structures I use are almost always from Python's simple but rich supply (which I did lend a hand developing;-), rarely is a totally custom one needed (rather than building on top of list, dict, etc); but when it does happen (e.g. the bitvectors in my open source project gmpy), no big deal.
I was able to use B-Trees right when I learned about them in algorithm class (that was about 15 years ago when there were much less open source implementations available). And even later the knowledge about the differences of e. g. container classes came in handy...
Absolutely: even though stacks, queues, etc. are pretty straightforward, it helps to have been introduced to them in a disciplined fashion.
B-Tree's and more advanced sorting are a bit more difficult so learning them early was a big benefit and I have indeed had to implement each of them at various points.
Finally, I created an algorithm for single-connected components a few years back that was significantly better than the one our signal-processing team was using but I couldn't convince them that it was better until I could show that it was O(n) complexity rather than O(nlogn).
...just to name a few examples.
Of course, if you are content to remain a CRUD-system hacker with no real desire to do more than collect a paycheck, then it may not be necessary...
I found my knowledge of data structures very useful when I needed to implement a customizable event-driven system about ten years ago. That's the biggie, but I use that sort of knowledge fairly frequently in lesser ways.
For me, knowing the exact algorithms has been... nice as background knowledge. However, the thing that's been the most useful is the more general background of having to pay attention to how different pieces of an algorithm interact. For instance, there can be places in code where moving one piece of code (ie, outside a loop) can make a huge difference in both time and space.
Its less of the specific knowledge the course taught and, rather, more that it acted like several years of experience. The course took something that might take years to encounter (have drilled into you) all the variations of in pure "real world experience" and condensed it.
The title of your question asks about data structures and algorithms, but the body of your question focuses on complexity analysis, so I'll focus on that too:
There are lots of programming jobs where being able to do complexity analysis is at least occasionally useful. See What career can I hope for if I like algorithms? for some examples of these.
I can think of several instances in my career where either I or a co-worker have discovered a a piece of code where the (usually time, sometimes space) complexity was higher that it should have been. eg: something that was quadratic or cubic when it could have been linear or nlog(n). Such code would work fine when given small inputs, but on larger inputs would quickly become really slow or consume all available memory. Knowing alternative algorithms and data structures, their complexities, and also how to analyze the complexity to build new algorithms is vital in being able to correct these problems (or avoid them in the first place).
Networking is all I've used it: in an implementation of traveling salesman.
Unfortunately I do a lot of "line of business" and "forms over data" apps, so most problems I work on can be solved by hammering together arrays, linked lists, and hash tables. However, I've had the chance to work my data structures magic here and there:
Due to weird complex business rules, I worked on an application which used a custom thread pool implemented as a leftist-heap.
My dev team struggled to write a complex multithreaded app. It was plagued with race conditions, dead locks, and lousy performance due to very fine-grained locking. We re-worked the code to share state between threads, opting to write a very light-weight wrapper to facilitate message passing. Next, we converting our linked lists and hash tables to immutable stacks and immutable style and immutable red-black trees, we had no more problems with thread safety or performance. The resulting code was immaculate and surprisingly readable.
Frequently, a business rules engine requires you to roll your own state machine, which is very naturally modelled as a graph where vertexes and states and edges are transitions between states.
If for no other reasons, I'm glad I took the time to readable about data structures and algorithms simply to be able picture novel problems a little differently, especially combinatorial problems and graph problems. Graph theory is no longer a synonym for "scary".