How to implement text selecting? - language-agnostic

My question is not language based or OS based. I guess every system is offering some sort of TextOut(text, x, y) method. I am looking for some guidlines or articles how should I implement selection of outputed text. Could not find any info about this.
The only thing which comes to my mind is like this:
When user clicks some point on the text canvas I know the coordinates of this point. I need to calculate where exactly it will be in my text buffer. So I am traversing from the begining of the buffer and I am applying to each character (or block of text) a style (if it has any). After this, I know that after given style the letter has given size. I am adding its width and height to previously calculated X,Y coordinates. In this way, I am traversing the buffer until the calculated position has not reached the point that has been clicked by the user. After I reach the point within range of some offset I have starting point for the selection.
This is the basic idea. I don't know if this is good, I would like to know how this is done for real like for example in Firefox. I know I could browse the sources and if I won't have a choice I'll do it. But first I am trying to find some article about it...

Selecting text is inherently specific to the control which is containing it and the means it stores that text.
A very simple (though questionably inefficient means) is to run the text flow algorithm you are using when clicking on a point and stopping the algorithm when you have reached what is closest to that point. More advanced controls might cache the text layout to make selections or drawing their content more efficient. Depending on how much you value CPU time or memory there are ways to use caches and special cases to make this “hit test” cheaper.
If you can make any assertions (only one font in the control, so every line has the same height) then it is possible to make these tests cheaper by indexing the font layout by lines and then doing simple arithmetic to find out which line was clicked on. If your text control is also using monospace fonts (every character occupies the same width as well as height) then you are in even more luck, as you can jump straight to the character information via a lookup table and two simple divisions.
Keep in mind that writing a text control from scratch is obscenely difficult. For best practice, you should keep the content of the document separate from the display information. The reason for this is because the text itself will need to be edited quite often, so algorithms such as Ropes or Gap Buffers may be employed on the data side to provide faster insertion around the caret. Every time text is edited it must also be rendered, which involves taking that data and running it through some kind of formatting / flow algorithms to determine how it needs to be displayed to the user. Both of these sides require a lot of algorithms that may be annoying to get right.
Unfortunately using the native TextOut functions will not help you. You will need to use methods which give you the text extents for individual characters, and more advanced (multiline for example) controls often must do their own rendering of characters using this information. Functions like TextOut are not built to deal with blinking insertion carets for example, or performing incremental updates on text layouts. While some TextOut style functions may support word wrap and alignment for you, they also require re-rendering the entire string which becomes more undesirable in proportion to the amount of text you need to work with in your control.

You are thinking at a much lower level than necessary (not an insult. you are thinking that you need to do much more work then you need to). Most (if not all) languages with GUI support will also have some form of selectionRange that gives you either the string that was selected or the start and stop indices in the string.
With a modern language, you should never have to calculate pixels and character widths.
For text selection in Javascript, see this question: Understanding what goes on with textarea selection with JavaScript

Related

FirebaseVisionImage / ML Toolkit cropRect() support

I am posting this question by request of a Firebase engineer.
I am using the Camera2 API in conjunction with Firebase-mlkit vision. I am using both barcode and on-platform OCR. The things I am trying to decode are mostly labels on equipment. In testing the application I have found that trying to scan the entire camera image produces mixed results. The main problem is that the field of view is too wide.
If there are multiple bar codes in view, firebase returns multiple results. You can sort of work around this by looking at the coordinates and picking the one closest to the center.
When scanning text, it's more or less the same, except that you get multiple Blocks, many times incomplete (you'll get a couple of letters here and there).
You can't just narrow the camera mode, though - for this type of scanning, the user benefits from the "wide" camera view for alignment. The ideal situation would be if you have a camera image (let's say for the sake of argument it's 1920x1080) but only a subset of the image is given to firebase-ml. You can imagine a camera view that has a guide box on the screen, and you orient and zoom the item you want to scan within that box.
You can select what kind of image comes from the Camera2 API but firebase-ml spits out warnings if you choose anything other than YUV_420_488. The problem is that there's not a great way in the Android API to deal with YUV images unless you do it yourself. That's what I ultimately ended up doing - I solved my problem by writing a Renderscript that takes an input YUV, converts it to RGBA, crops it, then applies any rotation if necessary. The result of this is a Bitmap, which I then feed into either the FirebaseVisionBarcodeDetectorOptions or FirebaseVisionTextRecognizer.
Note that the bitmap itself cases mlkit runtime warnings, urging me to use the YUV format instead. This is possible, but difficult. You would have to read the byte array and stride information from the original camera2 yuv image and create your own. The object that comes from camear2 is unfortunately a package-protected class, so you can't subclass it or create your own instance - you'd essentially have to start from scratch. (I'm sure there's a reason Google made this class package protected but it's extremely annoying that they did).
The steps I outlined above all work, but with format warnings from mlkit. What makes it even better is the performance gain - the barcode scanner operating on an 800x300 image takes a tiny fraction as long as it does on the full size image!
It occurs to me that none of this would be necessary if firebase paid attention to cropRect. According to the Image API, cropRect defines what portion of the image is valid. That property seems to be mutable, meaning you can get an Image and change its cropRect after the fact. That sounds perfect. I thought that I could get an Image off of the ImageReader, set cropRect to a subset of that image, and pass it to Firebase and that Firebase would ignore anything outside of cropRect.
This does not seem to be the case. Firebase seems to ignore cropRect. In my opinion, firebase should either support cropRect, or the documentation should explicitly state that it ignores it.
My request to the firebase-mlkit team is:
Define the behavior I should expect with regard to cropRect, and document it more explicitly
Explain at least a little about how images are processed by these recognizers. Why is it so insistent that YUV_420_488 be used? Maybe only the Y channel is used in decoding? Doesn't the recognizer have to convert to RGBA internally? If so, why does it get angry at me when I feed in Bitmaps?
Make these recognizers either pay attention to cropRect, or state that they don't and provide another way to tell these recognizers to work on a subset of the image, so that I can get the performance (reliability and speed) that one would expect out of having to ML correlate/transform/whatever a smaller image.
--Chris

Item matching with domain knowlege

I have various product items that I need to decide if they are the same. A quick example:
Microsoft RS400 mouse with middle button should match Microsoft Red Style 400 three buttoned mouse but not Microsoft Red Style 500 mouse
There isn't anything else nice that I can match with apart from the name and just doing it on the ratio of matching words isn't good enough (The error rate is far too high)
I do know about the domain and so I can (for example) hand write the fact that a three buttoned mouse is probably the same as a mouse with a middle button. I also know the manufacturers (or can take a very good guess at them).
The only thought I have had so far is matching them by trying to use hand written rules to reduce the size of the string and then checking the matching words, but I wondered if anyone had any ideas best way of doing this matching was with a better accuracy and precision (or where to start looking) and if anyone knew of any work that had been done in this area? (papers, examples etc).
"I do know about the domain..."
How much exactly do you know about the domain? If you know everything about the domain, then you might be better off building an index of all your manufacturers products (basically the description of the product from the manufacturers webpage). Then instead of trying to match your descriptions to each other, matching them to your index of products.
Advantages to this approach:
presumably all words used in the description of the product have been used somewhere in the promotional literature
if when building the index you were able to weight some of the information (such as product codes) then you may have more success
Disadvantages:
may take a long time to create the index (especially if done by hand)
If you don't know everything about your domain, then you might consider down-ranking words that are very common (you can get lists of common words off the internet), and up-ranking numbers and words that aren't in a dictionary (you can get lists of words off the internet/most linux/unix distributions come with them for spell checking purposes).
I don't know how much you know about search, but in the past I've found the book "Search Engines: Information Retrieval in Practice" by W. Bruce Croft, Donald Metzler, Trevor Strohman to be useful. There are some sample chapters in the publishers website which will tell you if the book's for you or not: pearsonhighered.com
Hope that helps.
In addition to hand-written rules, you may try to use supervised learning with feature extraction.
Let features be the words in description, than look on descriptions as feature vectors.
When teaching the algorithm, let it show you two vectors that look similar by the ratio, and if it's same item, let the algorithm improve weighs for those words.
For example, each pair of words may have bigger weight than simple ratio, as you have done.
[3-button] [middle]
[wheel] [button]
[mouse] [mouse]
By your algorithm, it'll give ratio of 1/3 to similarity. When you set this as "same item" algorithm should add more value to those pair of words, when it reaches them next time.
Just tokenize (you should seperate numbers from letters in that step aswell, so not just a whitespace tokenizer), stem, filter stopwords and uninteresting words like mouse. Perhaps you should have a list with words producers aswell and shorten all not producers and numbers to their first letter. (if you do that, you have to seperate capital letters aswell in the tokenizer)
Microsoft RS400 mouse with middle button -> Microsoft R S 400
Microsoft Red Style 400 three buttoned mouse -> Microsoft R S 400
Microsoft Red Style 500 mouse -> Microsoft R S 500
If you want a better solution
vsm (vector space model) out of plagiarism detection would be nice. (Every word gets a weight, according to their discriminative value and those weights are projected into a multidimensional space. After that you just measure the angular degree between 2 texts)
I would suggest something a lot more generally applicable. As I understand it, you want some nlp processing that will deal with things that you recognize as synonyms. I think that's a pretty simple implementation right there.
If I were you I would make a keyword object that had a list of synonyms as a parameter, then write a script that would scrape whatever text you have for words that only appear occasionally (have some capped frequency at which the keyword is actually considered applicable), then add a list of keywords as a parameter of each keyword that contains it's synonyms. If you were willing to go a step further I would set weights on the synonym list showing how similar they are.
With this kind of nlp problem, the chance that you will get to 100% accuracy is 0, but you could well get above 90%, I would suggest adding an element by which you can adjust the weights in an automated way. I have to be fairly vague here, but in my last job I was tasked with a similar problem, and was able to get accuracy in the high 90's. My implementation was also probably more complicated than what you need, but even a simple implementation should get you pretty good return, but if you aren't dealing with a fairly large data set (~hundreds+) it's probably not worth scripting.
Quick example, in your example the difference can be distilled pretty accurately to just saying that "middle" and "three" are synonyms. You can get more complex if you need to, but that would match a lot.

Extract or crop image from within TIFF

I need to extract/crop the logotype (BEAVER) in the middle from a TIFF file that looks like this: http://i41.tinypic.com/2i7rbie.jpg
And then I need to automate the process so it can be repeated about 9 million times...
My guess is that I would have to use some OCR software. But is it possible for such a software to "crop anything that starts below this point and ends above this point"?
Thoughts?
Typically OCR software does only extraction of text from images and conversion of it into some text-specific format. It does not do crop. However, you can use OCR technologies to achieve your task. I would recommend following:
OCR whole page
Get coordinates of recognized text
Apply your magic rules to recognized text to locate area to crop: such as averything in between "application filled" and "STATEMENT" sentences.
Cut from image that area and export it where you want it.
Real challenge is in the amount of text you would like to process. You have to be very carefull when defining your "smart rules" to make sure they don't provide false positives and always send suspicious images to separate queue that you will later manually review and update your rules.
In general it may look like this:
Take first 10 of images, define logo detection rules, test and see if everything works well
Then run on next 10, see what was prcessed wrong, what was not processed, update rules, re-process those 10 to make sure everything works well now
Re-run it on new batches of same size until it will start working well.
Then increase batch size from 10 to 100, and go with those batches until again everything start working smoothly
Then continue this way perfecting your rules and increasing batch size. At some point of time you will go to production speed.
Most likely you will encounter some strange images that either contradict existing rules, or just wrong. Not always you have to update your rules to accomodate it. It may happen that there it only dozen of images like that in whole your 9 million collection. It might be better to leave them in exceptions queue for manual processing, and don't risk stability of your magic rules.

Optical character recognition

Hey everyone,
I'm trying to create a program in Java that can read numbers of the screen, and also recognise images on the screen. I was wondering how i can achieve this?
The font of the numbers will always be the same. I have never programmed anything like this before, but my idea of how it works is to have the program take a screenshot, then overlay the image of the numbers with the section of the screenshot image and check if they match, repeating this for each numbers. If this is the correct way to do this, how would i put that in code.
Thanks in advance for any help.
You could always train a neural net to do it for you. They can get pretty accurate sometimes. If you use something like Matlab it actually has capabilities for that already. Apparently there's a neural network library for java (http://neuroph.sourceforge.net/) although I've never used it personally.
Here's a tutorial about using neuroph: http://www.certpal.com/blogs/2010/04/java-neural-networks-and-neuroph-a-tutorial/
You can use a neural network, support vector machine, or other machine learning construct for this. But it will not do the entire job. If you do a screen shot, you are going to be left with a very large image that you will need to find the individual characters on. You also need to deal with the fact that the camera might not be pointed straight at the text that you want to read. You will likely need to use a series of algorithms to lock onto the right parts of the image and then downsample it in a way that size becomes neutral.
Here is a simple Java applet I wrote that does some of this.
http://www.heatonresearch.com/articles/42/page1.html
It lets you draw on a relatively large area and locks in on your char. Then it recognizes it. I am using the alphabet, but digits should be easier. The complete Java source code is included.
One simpler approach could be to use template matching. If the fonts are same, and/or the size (in pixels)is known, then simple template matching can do the job for you. ifsize of input is unknown, you might have to create copies of images at different scales and do the matching at each scale.
One with the extreme value(highest or lowest depending on the method you follow for template matching) is your result.
Follow this link for details

Element point map for html5 canvas element, need algorithm

I'm currently working on a pure html 5 canvas implementation of the "flying tag cloud sphere", which many of you have undoubtedly seen as a flash object in some pages.
The tags are drawn fine, and the performance is satisfactory, but there's one thing in the canvas element that's kind of breaking this idea: you can't identify the objects that you've drawn on a canvas, as it's just a simple flat "image"..
What I have to do in this case is catch the click event, and try to "guess" which element was clicked. So I have to have some kind of matrix, which stores a link to a tag object for each pixel on the canvas, AND I have to update this matrix on every redraw. Now this sounds incredibly inefficient, and before I even start trying to implement this, I want to ask the community - is there some "well known" algorithm that would help me in this case? Or maybe I'm just missing something, and the answer is right behind the corner? :)
This is called the point location problem, and it's one of the basic topics in computational geometry. There are a lot of methods you could use that would be much faster than the approach you're thinking of, but the details depend on what exactly you want to accomplish.
For example, each text string is contained in a bounding box. Do you just want to test whether the user clicked somewhere in that box? Then simply store the minimum and maximum coordinates of each rendered string, and test the point against each bounding box to see if it's contained in that range. If you have a large number of points to test, you can build any number of data structures to speed this up (e.g. R-trees), but for a single point the overhead of constructing such a structure probably isn't worthwhile.
If you care about whether the point actually falls within the opaque area of the stroked characters, the problem is slightly trickier. One solution would be to use the bounding box approach to first eliminate most of the possibilities, and then render the remaining strings one at a time to an offscreen buffer, checking each time to see if the target point has been touched.