Google Cloud Vision Document OCR - keep layout in the resulted text - ocr

I use Google Cloud Vision Document OCR API. The resulted text that is returned by com.google.cloud.vision.v1.AnnotateImageResponse.getFullTextAnnotation().getText() is a little bit messy and lose the text formatting presented on the original image.
Is there with Google Cloud Vision Document OCR API a way to keep the layout(formatting) in the resulted text?

At this point of time(1/24/2018), there is no Rich Text Output from the Google Vision OCR. You can only differentiate the lines. There is other way around which is pretty time consuming using the bounding boxes, I don't think that's the solution you are looking for.

Related

Google Firebase MLKit Vision API tweaking block bounds

I am experimenting with Google's new Firebase MLKit for iOS. I am playing with cloud text recognition in images, and am wondering how I can tweak the bounds of the text blocks.
This is the sample I am using: https://firebase.google.com/docs/ml-kit/ios/recognize-text
I can find documentation on doing this for Google's Cloud Vision, but not specifically for the MLKit implementation of Cloud Vision.
Text from images is being automatically placed into blocks. Some returned blocks should be combined, and others should be seperated. How can I go about tweaking the rules in which these blocks are generated. Refer to the sample provided.

Doing an Image Search?

Is it possible to perform an image search with google maps? For example, if I had a small section of a map, showing road configurations, but there were no labels to indicate street names or place names, is there a possible way to do an image search, similar to what you can do with regular google, to be able to identify that location? I have tried this with the regular google, and it does not work. Does anyone know of software or an app that has the ability to do this? Thanks!
Yes it may be possible , But I do not think any software or app that is currently available a program would be need to be written that takes that small section of a map that you have and it would have to be a rather good quality image and preferaby in a raw Bitmap format and then the small section of the map you provide would have to be overlaid over the main google map and then moved around the map... scanned and each pixel compared frame by frame as the search picture is scanned across the bigger map and then a best-fit process would be utilized , and then when it finds a match it can then let you know just how closely it matches and what the confidence level it is that that the location found is actually correct , it would be best to narrow it down as much as possible as well, also a bit of artificial intelligence might be very useful here too, a face recognition A.I. program could be modified to complete that task. but as far as I know none of that exists in a single readily available app or program , maybe someday someone would create such a program? it is possible.

How can a scanned page be divided into words like the reCaptcha project?

I would like to digitize a book in a similar way to the reCaptcha project. Is there already a system for inputing an image and then outputting little images cropped around words? Any ideas on how to do this?
You should look into the Tesseract OCR project on which reCaptcha was probably based. It has the capability to output the coordinates of recognized words. Then you crop the page to those coords and you are done.
If you just want to split the image in multiple images one word each you could try to find the word bounding boxes and then take those co-ordinates for the splitting. This can be done by taking histograms/projections of the document in horizontal direction and then for each line in vertical direction. An example algorithm with some pictures describing the idea can be found in this paper: "Document Page Decomposition by the Bounding-Box Projection Technique" (http://haralick.org/conferences/71281119.pdf). You could implement this in OpenCV.
Alternativly, you can use Tessaract as mentioned by beppe9000. Perhaps this helps: Getting the bounding box of the recognized words using python-tesseract
But then you get the whole complexity of training OCR even though you only want the bounding boxes.

Using 3D in Google Maps

I would like to create a webapp where I can display my own geotiffs, with NDVI and other data layers, as well as 3D geometries, providing a seamless rendering of both 2D tiles and textured 3D shapes, exactly like maps.google.com achieves in switching from "map" to "earth" views.
After much research, the closest I came to a viable solution is to build the infrastructure from the ground up based on http://cesiumjs.org/, and while this seems doable, it is extremely low level, and will require an exotic cocktail of libraries and a buttload of man-hours.
Before going down that road, I want to make sure there isn't a cost effective alternative that takes all the heavy lifting out of my app's shoulders and gives me a friendly set of APIs to base my app on.
Mapbox comes close to perfection in this regard, but unfortunately, it only handles 2D.
On the other hand, on the Google side, amid Earth API and Maps Engine deprecation, it's hard to tell what exactly is currently possible and will remain available long term.
Bottom line, for a future-proof Google-centric solution built today, are there Google APIs in place that allow building a webapp that displays custom 2D and 3D data with a seamless rendering experience?
https://cesiumjs.org/ is a library very similar to google maps that provides support for 3d shapes on top of maps.

Using Google Maps controls for a large image

I have large images that I would like to have dragging and zooming controls like Google Maps. I started looking into Google Maps API and some other related websites, but I could not find something simple and easy.
MapKi tutorial suggests me to automatically cut tiles and add it as a custom map. This makes sense, but I have so many images in the file server that I don't have time to go through all of them and cut the tiles and figure out zoom levels for each. One good solution would be writing a script that can do this automatically, but that would take a lot of effort and time that has made to look for another solution if there is any.
Hence, is there a way to have similar functionalities as Google Maps controls for images without creating new images or tiles out of the original image. It would be great if you can either post some code or link to the tutorial/documentation. Or, if you know how to do this with Google API without making those tiles, please direct me to the right path. I'm a total newbie with Google Maps API.
I have found the DragZoom for Google Maps, but I don't think that's what I'm looking for.
You're looking for something like Djatoka
You should take a look at the IIIF protocol used by libraries and museums for zooming extremely large images (tens of thousands of pixels on a side +), preparing collections of images on canvases, presenting annotations on those images, etc.
http://iiif.io
…and just for the record here's an open source tiling server with a frontend viewer:
http://iipimage.sourceforge.net/
Check out https://github.com/Murtnowski/GMap-JSlicer
slicer = new JSlicer(document.getElementById('map'), 'myImage.png');
slicer.init();
It's super simple, no need for tile pre-cutting. Just point at a single image and go.