Get confidence of Google Vision OCR text annotation result - ocr

The results I get from running OCR on an image (based on this tutorial) don't include a confidence score. Is there a way to get this information?
The documentation lists score as one of the values that should be returned, but I don't see it.
This is the output I see:
description: "&"
bounding_poly {
vertices {
x: 435
y: 959
}
vertices {
x: 459
y: 960
}
vertices {
x: 458
y: 990
}
vertices {
x: 434
y: 989
}
}

Your link to the documentation goes to the section of "entity annotation" which is not relevant for OCR.
You can get some kind of score (confidence) for OCR results if you set the type of your request to "DOCUMENT_TEXT_DETECTION":
....
"symbols": [
{
"property": {
"detectedLanguages": [
{
"languageCode": "en"
}
]
},
"boundingBox": {
"vertices": [....]
},
"text": "T",
"confidence": 0.99
},
....
The results for the type "TEXT_DETECTION" will NOT give you any confidence values.
You can try the difference easily here:
https://cloud.google.com/vision/docs/ocr

Related

Creating nodes from nested JSON using neo4J query

I'm new in neo4j and i have this json file:
{
"locations_connections": {
"locations": [
{
"id": "aws.us-east-1",
"longitude": 72.8777,
"latitude": 19.0760
},
{
"id": "aws.us-east-2",
"longitude": 126.9780,
"latitude": 37.5665
},
{
"id": "aws.us-west-1",
"longitude": 103.8517837,
"latitude": 1.287950
}
],
"connections": [
{
"aws.us-west-1": [
{
"id": "aws.us-west-1",
"latency": 3.16,
"cost": 0.02
},
{
"id": "aws.us-east-1",
"latency": 53.47,
"cost": 0.02
},
{
"id": "aws.us-east-2",
"latency": 53.47,
"cost": 0.02
}
]
},
{
"aws.us-east-1": [
{
"id": "aws.us-east-1",
"latency": 3.16,
"cost": 0.02
},
{
"id": "aws.us-east-2",
"latency": 53.47,
"cost": 0.02
}
]
},
{
"aws.us-east-2": [
{
"id": "aws.us-east-2",
"latency": 53.47,
"cost": 0.02
}
]
}
]
}
}
After reading the json using the apoc.load.json(URL) procedure , what query do I write to represent this as a graph?
where the Node will contain the information name like for example aws.us-east-1, value of longitude and value of latitude and the edges will have the latency and the cost
I have this code:
call apoc.load.json("/file.json") yield value
UNWIND value.locations_connections.locations as loc
UNWIND value.locations_connections.connections as con
MERGE (e:Element {id:loc.id}) ON CREATE
SET e.longitude = loc.longitude, e.latitude = loc.latitude
WITH con
FOREACH (region_source IN KEYS(con)|
FOREACH (data in con[region_source]|
MERGE (e1:Element {id: region_source})
MERGE (e1)<-[:CONNECT]-(e2:Element {id:data.id, latency:data.latency, cost:data.cost})
))
and the execution result is incorrect:
Added 9 labels, created 9 nodes, set 27 properties, created 6 relationships, completed after 60 ms.and I have seen this one,But this is not what I expected
You cannot use match inside a FOREACH so when you put MERGE and :CONNECT inside the for loop, it is creating multiple nodes. This is what I did and tell us if it works for you or not.
call apoc.load.json("/file.json") yield value
// read the json file
WITH value, value.locations_connections.locations as locs
// for loop to create the locations (or regions)
FOREACH (loc in locs | MERGE (e:Element {id:loc.id}) ON CREATE
SET e.longitude = loc.longitude, e.latitude = loc.latitude
)
// get the data for the connections
WITH value.locations_connections.connections as cons
UNWIND cons as con
// the keys and value are assigned to variables region and data
WITH KEYS(con)[0] as region_source, con[KEYS(con)[0]] as dat
// unwind is similar to a for loop
UNWIND dat as data
// look for the nodes that we want
MATCH (e1:Element {id: region_source}), (e2:Element {id: data.id})
// create the connection between regions
MERGE (e1)<-[:CONNECT {latency:data.latency, cost:data.cost}]-(e2)
RETURN e1, e2
See result below:

How to read from a JSON with two keys

I have a json that I need to import and then return a certain value. The json has two keys, like
{
"NUM_High_Objects": {
"abseta_pt": {
"field1:[0.0,0.9]": {
"field2:[15,20]": {
"tagIso": 0.00012,
"value": 0.99
},
"field2:[20,25]": {
"tagIso": 0.00035,
"value": 0.98
}
},
"field1:[0.91,1.2]": {
"field2:[15,20]": {
"tagIso": 0.00013,
"value": 0.991
},
"field2:[20,25]": {
"tagIso": 0.00036,
"value": 0.975
}
},
"binning": [
{
"binning": [
0.0,
0.9,
1.2,
2.1,
2.4
],
"variable": "abseta"
},
{
"binning": [
15,
20,
25,
30,
40,
50,
60,
120
],
"variable": "pt"
}
]
}
},
What I need is to search if a pair of values is within the range of "field1" and "field2" and return the corresponding "value"
I tried following this Search nested json / dict for multiple key values matching specified keys but could not make it to work...
I ve tried something like
class checkJSON() :
def __init__(self,filein) :
self.good, self.bad = 0, 0
print 'inside json function : will use the JSON', filein
input_file = open (filein)
self.json_array = json.load(input_file)
def checkJSON(self,LS,run) :
try :
LSlist = self.json_array[str(run)]
for LSrange in LSlist :print LSrange, run
except KeyError :
pass
self.bad += 1
return False
CJ=''
CJ=checkJSON(filein='test.json')
isInJSON = CJ.checkJSON("0.5", "20")
print isInJSON
but this does not work as I am not sure how to loop inside the keys
If I am understanding your question correctly then the relevant portion of your JSON is:
{
"field1:[0.0,0.9]": {
"field2:[15,20]": {
"tagIso": 0.00012,
"value": 0.99
},
"field2:[20,25]": {
"tagIso": 0.00035,
"value": 0.98
}
},
"field1:[0.91,1.2]": {
"field2:[15,20]": {
"tagIso": 0.00013,
"value": 0.991
},
"field2:[20,25]": {
"tagIso": 0.00036,
"value": 0.975
}
},
"binning": [
{
"binning": [
0.0,
0.9,
1.2,
2.1,
2.4
],
"variable": "abseta"
},
{
"binning": [
15,
20,
25,
30,
40,
50,
60,
120
],
"variable": "pt"
}
]
}
Then the following code should do what you are trying to achieve. It doesn't look like you need to search for nested keys, you simply need to parse your field1[...] and field2[...]. The code below is a quick implementation of what I understand you are trying to achieve. It will return the value if the first parameter is in the range of a field1[...] and the second parameter is in the range of a field2[...]. Otherwise, it will return None.
import json
def check_json(jsondict, l1val, l2val):
def parse_key(keystr):
level, lrange = keystr.split(':')
return level, eval(lrange)
for l1key, l2dict in jsondict.items():
if 'field' in l1key:
l1, l1range = parse_key(l1key)
if l1val >= l1range[0] and l1val <= l1range[1]:
for l2key, vals in l2dict.items():
l2, l2range = parse_key(l2key)
if l2val >= l2range[0] and l2val <= l2range[1]:
return vals['value']
return None
Here is a driver code to test the implementation above.
if __name__ == '__main__':
with open('data.json', 'r') as f:
myjson = json.load(f)
print(check_json(myjson, 0.5, 20))

Elastic Search - Even distribution on a map

I'm using Elastic Search 2. I have a big database of locations, all of them have a gps attribute, which is a geopoint.
My frontend application displays a google maps component with the results, filtered by my query, let's say pizza. The problem is that the dataset grew a lot, and the client wants even results on the map.
So if I search for a specific query in New York, i would like to have results all over New York, but i'm currently receiving 400 results only in one populous area of Manhattan.
My naive approach was to just filter by distance
{
"size":400,
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"geo_distance":{
"distance":"200km",
"gps":[
-73.98502023369585,
40.76195656809083
]
}
}
}
}
}
This doesn't guarantee that the results will be spread across the map.
How can I do it?
I've tried using Geo-Distance Aggregation for this
{
"size":400,
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"geo_distance":{
"distance":"200km",
"gps":[
-73.98502023369585,
40.76195656809083
]
}
}
}
},
"aggs":{
"per_ring":{
"geo_distance":{
"field":"gps",
"unit":"km",
"origin":[
-73.98502023369585,
40.76195656809083
],
"ranges":[
{
"from":0,
"to":100
},
{
"from":100,
"to":200
}
]
}
}
}
}
But i just receive a results list + the amount of elements that belong to the buckets. The results list is not guaranteed to be spread.
"aggregations": {
"per_ring": {
"buckets": [
{
"key": "*-100.0",
"from": 0,
"from_as_string": "0.0",
"to": 100,
"to_as_string": "100.0",
"doc_count": 33821
},
{
"key": "100.0-200.0",
"from": 100,
"from_as_string": "100.0",
"to": 200,
"to_as_string": "200.0",
"doc_count": 6213
}
]
}
}
I would like to grab half of the results from one bucket, half from the other bucket.
I've also attempted to use Geohash Grid Aggregation, but that also doesn't give me samples of results for every bucket, just provides the areas.
So how do I get a spaced distribution of results spread across my map with one elastic search query?
Thanks!
I think introducing some randomness may give you the desired result. I am assuming you're seeing the same distribution because of index ordering (you're not scoring based on distance, and you're taking the first 400 so you are most likely seeing the same result set).
{
"size": 400,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": {
"geo_distance": {
"distance": "200km",
"gps": [
-73.98502023369585,
40.76195656809083
]
}
}
}
},
"functions": [
{
"random_score": {}
}
]
}
}
}
Random score in elastic

c3.js generate a stacked bar from JSON payload

I am attempting to generate a stacked bar chart with c3 when using a JSON payload (code below). However, when I group the data, instead of having a stacking behavior, they overlay instead. If I use the column structure, I get the intended behavior, but this means that I'd have different code generate for a stacked bar chart versus my other visuals (ie timeseries chart).
var chart = c3.generate({
data: {
x: "x-axis",
json:[
{ "x-axis": "0",
"data1": 30
},
{ "x-axis": "0",
"data2": 40
}],
keys: {
x: "x-axis",
value: ["data1", "data2"]
},
groups: [
['data1', 'data2']
],
type: 'bar'
}
});
Here is a fiddle: http://jsfiddle.net/cjrobinson/ozf4fzcb/
It's weird they overplot each other in your example, I'd report that as a bug to c3
If you don't want to use the columns[] format, you could do it like below, would still need some data wrangling though:
var chart = c3.generate({
data: {
x: "x-axis",
json:[
{ "x-axis": "0",
"data1": 30,
"data2": 40
},
{ "x-axis": "1",
"data1" :20,
"data2": 60
}],
// etc etc
keys: {
x: "x-axis",
value: ["data1", "data2"]
},
groups: [
['data1', 'data2']
],
type: 'bar'
}
});
http://jsfiddle.net/dhgujwy7/1/

Am I duplicating data in my API?

My gradepoints and percents objects hold the same values of grades with different keys. Please take a look at my json below and let me know if I'm doing it right. Is there a way to optimize this API?
I could provide the percents along with the gradepoints after a comma like "a1": "10,90" but this way I will need to split them up on client side JS, which I'm restraining from.
{
"gradepoints": [
{
"a1": 10
},
{
"a1": 10
},
{
"c2": 5
},
{
"e1": "eiop"
},
{
"d": 4
},
{
"b1": 8
}
],
"percents": [
{
"a1": 90
},
{
"a1": 90
},
{
"c2": 45
},
{
"e1": "eiop"
},
{
"d": 36
},
{
"b1": 72
}
],
"gpa": 7.4,
"overall": 70.3,
"eiop": 2
}
I would do it something like this:
{
grades: [
{ name: "a1",
gradepoint: 10,
percent: 90
},
{ name: "a1",
gradepoint: 10,
percent: 90
},
{ name: "c2",
gradepoint: 5,
percent: 45
},
...
],
gpa: 7.4,
overall: 70.3,
eiop: 2
}
Related data should be kept together in an object.
If it weren't for the duplicate a1 entries, I would probably make grades be an object, with the names as keys. But an object can't have duplicate keys, so it has to be put in the values.