Elasticsearch queries on "empty index" - exception

in my application I use several elasticsearch indices, which will contain no indexed documents in their initial state. I consider that can be called "empty" :)
The document's mapping is correct and working.
The application also has a relational database that contain entities, that MIGHT have documents associated in elasticsearch.
In the initial state of the appliation it is very common that there are only entities without documents, so not a single document has been indexed, therefore "empty index". The index has been created nevertheless and also the document's mapping has been put to the index and is present in the indexes metadata.
Anyway, when I query elasticsearch with a SearchQuery to find an document for one of the entities (the document contains an unique id from the entity), elasticsearch will throw an ElasticSearchException, that complains about no mapping present for field xy etc.
BUT IF I insert one single blank document into the index first, the query wont fail.
Is there a way to "initialize" an index in a way to prevent the query from failing and to get rid of the silly "dummy document workaround"?
UPDATE:
Plus, the workaround with the dummy doc pollutes the index, as for example a count query now returns always +1....so I added a deletion to the workaround as well...

Your questions lacks details and is not clear. If you had provided gist of your index schema and query, that would have helped. You should have also provided the version of elasticsearch that you are using.
"No mapping" exception that you have mentioned has nothing to do with initializing the index with some data. Most likely you are sorting on the field which doesn't exist. This is common if you are querying multiple indexes at once.
Solution: Solution is based on the version of elasticsearch. If you are on 1.3.x or lower then you should use ignore_unmapped. If you are on a version greater than 1.3.5 then you should use unmapped_type.
Click here to read official documentation.
If you are find the documentation confusing, then this example will make it clear:
Lets create two indexes testindex1 and testindex2
curl -XPUT localhost:9200/testindex1 -d '{"mappings":{"type1":{"properties":{"firstname":{"type":"string"},"servers":{"type":"nested","properties":{"name":{"type":"string"},"location":{"type":"nested","properties":{"name":{"type":"string"}}}}}}}}}'
curl -XPUT localhost:9200/testindex2 -d '{"mappings":{"type1":{"properties":{"firstname":{"type":"string"},"computers":{"type":"nested","properties":{"name":{"type":"string"},"location":{"type":"nested","properties":{"name":{"type":"string"}}}}}}}}}'
The only difference between these two indexes is - testindex1 has "server" field and textindex2 has "computers" field.
Now let's insert test data in both the indexes.
Index test data on testindex1:
curl -XPUT localhost:9200/testindex1/type1/1 -d '{"firstname":"servertom","servers":[{"name":"server1","location":[{"name":"location1"},{"name":"location2"}]},{"name":"server2","location":[{"name":"location1"}]}]}'
curl -XPUT localhost:9200/testindex1/type1/2 -d '{"firstname":"serverjerry","servers":[{"name":"server2","location":[{"name":"location5"}]}]}'
Index test data on testindex2:
curl -XPUT localhost:9200/testindex2/type1/1 -d '{"firstname":"computertom","computers":[{"name":"computer1","location":[{"name":"location1"},{"name":"location2"}]},{"name":"computer2","location":[{"name":"location1"}]}]}'
curl -XPUT localhost:9200/testindex2/type1/2 -d '{"firstname":"computerjerry","computers":[{"name":"computer2","location":[{"name":"location5"}]}]}'
Query examples:
Using "unmapped_type" for elasticsearch version > 1.3.x
curl -XPOST 'localhost:9200/testindex2/_search?pretty' -d '{"fields":["firstname"],"query":{"match_all":{}},"sort":[{"servers.location.name":{"order":"desc","unmapped_type":"string"}}]}'
Using "ignore_unmapped" for elasticsearch version <= 1.3.5
curl -XPOST 'localhost:9200/testindex2/_search?pretty' -d '{"fields":["firstname"],"query":{"match_all":{}},"sort":[{"servers.location.name":{"order":"desc","ignore_unmapped":"true"}}]}'
Output of query1:
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "testindex2",
"_type" : "type1",
"_id" : "1",
"_score" : null,
"fields" : {
"firstname" : [ "computertom" ]
},
"sort" : [ null ]
}, {
"_index" : "testindex2",
"_type" : "type1",
"_id" : "2",
"_score" : null,
"fields" : {
"firstname" : [ "computerjerry" ]
},
"sort" : [ null ]
} ]
}
}
Output of query2:
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "testindex2",
"_type" : "type1",
"_id" : "1",
"_score" : null,
"fields" : {
"firstname" : [ "computertom" ]
},
"sort" : [ -9223372036854775808 ]
}, {
"_index" : "testindex2",
"_type" : "type1",
"_id" : "2",
"_score" : null,
"fields" : {
"firstname" : [ "computerjerry" ]
},
"sort" : [ -9223372036854775808 ]
} ]
}
}
Note:
These examples were created on elasticserch 1.4.
These examples also demonstrate how to do sorting on nested fields.

Are you doing a sort when you search? I've run into the same issue ("No mapping found for [field] in order to sort on"), but only when trying to sort results. In that case, the solution is simply to add the ignore_unmapped: true property to the sort parameter in your query:
{
...
"body": {
...
"sort": [
{"field_name": {
"order": "asc",
"ignore_unmapped": true
}}
]
...
}
...
}
I found my solution here:
No mapping found for field in order to sort on in ElasticSearch

Related

Curl Get Specific value from the output

I have one curl command if I run it , output as below,
{
"page" : 1,
"records" : 1,
"total" : 1,
"rows" : [ {
"automated" : true,
"collectionProtocol" : "MagBead Standard Seq v2",
"comments" : "",
"copy" : false,
"createdBy" : "stest",
"custom1" : "User Defined Field 1=",
"custom2" : "User Defined Field 2=",
"custom3" : "User Defined Field 3=",
"custom4" : "User Defined Field 4=",
"custom5" : "User Defined Field 5=",
"custom6" : "User Defined Field 6=",
"description" : null,
"editable" : false,
"expanded" : false,
"groupName" : "99111",
"groupNames" : [ "all" ],
"inputCount" : 1,
"instrumentId" : 1,
"instrumentName" : "42223",
"jobId" : 11111,
"jobStatus" : "In Progress",
"leaf" : true,
"modifiedBy" : null,
"name" : "Copy_of_Test_Running2"
} ]
}
I want to extract only jobId`s value.
This output will be
11111
If there is multiple rows then, there is multiple jobId
11111
11112
11113
I want to extract only jobId and process in the while loop.
like below,
while read job; do
echo $job
done < < (curl command)
and I want to use that job id in another command.
That curl results could be multiple.
Do you have idea to get easy to extract curl output and make a while or for loop?
I think jq (thanks to #Mircea ) is a nice solution.
Besides, I can provide a simple awk solution only if the curl's output format is disciplinary and does not has any dirty symbol.
So, just be careful to use this:
while IFS= read -r line
do
echo $line|awk -F':' '/jobId/{split($2,a,",");for(i in a){if(a[i]){printf("%d\n",a[i])}}}'
done < "$file"

Need to extract the timestamp from a logstash elasticsearch cluster

I'm trying to determine the freshness of the most recent record in my logstash cluster, but I'm having a bit of trouble digesting the Elasticsearch DSL.
Right now I am doing something like this to extract the timestamp:
curl -sX GET 'http://localhost:9200/logstash-2015.06.02/' -d'{"query": {"match_all": {} } }' | json_pp | grep timestamp
which gets me;
"#timestamp" : "2015-06-02T00:00:28.371+00:00",
I'd like to use an elasticsearch query directly with no grep hackiness.
The raw JSON (snipped for length) looks like this:
{
"took" : 115,
"timed_out" : false,
"hits" : {
"hits" : [
{
"_index" : "logstash-2015.06.02",
"_source" : {
"type" : "syslog",
"#timestamp" : "2015-06-02T00:00:28.371+00:00",
"tags" : [
"sys",
"inf"
],
"message" : " 2015/06/02 00:00:28 [INFO] serf: EventMemberJoin: generichost.example.com 10.1.1.10",
"file" : "/var/log/consul.log",
"#version" : 1,
"host" : "generichost.example.com"
},
"_id" : "AU4xcf51cXOri9NL1hro",
"_score" : 1,
"_type" : "syslog"
},
],
"total" : 8605141,
"max_score" : 1
},
"_shards" : {
"total" : 50,
"successful" : 50,
"failed" : 0
}
}
Any help would be appreciated. I know the query is simple, I just don't know what it is.
You don't need to use the DSL for this. You can simply cram everything into the URL query string, like this:
curl -s XGET 'localhost:9200/logstash-2015.06.02/_search?_source=#timestamp&size=1&sort=#timestamp:desc&format=yaml'
So:
_source=#timestamp means we're only interested in getting the #timestamp value
size=1 means we only need one result
sort=#timestamp:desc means we want to sort on #timestamp descending (i.e. latest first)
format=yaml will get you the result in YAML format which is a bit more concise than JSON in your case
The output would look like this:
- _index: "logstash-2015.06.02"
_type: "syslog"
_id: "AU4xcf51cXOri9NL1hro"
_score: 1.0
_source:
#timestamp: "2015-06-02T00:00:28.371+00:00"
You don't need json_pp anymore, you can still simply grep #timestamp to get the data you need.
Note that in 1.6.0, there will be a way to filter out all the metadata (i.e. _index, _type, _id, _score) and only get the _source for a search result using the filter_path parameter in the URL.

Is there any way to preserve the order while generating patch for json files ?

I am new to Json stuff i.e. JSON PATCH.
I have scenario where I need to figure out between two version of Json files of same object, for that I am using json-patch-master.
But unfortunately the patch generated interpreting it differently i.e. the order differently hence getting unexpected/invalid results.
Could anyone help me how to preserve the order while generating Json Patch ?
**Here is the actual example.
Original Json file :**
[ {
"name" : "name1",
"roolNo" : "1"
}, {
"name" : "name2",
"roolNo" : "2"
}, {
"name" : "name3",
"roolNo" : "3"
}, {
"name" : "name4",
"roolNo" : "4"
} ]
**Modified/New Json file: i.e. removed 2nd node of original file.**
[ {
"name" : "name1",
"roolNo" : "1"
}, {
"name" : "name3",
"roolNo" : "3"
}, {
"name" : "name4",
"roolNo" : "4"
} ]
**Patch/Diff Generated :**
[ {"op":"remove","path":"/3"},
{"op":"replace","path":"/1/name","value":"name3"},
{"op":"replace","path":"/1/roolNo","value":"3"},
{"op":"replace","path":"/2/name","value":"name4"},
{"op":"replace","path":"/2/roolNo","value":"4"}]
Very time I generate Diff/Patch it is giving different path/diff results.
And moreover the interpretation is different i.e. order is not preserving.
**Is there any way to get expected results i.e. [ {"op":"remove","path":"/1"} ] , in other words generated a patch/diff based some order so will get what is expected. ?
How to handle this kind of scenario ?**
Please help me.
Thank you so much.
~Shyam
We are currently working on this issue in Starcounter-Jack/JSON-Patch.
It seems to work nice with native Array.Observe- http://jsfiddle.net/tomalec/p4s7aw96/.
Try Starcounter-Jack/JSON-Patch issues/65_ArrayObserve branch
we will release it as new version once shim and performance will be checked.
Feel free to add you comments at JSON-Patch issue board

How to index couchdb from elasticsearch server with the help of elasticsearch river plugin and hence get JSON data

I am working on graphical representation of data. The graph accepts JSON data,hence I need to fetch the required data from couchdb. I am using elasticsearch server for indexing couchdb and hence retrieve required data.
I am using elasticsearch river plugin to make couchdb and elasticsearch server together.
I have Created the CouchDB Database 'testdb' and created some test documents for the same.
Setup elasticsearch with the database.
On testing the same by writing CURl GET command with default search criteria, we must get 'total hits' more than 0 and the 'hits' must have some response value for searched criteria.
But we are getting 'total hits' as 0 and 'hits':[] (i.e. null)
Procedures I followed.
1) Downloaded and installed couchdb latest version
2) Verified CouchDB is running
curl localhost:5984
I got response that starts with:
{"couchdb":"Welcome"...
3) Downloaded ElasticSearch and installed service
service.bat install
curl http://127.0.0.1:9200
I got response as
{ "ok" : true, "status" : 200,.....
4) Installed the CouchDB River Plugin for ElasticSearch 1.4.2
plugin -install elasticsearch/elasticsearch-river-couchdb/2.4.1
5) To Create the CouchDB Database and ElasticSearch Index
curl -X PUT "http://127.0.0.1:5984/testdb"
6) To Create some test documents:
curl -X PUT "http://127.0.0.1:5984/testdb/1" -d "{\"name\":\"My
Name 1\"}"
curl -X PUT "http://127.0.0.1:5984/testdb/2" -d
"{\"name\":\"My Name 2\"}"
curl -X PUT
"http://127.0.0.1:5984/testdb/3" -d "{\"name\":\"My Name 3\"}"
curl
-X PUT "http://127.0.0.1:5984/testdb/4" -d "{\"name\":\"My Name 4\"}"
7) To Setup ElasticSearch with the Database
curl -X PUT "127.0.0.1:9200/_river/testdb/_meta" -d "{ \"type\" :
\"couchdb\", \"couchdb\" : { \"host\" : \"localhost\", \"port\" :
5984, \"db\" : \"testdb\", \"filter\" : null }, \"index\" : {
\"index\" : \"testdb\", \"type\" : \"testdb\", \"bulk_size\" :
\"100\", \"bulk_timeout\" : \"10ms\" } }"
8) To test it
curl "http://127.0.0.1:9200/testdb/testdb/_search?pretty=true"
on testing we should get this
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "testdb",
"_type" : "testdb",
"_id" : "4",
"_score" : 1.0, "_source" : {"_rev":"1-7e9376fc8bfa6b8c8788b0f408154584","_id":"4","name":"My Name 4"}
}, {
"_index" : "testdb",
"_type" : "testdb",
"_id" : "1",
"_score" : 1.0, "_source" : {"_rev":"1-87386bd54c821354a93cf62add449d31","_id":"1","name":"My Name"}
}, {
"_index" : "testdb",
"_type" : "testdb",
"_id" : "2",
"_score" : 1.0, "_source" : {"_rev":"1-194582c1e02d84ae36e59f568a459633","_id":"2","name":"My Name 2"}
}, {
"_index" : "testdb",
"_type" : "testdb",
"_id" : "3",
"_score" : 1.0, "_source" : {"_rev":"1-62a53c50e7df02ec22973fc802fb9fc0","_id":"3","name":"My Name 3"}
} ]
}
}
But I got something like this
{
"error" : "IndexMissingException[[testdb] missing]",
"status" : 404
}
This curl string doesn't need the additional testb. This:
curl "http://127.0.0.1:9200/testdb/testdb/_search?pretty=true"
Should be this:
curl 'http://localhost/testdb/_search?pretty=true'
You can view all your by running the following and ensuring your search is against one of your indices:
curl -X GET 'localhost:9200/_cat/indices'

MongoDB AND Comparison Fails

I have a Collection named StudentCollection with two documents given below,
> db.studentCollection.find().pretty()
{
"_id" : ObjectId("52d7c0c744b4dd77efe93df7"),
"regno" : 101,
"name" : "Ajeesh",
"gender" : "Male",
"docs" : [
"voterid",
"passport",
"drivinglic"
]
}
{
"_id" : ObjectId("52d7c6a144b4dd77efe93df8"),
"regno" : 102,
"name" : "Sathish",
"gender" : "Male",
"dob" : ISODate("2013-12-09T21:05:00Z")
}
Why does the below query returns a document when it doesn't fulfil the criteria which I gave in find command. I know it's a bad & stupid query for AND comparison. I tried this with MySQL and it doesn't return anything as expected but why does NOSQL makes problem. I hope it's considering the last field for comparison.
> db.studentCollection.find({regno:101,regno:102}).pretty()
{
"_id" : ObjectId("52d7c6a144b4dd77efe93df8"),
"regno" : 102,
"name" : "Sathish",
"gender" : "Male",
"dob" : ISODate("2013-12-09T21:05:00Z")
}
Can anyone brief why does Mongodb works this way?
MongoDB leverages JSON/BSON and names should be unique (http://www.ietf.org/rfc/rfc4627.txt # 2.2.) Found this in another post How to generate a JSON object dynamically with duplicate keys? . I am guessing the value for 'regno' gets overridden to '102' in your case.
If what you want is an OR query, try the following:
db.studentCollection.find ( { $or : [ { "regno" : "101" }, {"regno":"102"} ] } );
Or even better, use $in:
db.studentCollection.find ( { "regno" : { $in: ["101", "102"] } } );
Hope this helps!
Edit : Typo!
MongoDB converts your query to a Javascript document. Since you have not mentioned anything for $and condition in your document, your query clause is getting overwritten by the last value which is "regno":"102". Hence you get last document as result.
If you want to use an $and, you may use any of the following:
db.studentCollection.find({$and:[{regno:"102"}, {regno:"101"}]});
db.studentCollection.find({regno:{$gte:"101", $lte:"102"}});