What is difference between Post Tool and Index Handlers? - json

I have a JSON documents which needs to be indexed in Solr. The document looks like this:
{
"id":"1",
"prop":null,
"path":"1.parent",
"_childDocuments_":[
{
"id":"2",
"path":"2.parent.child"
}
]
}
It contains parent-child relationship structure denoted by _childDocuments_ key.
When I insert the documents in Solr via Post Tool, i.e., ./bin/post -c coreName data.json and query the Solr, then I get following response from Solr:
$ curl 'http://localhost:8983/solr/coreName/select?indent=on&q=*:*&wt=json'
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"on",
"wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[
{
"id":"1",
"path":["1.parent"],
"_childDocuments_.id":[2],
"_childDocuments_.path":["2.parent.child"],
"_version_":1566718833663672320}]
}}
But when I try to insert the same JSON document via Index Handler, i.e., curl -
$ curl "http://localhost:8983/solr/coreName/update?commit=true" -H 'Content-type:application/json' --data-binary "#1.json"
{"responseHeader":{"status":400,"QTime":56},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"Unknown command 'id' at [11]","code":400}}
I get SolrException. But if I put the JSON in array, then it shows another error, i.e.,
[{
"id":"1",
"prop": null,
"path":"1.parent",
"_childDocuments_":[
{
"id":"2",
"path":"2.parent.child"
}
]
}]
Error:
$ curl 'http://localhost:8983/solr/coreName/update?commit=true' -H 'Content-type:application/json' --data "#/home/knoldus/practice/solr/1.json"
{"responseHeader":{"status":500,"QTime":3},"error":{"trace":"java.lang.NullPointerException\n\tat org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.mapValueClassesToFieldType(AddSchemaFieldsUpdateProcessorFactory.java:370)\n\tat org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:288)\n\tat org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)\n\tat org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\n\tat org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)\n\tat org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\n\tat org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)\n\tat org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\n\tat org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)\n\tat org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\n\tat org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)\n\tat org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)\n\tat org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)\n\tat org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\n\tat org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)\n\tat org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:91)\n\tat org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:492)\n\tat org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:139)\n\tat org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:115)\n\tat org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:78)\n\tat org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat java.lang.Thread.run(Thread.java:745)\n","code":500}}
So, I have to remove "prop": null as well or make it an empty string, like this:
[{
"id":"1",
"prop": "",
"path":"1.parent",
"_childDocuments_":[
{
"id":"2",
"path":"2.parent.child"
}
]
}]
After making these modifications, when I insert the JSON doc. in Solr via curl, then it works fine.
$ curl 'http://localhost:8983/solr/coreName/update?commit=true' -H 'Content-type:application/json' --data "#/home/knoldus/practice/solr/1.json"
{"responseHeader":{"status":0,"QTime":851}}
And I get following response from Solr query:
$ curl 'http://localhost:8983/solr/coreName/select?indent=on&q=*:*&wt=json'
{
"responseHeader":{
"status":0,
"QTime":1,
"params":{
"q":"*:*",
"indent":"on",
"wt":"json"}},
"response":{"numFound":2,"start":0,"docs":[
{
"id":"2",
"path":["2.parent.child"]},
{
"id":"1",
"path":["1.parent"],
"_version_":1566719240059224064}]
}}
But here again I see a difference that _childDocuments_ have been indexed as separate documents.
So, I have following questions on two different methods of data indexing in Solr:
Why Post Tool ./bin/post does not index _childDocuments_ separately like Request Handler /update?
Why Request Handler /update requires JSON document to be wrapped in array?
And last, why Request Handler /update cannot handle null values whereas Post Tool can?

PostTool is just some small java utility that reads the input you give to it, and sends it to Solr. It is not running inside Solr. I have not looked but I am pretty sure:
it does not detect childDocuments as a special value and sends it just like any other nested json
as explained in the docs, /update requires the array, you can send individual json doc to /update/json/docs
PostTool is converting the null to either "" and just ignoring the field

Related

copy contents from sample json file to index elastic aearch index json file

elastic search version i am using 6.6.1
i have created index by running following command
curl -XPUT http://localhost:9200/incident_422? -H 'Content-Type: application/json' -d #elasticsearch.json
i need to update the index file with sample json data.(sample.json)
{
"properties": {
"id185": {
"type": "byte"
},
"id255": {
"type": "text"
},
"id388": {
"type": "text"
}
}
}
I tried running the command
curl -XPUT http://localhost:9200/incident_422/mapping/_doc? -H 'Content-Type: application/json' -d #sample.json
but i get the error message saying that
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Rejecting mapping update to [incident_422] as the final mapping would have more than 1 type: [mapping, doc]"}]
i have read somewhere that ELK 6 doesnt support more than two types.
Could anyone please tell me how this can be achieved without downgrading the version
This seems to related to the removal of mapping type, you need to specify the type name while indexing the documents.
try adding type to your index request aka http://localhost:9200/incident_422/<your-type-name> in your URL and it should solve the issue.

ampUrls batchGet JSON format

I do not understand the documentation for the JSON package defining the URLs. I am using curl. I am able to pass my key to the API. I am having trouble with the url data. Below is what I have tried without success. Any help would be appreciated.
enter code here
--data '{
"lookupStrategy": "FETCH_LIVE_DOC",
"urls": [
"originalURL":"https://www.myurl.com/index.html", \
"ampURL":"https://www.myurl.com/index.html", \
"cdnAmpUrl":"https://www-myurl-com.cdn.ampproject.org/c/s/index.html"
]
}'
Your JSON should be:
{
"lookupStrategy": "FETCH_LIVE_DOC",
"urls": [
"https://www.myurl.com/index.html",
"https://www.myurl.com/index.html",
"https://www-myurl-com.cdn.ampproject.org/c/s/index.html"
]
}
Looks like you are using the return data schema.

CURL Get download link from request and download file

I'm using conversocial API:
https://api-docs.conversocial.com/1.1/reports/
Using the sample from the documentation, as after all tweaks I receive this "output"
{
"report": {
"name": "dump", "generation_start_date": "2012-05-30T17:09:40",
"url": "https://api.conversocial.com/v1.1/reports/5067",
"date_from": "2012-05-21",
"generated_by": {
"url": "https://api.conversocial.com/v1.1/moderators/11599",
"id": "11599"
},
"generated_date": "2012-05-30T17:09:41",
"channel": {
"url": "https://api.conversocial.com/v1.1/channels/387",
"id": "387"
},
"date_to": "2012-05-28",
"download": "https://s3.amazonaws.com/conversocial/reports/70c68360-1234/#twitter-from-may-21-2012-to-may-28-2012.zip",
"id": "5067"
}
}
Currently, I can sort this JSON output to download only and will receive this output
{
"report" : {
"download" : "https://s3.amazonaws.com/conversocial/reports/70c68360-1234/#twitter-from-may-21-2012-to-may-28-2012.zip"
}
}
Is it anyway of automating this process by using CURL, to make curl download this file?
To download I'm planning to use simple way as:
curl URL_LINK > FILEPATH/EXAMPLE.ZIP
Currently thinking is there is a way to replace URL_LINK with download link?? Or any other way, method, way around????
Give a try to this:
curl $(curl -s https://httpbin.org/get | jq ".url" -r) > file
Just replace your url and the jq params, based in your json, thay may be:
jq ".report.download" -r
The -r will remove the double quotes "
The way it works is by using a command substitution $():
$(curl -s https://httpbin.org/get | jq ".url" -r)
This will fetch you URL and extract the new URL from the returned JSON using jq the one later is passed to curl as an argument.

Error when posting JSON using NiFi vs. curl

I am seeing a very slight difference between how NiFi's InvokeHTTP processor POSTs json data and how curl does it.
The problem is that the data APPEARS to be the same when I log it ... but the data is rendering differently.
Does anyone have any idea what could be wrong? Thank you!
CURL -- works; correct printout & render
curl -X POST -H "Content-Type: application/json" -d '{ "responseID": "a1b2c3", "responseData": { "signals": [ "a", "b", "c" ] } } localhost:8998/userInput
WebServer app printout
responseID: a1b2c3
responseData: {signals=[a, b, c]}
Template render
NiFi -- does not work; correct printout BUT incorrect render
Generate FlowFile
UpdateAttributes
AttributesToJSON
InvokeHTTP
WebServer app printout
responseID: a1b2c3
responseData: {signals=[a, b, c]}
Template render
you need this kind of json:
{ "responseID": "a1b2c3", "responseData": { "signals": [ "a", "b", "c" ] } }
but in nifi you building this:
{ "responseID": "a1b2c3", "responseData": "{ signals=[ a, b, c ] }" }
it means that you create responseData just as a string "{ signals=[ a, b, c ] }" but you need an object
in nifi the AttributesToJSON processor creates only one level object, so you can create a sequence of AttributesToJSON -> EvaluateJsonPath -> AttributesToJSON to make nested json objects.
or use ExecuteStript with javascript or groovy language - both has good syntax to build json.

deleteByQuery working via XML call but not JSON for multi condition query

Is Solr's delete query syntax different when passing JSON data vs XML data? Solr's docs are rather vague. I'm using Solr 5.0.0 on Mac OSX on Java 1.8.
Here are the curl commands on my local box.
curl -v http://localhost:8983/solr/nZ/update -H "Content-Type: application/json" --data-binary
[
{
"delete": {
"query":"UserId:5629499534213120 AND SessionId:5066549580791808 AND Kind:event"
}
}
]
This outputs:
{
"responseHeader": {
"status": 400,
"QTime": 2
},
"error": {
"msg": "Document is missing mandatory uniqueKey field: Id",
"code": 400
}
}
Running it via XML works:
curl -v http://localhost:8983/solr/nZ/update -H "Content-Type: text/xml" --data-binary '
<delete>
<query>UserId:5629499534213120 AND SessionId:5066549580791808 AND Kind:event</query>
</delete>
'
This deletes the documents and outputs:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">57</int></lst>
</response>
I also queried the documents I'm trying to delete. There were only 2 of them and they both had the Id fields. Id is a string and the unique key for the schema. Is the query syntax for a multi condition different for JSON than XML?
JSON update format in Solr is a bit picky. And it seems you mixed two very-similar looking structures. The array outer [] structure is for when you are just submitting documents with no commands.
The object {} outer structure is when you have commands and documents.
You have mixed those two. Try deleting outer array structure and see if it helps. Notice that Solr does allow/require duplicate keys within the object for this second format, which some libraries can't handle generating. In which case, you may be limited to only one instance of the same update command per document.