Solr Mulivalued Problem - json

Consider The following is the json response i'm getting from the solr if i use multivalued = true for the fields.
{
"id":["1","2","3"],
"TS":["2010-06-28 00:00:00.0","2010-06-28 00:00:00.0","2010-06-28 00:00:00.0"],
"Type":["VIDEO","IMAGE","VIDEO"]
}
but i need the response like this
{
"0":["1","2010-06-28 00:00:00.0","VIDEO"],
"1":["2","2010-06-28 00:00:00.0","IMAGE"],
"2":["3","2010-06-28 00:00:00.0","VIDEO"]
}
How can i get this.Any help would be appreciated. Thanks in advance.
**Update :**
Actually at the first level its not a problem. When we are going
more than one level then only the
problem arises. right now i'm putting
the entire response here to make it
clear.
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"facet":"true",
"indent":"on",
"start":"0",
"q":"laptop",
"wt":["json",
"json"],
"rows":"200"}},
"response":{"numFound":1,"start":0,"docs":[
{
"createdBy":"0",
"id":194,
"status":"ACTIVE",
"text":"Can i buy Sony laptop?",
"ansTS":["2010-07-01 00:00:00.0","2010-08-06 15:11:55.0","2010-08-11 15:28:13.0","2010-08-11 15:30:49.0","2010-08-12 01:45:48.0","2010-08-12 01:46:18.0"],
"mediaType":["VIDEO","VIDEO","VIDEO"],
"ansId":["59","76","77","78","80","81"],
"mediaId":[24,25,26],
]},
]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"catName":[]},
"facet_dates":{}}}
look at the mediaId , mediatype ,ansTS arrays. Its one to many relationship.But they are grouped by column names.Thanks in advance.

You mentioned that you will consume this JSON from a browser. So you can use jQuery or any other javascript library to convert the raw Solr JSON response into the structure that you need.

If the first snippet is the actual solr response you're getting, then chances are you have a bug in your feeder (connector/crawler/etc). It looks like you only have one indexed document (that matches your query), which has all the values that you expect from 3 documents.
Assuming you have 3 documents, analogous with your expected output, then the actual solr wt=json result would contain:
[{
"id":"1",
"TS":"2010-06-28 00:00:00.0",
"Type":"VIDEO"
},
{
"id":"2",
"TS":"2010-06-28 00:00:00.0",
"Type":"IMAGE"
},
{
"id":"3",
"TS":"2010-06-28 00:00:00.0",
"Type":"VIDEO"
}]
If this assumption is correct, then I would suggest looking over your indexing logic.

This output is produced by Solr's JSONResponseWriter. Its output can't be altered via configuration. But what you can do is create your own version of JSONResponseWriter to produce your desired output. You can registered your new ResponseWriter by adding a queryResponseWriter tag in solrconfig.xml.

Related

NiFi JsonRecordSetWriter skipping a nested field

I am pulling messages out of a Kafka topic into NiFi, and am seeing problems with the JsonRecordSetWriter not outputting a nested field. If I switch from that writer to a CSV writer I don't have the same problem, but both the Avro and XML writers have the same problem, so it's a problem with the inferred schema, I believe.
Here is my simplified input:
{
"data": [
{
"object": {
"extensions": {
"field1": "TS",
"field2": "howdy"
}
}
},
{
"object": {
"extensions": {
"field1": "TT",
"field3": "something"
}
}
}
]
}
And the output:
[ {
"data" : [ {
"object" : {
"extensions" : {
"field1" : "TS",
"field2" : "howdy"
}
}
}, {
"object" : {
"extensions" : {
"field1" : "TT",
"field2" : null
}
}
} ]
} ]
If I use the CSV writer I get field1 and 2 for the first record and field 1 and 3 for the second record, so the JsonTreeReader is getting the data read from Kafka correctly, it's the JsonRecordSetWriter that isn't writing it out right. Looks like the schema inference engine is reading the first record in the array for it's schema, and then outputting based on that. Field2 is output regardless that it doesn't exist in record2, and field3 is ignored since it didn't exist in record1.
Any suggestions from folks who know more than I?
Thanks in advance for any assistance!
I solved my problem, but didn't figure out why it was happening, unfortunately. The way I solved it was to switch from the ConsumeKafkaRecord processor to the ConsumeKafka processor, which doesn't use the JsonTreeReader and JsonRecordSetWriter controllers to process Json. Since the values I was pulling out of Kafka were already in Json, getting them as a string and going from there (adding an application/json mime.type to ensure they were correctly treated) worked just fine for me.
The problem could also have been solved by creating a schema which contained every field possible, but that would have resulted in a lot of null fields, as the records are sparsely populated - and it was further complicated by the fact that one of my fields started with a symbol, and NiFi uses Avro-formatted schemas everywhere, so I would have had to work around that (a bug fix in NiFi 1.7 allows it, but it has limitations also).
So I'm on my way - not sure if this experience will help others, but if it does, great!
I had almost the same problem with a transformRecord with common jsons. In short, the flow listed an entry in an application event s3 and transformed the jonsons of those events into 2 types:
1 - For an athena table.
2 - For an HTTP call.
Each event file contained more than 200 events, and in the middle of these events it contained one that had a nesting called "traits" and the information from that traits never came (it was an event that rarely came in this "set" of events, about 1 to 2% of the events in the archive, but they are extremely important).
I had to modify the beginning of my flow to make it work.
The flow was:
ListS3 -> FetchS3Object -> JoltTransformRecord -> SplitJson -> EvaluateJsonPath
And stayed:
ListS3 -> FetchS3Object -> ConvertRecord -> SplitJson -> JoltTransformJson -> EvaluateJsonPath
And now it works perfectly.
The #Chrick solution helped me a lot to identify what the error was and, as my problem is a little different, I hope this can help someone.

JSON Deserialization on Talend

Trying to figuring out how to deserialize this kind of json in talend components :
{
"ryan#toofr.com": {
"confidence":119,"email":"ryan#toofr.com","default":20
},
"rbuckley#toofr.com": {
"confidence":20,"email":"rbuckley#toofr.com","default":15
},
"ryan.buckley#toofr.com": {
"confidence":18,"email":"ryan.buckley#toofr.com","default":16
},
"ryanbuckley#toofr.com": {
"confidence":17,"email":"ryanbuckley#toofr.com","default":17
},
"ryan_buckley#toofr.com": {
"confidence":16,"email":"ryan_buckley#toofr.com","default":18
},
"ryan-buckley#toofr.com": {
"confidence":15,"email":"ryan-buckley#toofr.com","default":19
},
"ryanb#toofr.com": {
"confidence":14,"email":"ryanb#toofr.com","default":14
},
"buckley#toofr.com": {
"confidence":13,"email":"buckley#toofr.com","default":13
}
}
This JSON comes from the Toofr API where documentation can be found here .
Here the actual sitation :
For each line retreived in the database, I call the API and I got this (the first name, the last name and the company change everytime.
Does anyone know how to modify the tExtractJSONField (or use smthing else) to show the results in tLogRow (for each line in the database) ?
Thank you in advance !
EDIT 1:
Here's my tExtractJSONfields :
When using tExtractJSONFields with XPath, you need
1) a valid XPath loop point
2) valid XPath mapping to your structure relative to the loop path
Also, when using XPath with Talend, every value needs a key. The key cannot change if you want to loop over it. Meaning this is invalid:
{
"ryan#toofr.com": {
"confidence":119,"email":"ryan#toofr.com","default":20
},
"rbuckley#toofr.com": {
"confidence":20,"email":"rbuckley#toofr.com","default":15
},
but this structure would be valid:
{
"contact": {
"confidence":119,"email":"ryan#toofr.com","default":20
},
"contact": {
"confidence":20,"email":"rbuckley#toofr.com","default":15
},
So with the correct data the loop point might be /contact.
Then the mapping for Confidence would be confidence (the name from the JSON), the mapping for Email would be email and vice versa for default.
EDIT
JSONPath has a few disadvantages, one of them being you cannot go higher up in the hierarchy. You can try finding out the correct query with jsonpath.com
The loop expression could be $.*. I am not sure if that will satisfy your need, though - it has been a while since I've been using JSONPath in Talend because of the downsides.
I have been ingesting some complex json structures and did this via minimal json libraries, and tjava components within talend.

Updating JSON arrays in MarkLogic 9

I'm having trouble working out how to write a bit of XQuery. I have a JSON structure in MarkLogic that looks like:
{
"id": "pres003A10",
"title": "A Course About Something",
"description": "This course teaches people about some things they may not know.",
"author": "A.N. Author",
"updated": "2007-01-19",
"decks": [
{
"id":"really-basic-stuff",
"exclude": ["slide3", "slide12"]
},
{
"id":"cleverer-stuff",
"exclude": []
}
]
}
The exclude array contains the identifiers for slides in decks (presentations are made up of one or more decks of slides). I'm trying to write a piece of code that will look for a slide id in that exclude list and remove it if present or add it if not (a toggle).
I can obtain the array node itself using:
let $exclude := doc('/presentations/presentation.json')/object-node()/decks[id = 'markup-intro']/array-node('exclude')
but I can't for the life of me see how I then update that array to either remove an item or add it. The intention is call a function something like:
local:toggle-slide($presentation) as object-node()
{
(: xdmp:node-update(...) goes here :)
};
So, how do I update that array?
In memory JSON node trees (and XML trees, for that matter) are immutable.
The way to modify a tree is to construct a new tree, copying the nodes that haven't changed and creating the parent node and ancestor node with the changes.
That said, there's an easier way to modify JSON. If you call xdmp:from-json() on the root node, you will get a mutable in-memory map / array structure.
You can then navigate to the array using map:get() on the maps and [ITEM_NUMBER] on the arrays and delete or insert items FOR the appropriate json:array object.
When you're done, call xdmp:to-json() to turn the root map back into a node.
Hoping that helps,
If you need to update the json in the database, you can use xdmp:node-replace. The catch with node-replace is though, that you have to feed it with a named node. To do that, you need to wrap the array-node in an object-node, and then grab the array-node inside the object-node on the fly. Here a working example:
xquery version "1.0-ml";
(: insert test data :)
xdmp:document-insert("/presentations/presentation.json", xdmp:unquote('{
"id": "pres003A10",
"title": "A Course About Something",
"description": "This course teaches people about some things they may not know.",
"author": "A.N. Author",
"updated": "2007-01-19",
"decks": [
{
"id":"markup-intro",
"exclude": ["slide3", "slide12"]
},
{
"id":"cleverer-stuff",
"exclude": []
}
]
}'
))
;
(: node-replace array-node :)
let $exclude := doc('/presentations/presentation.json')/object-node()/decks[id = 'markup-intro']/array-node('exclude')
return xdmp:node-replace($exclude, object-node{
"exclude": array-node{ "other", "slides" }
}/node())
;
(: view if changed :)
doc('/presentations/presentation.json')
Note: consider looking at MarkLogic's Server-side JavaScript (SJS) support. Updating JSON might seem more natural that way, particularly if you need to make multiple changes in one go.
HTH!

Couchbase View not returning array value

I am trying to create a view to group on a particular attribute inside an array. However, the below map function is not returning any result.
JSON Document Structure :
{
"jsontype": "survey_response",
"jsoninstance": "xyz",
"jsonlanguage": "en_us",
"jsonuser": "test#test.com",
"jsoncontact": "test#mayoclinic.com",
"pages": [
{
q-placeholder": "q1-p1",
q:localized": "q1-localized-p1",
q-answer-placeholder": "jawaabu121",
q-answer-localized": "localized jawaabu1"
},
{
q-placeholder": "q2-p2",
q:localized": "q2-localized-p2",
q-answer-placeholder": "jawaabu221",
q-answer-localized": "localized jawaabu2"
},
{
"q-placeholder": "q3-p3",
"q:localized": "q3-localized-p3",
"q-answer-placeholder": "jawaabu313",
"q-answer-localized": "localized jawaabu3"
}
]
}
Map Function :
function(doc, meta){
emit(doc.jsoninstance,[doc.pages[0].q-placeholder, doc.pages[0].q-localized,doc.pages[0].q-answer-placeholder,q-answer-localized]);
}
It looks like you made a typo at the end of your emit statement:
doc.pages[0].q-answer-placeholder,q-answer-localized.
Instead q-answer-localized should be changed to doc.pages[0].q-answer-localized.
Further to this it seems that you have defined a field as q-localized in your emit statement, but actually according to the sample document that you posted this should actually be q:localized, I assume that this was a mistake in the snippet of the document and not the emit statement, but if not then will also need amending.
I would imagine errors like this would be flagged up in the view engine's map-reduce error log, in future you should check this log so that you will be able to debug errors like this yourself.
The location of the mapreduce_errors log can be found in the Couchbase documentation

Ember-Data: How to get properties from nested JSON

I am getting JSON returned in this format:
{
"status": "success",
"data": {
"debtor": {
"debtor_id": 1301,
"key": value,
"key": value,
"key": value
}
}
}
Somehow, my RESTAdapter needs to provide my debtor model properties from "debtor" section of the JSON.
Currently, I am getting a successful call back from the server, but a console error saying that Ember cannot find a model for "status". I can't find in the Ember Model Guide how to deal with JSON that is nested like this?
So far, I have been able to do a few simple things like extending the RESTSerializer to accept "debtor_id" as the primaryKey, and also remove the pluralization of the GET URL request... but I can't find any clear guide to reach a deeply nested JSON property.
Extending the problem detail for clarity:
I need to somehow alter the default behavior of the Adapter/Serializer, because this JSON convention is being used for many purposes other than my Ember app.
My solution thus far:
With a friend we were able to dissect the "extract API" (thanks #lame_coder for pointing me to it)
we came up with a way to extend the serializer on a case-by-case basis, but not sure if it really an "Ember Approved" solution...
// app/serializers/debtor.js
export default DS.RESTSerializer.extend({
primaryKey: "debtor_id",
extract: function(store, type, payload, id, requestType) {
payload.data.debtor.id = payload.data.debtor.debtor_id;
return payload.data.debtor;
}
});
It seems that even though I was able to change my primaryKey for requesting data, Ember was still trying to use a hard coded ID to identify the correct record (rather than the debtor_id that I had set). So we just overwrote the extract method to force Ember to look for the correct primary key that I wanted.
Again, this works for me currently, but I have yet to see if this change will cause any problems moving forward....
I would still be looking for a different solution that might be more stable/reusable/future-proof/etc, if anyone has any insights?
From description of the problem it looks like that your model definition and JSON structure is not matching. You need to make it exactly same in order to get it mapped correctly by Serializer.
If you decide to change your REST API return statement would be something like, (I am using mock data)
//your Get method on service
public object Get()
{
return new {debtor= new { debtor_id=1301,key1=value1,key2=value2}};
}
The json that ember is expecting needs to look like this:
"debtor": {
"id": 1301,
"key": value,
"key": value,
"key": value
}
It sees the status as a model that it needs to load data for. The next problem is it needs to have "id" in there and not "debtor_id".
If you need to return several objects you would do this:
"debtors": [{
"id": 1301,
"key": value,
"key": value,
"key": value
},{
"id": 1302,
"key": value,
"key": value,
"key": value
}]
Make sense?