extract-document-data comes as xml string element in json output

extract-document-data comes as xml string element in json output - json

I am trying to enrich my search results with some elements taken from the "matching" documents, using the query option "extract-document-data" like
<options xmlns="http://marklogic.com/appservices/search">
<extract-document-data selected="include">
<extract-path>/language-version/language-version-canonical-model/title</extract-path>
<extract-path>/language-version/language-version-canonical-model/language</extract-path>
</extract-document-data>
(...)
</options>
When I run the search and I ask for Json output (using the header Accept: application/json) I get as mix of json and "strinxml" as a result:
{
"snippet-format": "snippet",
"total": 564,
"start": 1,
"page-length": 10,
"selected": "include",
"results": [
{
"index": 1,
"uri": "ENV/CHEM/NANO(2015)22/ANN5/2",
"path": "fn:doc(\"ENV/CHEM/NANO(2015)22/ANN5/2\")",
(...)
"matches": [
{
"path": "fn:doc(\"ENV/CHEM/NANO(2015)22/ANN5/2\")/ns2:language-version/ns2:language-version-raw-data/*:document/*:page[22]",
(...)
}
],
"extracted": {
"kind": "element",
"content": [
"<language>En</language>",
"<title>ZINC OXIDE DOSSIERANNEX 5</title>",
"<reference>ENV/CHEM/NANO(2015)22/ANN5</reference>",
"<classification>2</classification>",
"<modificationDate>2015-04-16T00:00:00.000+02:00</modificationDate>",
"<subject label_en=\"media\" >media</subject>",
"<subject label_en=\"fish\" ">fish</subject>",
]
}
},
The problem here is with the "extracted" part, as you can see, it looks like the xml elements have been simply copied as string, when I would really expect them to be converted to json.
Does anybody have an idea about this problem?

MarkLogic won’t convert content. So, XML will remain XML when asking for JSON formatted search response. And since you can't really embed XML inside JSON, it gets serialized as a string.
You could try applying a REST transform on your search results, and use something like json:transform-to-json (probably with the custom config) to convert those on the fly. For instance something like this Server-side JavaScript transform:
/* jshint node:true,esnext:true */
/* global xdmp */
var json = require('/MarkLogic/json/json.xqy');
var config = json.config('custom');
function toJson(context, params, content) {
'use strict';
var response = content.toObject();
if (response.results) {
response.results.map(function(result) {
if (result.extracted && result.extracted.content) {
result.extracted.content.map(function(content, index) {
if (content.match(/^</) && !content.match(/^<!/)) {
result.extracted.content[index] = json.transformToJson(xdmp.unquote(content), config);
}
});
}
});
}
return response;
}
exports.transform = toJson;
You could also convert client-side of course.
HTH!

If you're using the Java Client API you can use the correct handle for each result to read the extracted items (see ExtractedResult and ExtractedItem):
SearchHandle results = queryManager.search(query, new SearchHandle());
for (MatchDocumentSummary summary : results.getMatchResults()) {
ExtractedResult extracted = summary.getExtracted();
// here we check to see if this result is XML format, and if so
// we use org.w3c.dom.Document
if ( Format.XML == summary.getFormat() ) {
for (ExtractedItem item : extracted) {
Document extractItem = item.getAs(new DOMHandle()).get();
...
}
// or if the result is JSON we could choose a different handle
} else if ( Format.JSON == summary.getFormat() ) {
for (ExtractedItem item : extracted) {
JsonNode extractItem = item.getAs(JsonNode.class);
...
}
}
}

Related

Remove a specific JSONObject from JSONArray in groovy

Say I have a JSON request payload like
{
"workflow": {
"approvalStore": {
"sessionInfo": {
"user": "baduser"
},
"guardType": "Transaction"
}
}
}
I get the value of user via
def user = req.get("workflow").get("approvalStore").get("sessionInfo").get("user")
Now, I get a RestResponse approvalList which I store as list and return to caller as return approvalList.json as JSON. All well so far.
Suppose the response (approvalList.json) looks like below JSONArray -
[
{
"objId": "abc2",
"maker": "baduser"
},
{
"objId": "abc1",
"maker": "baduser"
},
{
"objId": "abc4",
"maker": "gooduser"
}
]
Question : How may I filter the approvalList.json so that it doesn't contain entries (objects) that have "maker": "baduser" ? The value passed to maker should essentially be the user variable I got earlier.
Ideal required output -

It's not entirely clear if you always want a single object returned or a list of objects but using collect is going to be the key here:
// given this list
List approvalList = [
[objId: "abc2", maker: "baduser"],
[objId: "abc1", maker: "baduser"],
[objId: "abc4", maker: "gooduser"]
]
// you mentioned you wanted to match a specific user
String user = "baduser"
List filteredList = approvalList.findAll{ it.maker != user}
// wasn't sure if you wanted a single object or a list...
if (filteredList.size() == 1) {
return filteredList[0] as JSON
} else {
return filteredList as JSON
}

Pretty simple. First parse the JSON into an object, then walk through and test.
JSONObject json = JSON.parse(text)
json.each(){ it ->
it.each(){ k,v ->
if(v=='baduser'){
// throw exception or something
}
}
}

Unable to parse JSON data due to errors or undefined data

I apologize if this seems similar to other questions asked but I have not been able to find any posts that have resolved this issue for me. Basically, I am getting a JSON object and I am trying to parse it but I can't parse it correctly. Mainly the WordDetails section that I am getting from a Word API. I am able to get everything outside the results section under WordDetails. Basically, when I get to results, I am not able to parse it correctly. Below is an example of the format.
{
"LastIndex": 133,
"SRDWords": [
{
"Domain": {
"URL": "abactinal.com",
"Available": true
},
"WordDetails": "{\"word\":\"abactinal\",\"results\":[{\"definition\":\"(of radiate animals) located on the surface or end opposite to that on which the mouth is situated\",\"partOfSpeech\":null,\"antonyms\":[\"actinal\"]}],\"syllables\":{\"count\":4,\"list\":[\"ab\",\"ac\",\"ti\",\"nal\"]}}"
},
{
"Domain": {
"URL": "aaronical.com",
"Available": true
},
"WordDetails": "{\"word\":\"aaronical\",\"syllables\":{\"count\":4,\"list\":[\"aa\",\"ron\",\"i\",\"cal\"]},\"pronunciation\":{\"all\":\"ɜ'rɑnɪkəl\"}}"
},
...
Here is my code below. Basically, I am getting to the results section of WordDetails but if I try to parse the results section it fails and if I try object.entries on it, it will not return a response according to the alert messages I used. I know there must be a better way but not sure what. Most articles say just JSON.parse then map it but that does not work. Any help would be appreciated!
data.Words.map(word => {
//get data
for (let [key, value] of Object.entries(word)) {
if (key === "Domain") {
url = value.URL;
availability = value.Available;
} else if (key.trim() === "WordDetails") {
alert("value " + value);
wDetails = JSON.parse(value);
for (let [key2, value2] of Object.entries(wDetails)) {
if (key2 === "word") {
//store word
} else if (key2.toString().trim() === "results") {
let test = JSON.parse(value2);
test = Object.entries(value2);
test.map(t => {
alert(t.definition);
});
}
}
}
}
});

You did JSON.parse above, no need to parse value2 again.
And value for results is an array, so no need for Object.entries.
...
} else if (key2.toString().trim() === 'results') {
let test = JSON.parse(value2); // this should be remove
test = Object.entries(value2); // this should be remove, value2 should be an array
// map value2 directly
value2.map(t => {
alert(t.definition);
});
}
...

How to access data inside a complex JSON object in Dart?

I use a WebSocket to communicate to a server in my Flutter app. Let's say I receive a JSON object trough the WebSocket :
{
"action": "getProduct",
"cbackid": 1521474231306,
"datas": {
"product": {
"Actif": 1,
"AfficheQte": 0,
"Article": "6"
},
"result": "success"
},
"deviceID": "4340a8fdc126bb59"
}
I have no idea what the content of datas will be until I read the action, and even then, it's not guaranteed to be the same every time. One example of a changing action/datas is when the product doesn't exist.
I can parse it in a Map<String, Object>, but then, how do I access what's inside the Object?
What's the correct way to read this data?

Not sure what the question is about, but you can check the type of the values and then continue accordingly
if(json['action'] == 'getProduct') {
var datas = json['datas'];
if(datas is List) {
var items = datas as List;
for(var item in items) {
print('list item: $item');
}
} else if (datas is Map) {
var items = datas as Map;
for(var key in items.keys) {
print('map item: $key, ${items[key]}');
}
} else if(datas is String) {
print('datas: $datas');
} // ... similar for all other possible types like `int`, `double`, `bool`, ...
}
You also can make that recursive to check list or map values if they are String, ...

Creating CSV view from CouchDB

I know this should be easy, but I just can't work out how to do it despite having spent several hours looking at it today. There doesn't appear to be a straightforward example or tutorial online as far as I can tell.
I've got several "tables" of documents in a CouchDB database, with each "table" having a different value in a "schema" field in the document. All documents with the same schema contain an identical set of fields. All I want to do is be able to view the different "tables" in CSV format, and I don't want to have to specify the list of fieldnames in each schema.
The CSV output is going to be consumed by an R script, so I don't want any additional headers in the output if I can avoid them; just the list of fieldnames, comma separated, with the values in CSV format.
For example, two records in the "table1" format might look like:
{
"schema": "table1",
"field1": 17,
"field2": "abc",
...
"fieldN": "abc",
"timestamp": "2012-03-30T18:00:00Z"
}
and
{
"schema": "table1",
"field1": 193,
"field2": "xyz",
...
"fieldN": "ijk",
"timestamp": "2012-03-30T19:01:00Z"
}
My view is pretty simple:
"all": "function(doc) {
if (doc.schema == "table1") {
emit(doc.timestamp, doc)
}
}"
as I want to sort my records in timestamp order.
Presumably the list function will be something like:
"csv": "function(head, req) {
var row;
...
// Something here to iterate through the list of fieldnames and print them
// comma separated
for (row in getRow) {
// Something here to iterate through each row and print the field values
// comma separated
}
}"
but I just can't get my head around the rest of it.
If I want to get CSV output looking like
"timestamp", "field1", "field2", ..., "fieldN"
"2012-03-30T18:00:00Z", 17, "abc", ..., "abc"
"2012-03-30T19:01:00Z", 193, "xyz", ..., "ijk"
what should my CouchDB list function look like?
Thanks in advance

The list function that works with your given map should look something like this:
function(head,req) {
var headers;
start({'headers':{'Content-Type' : 'text/csv; charset=utf-8; header=present'}});
while(r = getRow()) {
if(!headers) {
headers = Object.keys(r.value);
send('"' + headers.join('","') + '"\n');
}
headers.forEach(function(v,i) {
send(String(r.value[v]).replace(/\"/g,'""').replace(/^|$/g,'"'));
(i + 1 < headers.length) ? send(',') : send('\n');
});
}
}
Unlike Ryan's suggestion, the fields to include in the list are not configurable in this function, and any changes in order or included fields would have to be written in. You would also have to rewrite any quoting logic needed.

Here some generic code that Max Ogden has written. While it is in node-couchapp form, you probably can get the idea:
var couchapp = require('couchapp')
, path = require('path')
;
ddoc = { _id:'_design/csvexport' };
ddoc.views = {
headers: {
map: function(doc) {
var keys = [];
for (var key in doc) {
emit(key, 1);
}
},
reduce: "_sum"
}
};
ddoc.lists = {
/**
* Generates a CSV from all the rows in the view.
*
* Takes in a url encoded array of headers as an argument. You can
* generate this by querying /_list/urlencode/headers. Pass it in
* as the headers get parameter, e.g.: ?headers=%5B%22_id%22%2C%22_rev%5D
*
* #author Max Ogden
*/
csv: function(head, req) {
if ('headers' in req.query) {
var headers = JSON.parse(unescape(req.query.headers));
var row, sep = '\n', headerSent = false, startedOutput = false;
start({"headers":{"Content-Type" : "text/csv; charset=utf-8"}});
send('"' + headers.join('","') + '"\n');
while (row = getRow()) {
for (var header in headers) {
if (row.value[headers[header]]) {
if (startedOutput) send(",");
var value = row.value[headers[header]];
if (typeof(value) == "object") value = JSON.stringify(value);
if (typeof(value) == "string") value = value.replace(/\"/g, '""');
send("\"" + value + "\"");
} else {
if (startedOutput) send(",");
}
startedOutput = true;
}
startedOutput = false;
send('\n');
}
} else {
send("You must pass in the urlencoded headers you wish to build the CSV from. Query /_list/urlencode/headers?group=true");
}
}
}
module.exports = ddoc;
Source:
https://github.com/kanso/kanso/issues/336

Bitly, Json, and C#

I'm working on something that involved using the Bit.ly API, and allow the user to select theformat (Text, XML, Json) the text & XML are completed. This is the Json result that is returned when you shorten a URL:
{
"status_code": 200,
"status_txt": "OK",
"data":
{
"long_url": "http:\/\/panel.aspnix.com\/Default.aspx?pid={Removed}",
"url": "http:\/\/rlm.cc\/gtYUEd",
"hash": "gtYUEd",
"global_hash": "evz3Za",
"new_hash": 0
}
}
And this C# code works just fine to parse it and get the short URL:
var serializer2 = new JavaScriptSerializer();
var values2 = serializer2.Deserialize<IDictionary<string, object>>(json);
var results2 = values2["data"] as IDictionary<string, object>;
var shortUrl2 = results2["url"];
expandedUrl = results2["url"].ToString();
return results2["url"].ToString();
Now here's the Json sent back when expanding a URL:
{
"status_code": 200,
"status_txt": "OK",
"data":
{
"expand":
[
{
"short_url": "http:\/\/rlm.cc\/gtYUEd",
"long_url": "http:\/\/panel.aspnix.com\/Default.aspx?pid={Removed}",
"user_hash": "gtYUEd",
"global_hash": "evz3Za"
}
]
}
}
Ad that's where my problem begins, how can I change my current C# to be able to handle both scenarios, because as you can see their vastly different from each other. Any ideas?

I usually use Json.NET to cherrypick values out of JSON documents. The syntax is very concise. If you reference NewtonSoft.Json.dll and use Newtonsoft.Json.Linq, you can write the following:
var job = JObject.Parse(jsonString);
if (job["data"]["expand"] == null)
{
Console.WriteLine((string)job["data"]["url"]);
}
else
{
Console.WriteLine((string)job["data"]["expand"][0]["long_url"]);
}
If jsonString is:
string jsonString = #"{""status_code"": 200, ""status_txt"": ""OK"", ""data"": {""long_url"": ""http:\/\/panel.aspnix.com\/Default.aspx?pid={Removed}"", ""url"": ""http:\/\/rlm.cc\/gtYUEd"", ""hash"": ""gtYUEd"", ""global_hash"": ""evz3Za"", ""new_hash"": 0 }}";
the routine will display http://rlm.cc/gtYUEd.
If jsonString is:
string jsonString = #"{""status_code"": 200, ""status_txt"": ""OK"", ""data"": { ""expand"": [ { ""short_url"": ""http:\/\/rlm.cc\/gtYUEd"", ""long_url"": ""http:\/\/panel.aspnix.com\/Default.aspx?pid={Removed}"", ""user_hash"": ""gtYUEd"", ""global_hash"": ""evz3Za"" } ] } }";
the routine will display http://panel.aspnix.com/Default.aspx?pid={Removed}.

Not sure I got your problem. Why aren't you testing, if you got a shortening result or a expanding result? Since they are different, this could easily be done via simple 'if ()' statements:
if (results2.ContainsKey("expand")) {
// handle the expand part
} else {
// handle the shorten part
}

Assuming that the provider is consistent with which form it sends, do you need to have code that handles both? It should be direct to handle each individually.
If you can't know ahead of time which format you will get back, you can do the following:
if (results2.ContainsKey("expand"))
{
//Second example
}
else
{
//First example
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

extract-document-data comes as xml string element in json output - json

Related

Remove a specific JSONObject from JSONArray in groovy

Unable to parse JSON data due to errors or undefined data

How to access data inside a complex JSON object in Dart?

Creating CSV view from CouchDB

Bitly, Json, and C#

Categories

Resources