How to do Fulltext search on the below Document with ArangoDB? - json

{
"rootElement": {
"names": {
"name": [
"Haseb",
"Anil",
"Ajinkya",
{
"city": "mumbai",
"state": "maharashtra",
"job": {
"second": "bosch",
"first": "infosys"
}
}
]
},
"places": {
"place": {
"origin": "INDIA",
"current": "GERMANY"
}
}
}
}
If I had the document like the above example and I want to search the value like "mumbai" or "infosys" then how would I do the indexing and search for the same.

As we already discussed in other questions, you can only index one field in the document.
How about using a yaml dump of the whole structure in another attribute that you do the index on?
So, lets say paralell to rootElement you add wordTokens with that dump, and put a fulltext index on that?
You would probably want to use some regular expressions to strip keywords from the yaml dump, and since you don't want to be able to de-serialize it, remove unneeded whitespace and linebreaks too.

Related

How to remove one property from json input message using replace() expression in azure logic app?

{
"metadata": {
"id": "2",
"uri": "3",
"type": "2"
},
"Number": "2323600002913",
"Date": "04/21/2009",
"postingDate": "00/00/0000",
"ata": {
"results": [
{
"metadata": {
"id": "r",
"uri": "e2",
"type": "s2"
},
"item": "000010",
"data":"ad"
}
]
}
}
want to remove metadata property from above json message and output should be like below
{
"Number": "2323600002913",
"Date": "04/21/2009",
"postingDate": "00/00/0000",
"ata": {
"results": [
{
"item": "000010",
"data":"ad"
}
]
}
}
I tried with removeProperty() which is working for root level metadata but inside metadata not removed.
how to use replace() in this case or anything else to only remove metadata.
The simplest way is use inline code, cause even with removeProperty() expression to remove the metadata under results, it will return the results array data not the whole json data. Then you will have to combine them, it's not a convenient way.
And with inline code you could refer to my below picture. The variable json is the value from triggerbody, then just delete the node or key and return the json variable. And with this way, even you want to delete many metadata in the array, you could add a for loop to delete it, just think of it as plain js code.
Update:if you want to get value from variable,cause no support expression to get value from variable so use the below expression.
var json =wworkflowContext.actions.Initialize_variable.inputs.variables[0].value;
And about how to loop the array in the json refer to my below pic.

Logstash json field removal

We have a heavily nested json document containing server metrcs, the document contains > 1000 fields some of which are completely irrelevant to us for analytic purposes so i would like to remove them before indexing the document in Elastic.
However i am unable to find the correct filter to use as the fields i want to remove have common names in multiple different objects within the document.
The source document looks like this ( reduced in size for brevity)
[
{
"server": {
"is_master": true,
"name": "MYServer",
"id": 2111
},
"metrics": {
"Server": {
"time": {
"boundary": {},
"type": "TEXT",
"display_name": "Time",
"value": "2018-11-01 14:57:52"
}
},
"Mem_OldGen": {
"used": {
"boundary": {},
"display_name": "Used(mb)",
"value": 687
},
"committed": {
"boundary": {},
"display_name": "Committed(mb)",
"value": 7116
}
"cpu_count": {
"boundary": {},
"display_name": "Cores",
"value": 4
}
}
}
}
]
The data is loaded into logstash using the http_poller input plugin and needs to be processed before sending to Elastic for indexing.
I am trying to remove the fields that are not relevant for us to track for analytical purposes, these include the "display_name" and "boundary" fields from each json object in the different metrics.
I have tried using the mutate filter to remove the fields but because they exist in so many different objects it requires to many coded paths to be added to the logstash config.
I have also looked at the ruby filter, which seems promising as it can look the event, but i am unable to get it to crawl the entire json document, or more importantly actually remove the fields.
Here is what i was trying as a test
filter {
split{
field => "message"
}
ruby {
code => '
event.get("[metrics][Mem_OldGen][used]").to_hash.keys.each { |k|
logger.info("field is:", k)
if k.include?("display_name")
event.remove(k)
end
if k.include?("boundary")
event.remove(k)
end
}
'
}
}
It first splits the input at the message level to create one event per server, then tries to remove the fields from a specific metric.
Any help you be greatly appreciated.
If I get the point, you want to keep just the value key.
So, considering the response hash:
response = {
"server": {
"is_master": true,
"name": "MYServer",
"id": 2111
},
"metrics": {
...
You could do:
response[:metrics].transform_values { |hh| hh.transform_values { |h| h.delete_if { |k,v| k != :value } } }
#=> {:server=>{:is_master=>true, :name=>"MYServer", :id=>2111}, :metrics=>{:Server=>{:time=>{:value=>"2018-11-01 14:57:52"}}, :Mem_OldGen=>{:used=>{:value=>687}, :committed=>{:value=>7116}, :cpu_count=>{:value=>4}}}}

Elastic Search + JSON import (ELK Stack)

I'm currently trying to do a basic JSON file import into my ELK stack. I tried importing it directly via a POST request like this:
curl -XPOST http://localhost:9200/kwd_results/TS_Cart -d #/home/local/TS_Cart.json
ES says ok for the import, but when I'm trying to view the logs in Kibanna, they are not indexed by the nodes of the JSON file. I'm guessing I need like a template mapping to view it properly.
My JSON file looks like this:
{
"testResults": {
"FitNesseVersion": "v20160618",
"rootPath": "K1System.CountryDe.DriverFirefox.TestCases.MainFolder.TestVariants.SmokeTests_B2C.TS_Cart",
"result": [
{
"counts": {
"right": "16",
"wrong": "2",
"ignores": "3",
"exceptions": "1"
},
"date": "2017-05-10T00:01:11+02:00",
"runTimeInMillis": "117242",
"relativePageName": "TestCase_1",
"pageHistoryLink": "K1System.CountryDe.DriverFirefox.TestCases.MainFolder.TestVariants.SmokeTests_B2C.TS_Cart.B2CFreeCatalogueOrder?pageHistory&resultDate=20170510000111",
"tags": "de, at"
},
{
"counts": {
"right": "16",
"wrong": "0",
"ignores": "0",
"exceptions": "0"
},
"date": "2017-05-10T00:03:08+02:00",
"runTimeInMillis": "85680",
"relativePageName": "TestCase_2",
"pageHistoryLink": "K1System.CountryDe.DriverFirefox.TestCases.MainFolder.TestVariants.SmokeTests_B2C.TS_Cart.B2CGiftCardOrderWithAdvancePayment?pageHistory&resultDate=20170510000308",
"tags": "at, de"
}
],
"finalCounts": {
"right": "4",
"wrong": "1",
"ignores": "0",
"exceptions": "0"
},
"totalRunTimeInMillis": "482346"
}
}
Basically I would need rootPath to be used as an index, while having the following childs: counts, relativePageName, date and tags. Notice that I have two nodes that are childs of the result[] array.
Any help would be greatly appreciated!
Thank you.
Well, it's one JSON document so Elasticsearch treats it as such.
You'll need to (programmatically) split up the document into the right documents and then you can store them (potentially with one _bulk request).
For the index name:
Must be lowercase, so you'll need to cast that value.
Will you have many different root paths with jut a few docs each? Then you shouldn't make all of them an index since there is an overhead for each one of them (actually the underlying shards).

JSON Query for import.io

I'm using import.io and trying to figure out how to write a code that that uses multiple inputs to run a connector query. I've never used JSON before but essentially what I'm trying to do is to expand this existing query:
{
"input": {
"name": "Marin Academy",
"city": "San Rafael"
},
}
To include multiple names. So that when I run the query, IO automatically searches a list of organizations. What would be the correct syntax to achieve this?
I tried
{
"input": {
"name": "Marin Academy",
"city": "San Rafael"
},
{
"name": "Mt. Hood Community College",
"city": "Gresham"
}
}
But it gives me a syntax error.
Thanks!
In case of an an array, please use square brackets before the beginning and after the end of curly brackets like...
{
"input": [
{
"name": "Marin Academy",
"city": "San Rafael"
},
{
"name": "Mt. Hood Community College",
"city": "Gresham"
}
]
}
This will solve the JSON sytax error at least.

Elasticsearch mapping of nested structure

I'm looking for some pointers on mapping a somewhat dynamic structure for consumption by Elasticsearch.
The raw structure itself is json, but the problem is that a portion of the structure contains a variable, rather than the outer elements of the structure being static.
To provide a somewhat redacted example, my json looks like this:
"stat": {
"state": "valid",
"duration": 5,
},
"12345-abc": {
"content_length": 5,
"version": 2
}
"54321-xyz": {
"content_length": 2,
"version", 1
}
The first block is easy; Elasticsearch does a great job of mapping the "stat" portion of the structure, and if I were to dump a lot of that data into an index it would work as expected. The problem is that the next 2 blocks are essentially the same thing, but the raw json is formatted in such a way that a unique element has crept into the structure, and Elasticsearch wants to map that by default, generating a map that looks like this:
"stat": {
"properties": {
"state": {
"type": "string"
},
"duration": {
"type": "double"
}
}
},
"12345-abc": {
"properties": {
"content_length": {
"type": "double"
},
"version": {
"type": "double"
}
}
},
"54321-xyz": {
"properties": {
"content_length": {
"type": "double"
},
"version": {
"type": "double"
}
}
}
I'd like the ability to index all of the "content_length" data, but it's getting separated, and with some of the variable names being used, when I drop the data into Kibana I wind up with really long fieldnames that become next to useless.
Is it possible to provide a generic tag to the structure? Or is this more trivially addressed at the json generation phase, with our developers hard coding a generic structure name and adding an identifier field name.
Any insight / help greatly appreciated.
Thanks!
If those keys like 12345-abc are generated and possibly infinite values, it will get hard (if not impossible) to do some useful queries or aggregations. It's not really clear which exact use case you have for analyzing your data, but you should probably have a look at nested objects (https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html) and generate your input json accordingly to what you want to query for. It seems that you will have better aggregation results if you put these additional objects into an array with a special field containing what is currently your key.
{
"stat": ...,
"things": [
{
"thingkey": "12345-abc",
"content_length": 5,
"version": 2
},
...
]
}