How can I index .JSON in elasticsearch - json

I am starting with elasticsearch now and i don't know anything about it.
I have folowing .JSON:
[
{
"label": "Admin Law",
"tags": [
"#admin"
],
"owner": "generalTopicTagText"
},
{
"label": "Judicial review",
"tags": [
"#JR"
],
"owner": "generalTopicTagText"
},
{
"label": "Admiralty/Shipping",
"tags": [
"#shipping"
],
"owner": "generalTopicTagText"
}
]
My mapping is this:
{
"topic_tax": {
"properties": {
"label": {
"type": "string",
"index": "not_analyzed"
},
"tags": {
"type": "string",
"index_name": "tag"
},
"owner": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
I need to put the first .Json into Elasticsearch, but it does not work.
All I know is that i am defining only 1 of this:
{
"label": "Judicial review",
"tags": [
"#JR"
],
"owner": "generalTopicTagText"
}
So when i try to put all of them with my elasticsearch.init, it will not work.
But I really don't know how to declare the mapping.Json to put the all .Json, it is like i need something like a for there.

You have to insert them json after json. But what you should do is use the bulk api of elasticsearch to insert multiple documents in one request. Check this api doc to see how it works

You can do something like this
curl -XPUT 'localhost:9000/es/post/1?version=2' -d '{
"text" : "your test message!"
}'
here is the documentation for index json with elasticsearch

Related

Delete json block with jq command

I have json file with multiple domains which is formated as is showed below. How can I delete whole blocks with domains? For example if I will want to delete whole block in json for domain domain.tld?
I tryed this, but output is error:
jq '."http-01"."domain"[]."main"="domain.tld"' acme.json
jq: error (at acme.json:11483): Cannot iterate over null (null)
formating example file:
{
"http-01": {
"Account": {
"Email": "mail#placeholder.tld",
"Registration": {
"body": {
"status": "valid",
"contact": [
"mailto:mail#placeholder.tld"
]
},
"uri": "https://acme-v02.api.letsencrypt.org/acme/acct/110801506"
},
"PrivateKey": "main_priv_key_string",
"KeyType": "4096"
},
"Certificates": [
{
"domain": {
"main": "www.some_domain.tld"
},
"certificate": "cert_string",
"key": "key_string",
"Store": "default"
},
{
"domain": {
"main": "some_domain.tld"
},
"certificate": "cert_string",
"key": "key_string",
"Store": "default"
},
{
"domain": {
"main": "www.some_domain2.tld"
},
"certificate": "cert_string",
"key": "key_string",
"Store": "default"
},
{
"domain": {
"main": "some_domain2.tld"
},
"certificate": "cert_string",
"key": "key_string",
"Store": "default"
}
]
}
}
To delete domain block "www.some_domain.tld" :
jq '."http-01".Certificates |= map(select(.domain.main != "www.some_domain.tld"))' input.json
Your question is quite broad. What is a "block"?
Let's assume you want to delete from within the object under http-01 each field that is of type array and has at index 0 an object satisfying .domain.main == "domain.tld". Then first navigate to where you want to delete from, and update it (|=) using del and select which performs the filtered deletion.
jq '
."http-01" |= del(
.[] | select(arrays[0] | objects.domain.main == "domain.tld")
)
' acme.json
{
"http-01": {
"Account": {
"Email": "email#domain.tld",
"Registration": {
"body": {
"status": "valid",
"contact": [
"mailto:email#domain.tld"
]
},
"uri": "https://acme-v02.api.letsencrypt.org/acme/acct/110801506"
},
"PrivateKey": "long_key_string",
"KeyType": "4096"
}
}
}
Demo
If your "block" is deeper, go deeper before updating. If it is higher, the whole document for instance, there's no need to update, just start with del.

Json Path Read from a Kafka Message

I have a kafka message like below, where im trying to read the data from the json path. However im having a challenge when reading some of the attributes from the json path. here is the sample message.
sample1:
{
"header": {
"bu": "google",
"id": "12345",
"bum": "google",
"originTimestamp": "2021-10-09T15:17:09.842+00:00",
"batchSize": "0",
"jobType": "Batch"
},
"payload": {
"derivationdetails": {
"Id": "6783jhvvh897u31y283y",
"itemid": "1234567",
"batchid": 107,
"attributes": {
"itemid": "1234567",
"lineNbr": "1498",
"cat": "5929",
"Id": "6783jhvvh897u31y283y",
"indicator": "false",
"subcat": "3514"
},
"Exception": {
"values": [
{
"type": "PICK",
"value": "blocked",
"Reason": [
"RULE"
],
"rules": [
"439"
]
}
],
"rulesBagInfo": [
{
"Idtype": "XXXX",
"uniqueid": "7889423rbhevfhjaufdyeuiryeukjbdafvjd",
"rulesMatch": [
"439"
]
}
]
}
}
}
}
sample 2: Same message but see the difference in "Payload"
{
"header": {
"bu": "google",
"id": "12345",
"bum": "google",
"originTimestamp": "2021-10-09T15:17:09.842+00:00",
"batchSize": "0",
"jobType": "Batch"
},
"payload": {
"Id": "6783jhvvh897u31y283y",
"itemid": "1234567",
"batchid": 107,
"attributes": {
"itemid": "1234567",
"lineNbr": "1498",
"cat": "5929",
"Id": "6783jhvvh897u31y283y",
"indicator": "false",
"subcat": "3514"
},
"Exception": {
"values": [
{
"type": "PICK",
"value": "blocked",
"Reason": [
"RULE"
],
"rules": [
"439"
]
}
],
"rulesBagInfo": [
{
"Idtype": "XXXX",
"uniqueid": "7889423rbhevfhjaufdyeuiryeukjbdafvjd",
"rulesMatch": [
"439"
]
}
]
}
}
}
If you observe, sometimes the message has "derivationdetails", and sometimes it doesn't. But irrespective of its existence, i need to read the values of id,itemid and batchid. I tried using
$.payload[*].id
$.payload[*].itemid
$.payload[*].batchid
But i see that for batchid is returning null even though it has a value in the message, and the attributes under "attributes" return null if im using the above. For fields under "attributes" im using this(example):
$.payload.attributes.itemId
And, completely blank on how to read the below part.
"Exception": {
"values": [
{
"type": "PICK",
"value": "blocked",
"Reason": [
"RULE"
],
"rules": [
"439"
]
}
],
"rulesBagInfo": [
{
"Idtype": "XXXX",
"uniqueid": "7889423rbhevfhjaufdyeuiryeukjbdafvjd",
"rulesMatch": [
"439"
]
Im new to this and need some suggestions on how to read the attributes properly. Any help would be much appreciated.Thanks
Use ..(recursive descent, Deep scan. JSONPath borrows this syntax from E4X.) to get the values. But It will return a list if there are multiple entries with same key nested in deep.
Below jsonpath expressions will return a list with one item each for both sample1 and sample2
$.payload..attributes.Id
$.payload..attributes.itemid
$.payload..batchid
$.payload..Exception

Integromat - Dynamically render spec of a collection from an rpc

I am trying to dynamically render a spec (Specification) of a collection from an RPC. Can't get it to work. Here I have attached the code of both 'module->mappable parameters' and the 'remote procedure->communication' here.
module -> mappable parameters
[
{
"name": "birdId",
"type": "select",
"label": "Bird Name",
"required": true,
"options": {
"store": "rpc://selectbird",
"nested": [
{
"name": "variables",
"type": "collection",
"label": "Bird Variables",
"spec": [
"rpc://birdVariables"
]
}
]
}
}
]
remote procedure -> communication
{
"url": "/bird/get-variables",
"method": "POST",
"body": {
"birdId": "{{parameters.birdId}}"
},
"headers": {
"Authorization": "Apikey {{connection.apikey}}"
},
"response": {
"iterate":{
"container": "{{body.data}}"
},
"output": {
"name": "{{item.name}}",
"label": "{{item.label}}",
"type": "{{item.type}}"
}
}
}
Thanks in advance.
Just tried the following and it worked. According to Integromat's Docs you can use the wrapper directive for the rpc like so:
{
"url": "/bird/get-variables",
"method": "POST",
"body": {
"birdId": "{{parameters.birdId}}"
},
"headers": {
"Authorization": "Apikey {{connection.apikey}}"
},
"response": {
"iterate":"{{body.data}}",
"output": {
"name": "{{item.name}}",
"label": "{{item.label}}",
"type": "{{item.type}}"
},
"wrapper": [{
"name": "variables",
"type": "collection",
"label": "Bird Variables",
"spec": "{{output}}"
}]
}
}
Your mappable parameters would then look like:
[
{
"name": "birdId",
"type": "select",
"label": "Bird Name",
"required": true,
"options": {
"store": "rpc://selectbird",
"nested": "rpc://birdVariables"
}
}
]
Needing this myself. Pulling in custom fields that have different types but would like them all to show for the user to update customs fields or when creating a contact be able to update them. Not sure if best to have them all show or have a select drop down then let the user use the map for more than one.
Here is my response from a Get for custom fields. Could you show how my code should look. Got little confused as usualy look for add a value in the output and do you need two separate RPC's in integromat? Noticed your store and nested were different.
{
"customFields": [
{
"id": "5sCdYXDx5QBau2m2BxXC",
"name": "Your Experience",
"fieldKey": "contact.your_experience",
"dataType": "LARGE_TEXT",
"position": 0
},
{
"id": "RdrFtK2hIzJLmuwgBtAr",
"name": "Assisted by",
"fieldKey": "contact.assisted_by",
"dataType": "MULTIPLE_OPTIONS",
"position": 0,
"picklistOptions": [
"Tom",
"Jill",
"Rick"
]
},
{
"id": "uyjmfZwo0PCDJKg2uqrt",
"name": "Is contacted",
"fieldKey": "contact.is_contacted",
"dataType": "CHECKBOX",
"position": 0,
"picklistOptions": [
"I would like to be contacted"
]
}
]
}

Elasticsearch Custom Analyzer

I'm currently working on an Search Engine implementation with Scrapy as Crawler and Elasticsearch as server. Scrapy and Elasticsearch work fine, but what I'm currently struggling to enable is case-insensitive search with a german analyzer. I have a general structure (match_all query) like this:
"hits": {
"total": 14,
"max_score": 0.40951526,
"hits": [
{
"_index": "uni",
"_type": "items",
"_id": "AVcuHuT6qni1Wq78foIA",
"_score": 0.40951526,
"_source": {
"description": "...",
"tags": [
"...",
"..."
],
"url":"...",
"author": "...",
"content": "...",
"date": "18.09.2015",
"title": "..."
},
"highlight": {
"content": [
"...",
"...",
"..."
]
}
}
]
}
And tried to add these settings just by
"curl -XPUT localhost:9200/uni {...}":
{
"mappings":{
"_source":{
"type":"object",
"properties":{
"title":{
"type":"string",
"analyzer":"german_lowercase"
},
"content":{
"type":"string",
"analyzer":"german_lowercase"
},
"description":{
"type":"string",
"analyzer":"german_lowercase"
},
"tags":{
"type":"array",
"analyzer":"german_lowercase"
}
}
}
},
"settings":{
"uni":{
"analysis":{
"analyzer":{
"german_lowercase":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase",
"german_stop",
"german_keywords",
"german_normalization",
"german_stemmer"
]
}
},
"filter":{
"german_stop": {
"type": "stop",
"stopwords": "_german_"
},
"german_keywords": {
"type": "keyword_marker",
"keywords": []
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
}
}
}
}
}
}
I'm not sure whats going wrong, can anybody help out?
EDIT:
Elasticsearch wont allow me to put those settings into the index (already exists) and if I try to put the mapping separately Ill get an "missing mapping type" exception. In case of the settings it failes ot update the non dynamic settings. So I'm asking for a more general info how I should update those settings/mappings in order to enable case-insensitive search (other posts have the same problem).

How to Index & Search Nested Json in Solr 4.9.0

I want to index & search nested json in solr. Here is my json code
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
When I try to Index, I'm getting the error "Error parsing JSON field value. Unexpected OBJECT_START"
When we tried to use Multivalued Field & index, we couldn't able to search using the multivalued field? Its returning "Undefined Field"
Also Please advice if I need to do any changes in schema.xml file?
You are nesting child documents within your document. You need to use the proper syntax for nested child documents in JSON:
[
{
"id": "1",
"title": "Solr adds block join support",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "2",
"comments": "SolrCloud supports it too!"
}
]
},
{
"id": "3",
"title": "Lucene and Solr 4.5 is out",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "4",
"comments": "Lots of new features"
}
]
}
]
Have a look at this article which describes JSON child documents and block joins.
Using the format mentioned by #qux you will face "Expected: OBJECT_START but got ARRAY_START at [16]",
"code": 400
as when JSON starting with [....] will parsed as a JSON array
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
The above format is correct.
Regarding searching. Kindly use the index to search for the elements of the JSON array.
The workaround for this can be keeping the whole JSON object inside other JSON object and the indexing it
I was suggesting to keep the whole data inside another JSON object. You can try the following way
{
"data": [
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
]
}
see the syntax in http://yonik.com/solr-nested-objects/
$ curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '
[
{id : book1, type_s:book, title_t : "The Way of Kings", author_s : "Brandon Sanderson",
cat_s:fantasy, pubyear_i:2010, publisher_s:Tor,
_childDocuments_ : [
{ id: book1_c1, type_s:review, review_dt:"2015-01-03T14:30:00Z",
stars_i:5, author_s:yonik,
comment_t:"A great start to what looks like an epic series!"
}
,
{ id: book1_c2, type_s:review, review_dt:"2014-03-15T12:00:00Z",
stars_i:3, author_s:dan,
comment_t:"This book was too long."
}
]
}
]'
supported from solr 5.3