I'm currently working on an Search Engine implementation with Scrapy as Crawler and Elasticsearch as server. Scrapy and Elasticsearch work fine, but what I'm currently struggling to enable is case-insensitive search with a german analyzer. I have a general structure (match_all query) like this:
"hits": {
"total": 14,
"max_score": 0.40951526,
"hits": [
{
"_index": "uni",
"_type": "items",
"_id": "AVcuHuT6qni1Wq78foIA",
"_score": 0.40951526,
"_source": {
"description": "...",
"tags": [
"...",
"..."
],
"url":"...",
"author": "...",
"content": "...",
"date": "18.09.2015",
"title": "..."
},
"highlight": {
"content": [
"...",
"...",
"..."
]
}
}
]
}
And tried to add these settings just by
"curl -XPUT localhost:9200/uni {...}":
{
"mappings":{
"_source":{
"type":"object",
"properties":{
"title":{
"type":"string",
"analyzer":"german_lowercase"
},
"content":{
"type":"string",
"analyzer":"german_lowercase"
},
"description":{
"type":"string",
"analyzer":"german_lowercase"
},
"tags":{
"type":"array",
"analyzer":"german_lowercase"
}
}
}
},
"settings":{
"uni":{
"analysis":{
"analyzer":{
"german_lowercase":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase",
"german_stop",
"german_keywords",
"german_normalization",
"german_stemmer"
]
}
},
"filter":{
"german_stop": {
"type": "stop",
"stopwords": "_german_"
},
"german_keywords": {
"type": "keyword_marker",
"keywords": []
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
}
}
}
}
}
}
I'm not sure whats going wrong, can anybody help out?
EDIT:
Elasticsearch wont allow me to put those settings into the index (already exists) and if I try to put the mapping separately Ill get an "missing mapping type" exception. In case of the settings it failes ot update the non dynamic settings. So I'm asking for a more general info how I should update those settings/mappings in order to enable case-insensitive search (other posts have the same problem).
Related
I know I am close on this, the error messages are getting nicer. Currently, I can call a similar call to update the seller's email no issue via Postman currently, working on updating the amount and associated objects. Something in my request format is off.
Is my breakdown section in the correct location? The amount_breakdown documentation looks like it is on same level as value and currency_code, so does it need to move into that section.
Here's my request JSON via Postman:
[
{
"op": "replace",
"path": "/purchase_units/#reference_id=='default'/amount",
"value": {
"currency_code": "CAD",
"value": "2",
"amount": {
"currency_code": "CAD",
"value": "2",
"breakdown": {
"item_total": {
"currency_code": "CAD",
"value": "2"
},
"tax_total": {
"value": "0",
"currency_code": "CAD"
}
}
},
"items": [
{
"name": "First Product Name",
"description": "Optional descriptive text..",
"unit_amount": {
"currency_code": "CAD",
"value": "2"
},
"tax": {
"value": "0",
"currency_code": "CAD"
},
"quantity": "1"
}
]
}
}
]
RESPONSE:
{
"name": "UNPROCESSABLE_ENTITY",
"details": [
{
"field": "/purchase_units/#reference_id=='default'/amount/breakdown/item_total",
"location": "body",
"issue": "ITEM_TOTAL_REQUIRED",
"description": "If item details are specified (items.unit_amount and items.quantity) corresponding amount.breakdown.item_total is required."
}
],
"message": "The requested action could not be performed, semantically incorrect, or failed business validation.",
"debug_id": "acecd3643c994",
"links": [
{
"href": "https://developer.paypal.com/docs/api/orders/v2/#error-ITEM_TOTAL_REQUIRED",
"rel": "information_link",
"method": "GET"
}
]
}
Thanks for any help!
Different variations of objects.
I can get the other PATCH operation working no issue but it is much simpler in object structure
There should be no amount key under the /amount path, and the items array does not belong at that /amount path either.
The problem is: I have 2 cucumber test reports in JSON format
I need to remove redundant key-value pairs from those reports and compare them, but I can't understand how to remove the unnecessary data from those 2 jsons because of their structure after JSON.parse (array or hash with many nested arrays/hashes). Please advice if there are some gems or known solutions to do this
JSON structure is e.g. :
[
{
"uri": "features/home_screen.feature",
"id": "as-a-user-i-want-to-explore-home-screen",
"keyword": "Feature",
"name": "As a user I want to explore home screen",
"description": "",
"line": 2,
"tags": [
{
"name": "#home_screen",
"line": 1
}
],
"elements": [
{
"keyword": "Background",
"name": "",
"description": "",
"line": 3,
"type": "background",
"before": [
{
"match": {
"location": "features/step_definitions/support/hooks.rb:1"
},
"result": {
"status": "passed",
"duration": 505329000
}
}
],
"steps": [
{
"keyword": "Given ",
"name": "I click OK button in popup",
"line": 4,
"match": {
"location": "features/step_definitions/registration_steps.rb:91"
},
"result": {
"status": "passed",
"duration": 2329140000
}
},
{
"keyword": "And ",
"name": "I click Allow button in popup",
"line": 5,
"match": {
"location": "features/step_definitions/registration_steps.rb:96"
},
"result": {
"status": "passed",
"duration": 1861776000
}
}
]
},
Since you are asking for a gem, you might try iteraptor I have created exactly for this kind of tasks.
It allows iterating, mapping and reducing the deeply nested structures. For instance, to filter out all the keys called "name" on all levels, you might do:
input.iteraptor.reject(/name/)
The more detailed description might be found on the github page linked above.
In version 1.7.0 of Orion CB running the docker version in Docker for Windows,
if I create a simple object doing POST http://localhost:1026/v1/updateContext
with the body:
{
"contextElements": [
{
"type": "Car",
"id": "myNewCar",
"attributes": [
{
"name": "maxSpeed",
"type": "integer",
"value": "220"
}
]
}
],
"updateAction": "APPEND"
}
I get the answer:
{
"contextResponses": [
{
"contextElement": {
"type": "Car",
"isPattern": "false",
"id": "myNewCar",
"attributes": [
{
"name": "maxSpeed",
"type": "integer",
"value": ""
}
]
},
"statusCode": {
"code": "200",
"reasonPhrase": "OK"
}
}
]
}
Then, if I do POST http://localhost:1026/v1/queryContext with the same headers and the same components with the body
{
"entities": [
{
"type": "Car",
"isPattern": "false",
"id": "myNewCar"
}
]
}
I get the following:
{
"errorCode": {
"code": "404",
"reasonPhrase": "No context element found"
}
}
Which shouldn't be problematic (I can query the entities with v2 API, for instance) if it wasn't needed for integration with data representation tools such as SpagoBI as documented in http://spagobi.readthedocs.io/en/latest/user/NGSI/README/
What can I do? I am doing something wrong with the context provision?
Thanks!
My problem was that I was using a imported Postman collection of the API (Downloaded from https://github.com/telefonicaid/fiware-orion/tree/develop/doc/apiary/v2) and accidentally I was using the header Fiware-Service.
You are right and your tests work properly.
Thanks for the prompt reply!!
I am starting with elasticsearch now and i don't know anything about it.
I have folowing .JSON:
[
{
"label": "Admin Law",
"tags": [
"#admin"
],
"owner": "generalTopicTagText"
},
{
"label": "Judicial review",
"tags": [
"#JR"
],
"owner": "generalTopicTagText"
},
{
"label": "Admiralty/Shipping",
"tags": [
"#shipping"
],
"owner": "generalTopicTagText"
}
]
My mapping is this:
{
"topic_tax": {
"properties": {
"label": {
"type": "string",
"index": "not_analyzed"
},
"tags": {
"type": "string",
"index_name": "tag"
},
"owner": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
I need to put the first .Json into Elasticsearch, but it does not work.
All I know is that i am defining only 1 of this:
{
"label": "Judicial review",
"tags": [
"#JR"
],
"owner": "generalTopicTagText"
}
So when i try to put all of them with my elasticsearch.init, it will not work.
But I really don't know how to declare the mapping.Json to put the all .Json, it is like i need something like a for there.
You have to insert them json after json. But what you should do is use the bulk api of elasticsearch to insert multiple documents in one request. Check this api doc to see how it works
You can do something like this
curl -XPUT 'localhost:9000/es/post/1?version=2' -d '{
"text" : "your test message!"
}'
here is the documentation for index json with elasticsearch
I want to index & search nested json in solr. Here is my json code
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
When I try to Index, I'm getting the error "Error parsing JSON field value. Unexpected OBJECT_START"
When we tried to use Multivalued Field & index, we couldn't able to search using the multivalued field? Its returning "Undefined Field"
Also Please advice if I need to do any changes in schema.xml file?
You are nesting child documents within your document. You need to use the proper syntax for nested child documents in JSON:
[
{
"id": "1",
"title": "Solr adds block join support",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "2",
"comments": "SolrCloud supports it too!"
}
]
},
{
"id": "3",
"title": "Lucene and Solr 4.5 is out",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "4",
"comments": "Lots of new features"
}
]
}
]
Have a look at this article which describes JSON child documents and block joins.
Using the format mentioned by #qux you will face "Expected: OBJECT_START but got ARRAY_START at [16]",
"code": 400
as when JSON starting with [....] will parsed as a JSON array
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
The above format is correct.
Regarding searching. Kindly use the index to search for the elements of the JSON array.
The workaround for this can be keeping the whole JSON object inside other JSON object and the indexing it
I was suggesting to keep the whole data inside another JSON object. You can try the following way
{
"data": [
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
]
}
see the syntax in http://yonik.com/solr-nested-objects/
$ curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '
[
{id : book1, type_s:book, title_t : "The Way of Kings", author_s : "Brandon Sanderson",
cat_s:fantasy, pubyear_i:2010, publisher_s:Tor,
_childDocuments_ : [
{ id: book1_c1, type_s:review, review_dt:"2015-01-03T14:30:00Z",
stars_i:5, author_s:yonik,
comment_t:"A great start to what looks like an epic series!"
}
,
{ id: book1_c2, type_s:review, review_dt:"2014-03-15T12:00:00Z",
stars_i:3, author_s:dan,
comment_t:"This book was too long."
}
]
}
]'
supported from solr 5.3