Elasticsearch completion suggester phrase instead of terms - mysql

I am developing a search engine with Elasticsearch 1.6 and it's all working great. I get the data from my MySQL database with the JDBC importer from Jorg Prante. I would like to use the Elasticsearch complete suggester like documented here. The problem is only that I cannot find out how to do this without having tags like shwown in the examples everywhere. I only have the title of a product which is a quite long title.
So I would like to know how to make this work like expected by using the full phrase of the title or otherwise how to split the titlephrase into tags and adding them.
This is my current mapping for the field 'title' but that does only return a (not very relevant) whole phrase.
curl -XPUT "http://localhost:9200/jdbc/" -d'
{
"mappings": {
"jdbc": {
"properties": {
"title": {
"type": "completion",
"index_analyzer": "simple",
"search_analyzer": "simple",
"payloads": true
}
}
}
}
}'

Related

Monitoring a JSON or YAML config file and showing the parent of the changed attributes

I have a case in which I need to monitor JSON and YAML config files for changes, but all of the solutions that I found only showed line-by-line changes.
For example, if I have a JSON as such:
{
"server-1": {
"name": "A" ,
"metadata":{
"tags": ["X","Y","Z"],
"date-created": "10 March 2021"
}
},
"server-2": {
"name": "B" ,
"metadata":{
"tags": ["W","X","Y"],
"date-created": "11 March 2021"
}
},
}
If server-2's tags were to be changed, I want to get: search-1, metadata, tags as its result.
Does anyone have any ready-to-use solutions that I can use for monitoring purposes?
I am looking for a ready-to-use solutions since the JSON and YAML fields are quite dynamic and contains a lots of fields. I am considering ELK stack or Splunk for this, but I am not quite sure it will work or not.

Elasticsearch dynamic mapping for object within attribute

Wondering if I can create a "dynamic mapping" within an elasticsearch index. The problem I am trying to solve is the following: I have a schema that has an attribute that contains an object that can differ greatly between records. I would like to mirror this data within elasticsearch if possible but believe that automatic mapping may get in the way.
Imagine a scenario where I have a schema like the following:
{
name: string
origin: string
payload: object // can be of any type / schema
}
Is it possible to create a mapping that supports this? I do not need to query the records by this payload attribute, but it would be great if I can.
Note that I have checked the documentation but am confused on if what elastic calls dynamic mapping is what I am looking for.
It's certainly possible to specify which queryable fields you expect the payload to contain and what those fields' mappings should be.
Let's say each doc will include the fields payload.livemode and payload.created_at. If these are the only two fields you'll want to perform queries on, and you'd like to disable dynamic, index-time mappings autogenerated by Elasticsearch for the rest of the fields, you can use dynamic templates like so:
PUT my-payload-index
{
"mappings": {
"dynamic_templates": [
{
"variable_payload": {
"path_match": "payload",
"mapping": {
"type": "object",
"dynamic": false,
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"livemode": {
"type": "boolean"
}
}
}
}
}
],
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Then, as you ingest your docs:
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": false,
"abc":"def"
}
}
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": true,
"modified_at": "2021-04-05 09:00:00"
}
}
and verify with
GET my-payload-index/_mapping
no new mappings will be generated for the fields payload.abc nor payload.modified_at.
Not only that — the new fields will also be ignored, as per the documentation:
These fields will not be indexed or searchable, but will still appear in the _source field of returned hits.
Side note: if fields are neither stored nor searchable, they're effectively the opposite of enabled.
The Big Picture
Working with variable contents of a single, top-level object is quite standard. Take for instance the stripe event object — each event has an id, an api_version and a few other shared params. Then there's the data object that's analogous to your payload field.
Now, all is fine, until you need to aggregate on the contents of your payload. See, since the content is variable, so are the data paths / accessors. But wildcards in aggregation paths don't work in Elasticsearch. Scripts do but are onerous to maintain.
Back to stripe. They partially solved it through what they call polymorphic, typed hashes — as discussed in their blog on API design:
A pretty neat approach that's worth emulating.
P.S. I discuss dynamic templates in more detail in the chapter "Mapping Automation" of my ES Handbook.

Mapping format on elasticsearch

I'm to upload a json document to my server via elasticsearch but i wanted to map it before i upload it but i keep getting a search phase execution exception error.
The json data looks like this
{"geometry":{"type":"Point","coordinates":[-73.20266100000001,45.573647]},"properties":{"persistent_id":"XVCPFsbsqB7h4PrxEtCU3w==","timestamp":1408216040000,"tower_id":"10.48.66.178"}}
So far i've tried this as my mapping. Im not sure what i am doing wrong...
curl –XPUT 'http://localhost:9200/carrier/_search?q=coordinates?pretty=true' -d'
{ “geometry”: {
“type” : {“type” : “string”},
“coordinates” : {“type” : “geo_point”}
},
“properties” : {
“persistent_id” : {“type” : “string”},
“timestamp”: { “type” : “long”},
“tower_id” : {“type” : “string”}
}'
There are a few problems here. First of all you need to use put mapping request instead of search request. The body of the request has to start with the name of the type followed by the list of properties (fields) that you add. The second problem is that you probably copied the example from some documentation where all ascii quotes (") were replaced with replaced with their fancy unicode versions (“ and ”) and dash in front of the XPUT parameter looks like n-dash – instead of normal dash -. You need to replace all fancy quotes and dashes with their ascii versions. So, all together the working statement should look like this (assuming doc as your document type):
curl -XPUT 'http://localhost:9200/carrier/doc/_mapping' -d '{
"doc": {
"properties": {
"geometry": {
"properties": {
"type": {
"type": "string"
},
"coordinates": {
"type": "geo_point"
}
}
},
"properties": {
"properties": {
"persistent_id": {
"type": "string"
},
"timestamp": {
"type": "long"
},
"tower_id": {
"type": "string"
}
}
}
}
}
}'
then you can add document like this:
curl -XPUT 'http://localhost:9200/carrier/doc/1' -d '{"geometry":{"type":"Point","coordinates":[-73.20266100000001,45.573647]},"properties":{"persistent_id":"XVCPFsbsqB7h4PrxEtCU3w==","timestamp":1408216040000,"tower_id":"10.48.66.178"}}'
Please note that in order to add the mapping you might need to delete and recreate the index if you already tried to add documents to this index and the mapping was already created.
This is because you're using the _search endpoint in order to install your mapping.
You have to use the _mapping endpoint instead, like this:
curl –XPUT 'http://localhost:9200/carrier/_mapping/geometry' -d '{
...your mapping...
}'

Store Json field as String in Elastic search?

I am trying to index a json field in elastic search, I have given it external mapping that this field should be treated as string and not json, also indexing is not required for it, so no need to analyze it. The mapping for this is following
"json_field": {
"type": "string",
"index": "no"
},
Still at the time of indexing, this field is getting analyzed and because of that I am getting MapperParsingException
in Short How can we store json as a string in elastic search without getting analyzed ?
Finally got it,
if you want to store JSON as a string, without analyzing it, the mapping should be like this
"json_field": {
"type": "object",
"enabled" : false
},
The enabled flag allows to disable parsing and indexing a named object completely. This is handy when a portion of the JSON document contains an arbitrary JSON which should not be indexed, nor added to the mapping.
Update - From ES version 7.12 "enabled" has been changed to "index".
As of today ElasticSearch 7.12 "enabled" is now "index".
So mapping should be like this:
"json_field": {
"type": "object",
"index" : false
},
Solution
Set "enabled": false for the field.
curl -X PUT "localhost:9200/{{INDEX-NAME}}/_mapping/doc" -H 'Content-Type: application/json' -d'
{
"properties" : {
"json_field" : {
"type" : "object",
"enabled": false
}
}
}
Note: This cannot be applied to the existing field. Either pass it in mapping during the creation of index or you can always create a new field.
Explanation
The enabled setting, which can be applied only to the top-level mapping definition and to object fields, causes Elasticsearch to skip parsing of the contents of the field entirely. The JSON can still be retrieved from the _source field, but it is not searchable or stored in any other way:
Ref: Elasticsearch Docs

Identifying Duplicates in CouchDB

I'm new to CouchDB and document-oriented databases in general.
I've been playing around with CouchDB, and was able to get familiar with creating documents (with perl) and using the Map/Reduce functions in Futon to query the data and create views.
One of the things I'm still trying to figure out is how to identify duplicate values across documents using Futon's Map/Reduce.
For example, if I have the following documents:
{
"_id": "123",
"name": "carl",
"timestamp": "2012-01-27T17:06:03Z"
}
{
"_id": "124",
"name": "carl",
"timestamp": "2012-01-27T17:07:03Z"
}
And I wanted to get a list of document id's that had duplicate "name" values, is this something I could do with the Futon Map/Reduce?
The result was hoping to achieve is as follows:
{
"name": "carl",
"dupes": [ "123", "124" ]
}
..or..
{
"carl": [ "123", "124" ]
}
.. which would be the value, and associated document ids which contain those duplicate values.
I've tried a few different things with Map/Reduce, but so far as I understand, the Map function works with data on a per-document basis, and the Reduce functions only allow you to work with the keys/values from a given document.
I know i could just pull the data I need with perl, work magic there, and get the result I want, but I'm trying to work only with CouchDB for now in order to better understand it's benefits / limitations.
Another way I'm thinking about doing this is to use a single document like an RDBMS table:
{
"_id": "names",
"rec1": {
"_id": "123",
"name": "carl",
"timestamp": "2012-01-27T17:06:03Z"
},
"rec2": {
"_id": "124",
"name": "carl",
"timestamp": "2012-01-27T17:07:03Z"
}
}
.. which should allow me to use the Map/Reduce functions in the way I originally thought. However I'm not sure if this is ideal.
I understand that my mind is still stuck in RDBMS land, so much of what I'm trying to do above may not be necessary. Any insight on this would be much appreciated.
Thanks!
Edit: Fixed JSON syntax in some of the examples.
If you merely want a list of unique values, that's pretty easy. If you wish to identify the duplicates, then it gets less easy.
In both cases, a map function like this should suffice:
function (doc) {
emit(doc.name);
}
For your reduce function, just enter _count.
Your view output will look like: (based on your 2 documents)
{
"rows": [
{ "key": "carl", "value": 2 }
]
}
From there, you will have a list of names as well as their frequency. You can take that list and filter it yourself, or you can take the "all couch" route and use a _list function to perform that final filtering.
function (head, req) {
var row, duplicates = [];
while (row = getRow()) {
if (row.value > 1) {
duplicates.push(row);
}
}
send(JSON.stringify(duplicates));
}
Read up about _list functions, they're pretty handy and versatile.