Elasticsearch: how to search a user_defined field? - mysql

I use mysql to store user data, and search the data by Elasticsearch. For mysql, I have a user defined field, this field can store a JSON format data.
the example data like this:
data1 =
{
"name" : "test1",
"age": 10,
"user_defined": {
"a" : "aaa",
"b" : "bbb",
"c" : "ccc",
.....
}
}
data2 =
{
"name" : "test2",
"age": 20,
"user_defined": {
"d" : "ddd",
"e" : "eee",
"f" : "fff",
.....
}
}
For user_defined field, the number of keys is not fixed, the type of values all are string, I hope each key can be searched, how to define the mapping? how to search this kind of data by Elasticsearch?
Anyone has good idea?

You can define the mapping of the user_defined as "type": "object", like this:
PUT your_index
{
"mappings": {
"your_type": {
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
},
"user_defined": {
"type": "object"
}
}
}
}
}
Thereafter, you can index your documents and search them easily with the Query DSL
POST your_index/_search
{
"query": {
"match" : {
"user_defined.a" : "aaa"
}
}
}

Every field in a document in Elasticsearch is indexed and searchable by default.
Elasticsearch provides a Search Lite API and a Query DSL for searching.

Related

storing boolean values in elasticsearch : optimization?

I have json documents with entries like :
......
{
"Fieldname" : "booked",
"Fieldvalue" : "yes"
}
...
Within the json document, there are many fields like this, where Boolean value is indirectly mentioned using Fieldname and Fieldvalue : Essentially it signifies that booked=true. Would it be more efficient to transform the json before storing it in elasticsearch ? I.e. replacing the above with :
{
"booked" : true
}
? The search use case is that I want to figure out whether similar json already exists in the system before adding another json.
Yes the later one is much cleaner way to store and search purpose both. Say you want to get all the booked properties from your index then you can easily do this way instead of using extra Fieldname and Fieldvalue
GET /properties/_search
{
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"country_code.keyword": "US"
}
},
{
"match": {
"booked": true
}
},
{
"range": {
"usd_price": {
"gte": 50,
"lte": 100000
}
}
}
]
}
},
"sort": [
{
"ranking": {
"order": "desc"
}
}
],
"_source": [
"property_id",
"property_name",
"country",
"country_code",
"state",
"state_abbr",
"city"
]
}

How to perform partial matching on _id in Elastic search

I am trying to perform a partial word matching on the _id field in my Elastic search instance.
After searching the official documentation I found out that the best way to do this is to create a n-gram analyzer, so using Sense I did this:
PUT /index2
{"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"partial_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"partial": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"partial_filter"
]
}
}
}
}}
I have tried to test the analyzer using :
POST /index2/_analyze
{
"analyzer": "partial",
"text": "brown fox"
}
And it worked as expected producing proper combinations.
The next step should be to apply the analyzer to the relevant fields,so I tried to do this:
PUT /index2/_mapping/type2
{
"type2": {
"properties": {
"_id": {
"type": "string",
"analyzer": "partial"
}
}
}
}
But i am getting an error:
"reason": "Field [_id] is defined twice in [type2]"
Probably because _id field gets created during the index2 creation along with the analyzer.
So my question is how can I use the partial search on the _id field?
Is there any other way to do this?
Thanks in advance!

How to fetch the data in the mongodb

How to fetch the data from the json file using mongoshell
I want to fecch the Data by policyID
Say in the json file I sent the PolicyID is 3148
I tried could of ways to write the command but say 0 rows fetched.
db.GeneralLiability.find({"properties.id":"21281"})
db.GeneralLiability.find({properties:{_id:"21281"}})
Do i need to set any thing else?index,cursors etc?
Sample json
{
"session": {
"data": {
"account": {
"properties": {
"userName": "abc.com",
"_dateModified": "2014-10-01",
"_manuscript": "Carrier_New_Rules_2_1_0",
"_engineVersion": "2.0.0",
"_cultureCode": "en-US",
"_cultureName": "United States [english]",
"_context": "Underwriter",
"_caption": "Carrier New Rules (2.1.0)",
"_id": "p1CEB08012E51477C9CD0E89FE77F5E51"
},
"properties": {
"_xmlns:xsd": "http://www.w3.org/2001/XMLSchema",
"_xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
"_id": "3148",
"_HistoryID": "5922",
"_Type": "onset",
"_Datestamp": "2014-10-01T04:46:33",
"_TransactionType": "New",
"_EffectiveDate": "2014-01-01",
"_Charge": "1599",
"_TransactionGroup": "t4CE4FA751F9C400D9007E692A883DA66",
"_PolicyID": "3148",
"_Index": "1",
"_Count": "1",
"_Sequence": "1"
}
}
}
This will return the document with _PolicyID = "3148":
db.GeneralLiability.find({
"session._PolicyID": "3148"
}).pretty();
You have some issues in your document formatting. First off I am pretty sure that using underscores are reserved for mongo (I could be wrong). Either way it is bad form. I have restructured your data for you. I am not sure why you wanted to nest your data so much, but I am guessing you had a good reason for it.
You will notice that I am using the ObjectID from Mongo for my _id:
{
"_id" : ObjectId("56e1c1f53bac31a328e3682b"),
"session" : {
"data" : {
"account" : {
"properties" : {
"xmlns:xsd" : "http://www.w3.org/2001/XMLSchema",
"xmlns:xsi" : "http://www.w3.org/2001/XMLSchema-instance",
"HistoryID" : "5922",
"Type" : "onset",
"Datestamp" : "2014-10-01T04:46:33",
"TransactionType" : "New",
"EffectiveDate" : "2014-01-01",
"Charge" : "1599",
"TransactionGroup" : "t4CE4FA751F9C400D9007E692A883DA66",
"PolicyID" : "3148",
"Index" : "1",
"Count" : "1",
"Sequence" : "1"
}
}
}
}
}
Now if you run this command it will return your document:
{ "session.data.account.properties.PolicyID": "3148" }

Nested filter numerical range

I have the following json object:
{
"Title": "Terminator,
"Purchases": [
{"Country": "US", "Site": "iTunes", "Price": 4.99},
{"Country": "FR", "Site": "Google", "Price": 5.99}
]
}
I want to be able to find an object specifying a Country+Site+PriceRange. For example, the above should return True on Country=US&Price<5.00, but should return False on Country=FR&Price<5.00. How would the index and query look to do this? Here is another answer that this is a follow-up question to: Search within array object.
Simply add a Range query to your Bool query logic tree. This will return documents that match US for country and have the Price field with a numeric value less than 5.
{ "query":
{ "nested" : {
"path" : "Purchases",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"Purchases.Country" : "US"}
},
{
"range" : "Purchases.Price":
{
"lte": 5
}
}
]
}
}
}
}
}

Proper Mapping for dynamic fields

I have the following document structure:
{
"some_field": "some_data",
"entries": {
{"id": "some_id", "type": "some_type", "value": "some_value"},
{"id": "another_id", "type": "another_type", "value": {"foo": 1, "bar": "two"}
}
}
So I would like to map entries based on the "type" field.
Which maping type or flag should I use?
Or maybe I need to change my doc structure?
Could you use this one
{
"some_field":"some_data",
"entries":[{
"id":"some_id",
"type":"some_type",
"value":"some_value"
},
{
"id":"another_id",
"type":"another_type",
"value":{
"foo":1,
"bar":"two"
}
}]
}