Implementing a search with Elasticsearch using mysql data - mysql

I am new to Elasticsearch. I was using MySQL Full Text features till now.
I want my MySQL database as my primary database and want to use Elasticsearch alongside as a search engine in my website. I got several problems when thinking about it. The main problem is Syncing between MySQL database and Elastic search.
Some say to use Logstash. But even though I use it, would I need to write separate functions in my program to database transactions and Elasticsearch indexing?

You will need to run periodic job doing full reindex and/or send individual document updates for ES indexing. Logstash sounds like ill-suited thing for the purpose. You need just the usual ES API to index stuff.

Related

Can Mysql interact with Elasticsearch To Find Match?

I have a wordpress website with mysql database of real estate properties listed for sale and an elasticsearch database of addresses I have curated.
Can the mysql query the elasticsearch database to find if there is a matching address then send back to wordpress to move the property into a "xyz" category?
If not does anyone know a way for this type of process to happen?
Thanks
I don't know of any way MySQL can interact directly with ElasticSearch.
Typically you would develop code to interact with both ElasticSearch and MySQL in turn. This would be done with two separate queries. The first to query ElasticSearch to find a matching address, and then a second query against MySQL, using the information returned by the first query as a parameter for the MySQL query.
WordPress in this context is a client application for both MySQL and ElasticSearch. But each has its own protocol and connector. MySQL and ElasticSearch don't share data or queries.
Update: I found a way to use an ElasticSearch data source via MySQL's FEDERATED engine through a proxy that translates MySQL's protocol into ODBC to the ElasticSearch node. This is probably way too complex for your needs, but it is possible. See https://www.cdata.com/kb/tech/elasticsearch-odbc-mysql-federated-tables.rst
By analogy, this is like asking "can I drive my car while making my car drive another car?" Yes, you can — just weld your car's steering rods to some long metal struts that operate the hands of a puppet version of you sitting in the other car.
I don't recommend it.

Real time migration of data from MySQL to elasticsearch?

I have tons of data present in MySQL in form of different database, and their respective tables. They all are related to each other. But when I have to do analysis in data, I have to create different scripts, that combine data, merge it and show me as a result, but this takes a lot of time, and effort too. I love elasticsearch for its speed and visualization of data via kibana, therefore I have decided to move my entire MySQL data in real time to elasticsearch, keeping data in MySQL too. But I want a scalable strategy, and process that migrates that data to elasticsearch.
Suggest the best tool, or methods to do the job.
Thank you.
Prior to Elasticsearch 2.x you could write your own Elasticsearch _river plugin that you can install into elasticsearch. You can control how often you want this said data you've munged with your scripts to be pulled in by the _river (Note: this is not truly recommended).
You may also use your favourite Queuing Message Broker tool such as ActiveMQ to push your data into elasticsearch
If you want full control to meet your need for real time migration of data you may also write a simple app that makes use of elasticsearch REST end point, by simply writing to it via REST. You can even do bulk POST
Make use of any of the elasticsearch tools such as beat, logstash that are great at shipping almost any type of data into elasticsearch
For other alternatives of munging your data to a flat file, or if you want to maintain relationships see this post here

Increase app performances with ElasticSearch on NodeJS

I didn't find exactly what I'm looking for and no one to explain me, so I ask here. Hope this is not a duplicate…
I have an application that run on NodeJS with the ORM sequelize and a MySQL database. The project start one year ago.
I would like, now, to increase performances etc, install ElasticSearch. But I'm not sure if I can.
Can I just tell my elasticserver, where my database is, and magically, it does the mapping alone ?
Or Have I to insert my data on elasticSearch ?
All my infrastructure is hosted on AWS, I'm using RDS. May be it will help ?
In fact I don't know where to put the elasticSearch layer ? Do you have any ideas to help me.
Because, I already worked with Symfony and the FosElasticaBundle, but it works pretty well that I don't know how.
UDPATE:
you can use the ES http api
your existing data in your database can you bulk index
You have to handle changes in your database, so you can use the observe pattern, such as created(), saved(), deleted(), so you can in these methods handle the ES actions to created, update or delete a document
Elasticsearch (ES) is a speed machine with a lot of possibilities (advanced queries, stats, etc.), not a database (read why here: https://www.elastic.co/blog/found-elasticsearch-as-nosql).
It's distributed (can work/share work via multiple servers)
It's document based (json documents, for example with nested, parent/child relationships)
It's mapping based
It's indexing based
More here: https://www.elastic.co/products/elasticsearch
You can use types in ES (like tables in your database).
ES uses indexed data, based on the mapping. Therefore, you still need a database and index the data (denormalized) to ES. You can query, filter, aggregate, analyze the data in your index.

Sphinx and Big Data

I would like to use a full text search engine and I decided to be Sphinx. But I am working with hadoop and Big data platform and Sphinx Search has a compatibility with mysql DB which cannot handle the big data.
So is there a way to use Sphinx with big data environments like hadoop or HDFS or any other nosql database?
Well it comes with built in drivers for loading data from RDBMS's but is certainly not limited as such.
For starters there 'pipe' indexing options...
http://sphinxsearch.com/docs/current.html#xmlpipe2
http://sphinxsearch.com/docs/current.html#xsvpipe
These just run a script and index the output. That script can fetch the data from just about any system imaginable.
Plenty of projects can use to get started, ramdom example:
https://github.com/georgepsarakis/mongodb-sphinx
You might also be able to get injest a CSV output from hadoop directly?
There are also real-time indexes. Where the data is inserted directly into an index, on the fly. Not a Hadoop expert, but in theory, could have a hadoop project inject the results directly into sphinx (the outputcommitter?), rather (or in addition to) writing the results to HDFS.
http://sphinxsearch.com/docs/current.html#rt-indexes
Might also be able to use something like
https://www.percona.com/blog/2014/06/02/using-infinidb-engine-mysql-hadoop-cluster-data-analytics/
as a bridge between hadoop and sphinx. (ie sphinx'es indexer creates an index via the fake mysql engine)

With Solr, Do I Need A SQL db as well?

I'm thinking about using solr to implement spatial and text indexing. At the moment, I have entries going in to a MYSQL database as well as solr. When solr starts, it reads all the data from MYSQL. As new entries come in, my web servers write them to MYSQL and, at the same time, adds documents to solr. More and more, it seems that my MYSQL implementation is just becoming a write-only persisten store (more or less, a backup for the data in solr) - all of the reading of entries are done via solr queries. Really the only data being read from MYSQL is user info, which doesn't need to be indexed/searched.
A few questions:
Do I really need the MYSQL implementation or could I simply store all of my data in solr?
If solr only, what are the risks associated with this solution?
Thanks!
Almost always, the answer is yes. It needn't be a database necessarily, but you should retain the original data somewhere outside of Solr in the event you alter how you index the data in Solr. Unlike most databases, which Solr is not, Solr can't simple re-index itself. You could hypothetically configure your schema so that all your original data is marked as "stored" and then perhaps to a CSV dump and re-index that way, but I wouldn't recommend this approach.
Shameless plug: For any information on using Solr, I recommend my book.
I recommend a separate repository. MySQL is one choice. Some people use the filesystem.
You often want a different schema for searching than for storing. That is easy to do with a separate repository.
When you change the Solr schema, you need to reload the content. Unloading all the content from Solr can be slow. If it is already in a separate repository, then you don't need to dump it from Solr, you can overwrite what is there.
In general, making Solr be both a search engine and a repository really reduces your flexibility and options for making search the best it can be.