We are planning to migrate an application from MySQL to Cassandra. The one major issue we're seeing is that the application makes extensive use of MyISAM's full text search. What can we use an alternative on cassandra?
There is an implementation of Solr in Cassandra: Solandra.
Solr (pronounced as /soʊlə/,/soʊlər/, SOH-lər) is an open source
enterprise search platform from the Apache Lucene project. Its major
features include powerful full-text search, hit highlighting, faceted
search, dynamic clustering, database integration, and rich document
(e.g., Word, PDF) handling. Providing distributed search and index
replication, Solr is highly scalable.1
You can find some other information here: http://www.datastax.com/docs/datastax_enterprise2.0/search/dse_search_about
Use Elassandra which runs Elasticsearch as a plugin for Apache Cassandra.
Some real example of Elassandra can be found in here
Related
We have a microservice-based application which is developed in Spring Boot. Let us assume there are 3 microservices A, B, and C. The front-end is written using Angular and the backend comprises MySQL database and Hibernate for ORM. We are required to implement a full-text search functionality that will have a search box on the UI where the user can enter the text of his choice. The search must be able to return data from databases from all the 3 microservices. I am facing difficulty in finalizing the search technology for the same. Some of the technologies I have in my purview are
Hibernate Search
Apache Solr
ElasticSearch
Which is the best technology for this problem? If possible, are there any examples of the same?
Hibernate search looks to be depending on internal search solution for it to be providing full text search which could be either plain Apache Lucene or Elasticsearch. I'm not sure but its Elasticsearch integration if its already matured as version 6.0 is still in development stage.
The older/stable version of Hibernate search i.e. 5.11 supports Elasticsearch 2.0 to 5.6.
But looking at your queries, it depends on what you uses case you have. Perhaps below points would help you.
What is the size of the data you have and what is the expected growth rate of your documents/data.
What would be your write vs read rates for this application?
What type of search use-cases you have? What features of search are you looking for? for e.g. Autocomplete, autosuggestion, highlighting, faceted-search
Are you looking for a distributed search or do you have a limitation in using hardware?
Is there a requirement to support search in multiple-languages?
Is that only text search would suffice or would you also be doing analysis on the search logs or click-view data in the future?
What options do you have when it comes to ingesting documents into your search engine. If its Elasticsearch you can easily make use of Beats or Logstash. Or you can simply dump raw data into ES, and then make use of a combination of Ingest API to do pre-processing/enrichment/filtering and then again push the processed data in different index in Elasticsearch.
Both Solr and Elasticsearch are great technologies but if you have to use one of them, I would strongly suggest in using Elasticsearch because it would help you in all of the above queries, much more powerful distributed model, it has it own amazing DSL which is very mature and easy to use, has excellent administrative tools/API for data managment, It is extremely fast and easy to set up. Not to mention their aggregation queries which helps you get analytical information about the documents you'd have ingested.
You would also have the luxury of setting up your own dashboard via Kibana which would help you quickly create some great visualizers.
Plus point is it is completely RESTful by nature, so that means it makes your life easier when it comes to deployment of your applications. I'd suggest you to start from here and spend sometime in understanding the technology.
Hope this helps!
For fulltext search Elasticsearch and Apache Solr are the best choices from the given selection.
However, I strongly disagree that Elasticsearch is better than Solr or other way around without knowing more info about your business case. Both technologies will perform equally for the given problem since they are both build upon Apache Lucene search engine.
They both offer great REST CLients.
Here you can check out example implementation of both - Apache Solr and Elasticsearch in the same project in Java. You can also check what are the differences and which one do you prefer.
Also, there are six write-ups on how to use Apache Solr and Elasticsearch written here. Last chapter is about research which shows that both engines are almost equal and have differences only in very specific business cases. Both have many supporting tools as well.
I love Phalcon, but it does not come with Couchbase whereas Couchbase provides ODM and SDKs (Dlls for XAMPP / WAMPP) for PHP. Can you please compile Phalcon with Couchbase SDKs / ODM ?
I am looking for Couchbase because so far Couchbase is the best of all NoSQL servers I have worked with. It supports distributive memcache-compatible in-memory Key-Value store as well as Document Store. Also, it allows N1QL (SQL-like Data Query language), Eventing Services using JavaScript and much more. It supports Full-Text Search with Fuzzy-Search as optional parameter. It is lightning fast and features-rich as compared to other NoSQL. It also offers Enterprise version with Enterprise-level support. Note that Doctrine-2 supports Couchbase, but I want it with Phalcon, not with any thick-layered ORM.
I'm planning out a Rails app that will be hosted on Heroku and will need both geospatial and full text search capabilities.
I know that Heroku offers add-ons like WebSolr and IndexTank that sound like they can do the job, but I was wondering if this could be done in MySQL and/or PostgreSQL without having to pay for any add-ons?
Depending on the scale of your application you should be able to accomplish both FULLTEXT and SPATIAL indexes in MySQL with ease. Once your application gets massive, i.e hundreds of millions of rows with high concurrency and multiples of thousands of requests per second you might need to move to another solution for either FULLTEXT or SPATIAL queries. But, I wouldn't recommend optimize for that early on, since it can be very hard to do properly. For the foreseeable future MySQL should suffice.
You can read about spatial indexes in MySQL here. You can read about fulltext indexes in MySQL here. Finally, I would recommend taking the steps outlined here to make your schema.rb file and rake tasks work with these two index types.
I have only used MySQL for both, but my understanding is that PostgreSQL has a good geo-spatial index solution as well.
If you have a database at Heroku, you can use Postgres's support for Full Text Search: http://www.postgresql.org/docs/8.3/static/textsearch.html. The oldest servers Heroku runs (for shared databases) are on 8.3 and 8.4. The newest are on 9.0.
A blog post noticing this little fact can be seen here: https://tenderlovemaking.com/2009/10/17/full-text-search-on-heroku.html
Apparently, that "texticle" (heh. cute.) addon works...pretty well. It will even create the right indexes for you, as I understand it.
Here's the underlying story: postgres full-text-search is pretty fast and fuss-free (although Rails-integration may not be great), although it does not offer the bells and whistles of Solr or IndexTank. Make sure you read about how to properly set up GIN and/or GiST indexes, and use the tsvector/tsquery types.
The short version:
Create an (in this case, expression-based) index: CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector('english', body));. In this case "body" is the field being indexed.
Use the ## operator: SELECT * FROM ... WHERE to_tsvector('english', pgweb.body) ## to_tsquery('hello & world') LIMIT 30
The hard part may be mapping things back into application land, the blog post previously cited is trying to do that.
The dedicated databases can also be requisitioned with PostGIS, which is a very powerful and fully featured system for indexing and querying geographical data. OpenStreetMap uses the PostgreSQL geometry types (built-in) extensively, and many people combine that with PostGIS to great effect.
Both of these (full text search, PostGIS) take advantage of the extensible data type and indexing infrastructure in Postgres, so you should expect them to work with high performance for many, many records (spend a little time carefully reviewing the situation if things look busted). You might also take advantage of fact that you are able to leverage these features in combination with transactions and structured data. For example:
CREATE TABLE products (pk bigserial, price numeric, quantity integer, description text); can just as easily be used with full text search...any text field will do, and it can be in connection with regular attributes (price, quantity in this case).
I'd use thinking sphinx, a full text search engine also deployable on heroku.
It has geo search built-in: http://freelancing-god.github.com/ts/en/geosearching.html
EDIT:
Sphynx is almost ready for heroku, see here: http://flying-sphinx.com/
IndexTank is now free up to 100k documents on Heroku, we just haven't updated the documentation. This may not be enough for your needs, but I thought I'd let you know just in case.
For full text search via Postgre I recommend pg_search, I am using it myself on heroku at the moment. I have not used texticle but from what I can see pg_search has more development activity lately and it has been built upon texticle (it will not add indexes for you, you have to do it yourself).
I cannot find the thread now but I saw that Heroku gave option for pg geo search but it was in beta.
My advice is if you are not able to find postgre solution is to host your own instance of SOLR (on EC2 instance) and use sunspot solr gem to integrate it with rails.
I have implemented my own solution and used WebSolr as well. Basically that is what they give you their own SOLR instance hassle free. Is it worth the money, in my opinion no. For integration that use sunspot solr client as well, so it is just are you going to pay somebody 20$/40$/... to host SOLR for you. I know you also get backups, maintenance etc. but call me cheap I prefer my own instance. Also WebSolr is locked on 1.4.x version of SOLR.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I'm looking for a stand-alone full-text search server with the following properties:
Must operate as a stand-alone server that can serve search requests from multiple clients
Must be able to do "bulk indexing" by indexing the result of an SQL query: say "SELECT id, text_to_index FROM documents;"
Must be free software and must run on Linux with MySQL as the database
Must be fast (rules out MySQL's internal full-text search)
The alternatives I've found that have these properties are:
Solr (based on Lucene)
ElasticSearch (also based on Lucene)
Sphinx
My questions:
How do they compare?
Have I missed any alternatives?
I know that each use case is different, but are there certain cases where I would definitely not want to use a certain package?
I've been using Solr successfully for almost 2 years now, and have never used Sphinx, so I'm obviously biased.
However, I'll try to keep it objective by quoting the docs or other people. I'll also take patches to my answer :-)
Similarities:
Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
Both have a long list of high-traffic sites using them (Solr, Sphinx)
Both offer commercial support. (Solr, Sphinx)
Both offer client API bindings for several platforms/languages (Sphinx, Solr)
Both can be distributed to increase speed and capacity (Sphinx, Solr)
Here are some differences:
Solr, being an Apache project, is obviously Apache2-licensed. Sphinx is GPLv2. This means that if you ever need to embed or extend (not just "use") Sphinx in a commercial application, you'll have to buy a commercial license (rationale)
Solr is easily embeddable in Java applications.
Solr is built on top of Lucene, which is a proven technology over 8 years old with a huge user base (this is only a small part). Whenever Lucene gets a new feature or speedup, Solr gets it too. Many of the devs committing to Solr are also Lucene committers.
Sphinx integrates more tightly with RDBMSs, especially MySQL.
Solr can be integrated with Hadoop to build distributed applications
Solr can be integrated with Nutch to quickly build a fully-fledged web search engine with crawler.
Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can't.
Solr comes with a spell-checker out of the box.
Solr comes with facet support out of the box. Faceting in Sphinx takes more work.
Sphinx doesn't allow partial index updates for field data.
In Sphinx, all document ids must be unique unsigned non-zero integer numbers. Solr doesn't even require an unique key for many operations, and unique keys can be either integers or strings.
Solr supports field collapsing (currently as an additional patch only) to avoid duplicating similar results. Sphinx doesn't seem to provide any feature like this.
While Sphinx is designed to only retrieve document ids, in Solr you can directly get whole documents with pretty much any kind of data, making it more independent of any external data store and it saves the extra roundtrip.
Solr, except when used embedded, runs in a Java web container such as Tomcat or Jetty, which require additional specific configuration and tuning (or you can use the included Jetty and just launch it with java -jar start.jar). Sphinx has no additional configuration.
Related questions:
Full Text Searching with Rails
Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?
Unless you need to extend the search functionality in any proprietary way, Sphinx is your best bet.
Sphinx advantages:
Development and setup is faster
Much better (and faster) aggregation. This was the killer feature for us.
Not XML. This is what ultimately ruled out Solr for us. We had to return rather large result sets (think hundreds of results) and then aggregate them ourselves since Solr aggregation was lacking. The amount of time to serialize to and from XML just absolutely killed performance. For small results sets though, it was perfectly fine.
Best documentation I've seen in an open source app
Solr advantages:
Can be extended.
Can hit it directly from a web app, i.e., you can have autocomplete-like searches hit the Solr server directly via AJAX.
Note: There are many users with the same question in mind.
So, to answer to the point:
Which and why?
Use Solr if you intend to use it in your web-app(example-site search engine). It will definitely turn out to be great, thanks to its API. You will definitely need that power for a web-app.
Use Sphinx if you want to search through tons of documents/files real quick. It indexes real fast too. I would recommend not to use it in an app that involves JSON or parsing XML to get the search results. Use it for direct dB searches. It works great on MySQL.
Alternatives
Although these are the giants, there are plenty more. Also, there are those that use these to power their custom frameworks. So, i would say that you really haven't missed any. Although there is one elasticsearch that has a good user base.
I have been using Sphinx for almost a year now, and it has been amazing.
I can index 1.5 million documents in about a minute on my MacBook, and even quicker on the server. I am also using Sphinx to limit searches to places within specific latitudes & longitudes, and it is very fast.
Also, how results are ranked is very tweakable.
Easy to install & setup, if you read a tutorial or two.
Almost 1.0 status, but their Release Candidates have been rock solid.
Lucene / Solr appears to be more featured and with longer years in business and a much stronger user community.
imho if you can get past the initial setup issues as some seems to have faced (not we) then I would say Lucene / Solr is your best bet.
How do the full text search systems of PostgreSQL and MySQL compare? Is any clearly better than the oder? In which way are they different?
PostgreSQL 8.3 has built in full text search which is an integrated version of the "tsearch2"
Here is the documentation: http://www.postgresql.org/docs/8.3/static/textsearch.html
And the example from the documentation:
SELECT title
FROM pgweb
WHERE to_tsvector(body) ## to_tsquery('friend');
Where body is a text field. You can index specifically for these types of searches and of course they can become more complex than this simple example. The functionality is very solid and worth diving into as you make your decision.
Best of luck.
Update: Starting in MySQL 5.6, InnoDB supports fulltext search
I'm not well versed in PostgreSQL unfortunately, but if you use the FULL TEXT search in MySQL you're immediately tied to MyISAM. If you want to use InnoDB (and if ACID compliance means anything to you, you should be using InnoDB) you're stuck using other solutions.
Two popular alternatives that are often rolled out are Lucene (an apache project with a Zend module if you're using PHP) and Sphinx.
If your using Hibernate as a ORM I highly recommend using Hibernate search. Its build on top of Lucene so its super fast.
Karl
I've had pretty good experience with postgresql/tsearch2, especially since it was rolled into the standard distribution (before version 8.0 - I think - it was an optional contrib feature, and upgrading to tsearch2 involved a bit of work).
If I recall correctly you have to set some properties (fuzzy matching, dictionary stuff) before startup, whereas on other databases those things are flexibly exposed through the fulltext syntax itself (I'm thinking of Oracle Text, here, though I know that's not relevant to your question).
I think you can use Sphinx with both MySQL and Postgres.
Here is an article to explain how to use Sphinx with MySQL (you can add it as a plugin)
Mysql full text search is very slow. It can't handle data more than 1 million (several tens of seconds per query).
I've no experience using postgresql full text search.
I have used sphinxsearch. It is very fast and easy to use. But it is not so powerful. I mean the search functionality. For example, it doesn't support like 'abc?', where '?' stands for any character.
I also know lucene. It is powerful, but it is hard to learn.