I have a data source which is sitting in relational database. I managed to index/store everything into Solr and thrilled to see the search performance and the awesome API (search/admin..etc).
However, people say if your data is truly structured, relational database should be fast if you index everything. However, even if I dump all the data into a relation database like MySQL, what I am missing is all the beautiful query API.
I guess my question is:
is it possible to only use the query API of Solr-ish and totally use relation database as the backend instead of using index at all.
if that is not possible, is there any mature project/product that can build a full stack query API on a relational database?
Document search engines and relational databases serves different usage patterns. If you're using Solr for anything that involves tokenization and analysis chains, replicating that in an RDBMS requires implementing that functionality yourself (or just using a subset, such as full text indices in certain RDBMSes). I detailed some of these differences and features in Should I just query the database or use a proper search engine solution?.
It's usually better to use the RDBMS as the main storage for your data and then push it into the search index as required. This will also let you get new features from those who care about search and the problem it tries to solve, without having to wait for a niche product to implement it on top of your RDBMS (there's still quite a few new features in each iteration of Lucene, Elastic and solr).
Related
Iam about to create a huge database with at least 200 Million entries.
The database needs to be searchable using full text and should be fast.
My database gets data from many different datasources and i need to import the new or updated data regularly.
Is it a good idea to store all my data in a relational database like mysql and then create a nosql document database (e.g. mongodb or elasticsearch) just for the purpose of searching or does that not provide any benefit in terms of
reliability and the prevention of redundant information?
I believe that keeping primary records in a SQL database and duplicating them to a noSQL database is a very common approach.
ElasticSearch has an ongoing status page about their resiliency. Even in the newest version, ElasticSearch can loose data in a number of different situations. A major change in the structure of an ElasticSearch index (such as adding analyzers) requires that you re-index all of the documents. This process is safer if you have another source for the documents. At the end of the day, ElasticSearch isn't designed to consistently store documents - I would only ever choose to use ElasticSearch as the primary store in situations where occasional data loss isn't a disaster.
Unlike ElasticSearch, MongoDB is designed to be resilient. You should be able to safely store documents in MongoDB. I've found trying to do full text searches in MongoDB can be a little painful, at least compared to ElasticSearch. In my opinion, for text search, the only advantage MongoDB has over MySQL's FULLTEXT is that it is distributed.
We are running ElasticSearch and MySQL right now - and the benefits greatly outweigh the hassles of extra infrastructure and dealing with replication between the two. We had previously attempted to use a noSQL solution as the primary datastore, with disastrous results. Running a ES in conjunction with a MySQL gets you the best of both worlds - consistency & safety of data in SQL, with the scalable, effective full text search in ES.
I don't know how applicable to your situation this is, but Evan Weaver compared a few of the common Rails search options (Sphinx, Ferret and Solr), running some benchmarks.
I am working on a website which would be having all the restaurant related details for a particular country. I was considering which DB would be best suitable for this kind of scenario,very similar to this.
I was considering to use MongoDB just because it would provide me with flexible schema and Simple queries for data retrieval. I am rethinking over my decision as neither my data is going to be too large as of my now so there wont be nay blockage for me w.r.t data size in MySQL.
What would be best way to choose between the 2.
It depends truly on whether you want data integrity and ACID features of a Relational Database. If a Relational Database is built correctly using the Relational Model and E.F Codd's rules, you will never have a problem with having duplicate data, inconsistencies, and other maladies.(This is assuming you use a RDBMS that is worth its salt, like oracle or SAP ASE)
However, you also have the option of MongoDB, which as you pointed out, is very flexible. However, through my experience, you will have to do a lot more manual work ensuring data accuracy and integrity.
However, certain things are easier in it, and it is by no means not successful. I use Mongo as a data back end for simulation servers I run, and it performs beautifully. Where mongo truly exceeds is with its atomic documents, and that's where Mongo pulled ahead of other NoSQL systems like CouchDB.
What it truly comes down to is what kind of data you are storing. If you are storing relational data, use a RDBMS. If it is more document based, use Mongo or a similar data storage engine. I do not like the idea of choosing a data storage engine by what is popular or what is new. Use what fits your data.
I hope this answers you question satisfactorily, if not please comment below.
I'm creating a search engine for deals, disscounts and coupons. First with my engine I collect deals from some sites and write that deals into database. So an records have a:
records: name,dissount,price,latitude,longitude
Now i'm using mysql but is my search engine will be faster if I use mongodB becouse all results in is similar json format
What is better solution if I have 1,000,000 records mysql or mongoDB ? I need faster searching.
http://test.pluspon.com
For your use case MongoDB would indeed be faster.
You can easy implement processing with multiple mongos in sharded environments, there would not be any blocking and even more performance gain for your use case.
But keep in mind that speed benchmarks and fast data processing is not the only thing you should care about. MongoDB is still at very young age compared to more mature enterprise databases. But for your named use case i would advise to go with it.
Also as commented there are other NoSQL databases that could help you even better in some cases. Read up this blog for more understanding
i am writing code for friend list and messaging system for my college website.I need to store interconnected data.. need to search them ...It has about 3500 records..So which way I proceed MYSQL or XML ..which is fastest..which is best ?why?
I'm going to use one of my professor's favorite answers here: "it depends."
XML and MySQL have very different applications. If you need to be doing lots of simultaneous queries for all sorts of sophisticated things, MySQL is your clear winner. Sometimes MySQL can be hard to use in some applications because you must first create a database schema in which to fit your data. It sounds like though, that you have many records with the same structure, and it would be easy enough to throw them into a database. With a SQL based database engine like MySQL, you can also construct queries using the standard SQL language. Database optimizations can also help to increase the performance of these types of queries, for example, you can used indexes and keys. If your data needs to be updated regularly, than MySQL will likely provide better performance as it will not have to rewrite the XML file. If you need your application to scale to many simultaneous connections of sophisticated queries, you are definitely going to want to go with some sort of SQL solution.
Depending upon your application though, sometimes there are other ways to store and access your data. I for one once needed to create a persistent data structure on the disk which could be accessed very quickly, but never updated. For that, I used cdb. There are also other database systems out there like the Berkeley database, and some No-SQL solutions such as couchdb and mongodb. I posed a somewhat interesting question here on stackoverflow on the use of No-SQL solutions a little while back which you may find interesting as well.
This is really just a sampling of different considerations you may want to make when you are choosing how you want to store your data. Think about questions like: How frequently will things be queried? or updated? What will your queries look like? What kinds of applications do you need to access your information from? etc.
I've looked into Doctrine's built-in search, MySQL myisam fulltext search, Zend_Lucene, and sphinx - but all the nuances and implementation details are making it hard to sort out for me, given that I don't have experience with anything other than the myisam search.
What I really want is something simple that will work with the Zend Framework and Doctrine (MySQL back-end, probably InnoDB). I don't need complex things like word substitutions, auto-complete, and so on (not that I'd be opposed to such things, if it were easy enough and time effective enough to implement).
The main thing is the ability to search for strings across multiple database tables, and multiple fields with some basic search criteria (e.g. user.state. = CA AND user.active = 1). The size of the database will start at around 50K+ records (old data being dumped in), the biggest single searchable table would be around 15K records, and it would grow considerably over time.
That said, Zend_Lucene is appealing to me because it is flexible (in case I do need my search solution to gorw in the future) and because it can parse MS Office files (which will be uploaded to my application by users). But its flexibility also makes it kind of complicated to set up.
I suppose the most straightforward option would be to just use Doctrine's search capabilities, but I'm not sure if that's going to be able to handle what I need. And I don't know that there is any option out there which is going to combine my desire for simplicity & power.
What search solutions would you recommend I investigate? And why would you think that solution would work well in this situation?
I would recomment using Solr search engine.
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface (which is really great) and many more features.
It runs in a Java servlet container such as Tomcat.
You can use the solr-php-client to handle queries in php.