PHP MYSQL Solr auto updating - mysql

I am new to Solr and i need to know whether i am thinking correct about Solr and MySQL relation or not.
We index data from MySQL to Solr for once and then all add,edit,delete, update queries etc are implemented on Solr and MySQL got no change meanwhile. If we need to update MySQL as well, we will have to export (or some thing like that) from Solr to MySQL to keep MySQL up to date.
Am i thinking Right?
We only need to index those tables of MySQL on Solr which need search, not all MySQL data?
Am i thinking Right again?

Qn 1. Usual case is that your main data store is MySQL, so Solr is the one that lags. One can either use the data import handler or write custom indexing programs to get data from MySQL to Solr.
Qn 2. Along with the fields you want to search (which are indexed fields), you can also keep non-indexed stored fields in Solr. This will help you build your data from Solr itself, without doing a secondary DB query.

Related

Validation of migrated data for MySQL

I'm migrating a large(approx. 10GB) MySQL database(InnoDB engine).
I've figured out the migration part. Export -> mysqldump, Import -> mysql.
However, I'm trying to figure out the optimum way to validate if the migrated data is correct. I thought of the following approaches but they don't completely work for me.
One approach could have been using CHECKSUM TABLE. However, I can't use it since the target database would have data continuously written to it(from other sources) even during migration.
Another approach could have been using the combination of MD5(), GROUP_CONCAT, and CONCAT. However, that also won't work for me as some of the columns contain large JSON data.
So, what would be the best way to validate that the migrated data is correct?
Thanks.
How about this?
Do SELECT ... INTO OUTFILE from each old and new table, writing them into .csv files. Then run diff(1) between the files, eyeball the results, and convince yourself that the new tables' rows are an appropriate superset of the old tables'.
These flat files are modest in size compared to a whole database and diff is fast enough to be practical.

Search across two different databases (mysql and postgres)

Is it possible to search for something that is in two databases? For example, I want to do a "starts with" search on a column in Postgres as well as a column in MySQL where one is "name" and one is "email"
Copying over data is not reliable as new data will be created in both databases constantly.
Yes, it is possible. For the "starts with" part, you should be able to use the standard Postgres string functions, of which starts_with is one, and indexing on the desired columns.
Getting the data from MySQL is the more complicated part.
You would most likely want to use a foreign data wrapper (e.g. FDW) from Postgres to access the MySQL data, and then handle the unioning of it (or other desired processing) with the Postgres data for returning the combined data set.
You could write your own FDW if you have particularly specific requirements, or you could try an open source one, such as this one from EnterpriseDB. EnterpriseDB is a Postgres consultancy and offers their own Postgres version, but the doc on the Github page for this says it is compatible with base Postgres as well as their own version.

Solr: continuous migration from MySQL

This may sound like an opinion question, but it's actually a technical one: Is there a standard process for maintaining a simple data set?
What I mean is this: let's say all I have is a list of something (we'll say books). The primary storage engine is MySQL. I see that Solr has a data import handler. I understand that I can use this to pull in book records on a first run - is it possible to use this for continuous migration? If so, would it work as well for updating books that have already been pulled into Solr as it would for pulling in new book records?
Otherwise, if the data import handler isn't the standard way to do it, what other ways are there? Thoughts?
Thank you very much for the help!
If you want to update documents from within Solr, I believe you'll need to use the UpdateRequestHandler as opposed to the DataImportHandler. I've never had need to do this where I work, so I don't know all that much about it. You may find this link of interest: Uploading Data With Index Handlers.
If you want to update Solr with records that have newly been added to your MySQL database, you would use the DataImportHandler for a delta-import. Basically, how it works is you have some kind of field in MySQL that shows the new record is, well, new. If the record is new, Solr will import it. For example, where I work, we have an "updated" field that Solr uses to determine whether or not it should import that record. Here's a good link to visit: DataImportHandler
The question looks similar to the one which we are doing, but not with SQL. Its with HBase(hadoop stack DB). However there we have Hbase indexer, which after mapping DB with Solr, listens to the events in hbase(DB) for new rows, and then executes code to fetch those values from DB and add in Solr. Not sure if there is such for SQL. However the concept looks similar. IN SQL I know about triggers which can listen to inserts and updates. At that even, you can trigger something to execute the steps of adding them in continuosly manner.

Solr search with Mysql Database, any utility for data importing

We are looking at ways of improving "search" functionality in our large business application which currently uses SQL Like syntax to do it. So we started evaluating Solr server and were able to index few of our database tables and search. But I am newbie and wanted to know if
1) We have large number of tables in our application. Is there any utility that generates schema xml in solr using the database tables?
2) Our current search lists the database row that meets the search criteria (this was written using SQL 'like' and takes lot of time to generate the search results). We want to simulate the exact functionality using solr. Is that possible?
For importing a database into SOLR, you might want to look into DataImportHandler.
There will be a fair amount of configuration required for it, defining what tables and columns to import, what should be stored, and how it should be indexed.

What is the best way to make two instances in Solr which use identical schemas?

I got indexed a Mysql database using Solr and everything is perfect. Now i got another database which uses exactly the same schema as my first database but with different data in it.
What i want is to use Solr to index also the second database using the same solr schema that i created for my first database since are completely the same!
I read that Solr cores allows you to run multiple instances that use different configuration sets and indexes, but in my case i got the same exactly configuration, the only thing that changes is the database name.
My question is what is the best way two create two Solr instances that use the same configuration?
Cheers
You could use two cores and share a schema. Just read the Wiki. But in practice you might want to keep the flexibility and just copy the schema for a second core.
How about using only one solr instance but have a field in the schema that contains a value which indicates which db/source the record came from.