I am new to Amazon OpenSearch service, and i wish to know if there's anyway i can sync MySQL db with Opensearch on real time. I thought of Logstash but it seems like it doesn't support delete , update operations which might not update my OpenSearch cluster
I'm going to comment for Elasticsearch as that is the tag used for this question.
You can:
Read from the database (SELECT * from TABLE)
Convert each record to a JSON Document
Send the json document to elasticsearch, preferably using the _bulk API.
Logstash can help for that. But I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.
I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/
Have also a look at this "live coding" recording.
Side note: If you want to run Elasticsearch, have look at Cloud by Elastic, also available if needed from AWS Marketplace, Azure Marketplace and Google Cloud Marketplace.
Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, Maps UI, Alerting and built-in solutions named Observability, Security, Enterprise Search and what is coming next :) ...
Disclaimer: I'm currently working at Elastic.
Keep a column that indicates when the row was last modified, then you will be able to do updates to OpenSearch. Similarly for deleting, just have a column indicating whether it is deleted or not (soft delete), and the date it was deleted.
With this db design, you can send the "delete" or "update" actions to OpenSearch/ElasticSearch to update/delete the indexes based on the last modified / deleted date. You can later have a scheduled maintenance job to delete these rows permanently from the database table.
Lastly, this article might be of help to you How to keep Elasticsearch synchronized with a relational database using Logstash and JDBC
Related
Currently I am using MySQL as a main database which is not real time database. To synchronize client side data with server and keep data when offline I am considering to add real time database as a slave database in my architecture. Synchronizing data is not easy, so I want to use cloud firestore.
What I searched until now, It seems there is no pratical way to synchronize between not real time RDMS(in my case is MySQL) and cloud firestore. I can not migrate current data to cloud firestore because other services depend on that.
If there is no pratical solution for this, please suggest me the best way. Thanks.
Not sure, but it seems there is no solution to synchronize that. To synchronize such this case, on the client side has to be implemented manually.
I have been thinking about this as well. What I have thought up so far is this.
Use Firestore to create a document.
Write cloud function to listen to create events on the collection that stores that document.
send the created document over http to a rest endpoint that stores that data on a relational db (mysql, postgres), if server is down or status code is anything other than 200, assign true to a boolean flag on the document called syncFailed.
write a worker that periodically fetches documents where syncFailed is true and sends this document to the endpoint again until sync is a success.
THINGS TO CONSIDER
Be careful in adopting this approach with update events. You may accidently create an infinite loop that will cost you all your wealth. If you do add a update listener on a document, make sure you devise a way to avoid creating an infinite loop. You may update a document from an update-listening cloud function that in turn re-triggers this cloud function which in turn updates the document which in turn re-re-triggers this cloud function and so on. Make sure you use a boolean flag on the document or some field as a sentinel value to get out of such a looping situation.
Is it possible to use couchbase syny gateway in the following way:
1) Mobile client queries couchbase for data.
2) No data in couchbase present so this triggers a import of the needed data from for example a mysql database into couchbase.
3) The imported data is then transfered to the mobile client by couchbase synch gateway.
4) The mobile client goes to sleep.
5) After 12 hours of inactivity the data is removed from couchbase.
6) Next day the mobile client still holds the data offline and syncs again which sync gateway
7) the data is again imported to couchbase server and the diffs are synced with the client
Does couchbase provide hooks to implement such an flexable usecase?
If yes could somebody point me to the important api calls?
Many Greetings
The preferred way to do this would run most things through Sync Gateway (the data imports from the external source in particular should go through Sync Gateway, not directly to Couchbase, and removing the data should go through SG also.)
Sync Gateway's sync function runs when SG receives documents. In this sense, there's no way to trigger something based on nothing being there.
One way you might solve this is by having the mobile client push a special purpose document. Your sync function could catch this and react in several ways (fire a webhook request, start a replication, or you could set up something to monitor a changes feed and trigger from that).
Next you have the issue of removing the data on the Server side. Here the question is a little unclear. Typically applications write new revisions to SG, and these get synced to the client (and vice versa). If you remove everything on the Server side, you'll actually end up with what are called tombstone revisions showing the document as deleted. (This is a result of the flexible conflict resolution technique used by Couchbase Mobile. It uses multiversion concurrency control.)
The question is a little unclear. It sounds like you don't want to store the data long term on the Server side. If that's right, I think you could do something like:
Delete the data (through SG)
Have the mobile client push data to SG
Trigger SG again with some special document
Update the data from the external source
Have the client pull updates from SG
That's a very rough outline. This is too complicated to really work out in this format. I suggest you post questions through the Couchbase developer forum to get more details.
So, the short answer, yes, this seems feasible, but a full answer needs more detail on what you're doing and what your constraints are.
I want to set TTL (time to live) at couchbase server for all documents getting pushed from mobile devices continuously for replication. I did not find any clear explanation/example in documentation to do this.
What should be the approach to set TTL for documents coming from mobile devices to Server through Sync Gateway.
Approach 1:
One approach is to create a view at server side which would return createdDate as key. We will query that view for keys of today date which would return today documents and we can set TTL for those documents. But how and when would we call this view and is it a good approach?
Approach 2:
Should I do it by using webhooks where it will listen to document changes (creations) made through Couchbase Lite push replications, set TTL for new documents and save back to Couchbase server?
Is there any other better way to do it?
Also what is the way to set TTL for only specific documents?
EDIT: My final approach:
I will create following view at couchbase server:
function (doc, meta) {
emit(dateToArray(doc.createdDate));
}
I will have a job which would run daily at EOD, query view to get documents created today and set TTL for those documents.
In this way I would be able to delete documents regularly.
Let me know if there is any issue with it or there is any better way.
Hopefully someone from the mobile team can chime in and confirm this, but I don't think that the mobile SDK allows to set an expiry at the moment.
I guess you could add the creation/update time to your document and create a batch job that uses the "core" SDKs to periodically find old documents (either via N1QL or a dedicated view) and remove them.
It is not currently possible to set a TTL for documents via Sync Gateway like you can with a Couchbase Server smart-client. There is a conceptual impedance mismatch with Sync Gateway using the native-style TTLs on documents. This is because the Sync Gateway protocol functions on the basis of revision trees and even when a document is 'deleted', there is still a document in place to mark that there is a document that has been deleted.
I would also be wary of workloads which might require TTLs (e.g. a cache), Sync Gateway documents take up space even after they've been deleted so your dataset may continue to grow unbounded.
If you do require TTLs, and if you do not think the dataset size will be an issue then the best way would be to store a field in the document that represents the time the document would expire. Then you would do two things:
When accessing the document, if it has expired then you manually delete it
Periodically iterate over the all docs endpoint and delete any documents you find with an expiry time in the past.
Couchbase does not delete when TTL reached;
Instead, when you access (expired) document,
then Couchbase check expiry, remove it at that moment.
http://developer.couchbase.com/documentation/server/4.0/developer-guide/expiry.html
Each day hundreds of thousands of items are inserted, updated and deleted on our service (backend using .Net and a MySql database).
Now we are integrating our service with another service using their RESTful API. Each time an item is inserted, updated or deleted on our service we also need to connect to their web service and use POST, PUT, DELETE.
What is a good implementation of this case?
It seems like not a very good idea to connect to their API each time a user inserts an item on our service as it would be a quite slow experience for the user.
Another idea was to update our database like usual. Then set up another server constant connecting to our database and fetching data that needs to be posted to the RESTful API. Is this the way to go?
How would you solve it? Any guides of implementing stuff like this would be great! Thanks!
It depends if you delay in updating the other service is acceptable or not. If not, than create a event and put this in queue of event processor who can send this to second service.
If delay is acceptable than there can be background batch job that can run periodically and send the data.
I want to "replicate" a database to an external service. For doing so I could just copy the entire database (SELECT * FROM TABLE).
If some changes are made (INSERT, UPDATE, DELETE), do I need to upload the entire database again or there is a log file describing these operations?
Thanks!
It sounds like your "external service" is not just another database, so traditional replication might not work for you. More details on that service would be great so we can customize answers. Depending on how long you have to get data to your external service and performance demands of your application, some main options would be:
Triggers: add INSERT/ UPDATE/ DELETE triggers
that update your external service's
data when your data changes (this
could be rough on your app's
performance but provide near
real-time data for your external
service)
Log Processing: you can parse changes from the logs and use some level of ETL to make sure they'll run properly on your external service's data storage. I wouldn't recommend getting into this if you're not familiar with their structure for your particular DBMS.
Incremental Diffs: you could run diffs on some interval (maybe 3x a day, for example) and have a cron job or scheduled task run a script that moves all the data in a big chunk. This prioritizes your app's performance over the external service.
If you choose triggers, you may be able to tweak an existing trigger-based replication solution to update your external service. I haven't used these so I have no idea how crazy that would be, just an idea. Some examples are Bucardo and Slony.
There are many ways to replicate a PostgreSQL database. In the current version 9.0 the PostgreSQL Global Development Group introduced two new rocks features called Hot Standby and Streaming Replication puting to PostgreSQL to a new level and introducing a built-in solution.
On the wiki, there is a completed review of the new PostgreSQL-9.0´s features:
http://wiki.postgresql.org/wiki/PostgreSQL_9.0
There are other applications like Bucardo, Slony-I, Londiste (Skytools), etc,which you can use too.
Now, What are you want to do for log processing? What do you want exactly ? regards