Is it possible to apply manual retention on a dataset in Palantir Foundry? - palantir-foundry

I have an incremental dataset. I want to apply specific retention on it. The platform team told me that the current state of the retention service does not let them easily create and maintain custom retention policies.
Is it possible to manually apply retention? Doing some API call to delete some old transaction?
(In this specific case, I would like to delete all transactions older than 6 months.)

Given the answer in related question: Does allow_retention flag in incremental decorator manage manual delete transaction?, doing manual delete transaction would trigger a snapshot and thus not be efficient. So I don't think it's possible but I'm open to suggestion.

Related

Is there a way to watch any table changes in MySQL workbench and automatically take action upon?

I have a database shared by two completely separate server applications and those applications cannot communicate with one another at all. Let's say those two applications are called A and B. Whenever A updates table in the shared DB, B should quickly know that there was a change somehow (remember *A and B cannot communicate with each other). Also, I want to avoid using setInterval type of approach where I query every x seconds. Initially I thought there would be a way to 'watch' changes within MySQL itself but seems like there isn't. What would be the best approach to achieve this? I'm using Node.js, MySQL Workbench, and PHP.
TDLR:
I'm trying to find a best way to 'watch' any table changes and trigger action (maybe like http request) whenever change is detected. I'm using MySQL Workbench and Node.js. I really want to avoid using setInterval type of approach. Any recommendation?
What you want is a Change Data Capture (CDC) feature. In MySQL, the feature is the binary log.
Some tools like Debezium are designed to watch and filter the binary log, and transform it into events on a message queue (e.g. Kafka).
Some comments above suggest using triggers, but this is a problematic idea, because triggers fire during a data change, when the transaction for that change is not yet committed. If you try to invoke an http request or any other application action when a trigger fires, then you risk having the action execute even if the data change is subsequently rolled back. This will really confuse people.
Also there isn't a good way to run application actions from triggers. They are for making subordinate data changes, not actions that are outside transaction scope.
Using the binary log as a record of changes is safer, because changes are not written to the binary log until they are committed. Also the binary log contains all changes to all tables.
Whereas with a trigger solution you would have to create three triggers (INSERT, UPDATE, and DELETE) for each table. Also MySQL does not support triggers for DDL statements (CREATE, ALTER, DROP, TRUNCATE, etc.).

How to synchronize MySQL database with Amazon OpenSearch service

I am new to Amazon OpenSearch service, and i wish to know if there's anyway i can sync MySQL db with Opensearch on real time. I thought of Logstash but it seems like it doesn't support delete , update operations which might not update my OpenSearch cluster
I'm going to comment for Elasticsearch as that is the tag used for this question.
You can:
Read from the database (SELECT * from TABLE)
Convert each record to a JSON Document
Send the json document to elasticsearch, preferably using the _bulk API.
Logstash can help for that. But I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.
I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/
Have also a look at this "live coding" recording.
Side note: If you want to run Elasticsearch, have look at Cloud by Elastic, also available if needed from AWS Marketplace, Azure Marketplace and Google Cloud Marketplace.
Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, Maps UI, Alerting and built-in solutions named Observability, Security, Enterprise Search and what is coming next :) ...
Disclaimer: I'm currently working at Elastic.
Keep a column that indicates when the row was last modified, then you will be able to do updates to OpenSearch. Similarly for deleting, just have a column indicating whether it is deleted or not (soft delete), and the date it was deleted.
With this db design, you can send the "delete" or "update" actions to OpenSearch/ElasticSearch to update/delete the indexes based on the last modified / deleted date. You can later have a scheduled maintenance job to delete these rows permanently from the database table.
Lastly, this article might be of help to you How to keep Elasticsearch synchronized with a relational database using Logstash and JDBC

Unable to create data through Apache Isis

I'm unable create entries through the Apache Isis wicket. Once I fill in the details for the required object and click OK, I receive the following error:
Unable to save changes. Does similar data already exist, or has referenced data been deleted?: Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.
The database is new and contains no values in it, so the part about similar data existing is likely not the cause.
My guess is some peculiarity of MySQL/InnoDB in conjunction with our use of JDO. If you can create a small example repo on github, I'll take a look.
Also, can I suggest you subscribe to the Isis users mailing list and consider raising future issues there?
Thx
Dan

How can I get CRM 2011 duplicate detection on new records to look at ALL records?

Now it only sees duplicates if they own the duplicate record. I don't want to give reps permission to see or change other people's records but want those possible duplicates to appear in duplicate detection so the reps don't add another record for the same lead.
Your requirements conflict. If you don't give your reps Read access to those other records, technically they shouldn't even know about duplicates. From a security perspective, that would be allowing the user to Read something they shouldn't Read, e.g. knowing that another record already exists with this name.
You have two out-of-the-box options: Give the reps Read access to each other's Leads or run duplicate detection jobs. Duplicate detection is broken into two methods: Proactive and Reactive. You are speaking of Proactive - you don't want the system to even allow a duplicate to be created. But if you can't relax your security requirements, you'll need to move to Reactive - this entails creating a duplicate detection job that will run on a schedule and you'll need to assign a person to review those jobs and merge duplicate records.
If neither of these are acceptable, you'll need to go with Daryl's option of creating a plugin. But that is a lot of work compared to your other two options.
I'm not aware of the issue you're describing with the duplicate detection only able to check for records the user has access to. Put if that is indeed a limitation, you'd have to create your own custom plugin on the validation stage that would run as an elevated user account, to check for duplicates and throw an exception if a duplicate is found.
I think the answers above sum it up nicely. If the workers cannot see the other records then de-dup will not find them.
I would give them access but with R/O rights to other user / teams records.
In places where i've had this issue I use the audit trail also to keep track of anyone loading records they should not, even if they are read only. This does however need a simple bit of on-load script to set a field to trigger audits as the in built audit tool does not monitor access.

Replicating database changes

I want to "replicate" a database to an external service. For doing so I could just copy the entire database (SELECT * FROM TABLE).
If some changes are made (INSERT, UPDATE, DELETE), do I need to upload the entire database again or there is a log file describing these operations?
Thanks!
It sounds like your "external service" is not just another database, so traditional replication might not work for you. More details on that service would be great so we can customize answers. Depending on how long you have to get data to your external service and performance demands of your application, some main options would be:
Triggers: add INSERT/ UPDATE/ DELETE triggers
that update your external service's
data when your data changes (this
could be rough on your app's
performance but provide near
real-time data for your external
service)
Log Processing: you can parse changes from the logs and use some level of ETL to make sure they'll run properly on your external service's data storage. I wouldn't recommend getting into this if you're not familiar with their structure for your particular DBMS.
Incremental Diffs: you could run diffs on some interval (maybe 3x a day, for example) and have a cron job or scheduled task run a script that moves all the data in a big chunk. This prioritizes your app's performance over the external service.
If you choose triggers, you may be able to tweak an existing trigger-based replication solution to update your external service. I haven't used these so I have no idea how crazy that would be, just an idea. Some examples are Bucardo and Slony.
There are many ways to replicate a PostgreSQL database. In the current version 9.0 the PostgreSQL Global Development Group introduced two new rocks features called Hot Standby and Streaming Replication puting to PostgreSQL to a new level and introducing a built-in solution.
On the wiki, there is a completed review of the new PostgreSQL-9.0´s features:
http://wiki.postgresql.org/wiki/PostgreSQL_9.0
There are other applications like Bucardo, Slony-I, Londiste (Skytools), etc,which you can use too.
Now, What are you want to do for log processing? What do you want exactly ? regards