Querying a live deployment (modeling objectives) from workshop or functions, but not samplewise (like with scenarios)

Querying a live deployment (modeling objectives) from workshop or functions, but not samplewise (like with scenarios) - palantir-foundry

I have an optimization algorithm deployed as a live deployment. It takes a set of objects and returns a set of objects with potentially different size. This works just fine when I'm using the REST api.
The problem is, I want to let the user of the workshop app query the model with a set of objects. The returned objects need to be written back to the ontology.
I looked into an action backed function but it seems like I can't query the model from a function?!
I looked into webhooks but it seems to not fit the purpose and I also would need to handle the API key and can't write back to the ontology?!
I know how to query the model with scenarios but it is per sample and that does not fit the purpose, plus I cant write back to the ontology.
My questions:
Is there any way to call the model from a function and write the return back to the ontology?
Is there any way to call a model from workshop with a set of objects and write back to the ontology?
Is modeling objectives just the wrong place for this usecase?
Do I need to implement the optimization in Functions itself?

I answered the questions, as well I tried to address some of the earlier points.
Q: "I looked into an action backed function but it seems like I can't query the model from a function?!"
A: That is correct, at this time you can't query a model from a function. However there are javascript based linear optimization libraries which can be used in a function.
Q: "I looked into webhooks but it seems to not fit the purpose and I also would need to handle the API key and can't write back to the ontology?!"
A: Webhooks are for hitting resources on networks where a magritte agent are installed. So if you have like a flask app on your corporate network you could hit that app to conduct the optimization. Then set the webhook as "writeback" on an action and use the webhook outputs as inputs for a ontology edit function.
Q: "I know how to query the model with scenarios but it is per sample and that does not fit the purpose, plus I cant write back to the ontology."
A: When querying a model via workshop you can pass in a single object as well as any objects linked in a 1:1 relationship with that object. This linking is defined in the modeling objective modeling api. You are correct to understand you can't pass in an arbitrary collection of objects. You can write back to the ontology however, you have to set up an action to apply the scenario back to the ontology (https://www.palantir.com/docs/foundry/workshop/scenarios-apply/).
Q: "Is there any way to call the model from a function and write the return back to the ontology?"
A: Not from an ontology edit function
Q: "Is there any way to call a model from workshop with a set of objects and write back to the ontology?"
A: Only object sets where the objects have 1:1 links within the ontology. You can writeback by appyling the scenario (https://www.palantir.com/docs/foundry/workshop/scenarios-apply/).
Q: "Is modeling objectives just the wrong place for this usecase? Do I need to implement the optimization in Functions itself?"
A: If you can write the optimization in an ontology edit function it will be quite a bit more straightforward. The main limitation of this is you have to use Typescript which is not as commonly used for this kind of thing as Python. There are some basic linear optimization libraries available for JS/TS.

Related

Boomi integration - Dynamically inject mapping information

We are now in process of evaluating integration solutions and comparing Mule and Boomi.
Use case is to read an Excel file, map the columns to a pre-defined set of JSON attributes and then use the JSON to insert records into a database. The mapping may vary from one Excel template to another wherein the column names in an Excel may be different from others.
How do I inject mapping information (source vs target) from outside integration flow?
Note: In Mule, I'm able to do that using a mapping variable (value is JSON) that I inject using Mule DataWeave language.

Boomi's mapping component is static in terms of structure but more versatile solutions are certainly possible.
The data processor component opens up Groovy, JavaScript, and XSLT 3.0 as options. These are Turing-complete languages that can be used to bend Boomi to almost any outcome.
You could make the Boomi UI available to those who need to write the maps in JSON. It's a pretty simple interface to learn. By using a route component, there could be one "parent" process that governs the a process for each template/process and then a map for each template. Such a solution would be pretty easy to build and run; allowing the template-specific processes to be deployed independently of the "parent".
You could map to a generic columnar structure and then dynamically alter the target
columns by writing a SQL procedure that would alter the target columns.
I've come across attempts to do what you're describing (not using either Boomi or Mulesoft) which were tragic failures: https://www.zdnet.com/article/uk-rural-payments-agency-rpa-it-failure-and-gross-incompetence-screws-farmers/ I draw your attention to the NAO's points:
ensure the system specifications retain a realistic level of flexibility
and
bespoke software is costly to develop, needs to be thoroughly tested, and takes more time to implement
The general goal for such a requirement like yours is usually to make transformation/ETL available to "non-programmers" which denies the reality that there are many more skills to delivering an outcome than "programming".

Question about Domain Objects, A Service Layer, and Using Linq2SQL and ASP.net MVC with the Repository Pattern

First off, apologies for the long description of my brainspace below. I'm still wrapping my head around lots of these new ideas, so I'm sure I'm describing something incorrectly. Please feel free to correct me where I'm wrong.
We are in the R&D phase of a new ASP.net MVC2 site and want to ensure that we can 1) decouple our data store from our application, 2) allow for our application to be tested via unit tests and 3) allow us to change out our datastore or use something other than Linq2SQL down the line.
This seemingly simple goal has opened up a whole new world to me that includes the Repository pattern, IoC, DI, and all sorts of other things that are making my head swim. Here's what is so far coming into focus, or at least what I believe is a somewhat correct plan to reach our goals:
We will have a number of ISpecificRepository interfaces that define the contract between users of the interface and the underlying data store.
The SpecificRepository implementations will query specific datastores and return POCO representing our domain objects (or collections of them).
Our Service Layer will perform the application specific business logic using an instance of ISpecificRepository passed to the various service methods and pass these POCO domain objects back to our presentation layer.
As mentioned, we are planning on using Linq2SQL to implement our specific repositories for the application and have decided to decouple our service layer from this implementation by creating the POCO for our domain objects and create a mapping to and from these objects to the LINQ generated entities. In the service layer, we can then create business logic to query the repository, add data, and do whatever else we need to do for each use case. This seems fine but my concern is that since we're using Linq2SQL, our specific Linq repository implementation will now have to house all of the many Get queries that the service layer requires to implement the business logic efficiently.
I'm curious as to whether this somehow breaks the Repository pattern since we're now housing application specific logic not in the service layer but in the repository instead.
The reason I feel that we need to do it this way is so that I can write more efficient Linq queries on my specific Linq repository using various DataLoadOptions, etc. without returning IQueryable from my repository up to my service layer, where it would seem that sort of logic actually belongs. Also, all of the example IRepository interfaces I've seen seem very lightweight and only provide a few methods to GetByID, GetAll, Find, Insert, Delete, and SubmitChanges to the underlying data store. In my case, it sounds like my specific repositories will be doing a great deal more than that.
Thanks for reading this far. Any and all help that can clarify my misconceptions would be greatly appreciated.
-Mustafa

our specific Linq repository
implementation will now have to house
all of the many Get queries that the
service layer requires to implement
the business logic efficiently.
I'm curious as to whether this somehow
breaks the Repository pattern
Not at all. A Repository is a collection of domain entities. If I have a Repository of Accounts, it is perfectly reasonable to want Accounts.ThatAreOverdue().
I personally prefer fluent naming. Accounts.ThatAreOverdue() feels better than AccountRepository.GetOverdue() .. but I suppose that is a point of preference.
Also, all of the example IRepository
interfaces I've seen seem very
lightweight and only provide a few
methods to GetByID, GetAll, Find,
Insert, Delete, and SubmitChanges to
the underlying data store.
A Repository interface can be thin. Find is meant to be used with the Specification pattern. Encapsulate the criteria in another object. The implementation of the criteria can be passed Linq2Sql objects from which to query - but it will be more difficult to re-use the criteria classes against in-memory domain objects (versus in database, where Linq2Sql is involved).
Our Service Layer will perform the
application specific business logic
using an instance of
ISpecificRepository passed to the
various service methods and pass these
POCO domain objects back to our
presentation layer.
Are you saying that your logic will all be in Services and the "domain objects" will be bags of properties and bound to in the view?
I don't think I'd recommend that.
If the same object that is used in the application logic is also used in the view, then you have tightly coupled the two application layers and experience says that causes problems. It will be very difficult to maintain coherence in the Services and Domain through changes if the View uses the same objects. The View will need pieces of data and they will inevitably get stuck onto places they don't really belong in the domain.

best practices for writing to a file from multiple methods

I have a class that contains a bunch of methods for checking data I scrape every week (for things like well-formedness and other errors in gathering the data). Each of these methods performs a test, and then prints out a summary of the test.
I want to print out the output from these tests to a file, but I'm not sure what the best way to do it is. For example...
Should the class hold an instance variable to the file, and each method open/appends/closes the file? (A problem is that methods sometimes call other methods, so this seems kinda messy?)
Should each method get passed the file as a parameter? (Seems messy as well.)
Should each method return a string, and a"central" method that calls all the other tests outputs all these strings to a file?
I'm not really familiar with using logger libraries -- would that be a solution?
My particular context
I have a scraper that pulls data from various websites and stores them in a database. Websites change all the time, so I'm writing a "scrape checker" program that checks my scrapes for various things, like:
number of empty results
length of results
weird characters in results
and so on
So I have methods like:
check_num_empty_results
check_weird_characters
check_scrape (calls a bunch of other checks)
check_scrape_pair (sometimes I want to check pairs of scrapes together, e.g., to match results against each other, so this is different checking each one in isolation)
etc.
I want my "scrape checker" program to print out a file that summarizes all the checks.

Separation of concerns. Write code the focuses on the scraping activity and return the value(s) scraped. Then use aspect oriented programming for logging, which can simplify the problem greatly as the aspect holds the reference to the file or logging API.

Ultimately, it depends on what language you're using.
The first solution makes the most sense if your language permits it. For each instance of the logging class, have a field for the file object that you're reading from/writing to. This is basically equivalent to passing the file object as a parameter to every method.
That said, most mature languages have modules that will do a lot of this work for you; off the top of my sh/awk, Perl, and Python all come to mind as being suited to this task (though if you want to, you could use Java or something else).

Seems like a logging framework would be a perfect solution for this. If you are using Java or .NET, log4j and log4net are pretty much the de-facto standards for that.

Should persistent objects validate data upon set?

If one has a object which can persist itself across executions (whether to a DB using ORM, using something like Python's shelve module, etc), should validation of that object's attributes be placed within the class representing it, or outside?
Or, rather; should the persistent object be dumb and expect whatever is setting it's values to be benevolent, or should it be smart and validate the data being assigned to it?
I'm not talking about type validation or user input validation, but rather things that affect the persistent object such as links/references to other objects exist, ensuring numbers are unsigned, that dates aren't out of scope, etc.

Validation is a part of the encapsulation- an object is responsible for it's internal state, and validation is part of it's internal state.
It's like asking "should I let an object do a function and set his own variables or should I user getters to get them all, do the work in an external function and then you setters to set them back?"
Of course you should use a library to do most of the validation- you don't want to implement the "check unsigned values" function in every model, so you implement it at one place and let each model use it in his own code as fit.

The object should validate the data input. Otherwise every part of the application which assigns data has to apply the same set of tests, and every part of the application which retrieves the persisted data will need to handle the possibility that some other module hasn't done their checks properly.
Incidentally I don't think this is an object-oriented thang. It applies to any data persistence construct which takes input. Basically, you're talking Design By Contract preconditions.

My policy is that, for a global code to be robust, each object A should check as much as possible, as early as possible. But the "as much as possible" needs explanation:
The internal coherence of each field B in A (type, range in type etc) should be checked by the field type B itself. If it is a primitive field, or a reused class, it is not possible, so the A object should check it.
The coherence of related fields (if that B field is null, then C must also be) is the typical responsibility of object A.
The coherence of a field B with other codes that are external to A is another matter. This is where the "pojo" approach (in Java, but applicable to any language) comes into play.
The POJO approach says that with all the responsibilities/concerns that we have in modern software (persistance & validation are only two of them), domain model end up being messy and hard to understand. The problem is that these domain objects are central to the understanding of the whole application, to communicating with domain experts and so on. Each time you have to read a domain object code, you have to handle the complexity of all these concerns, while you might care of none or one...
So, in the POJO approach, your domain objects must not carry code related to one of these concerns (which usually carries an interface to implement, or a superclass to have).
All concern except the domain one are out of the object (but some simple information can still be provided, in java usually via Annotations, to parameterize generic external code that handle one concern).
Also, the domain objects relate only to other domain objects, not to some framework classes related to one concern (such as validation, or persistence). So the domain model, with all classes, can be put in a separate "package" (project or whatever), without dependencies on technical or concern-related codes. This make it much easier to understand the heart of a complex application, without all that complexity of these secondary aspects.

Which layers should speak Domain Models?

Let's say I have a method like this in my business layer:
// This is in the business layer
public Result DeleteSomeDomainObject( ???? )
{
//Enforce business logic here.
//Delete records in the database
DAL. DeleteSomeDomainObject( ??? )
}
// This is in the data access layer
public Result DeleteSomeDomainObject( ???? )
{
// Delete records from the database.
}
Should these methods take instances of the domain model or just the primary keys?

I struggle with this often. I usually say that your business/service layer should take domain objects as parameters.
If we are talking web, your web tier will have the ID. It will likely instantiate or retrieve an instance of the object from the service layer. So it makes sense to pass it to your service layer.
However, there are often times where you would end up duplicating the retrieve of the object. Sometimes your services will be loading an object anyways because of some additional data not captured in the web layer. I've even had times where the data access layer has to load objects for dependencies. Caching can solve some of these issues and re-architecting your data/model can fix others. Certainly. But sometimes, in light of performance or other issues, passing an ID just makes more sense.
To summarize, prefer passing domain objects to the business tier. But realize that for other reasons, you might be better off passing an ID and, unfortunately, there needs to be exceptions to your rule.

Anywhere that's reasonable, it makes sense to decouple the policy from the implementation. I would say that if you plan to use some sort of ORM, pass instances of your business objects.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008