TopQuadrant Shacl Rule Engine Iterative inference - shacl

does the Shacl API Rule engine support sh:order for Rule execution as TopBraid Composer does.
I tested rule ordering in TBC and it goes iteratively until it reach a fixed point. No more rule to execute. I suspect that it is considered one-pass, but rule are prioritized and their result made available for the next rule to be execute in that same pass.
Anyhow, independently of how this is implemented, i wonder if it is a feature of the shacl rule engine or an implementation specific to TopBraid composer.
The following thread hint at the answer i am looking for but fall short.
How to input inferred triples to (other) SHACL rules?

The current SHACL API does not do the iterations out of the box. RuleEngine does a single iteration of all rules, and those rules may access each other's results following the outline at
https://w3c.github.io/shacl/shacl-af/#rules-execution
To do iterative looping, simply call RuleEngine.executeAll until one round has not created any new inferences. Care needs to be taken to avoid infinite loops, as some rules may in theory produce blank nodes, random values etc. TopBraid Composer does this looping automatically.

Related

Understanding pragmas

I have a few related questions about pragmas. What got me started on this line of questions was trying to determine whether it's possible to disable some warnings without going all the way to no worries (I'd still like to worry, at least a little bit!). And I'm still interested in the answer to that specific question.
But thinking about that issue made me realize that I don't really understand how pragmas work. It's clear that at least some pragmas take arguments (e.g., use isms<Perl5>). But they don't seem to be functions. Where do they fit into the overall MOP? Are they sort of like Traits? Or packages? Is there any way to introspect over them? See what pragmas are currently in effect?
Are pragmas built into the language, or are they something that users can add? When writing a library, I'd love to have some errors/warnings that users can optionally disable with a pragma – is that possible, or are they restricted to use in the compiler? If I can create my pragmas, is there a practical difference between setting something with a pragma versus with a dynamic variable, aside from the cleaner look of a pragma? For that matter, how do we decide what language features should be set with a pragma versus a variable (e.g., why is $*TOLERANCE not a pragma)?
Basically, I'd be interested in any info about pragmas that you could offer or point me towards – though my specific question is still whether I can selectively turn off certain warnings.
Currently, pragmas are hard-coded in the handling of the use statement. They usually either set some flag in a hash that is associated with the lexical scope of the moment, or change the setting of a dynamic variable in the grammar.
Since use is a compile time construct, you can only use compile time constructs to get at them (currently) (so you'd need BEGIN if it is not part of a use).
I have been in favour of decoupling use from pragma's in the past, as I see them as mostly a holdover from the Perl roots of Raku.
All of this will be changed in the RakuAST branch. I'm not sure what Jonathan Worthington has in mind regarding pragmas in the RakuAST context. For one thing, I think we should be able to "export" a pragma to the scope of a use statement.

Reusing existing functions

When adding a new feature to an existing system if you come across an existing function that almost does what you need is it best practice to:
Copy the existing function and make your changes on the new copy (knowing that copying code makes your fellow devs cry).
-or-
Edit the existing function to handle both the existing case and your new case risking that you may introduce new bugs into existing parts of the system (which makes the QA team cry)
If you edit the existing function where do you draw the line before you should just create a new independent function (based on a copy)...10% of the function, 50% of the function?
You can't always do this, but one solution here would be to split your existing function in other tiny parts, allowing you to use the parts you need without editing all the code, and making it easier to edit small pieces of code.
That being said, if you think you can introduce new bugs into existing parts of the system without noticing it, you should probably think about using units tests.
Rule of thumb I tend to follow is that if I can cover the new behaviour by adding an extra parameter (or new valid value) to the existing function, while leaving the code more-or-less "obviously the same" in the existing case, then there's not much danger in changing a function.
For example, old code:
def utf8len(s):
return len(s.encode('utf8')) # or maybe something more memory-efficient
New use case - I'm writing some code in a style that uses the null object pattern, so I want utf8len(None) to return None instead of throwing an exception. I could define a new function utf8len_nullobjectpattern, but that's going to get quite annoying quite quickly, so:
def utf8len(s):
if s != None:
return len(s.encode('utf8')) # old code path is untouched
else:
return None # new code path introduced
Then even if the unit tests for utf8len were incomplete, I can bet that I haven't changed the behavior for any input other than None. I also need to check that nobody was ever relying on utf8len to throw an exception for a None input, which is a question of (1) quality of documentation and/or tests; and (2) whether people actually pay any attention to defined interfaces, or just Use The Source. If the latter, I need to look at calling sites, but if things are done well then I pretty much don't.
Whether the old allowed inputs are still treated "obviously the same" isn't really a question of what percentage of code is modified, it's how it's modified. I've picked a deliberately trivial example, since the whole of the old function body is visibly still there in the new function, but I think it's something that you know when you see it. Another example would making something that was fixed configurable (perhaps by passing a value, or a dependency that's used to get a value) with a default parameter that just provides the old fixed value. Every instance of the old fixed thing is replaced with (a call to) the new parameter, so it's reasonably easy to see on a diff what the change means. You have (or write) at least some tests to give confidence that you haven't broken the old inputs via some stupid typo, so you can go ahead even without total confidence in your test coverage.
Of course you want comprehensive testing, but you don't necessarily have it. There are also two competing maintenance imperatives here: 1 - don't duplicate code, since if it has bugs in it, or behavior that might need to change in future, then you're duplicating the bugs / current behavior. 2 - the open/closed principle, which is a bit high-falutin' but basically says, "write stuff that works and then don't touch it". 1 says that you should refactor to share code between these two similar operations, 2 says no, you've shipped the old one, either it's usable for this new thing or it isn't, and if it isn't then leave it alone.
You should always strive to avoid code duplication. Therefore I would suggest that you try to write a new function that modifies the return value of the already existing function to implement your new feature.
I do realize that in some cases it might not be possible to do that. In those cases you definitely should consider rewriting the existing function without changing its interface. And introducing new bugs by doing that should be prevented by unit tests that can be run on the modified function before you add it to the project code.
And if you need only part of the existing function, consider extracting a new function from the existing one and use this new "helper" function in the existing and in your new function. Again confirming everything is working as intended via unit tests.

Using one method with constants as parameters versus several methods

In Kent Beck's Implementation Patterns, one can read
"A common use of constants is to
communicate variations of a message in
an interface. For example, to center
text you could invoke
setJustification(Justification.CENTERED).
One advantage of this style of API is
that you can add new variants of
existing methods by adding new
constants without breaking
implementors. However, these messages
don't communicate as well as having a
separate method for each variation. In
this style, the message above would be
justifyCentered(). An interface where
all invocations of a method have
literal constants as arguments can be
improved by giving it separate methods
for each constant value."
Why is this? Generally when I'm coding and I notice that I have a couple of similar parameterless methods that could be reduced to just one, with an argument, like in the following example,
void justifyRight()
void justifyLeft()
void justifyCentered()
I'd generally do just the opposite of what Kent advices, which would be to group it into
setJustification(Justification justification)
How do you usually handle this situation? Is this totally subjective or there is really a very strong reason that I can't see in favour of Kent's view of this matter?
Thanks
File access methods usually have parameters regarding read/write mode, whether to create non-existing files, security attributes, locking modes and so on. Imagine the amount of methods you'd have if you'd create a separate method for each valid combination of parameters!
I've highlighted the biggest argument in favor of separate methods; it's fail-safe because you have strict control over the API. The caller cannot pass in invalid arguments, or invalid combinations of parameters, if you don't expose such parameters. This also implies less complex parameter validation.
However, I'm not in favor of this practice. API's should be well-designed and should change as little as possible. Kent Beck on breaking API changes:
One advantage of [parameterized methods] is that you can add new variants of existing methods by adding new constants without breaking implementors.
His argument in favor of separate methods is:
However, [parameterized methods] don't communicate as well as having a separate method for each variation.
I disagree. Method parameters can be just as readable. Especially in combination with named parameters, a feature which is supported by several languages. Besides, separate methods would result in a cluttered API.
I suppose it's subjective. Some may argue that justifyLeft is clearer than justify(Justification.LEFT) Collapsing it all into one method may result in a nicer API - less clutter - and the mode can be stored in a variable and simply feeding it to the single setXY method (with different methods for each, you'd have to decide which to call depending on the value manually). Therefore I usually prefer this way way. Though it's usually just:
void justify(Justification justification) {
switch(justification) {
Justification.RIGHT: this.justifyRight();
Justification.LEFT: this.justifyLeft();
Justification.CENTERED: this.justifyCenter();
}
}
Of course this is only advisable when all these methods are very closely related.

Understanding complex post-conditions in DbC

I have been reading over design-by-contract posts and examples, and there is something that I cannot seem to wrap my head around. In all of the examples I have seen, DbC is used on a trivial class testing its own state in the post-conditions (e.g. lots of Bank Accounts).
It seems to me that most of the time when you call a method of a class, it does much more work delegating method calls to its external dependencies. I understand how to check for this in a Unit-Test with specific scenarios using dependency inversion and mock objects that focus on the external behavior of the method, but how does this work with DbC and post-conditions?
My second question has to deal with understanding complex post-conditions. It seems to me that to write out a post-condition for many functions, that you basically have to re-write the body of the function for your post-condition to know what the new state is going to be. What is the point of that?
I really do like the notion of DbC and I think that it has great promise, particularly if I can figure out how to reproduce some failure state once I find a validated contract. Over the past couple of hours I have been reading some neat stuff wrt. automatic test generation in Eiffel. I am currently trying to improve my processes in C++ development, but I am open to learning something new if I can figure out how to not lose all of the ground I have made in my current projects. Thanks.
but how does this work with DbC and
post-conditions?
Every function is basically one of these:
A sequence of statements
A conditional statement
A loop
The idea is that you should check any postconditions about the results of the function that go beyond the union of the postconditions of all the functions called.
that you basically have to re-write
the body of the function for your
post-condition to know what the new
state is going to be
Think about it the other way round. What made you write the function in the first place? What were you pursuing? Can that be expressed in a postcondition which is more simple than the function body itself? A postcondition will typically use queries (what in C++ are const functions), while the body usually combines commands and queries (methods that modify the object and methods which only get information from it).
In some cases, yes, you will find out that you can really add little value with postconditions. In these cases, writing a bunch of tests will typically be enough.
See also:
Bertrand Meyer, Contract Driven
Development
Related questions 1, 2
Delegation at the contract level
most of the time when you call a
method of a class, it does much more
work delegating method calls to its
external dependencies
As for this first question: the implementation of a function/method may call many other function/methods, but if the designer of the code had a clear mind, this does not imply that the specification of the caller is the concatenation of the specifications of the callees. For a method that calls many others, the size of the specification can remain contained if the method accomplishes a precise and well-defined task. Which it should if the whole system was well designed.
You are clearly asking your question from the point of view of run-time assertion checking. In this context, the above would perhaps be expressed as "you don't need to re-check in the post-condition of the caller that all the callees have respected their respective contracts. These checks will already be made on each call. In the post-condition of the caller, only check the functionally visible result of the caller."
Understanding complex post-conditions
You may find this "ACSL by example" document interesting (although probably different from what you're used to). It contains many examples of formal contracts for C functions. The language of the contracts is intended for static verification instead of run-time checking, with all the advantages and the drawbacks that it entails. They are a little more sophisticated than the "Bank Accounts" that you mention — these functions implement real algorithms, although simple ones. The document keeps the contracts short and readable by introducing well-thought-out auxiliary predicates (which would be called queries in Eiffel, as Daniel points out in his answer).

What's the difference between data and code?

To take an example, consider a set of discounts available to a supermarket shopper.
We could define these rules as data in some standard fashion (lists of qualifying items, applicable dates, coupon codes) and write generic code to handle these. Or, we could write each as a chunk of code, which checks for the appropriate things given the customer's shopping list and returns any applicable discounts.
You could reasonably store the rules as objects, serialised into Blobs or stored in code files, so that each rule could choose its own division between data and code, to allow for future rules that wouldn't fit the type of generic processor considered above.
It's often easy to criticise code that mixes data in, via if statements that check for 6 different things that should be in a file or a database, but is there a rule that helps in the edge cases?
Or is this the point of Object Oriented design, to stop us worrying about the line between data and code?
To clarify, the underlying question is this: How would you code the above example? Is there a rule of thumb that made you decide what is data and what is code?
(Note: I know, code can be compiled, but in a world of dynamic languages and JIT compilation, even that is a blurry concept.)
Fundamentally, there is of course no difference between data and code, but for real software infrastructures, there can be a big difference. Apart from obvious things like, as you mentioned, compilation, the biggest issue is this:
Most sufficiently large projects are designed to produce "releases" that are one big bundle, produced in 3-month (or longer) cycles, tested extensively and cannot be changed afterwards except in tightly controlled ways. "Code" most definitely cannot be changed, so anything that does need to be changed has to be factored out and made "configuration data" so that changing it becomes palatable those whose job it is to ensure that a release works.
Of course, in most cases bad configuration data can break a release just as thoroughly as bad code, so the whole thing is largely an illusion - in reality it doesn't matter whether it's code or "configuration data" that changes, what matters is that the interface between the main system and the parts that change is narrow and well-defined enough to give you a good chance that the person who does the change understands all consequences of what he's doing.
This is already harder than most people think when it's really just a few strings and numbers that are configured (I've personally witnessed a production mainframe system crash because it had one boolean value set differently than another system it was talking to). When your "configuration data" contains complex logic, it's almost impossible to achieve. But the situation isn't going to be any better ust because you use a badly-designed ad hoc "rules configuration" language instead of "real" code.
This is a rather philosophical question (which I like) so I'll answer it in a philosophical way: with nothing much to back it up. ;)
Data is the part of a system that can change. Code defines behavior; the way in which data can change into new data.
To put it more accurately: Data can be described by two components: a description of what the datum is supposed to represent (for instance, a variable with a name and a type) and a value.
The value of the variable can change according to rules defined in code. The description does not change, of course, because if it does, we have a whole new piece of information.
The code itself does not change, unless requirements (what we expect of the system) change.
To a compiler (or a VM), code is actually the data on which it performs its operations. However, the to-be-compiled code does not specify behavior for the compiler, the compiler's own code does that.
It all depends on the requirement. If the data is like lookup data and changes frequently you dont really want to do it in code, but things like Day of the Week, should not chnage for the next 200 years or so, so code that.
You might consider changing your topic, as the first thing I thought of when I saw it, was the age old LISP discussion of code vs data. Lucky in Scheme code and data looks the same, but thats about it, you can never accidentally mix code with data as is very possible in LISP with unhygienic macros.
Data are information that are processed by instructions called Code. I'm not sure I feel there's a blurring in OOD, there are still properties (Data) and methods (Code). The OO theory encapsulates both into a gestalt entity called a Class but they are still discrete within the Class.
How flexible you want to make your code in a matter of choice. Including constant values (what you are doing by using if statements as described above) is inflexible without re-processing your source, whereas using dynamically sourced data is more flexible. Is either approach wrong? I would say it really depends on the circumstances. As Leppie said, there are certain 'data' points that are invariate, like the days of the week that can be hard coded but even there it may be advantageous to do it dynamically in certain circumstances.
In Lisp, your code is data, and your
data is code
In Prolog clauses are terms, and terms
are clauses.
The important note is that you want to separate out the part of your code that will execute the same every time, (i.e. applying a discount) from the part of your code which could change (i.e. the products to be discounted, or the % of the discount, etc.)
This is simply for safety. If a discount changes, you won't have to re-write your discount code, you'll only need to go into your discounts repository (DB, or app file, or xml file, or however you choose to implement it) and make a small change to a number.
Also, if the discount code is separated into an XML file, then you can give the entire application to a manager, and with sufficient instructions, they won't need to pester you whenever they want to change the discount rates.
When you mix in data and code, you are exponentially increasing the odds of breaking when anything changes. So, as leppie said, you need to extract the constantly changing parts, and put them in a separate place.
Huge difference. Data is a given to system while code is a part of system.
Wrong data is senseless: our code===handler is good and what you put that you take, it is not a trouble of system that you meant something else. But if code is bad - system is bad.
In example, let's consider some JSON, some bad code parser.js by me and let's say good V8. For my system bad parser.js is a code and my system works wrong. But for Google system my bad parser is data that no how says about quality of V8.
The question is very practical, no sophistic.
https://en.wikipedia.org/wiki/Systems_engineering tries to make good answer and money.
Data is information. It's not about where you decide to put it, be it a db, config file, config through code or inside the classes.
The same happens for behaviors / code. It's not about where you decide to put it or how you choose to represent it.
The line between data and code (program) is blurry. It's ultimately just a question of terminology - for example, you could say that data is everything that is not code. But, as you wrote, they can be happily mixed together (although usually it's better to keep them separate).
Code is any data which can be executed. Now since all data is used as input to some program at some point of time, it can be said that this data is executed by a program! Thus your program acts as a virtual machine for your data. Hence in theory there is no difference between data and code!
In the end what matters is software engineering/development considerations like performance, efficiency etc. For example data driven programs may not be as efficient as programs which have hard coded (and hence fragile) conditional statements. Hence I choose to define code as any data which can be efficiently executed and all else being plain data.
It's a tradeoff between flexibility and efficiency. Executable data (like XML rules) offers more flexibility (sometimes) while the same data/rules when coded as part of the application will run more efficiently but changing it frequently becomes cumbersome. In other words executable data is easy to deploy but is inefficient and vice-versa. So ultimately the decision rests with you - the software designer.
Please correct me if I wrong.
Relationship between code and data is as follows:
code after compiled to a program processes the data while execution
program can extract data, transform data, load data, generate data ...
Also
program can extract code, transform code, load code, generate code tooooooo...
Hence code without compiled or interperator is useless, data is always worth..., but code after compiled can do all the above activities....
For eg)
Sourcecontrolsystem process Sourcecodes
here source code itself is a code
Backupscripts process files
here files is a data and so on...
I would say that the distinction between data, code and configuration is something to be made within the context of a particular component. Sometimes it's obvious, sometimes less so.
For example, to a compiler, the source code it consumes and the object code it creates are both data - and should be separated from the compiler's own code.
In your case you seem to be describing the option of a particularly powerful configuration file, which can contain code. Much as, for example, the GIMP lets you 'configure' plugins using Scheme. As the developer of the component that reads this configuration, you would think of it as data. When working at a different level -- writing the configuration -- you would think of it as code.
This is a very powerful way of designing.
Applying this to the underlying question ("How would you code the above example?"), one option might be to adopt or design a high level Domain Specific Language (DSL) for specifying rules. At startup, or when first required, the server reads the rule and executes it.
Provide an admin interface allowing the administrator to
test a new rule file
replace the current configuration with that from a new rule file
... all of which would happen at runtime.
A DSL might be something as simple as a table parser or an XML parser, or it could be something as sophisticated as a scripting language. From C, it's easy to embed Python or Lua. From Java it's easy to embed Groovy or Clojure.
You could switch in compiled code at runtime, with clever linking or classloader tricks. This seems more difficult and less valuable than the embedded DSL option, in my opinion.
The best practical answer to this question I found is this:
Any class that needs to be serialized, now or in any foreseeable future, is data.
Everything else is code.
That's why, for example, Java's HashMap is data - although it has a lot of code, API methods and specific implementation (i.e., it might look as code at first glance).