Multiple types for each record in Neo4j.rb - neo4j.rb

I have a database currently represented as a set of YAML files (one record per file). I would like to port it into Neo4j. Each record has a property "type" which stores an array of types. I would like to have a module (that includes ActiveNode) for each type. Each node object would then extend the modules corresponding to its types. The only way I can think of to implement this with neo4j.rb is to generate a class for each existing combination of types and include the corresponding type modules in the class. Is there some better way to accomplish this?

More concrete examples might help. Is there a natural hierarchy to the types?
Class hierarchy for multiple labels has been supported for a while, but I just put in some changes to the master branch in the last couple of days to make it work more smoothly. You should be able to do something like this:
class Person
include Neo4j::ActiveNode
end
class Author < Person
end
class Collaborator < Person
end
class Software
include Neo4j::ActiveNode
end
class Application < Software
end
class Library < Software
end
If you did ChildType.create it would create a node with both the ParentType and ChildType labels. If a query loads a node with both labels, the ChildType model class will be used.
We've also talked about the ability to load modules to do multiple labels, though we weren't able to think of a good example, so I'd welcome one.

Related

textX: How to generate object names with ObjectProcessors?

I have a simple example model where I would like to generate names for the objects of the Position rule that were not given a name with as <NAME>. This is needed so that I can find them later with the built-in FQN scope provider.
My idea would be to do this in the position_name_generator object processor but that will be only be called after the whole model is parsed. I donĀ“t really understand the reason for that, since by the time I would need a Position object in the Project, the objects are already created, still the object processor will not be called.
Another idea would be to do this in a custom scope provider for Position.location which would then first do the name generation and then use the built-in FQN to find the Location object. Although this would work, I consider this hacky and I would prefer to avoid it.
What would be the textX way of solving this issue?
(Please take into account that this is only a small example. In reality a similar functionality is required for a rather big and complex model. To change this behaviour with the generated names is not possible since it is a requirement.)
import textx
MyLanguage = """
Model
: (locations+=Location)*
(employees+=Employee)*
(positions+=Position)*
(projects+=Project)*
;
Project
: 'project' name=ID
('{'
('use' use=[Position])*
'}')?
;
Position
: 'define' 'position' employee=[Employee|FQN] '->' location=[Location|FQN] ('as' name=ID)?
;
Employee
: 'employee' name=ID
;
Location
: 'location' name=ID
( '{'
(sub_location+=Location)+
'}')?
;
FQN
: ID('.' ID)*
;
Comment:
/\/\/.*$/
;
"""
MyCode = """
location Building
{
location Entrance
location Exit
}
employee Hans
employee Juergen
// Shall be referred to with the given name: "EntranceGuy"
define position Hans->Building.Entrance as EntranceGuy
// Shall be referred to with the autogenerated name: <Employee>"At"<LastLocation>
define position Juergen->Building.Exit
project SecurityProject
{
use EntranceGuy
use JuergenAtExit
}
"""
def position_name_generator(obj):
if "" == obj.name:
obj.name = obj.employee.name + "At" + obj.location.name
def main():
meta_model = textx.metamodel_from_str(MyLanguage)
meta_model.register_scope_providers({
"Position.location": textx.scoping.providers.FQN(),
})
meta_model.register_obj_processors({
"Position": position_name_generator,
})
model = meta_model.model_from_str(MyCode)
assert model, "Could not create model..."
if "__main__" == __name__:
main()
What is the textx way to solve this...
The use case you describe is to define the name of an object based on other model elements, including a reference to other model elements. This is currently not part of any test and use cases included in our test suite and the textx docu.
Object processors are executed at defined stages during model construction (see http://textx.github.io/textX/stable/scoping/#using-the-scope-provider-to-modify-a-model). In the described setup they are executed after reference resolution. Since the name to be defined/deduced itself is required for reference resolution, object processors cannot be used here (even if we allow to control when object processors are executed, before or after scope resolution, the described setup still will not work).
Given the dynamics of model loading (see http://textx.github.io/textX/stable/scoping/#using-the-scope-provider-to-modify-a-model), the solution is located within a scope provider (as you suggested). Here, we allow to control the order of reference resolution, such that references to the object being named by a custom procedure are postponed, until references required to deduce/define the name resolved.
Possible workaround
A preliminary sketch of how your use case can be solved is discussed in a https://github.com/textX/textX/pull/194 (with an attached issue https://github.com/textX/textX/issues/193). This textx PR contains a version of scoping.py you could probably use for your project (just copy and rename the module). A full-fledged solution could be part of the textx TEP-001, where we plan to make scoping more controllable to the end-user.
Playing around with this absolutely interesting issue revealed new aspects to me for the textx framework.
names dependent on model contents (involving unresolved references). This name resolution, which can be Postponed (in the referenced PR, see below), in terms of our reference resolution logic.
Even more interesting are the consequences of that: What happens to references pointing to locations, where unresolved names are found? Here, we must postpone the reference resolution process, because we cannot know if the name might match when resolved...
Your example is included: https://github.com/textX/textX/blob/analysis/issue193/tests/functional/test_scoping/test_name_resolver/test_issue193_auto_name.py

Why while querying ontologies we have to load the ontology, also provide its namespace?

I wonder why we have to load an ontology, also provide its namespace while querying it? Why loading the ontology is not enough?
To understand my question better, here is a sample code:
g = rdflib.Graph()
g.parse('ppp.owl', format='turtle')
ppp = rdflib.Namespace('http://purl.org/xxx/ont/ppp/')
g.bind('ppp', ppp)
In line 2, we have opened the ontology (ppp.owl), but in line 3 we also provided its namespace. Does namespace show the program how to handle the ontology?
Cheers,
RF
To specify an element over the semantic web you need its URI: Unique Resource Identifier, which is composed of the namespace and the localname. For example, consider Person an RDF class; how would you differentiate the Person DBpedia class http://dbpedia.org/ontology/Person from Person in some other ontology somewhere? you need the namespace http://dbpedia.org/ontology/ and the local name Person. Which both uniquely identify the class.
Now coming back to your specific question, when you query the ontology, you might use multiple namespaces, some namespaces may not be the one of your ontology. You need other namespaces for querying your own ontology, e.g. rdf, rdfs, and owl. As an example, you can rarely write an arbitrary query without rdf:type property, which is included under the rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns> namespace, not your ontology namespace. As a consequence, you need to specify the namespace.
Well, now as you should know why to use a namespace, then we can proceed. Why to repeat the whole string of the namespace each time it is needed? It is nothing more than a prefix string appended to the local names to use in the query, to avoid writing exhaustively the full uri. See the difference between <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> and type.
Edit
As #AKSW says, as a conclusion, there is no need to declare a namespace in order to work with the ontology but it increases the convenience when working quite often with resources whose URI has particular namespace.

Neo4j.rb returns models with incorrect class for nodes with multiple labels

I have two ActiveNode models:
class Company
include Neo4j::ActiveNode
end
and
class Entity
include Neo4j::ActiveNode
end
They correspond to the labels "Entity" and "Company", which are attached to the same node. So, a node and be an entity and a company.
In my console, when I attempt the following query:
Entity.where(entity_id: 1).first
It returns a Company object:
#<Company uuid: entity_id: 1>
I don't want that. If I ask for an entity, I want an entity returned. The Entity model have different methods defined than the Company model. Is there anyway I can enforce the correct behavior? It seems pretty pretty counter intuitive that it behaves in this way.
I am using neo4j 3.0 and neo4j.rb 7.0.3
This is a good point. If both labels could be matched, it should use the one for the class which was used to do the find.
I'm curious about your modeling, though. Can a Company node ever not be an Entity or vice versa? Or is, for example, a Company always a kind of an Entity? If so, you might want to use inheritence:
class Entity
include Neo4j::ActiveNode
end
class Company < Entity
# No need to include Neo4j::ActiveNode
end
But it's partially a question of if it makes sense for Company nodes to inherit the behavior/logic of Entity

M2M relationship or 2 FKs?

Which of the following structures would be preferable:
# M2M
class UserProfile(models.Model):
...
groups = models.ManyToManyField(Group)
class Group(models.Model):
...
or -
# 2 FKs
class UserProfile(models.Model):
...
class Group(models.Models):
...
class GroupMember(models.Model):
user = models.ForeignKey(UserProfile)
group = models.ForeignKey(Group)
Which would be better?
You also can combine these 2 variants using through option
groups = models.ManyToManyField(Group, through='GroupMember')
What do you mean by better? Usually you don't need to create intermediate model (except the case when you have to store extra data).
ManyToManyField does his job perfectly, so don't write its functionality by yourself.
The two are essentially the same. When you do a M2M Django automatically creates a intermediary model, which is pretty much exactly like your GroupMember model. However, it also sets up some API hooks allowing you to access the Group model directly from the UserProfile model, without have to mess with the intermediary model.
You can get the same hooks added back by using through as #San4ez explains, but you've only made things more complicated. Creating a custom through model is only beneficial if you need to add additional fields to the relationship. Otherwise, stick with the default.
Long and short, #1 is better, only because it's exactly the same as #2, but simpler and with no extraneous code.

Create separate classes for insert and save

Is this a good idea? Instead of create a class with two method (insert and update) and two validation methods (validateInsert and validateUpdate), create three classes: one called ProductDB, another ProductInsert (with methods Insert and Validate) and another ProductUpdate (with same methods of ProductInsert).
Is this more readable, flexible and testable?
PaulG's answer leans more towards the traditional domain object pattern, which I'm not in favor of. Personally, my preference is to have a separate class for each process (like your ProductInsert and ProductUpdate). This is akin to what one sees in the simple bank example where Deposit is a instance of a class as opposed to a method on a BankAccount class. When you start thinking about business processes that have more stuff, like rules and actions to be taken and auditing/persistence of the action itself (say a ProductInsert table to track insertions), the more you realize the business process should be a first class citizen in its own right.
This sounds like a language-independent question. I would just create the one class and call it Product, and have the appropriate methods within the class. Think about what a mess it would be when actually instantiating your separate objects (unless you have static methods).
Also having a concrete Product class will allow you to store object specific information.
Ex:
Product myProduct = new Product()
myProduct.name = "cinnamon toast crunch"
myProduct.price = 3.99
In my opinion have separate classes would make your code a lot less readable and testable.