Using attributes in Chef - json

Just getting started with using chef recently. I gather that attributes are stored in one large monolithic hash named node that's available for use in your recipes and templates.
There seem to be multiple ways of defining attributes
Directly in the recipe itself
Under an attributes file - e.g. attributes/default.rb
In a JSON object that's passed to the chef-solo call. e.g. chef-solo -j web.json
Given the above 3, I'm curious
Are those all the ways attributes can be defined?
What's the order of precedence here? I'm assuming one of these methods supercedes the others
Is #3 (the JSON method) only valid for chef-solo ?
I see both node and default hashes defined. What's the difference? My best guess is that the default hash defined in attributes/default.rb gets merged into the node hash?
Thanks!

Your last question is probably the easiest to answer. In an attributes file you don't have to type 'node' so that this in attributes/default.rb:
default['foo']['bar']['baz'] = 'qux'
Is exactly the same as this in recipes/whatever.rb:
node.default['foo']['bar']['baz'] = 'qux'
In retrospect having different syntaxes for recipes and attributes is confusing, but this design choice dates back to extremely old versions of Chef.
The -j option is available to chef-client or chef-solo and will both set attributes. Note that these will be 'normal' attributes which are persistent in the node object and are generally not recommended to use. However, the 'run_list', 'chef_environment' and 'tags' on servers are implemented this way. It is generally not recommended to use other 'normal' attributes and to avoid node.normal['foo'] = 'bar' or node.set['foo'] = 'bar' in recipe (or attribute) files. The difference is that if you delete the node.normal line from the recipe the old setting on a node will persist, while if you delete a node.default setting out of a recipe then when your run chef-client on the node that setting will get deleted.
What happens in a chef-client run to make this happen is that at the start of the run the client issues a GET to get its old node document from the server. It then wipes the default, override and automatic(ohai) attributes while keeping the 'normal' attributes. The behavior of the default, override and automatic attributes makes the most sense -- you start over at the start of the run and then construct all the state, if its not in the recipe then you don't see a value there. However, normally the run_list is set on the node and nodes do not (often) manage their own run_list. In order to make the run_list persist it is a normal attribute.
The choice of the word 'normal' is unfortunate, as is the choice of 'node.set' setting 'normal' attributes. While those look like obvious choices to use to set attributes users should avoid using those. Again the problem is that they came first and were and are necessary and required for the run_list. Generally stick with default and override attributes only. And typically you can get most of your work done with default attributes, those should be preferred.
There's a big precedence level picture here:
https://docs.chef.io/attributes.html#attribute-precedence
That's the ultimate source of truth for attribute precedence.
That graph describes all the different ways that attributes can be defined.
The problem with Chef Attributes is that they've grown organically and sprouted many options to try to help out users who painted themselves into a corner. In general you should never need to touch automatic, normal, force_default or force_override levels of attributes. You should also avoid setting attributes in recipe code. You should move setting attributes in recipes to attribute files. What this leaves is these places to set attributes:
in the initial -j argument (sets normal attributes, you should limit using this to setting the run_state, over using this is generally smell)
in the role file as default or override precedence levels (careful with this one though because roles are not versioned and if you touch these attributes a lot you will cause production issues)
in the cookbook attributes file as default or override precedence levels (this is where you should set most of your attributes)
in environment files as default or override precedence levels (can be useful for settings like DNS servers in a datacenter, although you can use roles and/or cookbooks for this as well)
You also can set attributes in recipes, but when you do that you invariably wind up getting your next lesson in the two-phase compile-converge parser that runs through the Chef Recipes. If you have recipes that need to communicate with each other its better to use the node.run_state which is just a hash that doesn't get written as node attributes. You can drop node.run_state[:foo] = 'bar' in one recipe and read it in another. You probably will see recipes that set attributes though so you should be aware of that.
Hope That Helps.

When writing a cookbook, I visualize three levels of attributes:
Default values to converge successfully -- attributes/default.rb
Local testing override values -- JSON or .kitchen.yml (have you tried chef_zero using ChefDK and Kitchen?)
Environment/role override values -- link listed in lamont's answer: https://docs.chef.io/attributes.html#attribute-precedence

Related

Blocking in cross validation in mlr with subject id

I have a dataset with multiple observations by participant. Participants are denoted by id. To account for this in the cross validation process, I add blocking = factor(id) to makeClassifTask() and blocking.cv = TRUE to makeResampleDesc(). However, if I leave id in the dataset, it will be used as a predictor. My question is: How do I correctly use blocking? My take would be to create a new variable, e.g. participant.id (outside of the dataset), next to remove id from the original dataset and then to use blocking = factor(participant.id), but I am not sure if this is the correct way to handle blocking.
Rather than supplying a variable for blocking you can provide a custom factor vector that specifies the observations which belong together. This is also shown in the tutorial.
This way you do not need to have the variable "participant.id" in the dataset.
Also make sure that you really want to use "blocking". Did you have a look at "grouping" already? The differences between both are also described in the linked tutorial section.

Chef - expressions in `kitchen.yml` attributes?

In kitchen.yml, I would like to have an expression in the attributes: part. However it seems it is just a static file with literal values.
Is it somehow possible to have the values in attributes: evaluated?
The reason for that need is that I have some node.defaults in defaults.rb, and some of them are URLs at the same host, say, http:foo.org/service. And in the kitchen.yml I want to parametrize the host. So I would have:
...
attributes: { serviceX_baseURL: "http://bar.org/service" }
I want the override to happen with kitchen_*.yml override and not attributes/*.rb (that would be easier) because the override happens later in the process, after the main kitchen.yml file is already generated.
Any practical solutions for that are welcome.
You can use Erb formatting in the .kitchen.yml for some very simplistic templating, but you didn't really give a concrete example. Chances are you should not do this, generally parameterizing both the code and tests on the same input means the tests are brittle or not testing what you think they are.

IBM Websphere scripting ID 'containment paths' explained (I.e. AdminConfig.getid(...) )

I have been digging through IBM knowledge center and have returned frustrated at the lack of definition to some of their scripting interfaces.
IBM keeps talking about a containment path used in their id definitions and I am only assuming that it corresponds to an xml element within file in the websphere config folder hierachy, but that's only from observation. I haven't found this declared in any documentation as yet.
Also, to find an ID, there is a syntax that needs to be used that will retrieve it, yet I cannot find a reference to the possible values of 'types' used in the AdminConfig.getid(...) function call.
So to summarize I have a couple questions:
What is the correct definition of the id "heirachy" both for fetching an id and for the id itself?
e.g. to get an Id of the server I would say something like: AdminConfig.getid('/Cell:MYCOMPUTERNode01Cell/Node:MYCOMPUTERNode01/Server:server1'), which would give me an id something like: server1(cells/MYCOMPUTERNode01Cell/nodes/MYCOMPUTERNode01/servers/server1|server.xml#server_1873491723429)
What are the possible /type values in retrieving the id from the websphere server?
e.g. in the above example, /Cell, /Node and /Server are examples of type values used in queries.
UPDATE: I have discovered this document that outlines the locations of various configuration files in the configuration directory.
UPDATE: I am starting to think that the configuration files represent complex objects with attributes and nested attributes. not every object with an 'id' is query-able through the AdminConfig.getid(<containment_path>). Object attributes (and this is not attributes in the strict xml sense as 'attributes' could be nested nodes with a simple structure within the parent node) may be queried using AdminConfig.showAttribute(<element_id>, <attribute_name>). This function will either return the string value of the inline attribute of the element itself or a string list representation of the ids of the nested attribute node(s).
UPDATE: I have discovered the AdminConfig.types() function that will display a list of all manipulatible object types in the configuration files and the AdminConfig.parent() function that displays what nodes are considered parents of the specified node.
Note: The AdminConfig.parent() function does not reveal the parents in order of their hierachy, rather it seems to just present a list of parents. e.g. AdminConfig.parent('JDBCProvider') gives us this exact list: 'Cell\nDeployment\nNode\nServer\nServerCluster' and even though Server comes before ServerCluster , running AdminConfig.parent('Server') reveals it's parents as: 'Node\nServerCluster'. Although some elements can have no parents - e.g. Cell - Some elements produce an error when running the parent function - e.g. Deployment.
Due to the apparent lack of a reciprocal AdminConfig.children() function, It appears as though obtaining a full hierachical tree of parents/children lies in calling this function on each of the elements returned from the AdminConfig.parent(<>) call and combining the results. That being said, a trial and error approach is mostly fruitful due to the sometimes obvious ordering.

What are the actual advantages of the visitor pattern? What are the alternatives?

I read quite a lot about the visitor pattern and its supposed advantages. To me however it seems they are not that much advantages when applied in practice:
"Convenient" and "elegant" seems to mean lots and lots of boilerplate code
Therefore, the code is hard to follow. Also 'accept'/'visit' is not very descriptive
Even uglier boilerplate code if your programming language has no method overloading (i.e. Vala)
You cannot in general add new operations to an existing type hierarchy without modification of all classes, since you need new 'accept'/'visit' methods everywhere as soon as you need an operation with different parameters and/or return value (changes to classes all over the place is one thing this design pattern was supposed to avoid!?)
Adding a new type to the type hierarchy requires changes to all visitors. Also, your visitors cannot simply ignore a type - you need to create an empty visit method (boilerplate again)
It all just seems to be an awful lot of work when all you want to do is actually:
// Pseudocode
int SomeOperation(ISomeAbstractThing obj) {
switch (type of obj) {
case Foo: // do Foo-specific stuff here
case Bar: // do Bar-specific stuff here
case Baz: // do Baz-specific stuff here
default: return 0; // do some sensible default if type unknown or if we don't care
}
}
The only real advantage I see (which btw i haven't seen mentioned anywhere): The visitor pattern is probably the fastest method to implement the above code snippet in terms of cpu time (if you don't have a language with double dispatch or efficient type comparison in the fashion of the pseudocode above).
Questions:
So, what advantages of the visitor pattern have I missed?
What alternative concepts/data structures could be used to make the above fictional code sample run equally fast?
For as far as I have seen so far there are two uses / benefits for the visitor design pattern:
Double dispatch
Separate data structures from the operations on them
Double dispatch
Let's say you have a Vehicle class and a VehicleWasher class. The VehicleWasher has a Wash(Vehicle) method:
VehicleWasher
Wash(Vehicle)
Vehicle
Additionally we also have specific vehicles like a car and in the future we'll also have other specific vehicles. For this we have a Car class but also a specific CarWasher class that has an operation specific to washing cars (pseudo code):
CarWasher : VehicleWasher
Wash(Car)
Car : Vehicle
Then consider the following client code to wash a specific vehicle (notice that x and washer are declared using their base type because the instances might be dynamically created based on user input or external configuration values; in this example they are simply created with a new operator though):
Vehicle x = new Car();
VehicleWasher washer = new CarWasher();
washer.Wash(x);
Many languages use single dispatch to call the appropriate function. Single dispatch means that during runtime only a single value is taken into account when determining which method to call. Therefore only the actual type of washer we'll be considered. The actual type of x isn't taken into account. The last line of code will therefore invoke CarWasher.Wash(Vehicle) and NOT CarWasher.Wash(Car).
If you use a language that does not support multiple dispatch and you do need it (I can honoustly say I have never encountered such a situation though) then you can use the visitor design pattern to enable this. For this two things need to be done. First of all add an Accept method to the Vehicle class (the visitee) that accepts a VehicleWasher as a visitor and then call its operation Wash:
Accept(VehicleWasher washer)
washer.Wash(this);
The second thing is to modify the calling code and replace the washer.Wash(x); line with the following:
x.Accept(washer);
Now for the call to the Accept method the actual type of x is considered (and only that of x since we are assuming to be using a single dispatch language). In the implementation of the Accept method the Wash method is called on the washer object (the visitor). For this the actual type of the washer is considered and this will invoke CarWasher.Wash(Car). By combining two single dispatches a double dispatch is implemented.
Now to eleborate on your remark of the terms like Accept and Visit and Visitor being very unspecific. That is absolutely true. But it is for a reason.
Consider the requirement in this example to implement a new class that is able to repair vehicles: a VehicleRepairer. This class can only be used as a visitor in this example if it would inherit from VehicleWasher and have its repair logic inside a Wash method. But that ofcourse doesn't make any sense and would be confusing. So I totally agree that design patterns tend to have very vague and unspecific naming but it does make them applicable to many situations. The more specific your naming is, the more restrictive it can be.
Your switch statement only considers one type which is actually a manual way of single dispatch. Applying the visitor design pattern in the above way will provide double dispatch.
This way you do not necessarily need additional Visit methods when adding additional types to your hierarchy. Ofcourse it does add some complexity as it makes the code less readable. But ofcourse all patterns come at a price.
Ofcourse this pattern cannot always be used. If you expect lots of complex operations with multiple parameters then this will not be a good option.
An alternative is to use a language that does support multiple dispatch. For instance .NET did not support it until version 4.0 which introduced the dynamic keyword. Then in C# you can do the following:
washer.Wash((dynamic)x);
Because x is then converted to a dynamic type its actual type will be considered for the dispatch and so both x and washer will be used to select the correct method so that CarWasher.Wash(Car) will be called (making the code work correctly and staying intuitive).
Separate data structures and operations
The other benefit and requirement is that it can separate the data structures from the operations. This can be an advantage because it allows new visitors to be added that have there own operations while it also allows data structures to be added that 'inherit' these operations. It can however be only applied if this seperation can be done / makes sense. The classes that perform the operations (the visitors) do not know the structure of the data structures nor do they have to know that which makes code more maintainable and reusable. When applied for this reason the visitors have operations for the different elements in the data structures.
Say you have different data structures and they all consist of elements of class Item. The structures can be lists, stacks, trees, queues etc.
You can then implement visitors that in this case will have the following method:
Visit(Item)
The data structures need to accept visitors and then call the Visit method for each Item.
This way you can implement all kinds of visitors and you can still add new data structures as long as they consist of elements of type Item.
For more specific data structures with additional elements (e.g. a Node) you might consider a specific visitor (NodeVisitor) that inherits from your conventional Visitor and have your new data structures accept that visitor (Accept(NodeVisitor)). The new visitors can be used for the new data structures but also for the old data structures due to inheritence and so you do not need to modify your existing 'interface' (the super class in this case).
In my personal opinion, the visitor pattern is only useful if the interface you want implemented is rather static and doesn't change a lot, while you want to give anyone a chance to implement their own functionality.
Note that you can avoid changing everything every time you add a new method by creating a new interface instead of modifying the old one - then you just have to have some logic handling the case when the visitor doesn't implement all the interfaces.
Basically, the benefit is that it allows you to choose the correct method to call at runtime, rather than at compile time - and the available methods are actually extensible.
For more info, have a look at this article - http://rgomes-info.blogspot.co.uk/2013/01/a-better-implementation-of-visitor.html
By experience, I would say that "Adding a new type to the type hierarchy requires changes to all visitors" is an advantage. Because it definitely forces you to consider the new type added in ALL places where you did some type-specific stuff. It prevents you from forgetting one....
This is an old question but i would like to answer.
The visitor pattern is useful mostly when you have a composite pattern in place in which you build a tree of objects and such tree arrangement is unpredictable.
Type checking may be one thing that a visitor can do, but say you want to build an expression based on a tree that can vary its form according to a user input or something like that, a visitor would be an effective way for you to validate the tree, or build a complex object according to the items found on the tree.
The visitor may also carry an object that does something on each node it may find on that tree. this visitor may be a composite itself chaining lots of operations on each node, or it can carry a mediator object to mediate operations or dispatch events on each node.
You imagination is the limit of all this. you can filter a collection, build an abstract syntax tree out of an complete tree, parse a string, validate a collection of things, etc.

Deal with undefined values in code or in the template?

I'm writing a web application (in Python, not that it matters). One of the features is that people can leave comments on things. I have a class for comments, basically like so:
class Comment:
user = ...
# other stuff
where user is an instance of another class,
class User:
name = ...
# other stuff
And of course in my template, I have
<div>${comment.user.name}</div>
Problem: Let's say I allow people to post comments anonymously. In that case comment.user is None (undefined), and of course accessing comment.user.name is going to raise an error. What's the best way to deal with that? I see three possibilities:
Use a conditional in the template to test for that case and display something different. This is the most versatile solution, since I can change the way anonymous comments are displayed to, say, "Posted anonymously" (instead of "Posted by ..."), but I've often been told that templates should be mindless display machines and not include logic like that. Also, other people might wind up writing alternate templates for the same application, and I feel like I should be making things as easy as possible for the template writer.
Implement an accessor method for the user property of a Comment that returns a dummy user object when the real user is undefined. This dummy object would have user.name = 'Anonymous' or something like that and so the template could access it and print its name with no error.
Put an actual record in my database corresponding to a user with user.name = Anonymous (or something like that), and just assign that user to any comment posted when nobody's logged in. I know I've seen some real-world systems that operate this way. (phpBB?)
Is there a prevailing wisdom among people who write these sorts of systems about which of these (or some other solution) is the best? Any pitfalls I should watch out for if I go one way vs. another? Whoever gives the best explanation gets the checkmark.
I'd go with the first option, using an if switch in the template.
Consider the case of localization: You'll possibly have different templates for each language. You can easily localize the "anonymous" case in the template itself.
Also, the data model should have nothing to do with the output side. What would you do in the rest of the code if you wanted to test whether a user has a name or not? Check for == 'Anonymous' each time?
The template should indeed only be concerned with outputting data, but that doesn't mean it has to consist solely of output statements. You usually have some sort of if user is logged in, display "Logout", otherwise display "Register" and "Login" case in the templates. It's almost impossible to avoid these.
Personally, I like for clean code, and agree that templates should not have major logic. So in my implementations I make sure that all values have "safe" default values, typically a blank string, pointer to a base class or equivalent. That allows for two major improvements to the code, first that you don't have to constantly test for null or missing values, and you can output default values without too much logic in your display templates.
So in your situation, making a default pointer to a base value sounds like the best solution.
Your 3rd option: Create a regular User entity that represents an anonymous user.
I'm not a fan of None for database integrity reasons.