JSON design for a web service - how to express attributes - json

On my team there seem to be two schools of thought in designing our restful interface. This is a series of endpoints that themselves are a product we sell, so the customer interface is JSON. The question is how to express attributes of an object. My thoughts are to have attributes as explicit fields for a resource like this:
{
"myThing": [
{
"color": "blue",
"number": "212"
}
]
}
With this approach, you can understand the object by knowing its attributes, and it feels more elegant as well as explicit. Documentation is straight-forward for business people who are prospective customers.
However, there is a school of thought from some of our developers that the following is the preferred way to express attributes - I believe because we have a lot of attributes and it's easier to manage, though there may be other benefits. However, I don't see the architecture of this information as being as user-friendly or explicit, and requires cross-referencing documentation more extensively.
{
"myThing": [
{
"attribute": "color",
"value": "blue"
},
{
"attribute": "number",
"value": "212"
}
]
}
My question - aside from feeling the first approach is more intuitive, finding best practices or a persuasive argument for one approach over another is difficult. Can anyone point me to some best practice JSON design that would favor one over the other? Thanks!

Oh this is simple to qualify in some basic terms, however both can work.The trick is deciding which one is going to be more successful? It helps to think of successful vs unsuccessful, the whole right or wrong argument is very much based on personal opinions and is such a massive waste of time. Never argue this is the right way to do it with tech guys you will have an easier time herding cats.
So lets break down the model into Pro And Cons.
Approach one is a traditional model:
Pro:
Objects and attributes are well defined.
Attributes can be validated as they are known i.e. this a date.
Easier to understand and document.
Cons:
Attributes are static and expanding the attribute set will require us to update a code and potentially introduce problems with clients as they are aware of the attributes.
Approach two which is known as the Entity Attribute Value design pattern(a BIG no no in relation database design by the way).
Pro:
Adding attributes to an object does not require that much changes. Also its unlikely to break clients etc.
Cons:
Documentation is harder.
Data validation is much harder, the colour attribute in your example should it be a string like Blue? Or a HEX like #ffae12? Maybe CMYK? Its get harder to validate and document these type of things.
Requires developers/customers to understand the mechanism to set and get values i.e. it break the normal programming paradigms.
How to choose which model to use?
The principle here is to know your data. If the objects have loads of different attributes and they are pretty dynamic i.e. you add or remove attributes a lot of the time then approach 2 is better. Social media type data for example is a good fit.
If however you are dealing with financial transaction data i.e. debits, credits etc you want the highest validation possible and besides making payments and receiving payments have not changed it's unlikely that you would introduce a third account number to a credit transaction object. Not saying it is impossible but highly unlikely.
As always keep an open mind and use the most appropriate tool for the job when you need it. Don't blindly follow a school of thought. Understand your tools, your data and then it's rather easy to innovate.

Related

Has the word Abstraction other interpretations in computing?

I am quite new in programming and what haunts me about it is not really the coding (well at least not until the present moment!) itself, but some words/concepts that are really important to understand. My doubt is with the word "ABSTRACTION". I have already searched dictionaries and saw some videos of people giving very clear explanations of the word. So, I know that abstraction is when you take into consideration only the things that are important and leave out everything else (putting in very simple and direct language), like for instance, if you are going to change a light bulb, you do not need to know the manufacturer of the light bulb or the light socket. You also do not need to know the materials used to manufacture the light bulb. However, the problem is when you read some texts or listen to people using the word and it does not seem to fit the meaning and then you start to wonder if they misused the word (which I think is very unlikely) or it is because there is another obscure meaning that I have not found yet or maybe it is just because I am too dumb to understand it. Below I put excerpts from articles I was reading and bolded and capitalized the part where the word appears so you guys have a context and understand where my problem is. Thank you.
"A paradigm programming provides and determines the view that the programmer has on the structuring and execution of the programme. For example, in object-oriented programming, programmers MAY ABSTRACT A PROGRAMME AS A COLLECTION OF OBJECTS that interact with each other, while in functional programming, programmers ABSTRACT THE PROGRAMME as a sequence of functions executed in a stacked fashion."
"A tuple space has the function of creating a SHARED MEMORY ABSTRACTION over a distributed system, where everyone can read and write to it."
It's easy to understand if you replace abstract/abstraction with one of its synonyms conceptualize/conceptualization. In your first two examples "abstract a programme" means "think of a programme as"... or "conceptualize a programme as"... When we make an abstraction we forget about some details, and think about that thing in other terms.
Side advice from a fellow beginner:
As someone who started learning computer science independently less than a year ago, I can tell you right now there will be lots of tricky terms like this. Try not to get too caught up in them. Often times if you just keep learning, you'll experience first hand what these terms mean without even realizing it. Bits and pieces will add up. The takeaway from this being, don't let what you don't know slow you down. Sometimes it's ok to keep going and just not know for a while.
These seem to fit the definition you put up earlier. For object oriented programming, the mindset is to consider "objects" as the essential (important) aspect of a program and abstract all other considerations away. Same thing for functional programming where "functions" are the defining aspect abstracting other considerations as secondary.
The tuple space may be a little trickier but if you consider that variations in memory storage models are abstracted away in favour of a higher level concept focusing on a collection of values, then you see what the abstraction relates to.
Abstract
adjective
existing in thought or as an idea but not having a physical or concrete existence.
relating to or denoting art that does not attempt to represent external reality, but rather seeks to achieve its effect using shapes, colours, and textures.
verb
consider something theoretically or separately from (something else).
extract or remove (something).
noun
a summary of the contents of a book, article, or speech.
an abstract work of art.
There you have your answer. Ask 100 people what an abstract painting is, you will get at least 100 answers. Why should programmers behave differently?
Lets see what Oracle has to say about abstract classes:
Abstract classes are similar to interfaces. You cannot instantiate them, and they may contain a mix of methods declared with or without an implementation. However, with abstract classes, you can declare fields that are not static and final, and define public, protected, and private concrete methods.
Consider using abstract classes if any of these statements apply to your situation:
You want to share code among several closely related classes.
You expect that classes that extend your abstract class have many common methods or fields, or require access modifiers other than public (such as protected and private).
You want to declare non-static or non-final fields. This enables you to define methods that can access and modify the state of the object to which they belong.
Compare that with the definition of abstract in the above section. I think you get a pretty good idea of abstractness in computer programming.

How can I harnest Wikidata to build a Siri-like service?

I'd like to discuss the first part of this Siri-like service.
Ideally, I'd like to be able to query for things like:
"the social network"
"beethoven"
"bad blood taylor swift"
And get results like this:
{type:"film"}
{type:"composer"}
{type:"song"}
I care about nothing else, I find descriptions, images and general information utterly useless outside Wikipedia. I see Wikidata as a meta-data service that can provide me with the semantics of the text I search for.
Do all data structures have "types" or some kind of a property that has to do with its meaning? Is there a list of all the types? Is there a suggestions feature for entities that have double meaning like "apple"? Finally, how can I send a text query and read the "type" of the response data structure?
I know I'm not providing any code but I really can't wrap my head around Wikidata's API. I've searched everywhere and all I can't find are some crippled fetch examples and messed up Objective-C HTML parsers. I can't even get their "example query" page to work because of some error I don't understand.
Really newbie not-friendly and full of heavy terminology.
The problem with Wikidata's API is that it does not have a query interface. All it does is return information for a specific data item, if you already know the ID. We have simply not been able to build a query interface yet that is powerful enough and able to scale. There is an early beta of a SPARQL endpoint though: https://tools.wmflabs.org/ppp-sparql/.
Once that is up and running, we hope to provide easier to use services on top of this, like Magnus' WDQ http://magnusmanske.de/wordpress/?p=72.
(Edit to answer the concrete questions about the API:)
I've searched everywhere and all I can't find are some crippled fetch examples
Documentation could be nicer, but https://www.wikidata.org/wiki/Wikidata:Data_access is a good start. Also note that https://www.wikidata.org/w/api.php is self-documenting. In particular, have a look at https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities and https://www.wikidata.org/w/api.php?action=help&modules=wbsearchentities
Do all data structures have "types" or some kind of a property that has to do with its meaning?
All statements about a data item have to do with its meaning. Many have a statement about the "instance of" (P31) or "subclass of" (P279) property, which is pretty close to what you want, I suppose.
Is there a list of all the types?
No. Wikidata doesn't use a closed, pre-defined ontology to describe the world. It's a platform to describe the world collaboratively, in a machine readable way; from that, a fluid ontology emerges, which is never quite complete or consistent.
Any data item can serve as the class or suprt-class of another item. An item can be an instance or subclass of multiple classes. The relationships are quite complex.
Is there a suggestions feature for entities that have double meaning like "apple"?
There is a search interface that can list all matching data items for a given term. It's called wbsearchentities, for instance https://www.wikidata.org/w/api.php?action=wbsearchentities&search=apple&language=en (add format=json for machine readable JSON).
However, the ranking in the result is very naive. And without the semantic context of the original sentence, there is no way to find which word sense is meant. This is an interesting area of research called "word sense disambiguation".
Finally, how can I send a text query and read the "type" of the response data structure?
At the moment, you will have to do two API calls: one to wbsearchentities to get the ID of the entity you are interested in, and one to wbgetentities to get the instance-of statement for that entity. It would be nice to combine this in a single call; there's a ticket open for this: https://phabricator.wikimedia.org/T90693
As to Siri-like services: an early prototype called "wiri" by Magnus Manske has been around for a long time. It uses very simple patterns though: https://tools.wmflabs.org/magnus-toolserver/thetalkpage/
Bene* has been working on a more advanced approach for natural language question answering, see the Platypus Demo: https://projetpp.github.io/demo.html
Just yesterday, he presented a new prototype he has been developing together with Tpt, which generates SPARQL queries from natural language input: https://tools.wmflabs.org/ppp-sparql/
All of these projects are open source, and were created by enthusiastic volunteers. Look at the code and talk to them. :)

When is it worth creating an XML Schema (XSD)?

I'm curious of thoughts on how one would want to apply schema to a list of completed projects, for example, a listing of projects that were completed by an architecture firm.
So let's say you have a list of projects that were completed, consisting of information such as the date, location, description, etc.
I don't know if it is necessarily considered a Creative Work, or a place. I'm considering using the general ItemList/Item properties but not sure if there is much value in it. So having said that, would anyone expect this to be beneficial or worth doing?
You tagged your question with seo and html, but those are not reasons to define an XML Schema (XSD). You mention "Creative Work", but that concept is also immaterial to XSDs and their benefits.
The Value of Defining an XSD
There is value in defining an XSD in so far as you, or the software you use, or the partners with whom you exchange data would benefit from the clear definition and automatic validation of structured documents.
If these reasons are not relevant to your work, you probably don't need to define an XSD.

Creative Terminology

I seem to use bland words such as node, property, children (etc) too often, and I fear that someone else would have difficulty understanding my code simply because the parts' names are vague, common words.
How do you find creative names for classes and components to make them more memorable?
I am particularly having trouble with generic tools which have no real description except their rather generic functional purpose. I would like to know if others have found creative ways to name things rather than simply naming them by their utility, such as AnonymousFunctionWrapperCallerExecutorFactory.
It's hard to answer. I find them just because they seem to 'fit'.
What I do know, however, is that I find it basically impossible to move on writing code unless something is named correctly, and it 'feels' good. If it isn't named right, I find it hard to use, and the code is generally confusing.
I'm not too concerned about something being 'memorable', only 'accurate'.
I have been known to sit around thinking out loud about what to name something. Take your time, and make sure you are really happy with the name. don't be afraid of using common/simple words.
I don't really have an answer, but three things for you to think about.
The late Phil Karlton famously said: "There are only two hard problems in computer science. Cache Invalidation and Naming Things." So, the fact that you are having trouble coming up with good names is entirely normal and even expected.
OTOH, having trouble naming things can also be a sign of bad design. (And yes, I am perfectly aware, that #1 and #2 contradict each other. Or maybe one should think of it more like balancing each other.) E.g., if a thing has too many responsibilities, it is pretty much impossible to come up with a good name. (Witness all the "Service", "Util", "Model" and "Manager" classes in bad OO designs. Here's an example Google Code Search for "ManagerFactoryFactory".)
Also, your names should map to the domain jargon used by subject matter experts. If you can't find a subject matter expert, that's a sign that you are currently worrying about code that you're not supposed to worry about. (Basically, code that implements your core business domain should be implemented and designed well, code in ancillary domains should be implemented and designed so-so, and all other code should not be implemented or designed at all, but bought from a vendor, where what you are buying is their core business domain. [Please interpret "buy" and "vendor" liberally. Community-developed Free Software is just fine.])
Regarding #3 above, you mentioned in another comment that you are currently working on implementing a tree data structure. Unless your company is in the business of selling tree data structures, that is not a part of your core domain. And the reason that you have trouble finding good names could be that you are working outside your core domain. Now, "selling tree data structures" may sound stupid, but there are actually companies that do that. For example, the BCL team inside Microsoft's developer division: they actually sell (well, for certain definitions of "sell", anyway) the .NET framework's Base Class Libraries, which include, among others, tree data structures. But note that for example Microsoft's C++ compiler team actually (literally) buys their STL from a third-party vendor – they figure that their core domain is writing compilers, and they leave the writing of libraries to a company who considers writing STLs their core domain. (And indeed, AFAIK, that company does nothing but write and sell STL implementations. That's their sole product.)
If, however, selling tree data structures is your core domain, then the names you listed are just fine. They are the names that subject matter experts (programmers, in this case) use when talking about the domain of tree data structures.
Using 'metaphors' is a common theme in agile (and pattern) literature.
'Children' (in your question) is an example of a metaphor that is extensively used and for good reasons.
So, I'd encourage the use of metaphors, provided they are applicable and not a stretch of the imagination.
Metaphors are everywhere in computing. From files to bugs to pointers to streams... you can't avoid them.
I believe that for the purpose of standardization and communication, it's good to use a common vocab, like in the same case for design patterns. I have a problem with a programmer who keeps 'inventing' his own terms and I have trouble understanding him. (He kept using the term 'events orchestrating' instead of 'scripting' or 'FCFS process'. Kudos for creativity though!)
Those common vocab describe stuff we are used to. A node is a point, somewhere in a graph, in a tree, or what-not. One way is to be specific to the domain. If we are doing a mapping problem, instead of 'node', we can use 'location'. That helps in a sense, at least for me. So I find there is a need to balance being able to communicate with other programmers, and at the same time keeping the descriptor specific enough to help me remember what it does.
I think node, children, and property are great names. I can already guess the following about your classes, just by their "bland" names:
Node - this class is part of a graph of objects
children - this variable holds a list of nodes belonging to the containing node.
I don't think "node" is either vague or common, and if you're coding a generic data structure, it's probably ok to have generic names! (With that being said, if you are coding up a tree, you could use something like TreeNode to emphasize that the node is part of a tree.) One way you can make the life of developers who will use your API easier is to follow the naming conventions of your platform's built in libraries. If everyone calls a node a node, and an iterator an iterator, it makes life easy.
Names that reflect the purpose of the class, method or property are more memorable than creative ones. Modern IDEs make it easier to use longer names so feel fee to be descriptive. Getting creative won't help as much as getting accurate.
I recommend to pick nouns from a specific application domain. E.g. if you are putting cars in a tree, call the node class Car - the fact that it is also a node should be apparent from the API. Also, don't try to be too generic in your implementation - don't put all attributes of the car into a hashtable named properties, but create separate attributes for make, color, etc.
A lot of languages and coding styles like to use all sorts of descriptive prefixes. In PHP there are no clear types, so this may help greatly. Instead of doing
$isAvailable = true;
try
$bool_isAvailable = true;
It is admittedly a pain, but usually well worth the time.
I also like to use long names to describe things. It may seem strange, but is usually easier to remember, especially when I go back to refactor my code
$leftNode->properties < $leftTreeNode->arrayOfNodeProperties;
And if all else fails. Why not fall back on a solid star wars themed program.
$luke->lightsaber($darth[$ewoks]);
And lastly, in college I named my classes after my professor, and then my class methods all the things I wanted to do to that jerk.
$Kube->canEat($myShorts, $withKetchup);

Object Normalization

In the same line as Database Normalization - is there an approach to object normalization, not design pattern, but the same mathematical like approach to normalizing object creation. For example: first normal form: no repeating fields....
here's some links to DB Normalization:
http://en.wikipedia.org/wiki/Database_normalization
http://databases.about.com/od/specificproducts/a/normalization.htm
Would this make object creation and self-documentation better?
Here's a link to a book about class normalization (guess we're really talking about classes)
http://www.agiledata.org/essays/classNormalization.html
Normalization has a mathematical foundation in predicate logic, and a clear and specific goal that the same piece of information never be represented twice in a single model; the purpose of this goal is to eliminate the possibility of inconsistent information in a data model. It can be shown via mathematical proof that if a data model has certain specific properties (that it passes tests for 1st Normal Form (1NF), 2NF, 3NF, etc.) that it is free from redundant data representation, i.e. it is Normalized.
Object orientation has no such underlying mathematical basis, and indeed, no clear and specific goal. It is simply a design idea for introducing more abstraction. The DRY principle, Command-Query Separation, Liskov Substitution Principle, Open-Closed Principle, Tell-Don't-Ask, Dependency Inversion Principle, and other heuristics for improving quality of code (many of which apply to code in general, not just object oriented programs) are not absolute in nature; they are guidelines that programmers have found useful in improving understandability, maintainability, and testability of their code.
With a relational data model, you can say with absolute certainty whether it is "normalized" or not, because it must pass ALL the tests for normal form, and they are quite specific. With an object model, on the other hand, because the goal of "understandable, maintainable, testable, etc" is rather vague, you cannot say with any certainty whether you have met that goal. With many of the design heuristics, you cannot even say for sure whether you have followed them. Have you followed the DRY principle if you're applying patterns to your design? Surely repeated use of a pattern isn't DRY? Furthermore, some of these heuristics or principles aren't always even necessarily good advice all the time. I do try to follow Command-Query Separation, but such useful things as a Stack or a Queue violate that concept in order to give us a rather elegant and useful result.
I guess the Single Responsible Principle is at least related to this. Or at least, violation of the SRP is similar to a lack of normalization in some ways.
(It's possible I'm talking rubbish. I'm pretty tired.)
Interesting.
You may also be interested in looking at the Law of Demeter.
Another thing you may be interested in is c2's FearOfAddingClasses, as, arguably, the same reasoning that lead programmers to denormalise databases also leads to god classes and other code smells. For both OO and DB normalisation, we want to decompose everything. For databases this means more tables, for OO, more classes.
Now, it is worth bearing in mind the object relational impedance mismatch, that is, probably not everything will translate cleanly.
Object relational models or 'persistence layers', usually have 1-to-1 mappings between object attributes and database fields. So, can we normalise? Say we have department object with employee1, employee2 ... etc. attributes. Obviously that should be replaced with a list of employees. So we can say 1NF works.
With that in mind, let's go straight for the kill and look at 6NF database design, a good example is Anchor Modeling, (ignore the naming convention). Anchor Modeling/6NF provides highly decomposed and flexible database schemas; how does this translate to OO 'normalisation'?
Anchor Modeling has these kinds of relationships:
Anchors - unique object IDs.
Attributes, which translate to object attributes: (Anchor, value, metadata).
Ties - relationships between two or more objects (themselves anchors): (Anchor, Anchor... , metadata)
Knots, attributed Ties.
Attribute metadata can be anything - who changed an attribute, when, why, etc.
The OO translation of this is looks extremely flexible:
Anchors suggest attribute-less placeholders, like a proxy which knows how to deal with the attribute composition.
Attributes suggest classes representing attributes and what they belong to. This suggests applying reuse to how attributes are looked up and dealt with, e.g automatic constraint checking, etc. From this we have a basis to generically implement the GOF-style Structural patterns.
Ties and Knots suggest classes representing relationships between objects. A basis for generic implementation of the Behavioural design patterns?
Interesting and desirable properties of Anchor Modeling that also translate across are:
All this requires replacing inheritance with composition (good) in the exposed objects.
Attribute have owners rather than owners having attributes. Although this make attribute lookup more complex, it neatly solves certain aliasing problems, as there can only ever be one owner.
No need for NULL. This translates to clearer NULL handling. Empty-case attribute classes could provide methods for handling the lack of a particular attribute, instead of performing NULL-checking everywhere.
Attribute metadata. Attribute-level full historisation and blaming: 'play' objects back in time, see what changed, when and why, etc. (if required - metadata is entirely optional)
There would probably be a lot of very simple classes (which is good), and a very declarative programming style (also good).
Thanks for such a thought provoking question, I hope this is useful for you.
Perhaps you're taking this from a relational point-of-view, but I would posit that the principles of interfaces and inheritance correspond to normalization in the world of OOP.
For example, a Person abstract class containing FirstName, LastName, Gender and BirthDate can be used by classes such as Employee, User, Member etc. as a valid base class, without a need to repeat the definitions of those attributes in such subclasses.
The principle of DRY, (a core principle of Andy Hunt and Dave Thomas's book The Pragmatic Programmer), and the constant emphasis of object-oriented programming on re-use, also correspond to the efficiencies offered by Normalization in relational databases.
At first glance, I'd say that the objectives of Code Refactoring are similar in an abstract way to the objectives of normalization. But that's pretty abstract.
Update: I almost wrote earlier that "we need to get Jon Skeet in on this one." I posted my answer and who beat me? You guessed it...
Object Role Modeling (not to be confused with Object Relational Mapping) is the closest thing I know of to normalization for objects. It doesn't have as mathematical a foundation as normalization, but it's a start.
In a fairly ad-hoc and untutored fashion, that will probably cause purists to scoff, and perhaps rightly, I think of a database table as being a set of objects of a particular type, and vice versa. Then I take my thoughts from there. Viewed this way, it doesn't seem to me like there's anything particularly special you have to do to use normal form in your everyday programming. Each object's identity will do for starters as its primary key, and references (pointers, etc.) will do by way of foreign keys. Then just follow the same rules.
(My objects usually end up in 3NF, or some approximation thereof. I treat this all more as guidelines, and, like I said, "untutored".)
If the rules are followed properly, each bit of information then ends up in one place, the interrelationships are clear, and everything is structured such that introducing inconsistencies takes some work. One could say that this approach produces good results on this basis, and I would agree.
One downside is that the end result can feel a bit like a tangle of spaghetti, particularly after some time away, and it's hard to shake the constant lingering sensation, even though it's usually false, that surely a few of all these links could be removed...
Object oriented design is rational but it does not have the same mathematically well-defined basis as the Relational Model. There is nothing exactly equivalent to the well-defined normal forms of database design.
Whether this is a strength or a weakness of Object oriented design is a matter of interpretation.
I second the SRP. The Open Closed Principle applies as well to "normalization" although I might stretch the meaning of the word, in that it should be possible to extend the system by adding new implementations, without modifying the existing code. objectmentor about OCP
good question, sorry i can't answer in depth
I've been working on object normalization off and on for over 20 years. It's deep and complicated and beautiful, and is the subject of my second planned book, Object Mechanics II. ONF = Object Normal Form, you heard it here first! ;-)
since potentially patentable technology lurks within, I am not at liberty to say more, except that normalizing the data is the really easy part ;-)
ADDENDUM: change of plans - see https://softwareengineering.stackexchange.com/questions/84598/object-oriented-normalization