When is it worth creating an XML Schema (XSD)?

When is it worth creating an XML Schema (XSD)? - html

I'm curious of thoughts on how one would want to apply schema to a list of completed projects, for example, a listing of projects that were completed by an architecture firm.
So let's say you have a list of projects that were completed, consisting of information such as the date, location, description, etc.
I don't know if it is necessarily considered a Creative Work, or a place. I'm considering using the general ItemList/Item properties but not sure if there is much value in it. So having said that, would anyone expect this to be beneficial or worth doing?

You tagged your question with seo and html, but those are not reasons to define an XML Schema (XSD). You mention "Creative Work", but that concept is also immaterial to XSDs and their benefits.
The Value of Defining an XSD
There is value in defining an XSD in so far as you, or the software you use, or the partners with whom you exchange data would benefit from the clear definition and automatic validation of structured documents.
If these reasons are not relevant to your work, you probably don't need to define an XSD.

Related

JSON design for a web service - how to express attributes

On my team there seem to be two schools of thought in designing our restful interface. This is a series of endpoints that themselves are a product we sell, so the customer interface is JSON. The question is how to express attributes of an object. My thoughts are to have attributes as explicit fields for a resource like this:
{
"myThing": [
{
"color": "blue",
"number": "212"
}
]
}
With this approach, you can understand the object by knowing its attributes, and it feels more elegant as well as explicit. Documentation is straight-forward for business people who are prospective customers.
However, there is a school of thought from some of our developers that the following is the preferred way to express attributes - I believe because we have a lot of attributes and it's easier to manage, though there may be other benefits. However, I don't see the architecture of this information as being as user-friendly or explicit, and requires cross-referencing documentation more extensively.
{
"myThing": [
{
"attribute": "color",
"value": "blue"
},
{
"attribute": "number",
"value": "212"
}
]
}
My question - aside from feeling the first approach is more intuitive, finding best practices or a persuasive argument for one approach over another is difficult. Can anyone point me to some best practice JSON design that would favor one over the other? Thanks!

Oh this is simple to qualify in some basic terms, however both can work.The trick is deciding which one is going to be more successful? It helps to think of successful vs unsuccessful, the whole right or wrong argument is very much based on personal opinions and is such a massive waste of time. Never argue this is the right way to do it with tech guys you will have an easier time herding cats.
So lets break down the model into Pro And Cons.
Approach one is a traditional model:
Pro:
Objects and attributes are well defined.
Attributes can be validated as they are known i.e. this a date.
Easier to understand and document.
Cons:
Attributes are static and expanding the attribute set will require us to update a code and potentially introduce problems with clients as they are aware of the attributes.
Approach two which is known as the Entity Attribute Value design pattern(a BIG no no in relation database design by the way).
Pro:
Adding attributes to an object does not require that much changes. Also its unlikely to break clients etc.
Cons:
Documentation is harder.
Data validation is much harder, the colour attribute in your example should it be a string like Blue? Or a HEX like #ffae12? Maybe CMYK? Its get harder to validate and document these type of things.
Requires developers/customers to understand the mechanism to set and get values i.e. it break the normal programming paradigms.
How to choose which model to use?
The principle here is to know your data. If the objects have loads of different attributes and they are pretty dynamic i.e. you add or remove attributes a lot of the time then approach 2 is better. Social media type data for example is a good fit.
If however you are dealing with financial transaction data i.e. debits, credits etc you want the highest validation possible and besides making payments and receiving payments have not changed it's unlikely that you would introduce a third account number to a credit transaction object. Not saying it is impossible but highly unlikely.
As always keep an open mind and use the most appropriate tool for the job when you need it. Don't blindly follow a school of thought. Understand your tools, your data and then it's rather easy to innovate.

How to analyze information from the comments of users on my site?

Can anybody suggest a way to process the information and analyze the data from the comments users post on a article in my website.
I exactly want to process the comments as follows:
Example: Like on a article on computerization may get the following comments:
I love computerization as it makes the work easier.
Computerization is spreading unemployment as 1 computer can work better than 4 people.
How I process this information -
: I take the comments and try to recognize some predefined[and extensible] keywords in it.

Assuming that you are trying to extract some useful information from the comments, you could apply some machine learning to the comments to classify or categorize the data contained within, the sentiments etc.
There are number of different types of learning you can do on the text, however I personally recommend using support vector machines or a naive bayes classifier to be able to categorize and analyze the comments. You could also possibly use clustering, but there needs to be an element of natural language processing in the solution you choose. There are number of different libraries that you can use to implement the code to use either, i.e. svmlight, javaml, etc. I have personally used javaml and it is a good library.

Using Semantic MediaWiki for tabular data

Am I completely off-track to think about using Semantic MediaWiki to store (and organise, report on, etc.) 'tabular' data such as financial transactions or weather readings that would usually live in a spreadsheet or database?
It seems that one would need a separate, tiny, page for each tuple; but then, that's by design and perhaps it's perfectly okay.
I ask, simply because SMW seems like such a quick and easy way to get a collaborative data repository up and running.

Semantic MediaWiki is better suited for keeping track of Factual or Encyclopedic data, where you can have pages about everything you need to know about a certain topic.
For tabular or numerical data such as measurements, financial, sensor data, you would indeed need to create little pages about each data point, which is not practical in many cases.
However, there are extensions to Media Wiki that allow you to integrate external data sources (in MySQL databases or CSV files somewhere) with MediaWiki pages. This can allow you to have the best of both worlds - dynamic access and queries of tabular data and semantic annotations of pages around them.
Take a look at :
http://www.mediawiki.org/wiki/Extension:External_Data

No, I don't think it's such a bad idea.
Using SemanticForms you could enter lots of little data pages quickly and easily (for example, an invoice might require additional pages for each line item, but they could all be entered from one form using the 'multiple' feature of the for template form tag). So although I've never tried logging weather data in SMW, I think it would be pretty easy. I don't see what the problem would be with storing data across so many pages; it's easy enough to combine it in whatever formats you require.
Give it a go and let us know how it goes!

You can use either the Semantic Internal Objects extension (SIO), or SMW's built in subobjects (the former works well with the already mentioned External Data extension), to store multiple semantic objects (could be the rows of your spreadsheet) in one page.
However, unless you are really looking for a collaborative tool with semantic capabilities, I doubt SMW is the best suited piece of software for your task.
edit (november 2015): Since SMW version 1.9, there nothing that SIO can do that the built-in subobjects can't, so I would recommend the latter.

Creative Terminology

I seem to use bland words such as node, property, children (etc) too often, and I fear that someone else would have difficulty understanding my code simply because the parts' names are vague, common words.
How do you find creative names for classes and components to make them more memorable?
I am particularly having trouble with generic tools which have no real description except their rather generic functional purpose. I would like to know if others have found creative ways to name things rather than simply naming them by their utility, such as AnonymousFunctionWrapperCallerExecutorFactory.

It's hard to answer. I find them just because they seem to 'fit'.
What I do know, however, is that I find it basically impossible to move on writing code unless something is named correctly, and it 'feels' good. If it isn't named right, I find it hard to use, and the code is generally confusing.
I'm not too concerned about something being 'memorable', only 'accurate'.
I have been known to sit around thinking out loud about what to name something. Take your time, and make sure you are really happy with the name. don't be afraid of using common/simple words.

I don't really have an answer, but three things for you to think about.
The late Phil Karlton famously said: "There are only two hard problems in computer science. Cache Invalidation and Naming Things." So, the fact that you are having trouble coming up with good names is entirely normal and even expected.
OTOH, having trouble naming things can also be a sign of bad design. (And yes, I am perfectly aware, that #1 and #2 contradict each other. Or maybe one should think of it more like balancing each other.) E.g., if a thing has too many responsibilities, it is pretty much impossible to come up with a good name. (Witness all the "Service", "Util", "Model" and "Manager" classes in bad OO designs. Here's an example Google Code Search for "ManagerFactoryFactory".)
Also, your names should map to the domain jargon used by subject matter experts. If you can't find a subject matter expert, that's a sign that you are currently worrying about code that you're not supposed to worry about. (Basically, code that implements your core business domain should be implemented and designed well, code in ancillary domains should be implemented and designed so-so, and all other code should not be implemented or designed at all, but bought from a vendor, where what you are buying is their core business domain. [Please interpret "buy" and "vendor" liberally. Community-developed Free Software is just fine.])
Regarding #3 above, you mentioned in another comment that you are currently working on implementing a tree data structure. Unless your company is in the business of selling tree data structures, that is not a part of your core domain. And the reason that you have trouble finding good names could be that you are working outside your core domain. Now, "selling tree data structures" may sound stupid, but there are actually companies that do that. For example, the BCL team inside Microsoft's developer division: they actually sell (well, for certain definitions of "sell", anyway) the .NET framework's Base Class Libraries, which include, among others, tree data structures. But note that for example Microsoft's C++ compiler team actually (literally) buys their STL from a third-party vendor – they figure that their core domain is writing compilers, and they leave the writing of libraries to a company who considers writing STLs their core domain. (And indeed, AFAIK, that company does nothing but write and sell STL implementations. That's their sole product.)
If, however, selling tree data structures is your core domain, then the names you listed are just fine. They are the names that subject matter experts (programmers, in this case) use when talking about the domain of tree data structures.

Using 'metaphors' is a common theme in agile (and pattern) literature.
'Children' (in your question) is an example of a metaphor that is extensively used and for good reasons.
So, I'd encourage the use of metaphors, provided they are applicable and not a stretch of the imagination.
Metaphors are everywhere in computing. From files to bugs to pointers to streams... you can't avoid them.

I believe that for the purpose of standardization and communication, it's good to use a common vocab, like in the same case for design patterns. I have a problem with a programmer who keeps 'inventing' his own terms and I have trouble understanding him. (He kept using the term 'events orchestrating' instead of 'scripting' or 'FCFS process'. Kudos for creativity though!)
Those common vocab describe stuff we are used to. A node is a point, somewhere in a graph, in a tree, or what-not. One way is to be specific to the domain. If we are doing a mapping problem, instead of 'node', we can use 'location'. That helps in a sense, at least for me. So I find there is a need to balance being able to communicate with other programmers, and at the same time keeping the descriptor specific enough to help me remember what it does.

I think node, children, and property are great names. I can already guess the following about your classes, just by their "bland" names:
Node - this class is part of a graph of objects
children - this variable holds a list of nodes belonging to the containing node.
I don't think "node" is either vague or common, and if you're coding a generic data structure, it's probably ok to have generic names! (With that being said, if you are coding up a tree, you could use something like TreeNode to emphasize that the node is part of a tree.) One way you can make the life of developers who will use your API easier is to follow the naming conventions of your platform's built in libraries. If everyone calls a node a node, and an iterator an iterator, it makes life easy.

Names that reflect the purpose of the class, method or property are more memorable than creative ones. Modern IDEs make it easier to use longer names so feel fee to be descriptive. Getting creative won't help as much as getting accurate.

I recommend to pick nouns from a specific application domain. E.g. if you are putting cars in a tree, call the node class Car - the fact that it is also a node should be apparent from the API. Also, don't try to be too generic in your implementation - don't put all attributes of the car into a hashtable named properties, but create separate attributes for make, color, etc.

A lot of languages and coding styles like to use all sorts of descriptive prefixes. In PHP there are no clear types, so this may help greatly. Instead of doing
$isAvailable = true;
try
$bool_isAvailable = true;
It is admittedly a pain, but usually well worth the time.
I also like to use long names to describe things. It may seem strange, but is usually easier to remember, especially when I go back to refactor my code
$leftNode->properties < $leftTreeNode->arrayOfNodeProperties;
And if all else fails. Why not fall back on a solid star wars themed program.
$luke->lightsaber($darth[$ewoks]);
And lastly, in college I named my classes after my professor, and then my class methods all the things I wanted to do to that jerk.
$Kube->canEat($myShorts, $withKetchup);

Object Normalization

In the same line as Database Normalization - is there an approach to object normalization, not design pattern, but the same mathematical like approach to normalizing object creation. For example: first normal form: no repeating fields....
here's some links to DB Normalization:
http://en.wikipedia.org/wiki/Database_normalization
http://databases.about.com/od/specificproducts/a/normalization.htm
Would this make object creation and self-documentation better?
Here's a link to a book about class normalization (guess we're really talking about classes)
http://www.agiledata.org/essays/classNormalization.html

Normalization has a mathematical foundation in predicate logic, and a clear and specific goal that the same piece of information never be represented twice in a single model; the purpose of this goal is to eliminate the possibility of inconsistent information in a data model. It can be shown via mathematical proof that if a data model has certain specific properties (that it passes tests for 1st Normal Form (1NF), 2NF, 3NF, etc.) that it is free from redundant data representation, i.e. it is Normalized.
Object orientation has no such underlying mathematical basis, and indeed, no clear and specific goal. It is simply a design idea for introducing more abstraction. The DRY principle, Command-Query Separation, Liskov Substitution Principle, Open-Closed Principle, Tell-Don't-Ask, Dependency Inversion Principle, and other heuristics for improving quality of code (many of which apply to code in general, not just object oriented programs) are not absolute in nature; they are guidelines that programmers have found useful in improving understandability, maintainability, and testability of their code.
With a relational data model, you can say with absolute certainty whether it is "normalized" or not, because it must pass ALL the tests for normal form, and they are quite specific. With an object model, on the other hand, because the goal of "understandable, maintainable, testable, etc" is rather vague, you cannot say with any certainty whether you have met that goal. With many of the design heuristics, you cannot even say for sure whether you have followed them. Have you followed the DRY principle if you're applying patterns to your design? Surely repeated use of a pattern isn't DRY? Furthermore, some of these heuristics or principles aren't always even necessarily good advice all the time. I do try to follow Command-Query Separation, but such useful things as a Stack or a Queue violate that concept in order to give us a rather elegant and useful result.

I guess the Single Responsible Principle is at least related to this. Or at least, violation of the SRP is similar to a lack of normalization in some ways.
(It's possible I'm talking rubbish. I'm pretty tired.)

Interesting.
You may also be interested in looking at the Law of Demeter.
Another thing you may be interested in is c2's FearOfAddingClasses, as, arguably, the same reasoning that lead programmers to denormalise databases also leads to god classes and other code smells. For both OO and DB normalisation, we want to decompose everything. For databases this means more tables, for OO, more classes.
Now, it is worth bearing in mind the object relational impedance mismatch, that is, probably not everything will translate cleanly.
Object relational models or 'persistence layers', usually have 1-to-1 mappings between object attributes and database fields. So, can we normalise? Say we have department object with employee1, employee2 ... etc. attributes. Obviously that should be replaced with a list of employees. So we can say 1NF works.
With that in mind, let's go straight for the kill and look at 6NF database design, a good example is Anchor Modeling, (ignore the naming convention). Anchor Modeling/6NF provides highly decomposed and flexible database schemas; how does this translate to OO 'normalisation'?
Anchor Modeling has these kinds of relationships:
Anchors - unique object IDs.
Attributes, which translate to object attributes: (Anchor, value, metadata).
Ties - relationships between two or more objects (themselves anchors): (Anchor, Anchor... , metadata)
Knots, attributed Ties.
Attribute metadata can be anything - who changed an attribute, when, why, etc.
The OO translation of this is looks extremely flexible:
Anchors suggest attribute-less placeholders, like a proxy which knows how to deal with the attribute composition.
Attributes suggest classes representing attributes and what they belong to. This suggests applying reuse to how attributes are looked up and dealt with, e.g automatic constraint checking, etc. From this we have a basis to generically implement the GOF-style Structural patterns.
Ties and Knots suggest classes representing relationships between objects. A basis for generic implementation of the Behavioural design patterns?
Interesting and desirable properties of Anchor Modeling that also translate across are:
All this requires replacing inheritance with composition (good) in the exposed objects.
Attribute have owners rather than owners having attributes. Although this make attribute lookup more complex, it neatly solves certain aliasing problems, as there can only ever be one owner.
No need for NULL. This translates to clearer NULL handling. Empty-case attribute classes could provide methods for handling the lack of a particular attribute, instead of performing NULL-checking everywhere.
Attribute metadata. Attribute-level full historisation and blaming: 'play' objects back in time, see what changed, when and why, etc. (if required - metadata is entirely optional)
There would probably be a lot of very simple classes (which is good), and a very declarative programming style (also good).
Thanks for such a thought provoking question, I hope this is useful for you.

Perhaps you're taking this from a relational point-of-view, but I would posit that the principles of interfaces and inheritance correspond to normalization in the world of OOP.
For example, a Person abstract class containing FirstName, LastName, Gender and BirthDate can be used by classes such as Employee, User, Member etc. as a valid base class, without a need to repeat the definitions of those attributes in such subclasses.
The principle of DRY, (a core principle of Andy Hunt and Dave Thomas's book The Pragmatic Programmer), and the constant emphasis of object-oriented programming on re-use, also correspond to the efficiencies offered by Normalization in relational databases.

At first glance, I'd say that the objectives of Code Refactoring are similar in an abstract way to the objectives of normalization. But that's pretty abstract.
Update: I almost wrote earlier that "we need to get Jon Skeet in on this one." I posted my answer and who beat me? You guessed it...

Object Role Modeling (not to be confused with Object Relational Mapping) is the closest thing I know of to normalization for objects. It doesn't have as mathematical a foundation as normalization, but it's a start.

In a fairly ad-hoc and untutored fashion, that will probably cause purists to scoff, and perhaps rightly, I think of a database table as being a set of objects of a particular type, and vice versa. Then I take my thoughts from there. Viewed this way, it doesn't seem to me like there's anything particularly special you have to do to use normal form in your everyday programming. Each object's identity will do for starters as its primary key, and references (pointers, etc.) will do by way of foreign keys. Then just follow the same rules.
(My objects usually end up in 3NF, or some approximation thereof. I treat this all more as guidelines, and, like I said, "untutored".)
If the rules are followed properly, each bit of information then ends up in one place, the interrelationships are clear, and everything is structured such that introducing inconsistencies takes some work. One could say that this approach produces good results on this basis, and I would agree.
One downside is that the end result can feel a bit like a tangle of spaghetti, particularly after some time away, and it's hard to shake the constant lingering sensation, even though it's usually false, that surely a few of all these links could be removed...

Object oriented design is rational but it does not have the same mathematically well-defined basis as the Relational Model. There is nothing exactly equivalent to the well-defined normal forms of database design.
Whether this is a strength or a weakness of Object oriented design is a matter of interpretation.

I second the SRP. The Open Closed Principle applies as well to "normalization" although I might stretch the meaning of the word, in that it should be possible to extend the system by adding new implementations, without modifying the existing code. objectmentor about OCP

good question, sorry i can't answer in depth
I've been working on object normalization off and on for over 20 years. It's deep and complicated and beautiful, and is the subject of my second planned book, Object Mechanics II. ONF = Object Normal Form, you heard it here first! ;-)
since potentially patentable technology lurks within, I am not at liberty to say more, except that normalizing the data is the really easy part ;-)
ADDENDUM: change of plans - see https://softwareengineering.stackexchange.com/questions/84598/object-oriented-normalization

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008