Related
I'm thinking of introducing a strongly typed (read - with predefined schema) data interchange format for communication between our internal services. For example, I guess something like Thrift or Cap'n Proto.
At least two obvious advantages (to me) of using this over something like JSON is that
you would KNOW the exact format of the data the service can expect (so leaves less room for ambiguity and errors while communicating) and
the implementation generally deserializes the raw message for you and it provides methods for accessing the objects.
What are the practical disadvantages for going this route, versus something like JSON?
For context - our system consists of services written in python and java - and possibly other languages in the future, and communicates via HTTP endpoints between services and message brokers like rabbitmq.
As with every strongly typed system, one of the major advantanges is without a doubt that if you make mistakes, it fails early in the process, typically at the compilation stage, which is a good thing.
Second biggest advantage IMHO is what you already said: because the fields and types are well known, the compiler, libraries and related code know what data to expect and thus can be written/organized in a more efficient manner - or in short: performance.
In contrast, a losely typed system (like Avro), while allowing for much greater flexibility without the need of recompiling, comes with the other side of the same coin: the downside of being prone to errors regarding the contents of the message at runtime.
This is because a losely defined system defines only the syntax of a valid document (like for example XML) and leaves the message-level semantics of what's in the document up to the upper layers. A strongly typed system has the knowledge about those message-level semantics already built in at compile time. Therefore, it is easy to detect/decide whether a particular document or message is not only well-formed but valid with regard to the message contents. If you need to do the same with the losely defined system, you need to provide additional information at runtime (like XML schema) and validate your document against it.
Bottom line
What system you prefer is more or less a matter of taste in most cases. I'd make the decision based on the question, how variable the data are that I have to deal with. If it makes sense to use a strongly typed system, I'd go that way, because I like it very much to get informed about errors and mistakes early.
However, if there is a need for very flexible data structures, it may make more sense to go the other road. Although designing a losely typed schema on top of a strongly typed system is surely possible, it is somewhat contradicting and you'll end up with some overly complicated, while overly generic, thing.
Typed
Incoming messages that are type tagged is very liberating, so long as it's possible to tell what the incoming message is without reading all of it. If so then you no longer care so much about message order. This is because it's easy for the recipient of the messages to handle whatever it is sent. So you can have an application which just sits there taking whatever it gets, and just does whatever is appropriate for each one.
Format
A schema language that allows you to define value and size constraints is very useful. It means that the sender of a message cannot accidentally send an invalid one. Moreover the receiver can automatically tell if an incoming message meets the schema. This is a real bonus in implementing a network service; the vast bulk of the message validation is done for you!
By size constraint, I mean that you can specify how long an array is in the schema and the generated code will refuse to handle arrays longer or shorter. By value constraints, imagine a message field called "bearing"; you might want to constrain that to be between 0 and 359.
These both allow you to make a clear, unambiguous statement about what the interface is and have it enforced automatically. How many security bugs have there been recently where some network interface data validation has been badly implemented...
Options
One serialisation standard that does all this is ASN.1. The tools I've used take an ASN.1 schema and produce code to serialise and deserialise, automatically checking that the value and size constraints have been met and also telling you what an incoming message type is. The tools for ASN.1 can be quite elderly and are in need of updating. If updated it would be ideal for every purpose, with both binary and text wire formats available.
There's now JSON schemas too, and they seem to have type, value and size constraints. This might be what you're looking for.
I'm fairly sure that Google Protocol Buffers doesn't do type tagging very well, and doesn't do value and size constraints. I've seen comments in GPB schema along the lines of:
// musn't be greater than 10.
If that's what is being written into a schema, the schema language is arguably inadequate...
I'm not sure of Thrift, I'm not sure it does value constraints (someone correct me if I'm wrong please!).
Disadvantages
Can't think of any! It can irritate developers; code they thought was good can be readily revealed to be producing junk messages, which annoys them intensely...
what is the meaning of this interfaces? even if we implement an interface on a class, we have to declare it's functionality again and again each time we implement it on a different class, so what is the reason of interfaces exist on as3 or any other languages which has interface.
Thank you
I basically agree with the answers posted so far, just had a bit to add.
First to answer the easy part, yes other languages have interfaces. Java comes to mind immediately but I'm pretty sure all OOP languages (C++, C#, etc.) include some mechanism for creating interfaces.
As stated by Jake, you can write interfaces as "contracts" for what will be fulfilled in order to separate work. To take a hypothetical say I'm working on A and you're working on C, and bob is working on B. If we define B' as an interface for B, we can quickly and relatively easily define B' (relative to defining B, the implementation), and all go on our way. I can assume that from A I can code to B', you can assume from C you can code to B', and when bob gets done with B we can just plug it in.
This comes to Jugg1es point. The ability to swap out a whole functional piece is made easier by "dependency injection" (if you don't know this phrase, please google it). This is the exact thing described, you create an interface that defines generally what something will do, say a database connector. For all database connectors, you want it to be able to connect to database, and run queries, so you might define an interface that says the classes must have a "connect()" method and a "doQuery(stringQuery)." Now lets say Bob writes the implementation for MySQL databases, now your client says well we just paid 200,000 for new servers and they'll run Microsoft SQL so to take advantage of that with your software all you'd need to do is swap out the database connector.
In real life, I have a friend who runs a meat packing/distribution company in Chicago. The company that makes their software/hardware setup for scanning packages and weighing things as they come in and out (inventory) is telling them they have to upgrade to a newer OS/Server and newer hardware to keep with the software. The software is not written in a modular way that allows them to maintain backwards compatibility. I've been in this boat before plenty of times, telling someone xyz needs to be upgraded to get abc functionality that will make doing my job 90% easier. Anyhow guess point being in the real world people don't always make use of these things and it can bite you in the ass.
Interfaces are vital to OOP, particularly when developing large applications. One example is if you needed a data layer that returns data on, say, Users. What if you eventually change how the data is obtained, say you started with XML web services data, but then switched to a flat file or something. If you created an interface for your data layer, you could create another class that implements it and make all the changes to the data layer without ever having to change the code in your application layer. I don't know if you're using Flex or Flash, but when using Flex, interfaces are very useful.
Interfaces are a way of defining functionality of a class. it might not make a whole lot of sense when you are working alone (especially starting out), but when you start working in a team it helps people understand how your code works and how to use the classes you wrote (while keeping your code encapsulated). That's the best way to think of them at an intermediate level in my opinion.
While the existing answers are pretty good, I think they miss the chief advantage of using Interfaces in ActionScript, which is that you can avoid compiling the implementation of that Interface into the Main Document Class.
For example, if you have an ISpaceShip Interface, you now have a choice to do several things to populate a variable typed to that Interface. You could load an external swf whose main Document Class implements ISpaceShip. Once the Loader's contentLoaderInfo's COMPLETE event fires, you cast the contentto ISpaceShip, and the implementation of that (whatever it is) is never compiled into your loading swf. This allows you to put real content in front of your users while the load process happens.
By the same token, you could have a timeline instance declared in the parent AS Class of type ISpaceShip with "Export for Actionscript in Frame N *un*checked. This will compile on the frame where it is first used, so you no longer need to account for this in your preloading time. Do this with enough things and suddenly you don't even need a preloader.
Another advantage of coding to Interfaces is if you're doing unit tests on your code, which you should unless your code is completely trivial. This enables you to make sure that the code is succeeding or failing on its own merits, not based on the merits of the collaborator, or where the collaborator isn't appropriate for a test. For example, if you have a controller that is designed to control a specific type of View, you're not going to want to instantiate the full view for the test, but only the functionality that makes a difference for the test.
If you don't have support in your work situation for writing tests, coding to interfaces helps make sure that your code will be testable once you get to the point where you can write tests.
The above answers are all very good, the only thing I'd add - and it might not be immediately clear in a language like AS3, where there are several untyped collection classes (Array, Object and Dictionary) and Object/dynamic classes - is that it's a means of grouping otherwise disparate objects by type.
A quick example:
Image you had a space shooter, where the player has missiles which lock-on to various targets. Suppose, for this purpose, you wanted any type of object which could be locked onto to have internal functions for registering this (aka an interface):
function lockOn():void;//Tells the object something's locked onto it
function getLockData():Object;//Returns information, position, heat, whatever etc
These targets could be anything, a series of totally unrelated classes - enemy, friend, powerup, health.
One solution would be to have them all to inherit from a base class which contained these methods - but Enemies and Health Pickups wouldn't logically share a common ancestor (and if you find yourself making bizarre inheritance chains to accomodate your needs then you should rethink your design!), and your missile will also need a reference to the object its locked onto:
var myTarget:Enemy;//This isn't going to work for the Powerup class!
or
var myTarget:Powerup;//This isn't going to work for the Enemy class!
...but if all lockable classes implement the ILockable interface, you can set this as the type reference:
var myTarget:ILockable;//This can be set as Enemy, Powerup, any class which implements ILockable!
..and have the functions above as the interface itself.
They're also handy when using the Vector class (the name may mislead you, it's just a typed array) - they run much faster than arrays, but only allow a single type of element - and again, an interface can be specified as type:
var lockTargets:Vector.<Enemy> = new Vector.<Enemy>();//New array of lockable objects
lockTargets[0] = new HealthPickup();//Compiler won't like this!
but this...
var lockTargets:Vector.<ILockable> = new Vector.<ILockable>();
lockTargets[0] = new HealthPickup();
lockTargets[1] = new Enemy();
Will, provided Enemy and HealthPickup implement ILockable, work just fine!
I found it interesting to read on one of the ways that you can do functional dynamic dispatch in sicp - using a table of type tag + name -> functions that you can fetch from or add to.
I was wondering, is this a typical type dispatch mechanism for a dynamic non OO language?
Also what would be the typical way to monkey path this, using a chaining list of tables(if you don't find it in the first table try next table recursively)? Rebind the table within local scope to a modified copy? ect?
I believe this is a typical type dispatch mechanism, even for non-dynamic non-OO languages, based on this article about the JHC Haskell compiler and how it implements type classes. The implication in the article is that most Haskell compilers implement type classes (a kind of type dispatch) by passing dictionaries. His alternative is direct case analysis, which likely would not be applicable in dynamically typed languages, since you don't know ahead of time what the types of the constituents of your expression will be. On the other hand, this isn't extensible at run-time either.
As for dynamic non-OO languages, I'm not aware of many examples outside Lisp/Scheme. Common Lisp's CLOS makes Lisp a proper OO language and provides dynamic dispatch as well as multiple dispatch (you can add or remove generics and methods at run-time, and they can key off the type of more than just the first parameter). I don't know how this is usually implemented, but I do know that it is usually an add-on facility rather than a built-in facility, which implies it's using functionality available to the would-be monkey-patcher, and also that certain versions have been criticized for their lack of speed (CLISP, I think, but they may have resolved this). Of course, you could implement this type of parallel dispatch mechanism within an OO language as well, and you can probably find plenty of examples of that.
If you were using purely-functional persistent maps or dictionaries, you could certainly implement this faculty without even needing the chain of inherited maps; as you "modify" the map, you get a new map back, but all the existing references to the old map would still be valid and see it as the old version. If you were implementing a language with this facility you could interpret it by putting the type->function map in the Reader monad and wrapping your interpreter in it.
If one has a object which can persist itself across executions (whether to a DB using ORM, using something like Python's shelve module, etc), should validation of that object's attributes be placed within the class representing it, or outside?
Or, rather; should the persistent object be dumb and expect whatever is setting it's values to be benevolent, or should it be smart and validate the data being assigned to it?
I'm not talking about type validation or user input validation, but rather things that affect the persistent object such as links/references to other objects exist, ensuring numbers are unsigned, that dates aren't out of scope, etc.
Validation is a part of the encapsulation- an object is responsible for it's internal state, and validation is part of it's internal state.
It's like asking "should I let an object do a function and set his own variables or should I user getters to get them all, do the work in an external function and then you setters to set them back?"
Of course you should use a library to do most of the validation- you don't want to implement the "check unsigned values" function in every model, so you implement it at one place and let each model use it in his own code as fit.
The object should validate the data input. Otherwise every part of the application which assigns data has to apply the same set of tests, and every part of the application which retrieves the persisted data will need to handle the possibility that some other module hasn't done their checks properly.
Incidentally I don't think this is an object-oriented thang. It applies to any data persistence construct which takes input. Basically, you're talking Design By Contract preconditions.
My policy is that, for a global code to be robust, each object A should check as much as possible, as early as possible. But the "as much as possible" needs explanation:
The internal coherence of each field B in A (type, range in type etc) should be checked by the field type B itself. If it is a primitive field, or a reused class, it is not possible, so the A object should check it.
The coherence of related fields (if that B field is null, then C must also be) is the typical responsibility of object A.
The coherence of a field B with other codes that are external to A is another matter. This is where the "pojo" approach (in Java, but applicable to any language) comes into play.
The POJO approach says that with all the responsibilities/concerns that we have in modern software (persistance & validation are only two of them), domain model end up being messy and hard to understand. The problem is that these domain objects are central to the understanding of the whole application, to communicating with domain experts and so on. Each time you have to read a domain object code, you have to handle the complexity of all these concerns, while you might care of none or one...
So, in the POJO approach, your domain objects must not carry code related to one of these concerns (which usually carries an interface to implement, or a superclass to have).
All concern except the domain one are out of the object (but some simple information can still be provided, in java usually via Annotations, to parameterize generic external code that handle one concern).
Also, the domain objects relate only to other domain objects, not to some framework classes related to one concern (such as validation, or persistence). So the domain model, with all classes, can be put in a separate "package" (project or whatever), without dependencies on technical or concern-related codes. This make it much easier to understand the heart of a complex application, without all that complexity of these secondary aspects.
I've read many times and agree with avoiding the use of globals to keep code orthogonal. Does the use of the config file to keep read only information that your program uses similar to using Globals?
If you're using config files in place of globals, then yes, they are similar.
Config files should only be used in cases where the end-user (presumably a computer-savvy user, like a developer) needs to declare settings for an application or piece of code, while keeping their hands out of the code itself.
My first reaction would be that it is not the same. I think the problem with globals is the read+write scenario. Config-files are readonly (at least in terms of execution).
In the same way constants are not considered bad programming behaviour. Config-files, at least in the way I use them, are just easy-changable constants.
Well, since a config file and a global variable can both have the effect of propagating changes throughout a system - they are roughly similar.
But... in the case of a configuration file that change is usually going to take place in a single, highly-visible (to the developer) location, and global variables can affect change in very sneaky and hard to track down ways -- so in this way the two concepts are not similar.
Having a configuration file ususally helps with DRY concepts, and it shouldn't hurt the orthogonality of the system, either.
Bonus points for using the $25 word 'orthogonal'. I had to look that one up in Wikipedia to find out the non-Euclidean definition.
Configuration files are really meant to be easily editable by the end user as a way of telling the program how to run.
A more specialized form of configuration files, user preferences, are used to remember things between program executions.
Global is related to a unique instance for an object which will never change, whereas config file is used as container for reference values, for objects within the application that can change.
One "global" object will never change during runtime, the other object is initialized through config file, but can change later on.
Actually, those objects not only can change during the lifetime of the application, they can also monitor the config file in order to realize "hot-change" (modification of their value without stopping/restarting the application), if that config file is modified.
They are absolutely not the same or replacements for eachother. A config file, or object can be used non-globally, ie passed explicitly.
You can of course have a global variable that refers to a config object, and that would be defeating the purpose.