What is the best way to handle versioning using JSON protocol?

What is the best way to handle versioning using JSON protocol? - json

I am normally writing all parts of the code in C# and when writing protocols that are serialized I use FastSerializer that serializes/deserializes the classes fast and efficient. It is also very easy to use, and fairly straight-forward to do "versioning", ie to handle different versions of the serialization. The thing I normally use, looks like this:
public override void DeserializeOwnedData(SerializationReader reader, object context)
{
base.DeserializeOwnedData(reader, context);
byte serializeVersion = reader.ReadByte(); // used to keep what version we are using
this.CustomerNumber = reader.ReadString();
this.HomeAddress = reader.ReadString();
this.ZipCode = reader.ReadString();
this.HomeCity = reader.ReadString();
if (serializeVersion > 0)
this.HomeAddressObj = reader.ReadUInt32();
if (serializeVersion > 1)
this.County = reader.ReadString();
if (serializeVersion > 2)
this.Muni = reader.ReadString();
if (serializeVersion > 3)
this._AvailableCustomers = reader.ReadList<uint>();
}
and
public override void SerializeOwnedData(SerializationWriter writer, object context)
{
base.SerializeOwnedData(writer, context);
byte serializeVersion = 4;
writer.Write(serializeVersion);
writer.Write(CustomerNumber);
writer.Write(PopulationRegistryNumber);
writer.Write(HomeAddress);
writer.Write(ZipCode);
writer.Write(HomeCity);
if (CustomerCards == null)
CustomerCards = new List<uint>();
writer.Write(CustomerCards);
writer.Write(HomeAddressObj);
writer.Write(County);
// v 2
writer.Write(Muni);
// v 4
if (_AvailableCustomers == null)
_AvailableCustomers = new List<uint>();
writer.Write(_AvailableCustomers);
}
So its easy to add new things, or change the serialization completely if one chooses to.
However, I now want to use JSON for reasons not relevant right here =) I am currently using DataContractJsonSerializer and I am now looking for a way to have the same flexibility I have using the FastSerializer above.
So the question is; what is the best way to create a JSON protocol/serialization and to be able to detail the serialization as above, so that I do not break the serialization just because another machine hasn't yet updated their version?

The key to versioning JSON is to always add new properties, and never remove or rename existing properties. This is similar to how protocol buffers handle versioning.
For example, if you started with the following JSON:
{
"version": "1.0",
"foo": true
}
And you want to rename the "foo" property to "bar", don't just rename it. Instead, add a new property:
{
"version": "1.1",
"foo": true,
"bar": true
}
Since you are never removing properties, clients based on older versions will continue to work. The downside of this method is that the JSON can get bloated over time, and you have to continue maintaining old properties.
It is also important to clearly define your "edge" cases to your clients. Suppose you have an array property called "fooList". The "fooList" property could take on the following possible values: does not exist/undefined (the property is not physically present in the JSON object, or it exists and is set to "undefined"), null, empty list or a list with one or more values. It is important that clients understand how to behave, especially in the undefined/null/empty cases.
I would also recommend reading up on how semantic versioning works. If you introduce a semantic versioning scheme to your version numbers, then backwards compatible changes can be made on a minor version boundary, while breaking changes can be made on a major version boundary (both clients and servers would have to agree on the same major version). While this isn't a property of the JSON itself, this is useful for communicating the types of changes a client should expect when the version changes.

Google's java based gson library has an excellent versioning support for json. It could prove a very handy if you are thinking going java way.
There is nice and easy tutorial here.

It doesn't matter what serializing protocol you use, the techniques to version APIs are generally the same.
Generally you need:
a way for the consumer to communicate to the producer the API version it accepts (though this is not always possible)
a way for the producer to embed versioning information to the serialized data
a backward compatible strategy to handle unknown fields
In a web API, generally the API version that the consumer accepts is embedded in the Accept header (e.g. Accept: application/vnd.myapp-v1+json application/vnd.myapp-v2+json means the consumer can handle either version 1 and version 2 of your API) or less commonly in the URL (e.g. https://api.twitter.com/1/statuses/user_timeline.json). This is generally used for major versions (i.e. backward incompatible changes). If the server and the client does not have a matching Accept header, then the communication fails (or proceeds in best-effort basis or fallback to a default baseline protocol, depending on the nature of the application).
The producer then generates a serialized data in one of the requested version, then embed this version info into the serialized data (e.g. as a field named version). The consumer should use the version information embedded in the data to determine how to parse the serialized data. The version information in the data should also contain minor version (i.e. for backward compatible changes), generally consumers should be able to ignore the minor version information and still process the data correctly although understanding the minor version may allow the client to make additional assumptions about how the data should be processed.
A common strategy to handle unknown fields is like how HTML and CSS are parsed. When the consumer sees an unknown fields they should ignore it, and when the data is missing a field that the client is expecting, it should use a default value; depending on the nature of the communication, you may also want to specify some fields that are mandatory (i.e. missing fields is considered fatal error). Fields added within minor versions should always be optional field; minor version can add optional fields or change fields semantic as long as it's backward compatible, while major version can delete fields or add mandatory fields or change fields semantic in a backward incompatible manner.
In an extensible serialization format (like JSON or XML), the data should be self-descriptive, in other words, the field names should always be stored together with the data; you should not rely on the specific data being available on specific positions.

Don't use DataContractJsonSerializer, as the name says, the objects that are processed through this class will have to:
a) Be marked with [DataContract] and [DataMember] attributes.
b) Be strictly compliant with the defined "Contract" that is, nothing less and nothing more that it is defined. Any extra or missing [DataMember] will make the deserialization to throw an exception.
If you want to be flexible enough, then use the JavaScriptSerializer if you want to go for the cheap option... or use this library:
http://json.codeplex.com/
This will give you enough control over your JSON serialization.
Imagine you have an object in its early days.
public class Customer
{
public string Name;
public string LastName;
}
Once serialized it will look like this:
{ Name: "John", LastName: "Doe" }
If you change your object definition to add / remove fields. The deserialization will occur smoothly if you use, for example, JavaScriptSerializer.
public class Customer
{
public string Name;
public string LastName;
public int Age;
}
If yo try to de-serialize the last json to this new class, no error will be thrown. The thing is that your new fields will be set to their defaults. In this example: "Age" will be set to zero.
You can include, in your own conventions, a field present in all your objects, that contains the version number. In this case you can tell the difference between an empty field or a version inconsistence.
So lets say: You have your class Customer v1 serialized:
{ Version: 1, LastName: "Doe", Name: "John" }
You want to deserialize into a Customer v2 instance, you will have:
{ Version: 1, LastName: "Doe", Name: "John", Age: 0}
You can somehow, detect what fields in your object are somehow reliable and what's not. In this case you know that your v2 object instance is coming from a v1 object instance, so the field Age should not be trusted.
I have in mind that you should use also a custom attribute, e.g. "MinVersion", and mark each field with the minimum supported version number, so you get something like this:
public class Customer
{
[MinVersion(1)]
public int Version;
[MinVersion(1)]
public string Name;
[MinVersion(1)]
public string LastName;
[MinVersion(2)]
public int Age;
}
Then later you can access this meta-data and do whatever you might need with that.

Related

JDO, org.json.simple.JSONObject and PostgreSQL JSON type

In my PostgreSQL database I have:
CREATE TABLE category (
// ...
category_name_localization JSON not null,
);
In Java, I have a JDO class like so:
#javax.jdo.annotations.PersistenceCapable(table = "category" )
public class Category extends _BlueEntity implements Serializable {
//...
private org.json.simple.JSONObject category_name_localization;
#javax.jdo.annotations.Column( name = "category_name_localization" )
public org.json.simple.JSONObject getCategoryNameLocalization() {
return category_name_localization;
}
}
When I use this class, DataNucleus gives the following exception:
org.datanucleus.exceptions.NucleusUserException: Field "com.advantagegroup.blue.ui.entity.Category.category_name_localization" is a map that has been specified without a join table and neither the key nor the value has a mapped-by specified. This is invalid!
at org.datanucleus.store.rdbms.RDBMSStoreManager.newJoinTable(RDBMSStoreManager.java:2720)
at org.datanucleus.store.rdbms.mapping.java.AbstractContainerMapping.initialize(AbstractContainerMapping.java:82)
at org.datanucleus.store.rdbms.mapping.MappingManagerImpl.getMapping(MappingManagerImpl.java:680)
at org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:518)
at org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:424)
at org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1250)
at org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:271)
at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3288)
at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2897)
at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:118)
at org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1637)
at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:665)
at org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2098)
at org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1278)
at org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3668)
at org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2276)
at org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:482)
at org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:122)
at org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218)
at org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:1986)
at org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1830)
at org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1685)
at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:712)
at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:738)
at com.advantagegroup.blue.ui.jdo._BlueJdo.insert(_BlueJdo.java:40)
at ...
This error makes sense in a way, because org.json.simple.JSONObject extends Map. However, this field is not part of any relationships -- it is of type JSON and therefore it is natural to back it with JSONObject
How do I tell JDO / DataNucleus to chill and treat org.json.simple.JSONObject the same way it would a String or a Date?
Thanks!
DC

My understanding of this is that your default attempt is trying to persist a normal Map (since while it doesnt know what a JSONObject is, it does know what a Map is), and it will need a join table for that for RDBMS.
Since you presumably want the JSONObject persisted into a single column then you need to create a JDO AttributeConverter. I've done similar things with my own types and it works fine (i'm on v5.0.5 IIRC).
I also found this in their docs, for when you have your own Map class that it doesn't know how to handle by default in terms of replacing it with a proxy (to intercept the calls to put, putAll etc). If you add that line it will not try to wrap this field with a proxy (which it doesn't know how to do for that type, unless you tell it). If you wanted to auto-detect the JSONObject becoming "dirty" you would need to write a proxy wrapper, as per this page.
This doesn't answer how to map the column for that converter to use a "json" type in PostgreSQL, but i'd guess that if you set the sqlType you may get success in that respect.

Dart objects with strong typing from JSON

I'm learning Dart and was reading the article Using Dart with JSON Web Services, which told me that I could get help with type checking when converting my objects to and from JSON. I used their code snippet but ended up with compiler warnings. I found another Stack Overflow question which discussed the same problem, and the answer was to use the #proxy annotation and implement noSuchMethod. Here's my attempt:
abstract class Language {
String language;
List targets;
Map website;
}
#proxy
class LanguageImpl extends JsonObject implements Language {
LanguageImpl();
factory LanguageImpl.fromJsonString(string) {
return new JsonObject.fromJsonString(string, new LanguageImpl());
}
noSuchMethod(i) => super.noSuchMethod(i);
}
I don't know if the noSuchMethod implementation is correct, and #proxy seems redundant now. Regardless, the code doesn't do what I want. If I run
var lang1 = new LanguageImpl.fromJsonString('{"language":"Dart"}');
print(JSON.encode(lang1));
print(lang1.language);
print(lang1.language + "!");
var lang2 = new LanguageImpl.fromJsonString('{"language":13.37000}');
print(JSON.encode(lang2));
print(lang2.language);
print(lang2.language + "!");
I get the output
{"language":"Dart"}
Dart
Dart!
{"language":13.37}
13.37
type 'String' is not a subtype of type 'num' of 'other'.
and then a stacktrace. Hence, although the readability is a little bit better (one of the goals of the article), the strong typing promised by the article doesn't work and the code might or might not crash, depending on the input.
What am I doing wrong?

The article mentions static types in one paragraph but JsonObject has nothing to do with static types.
What you get from JsonObject is that you don't need Map access syntax.
Instead of someMap['language'] = value; you can write someObj.language = value; and you get the fields in the autocomplete list, but Dart is not able to do any type checking neither when you assign a value to a field of the object (someObj.language = value;) nor when you use fromJsonString() (as mentioned because of noSuchMethod/#proxy).

I assume that you want an exception to be thrown on this line:
var lang2 = new LanguageImpl.fromJsonString('{"language":13.37000}');
because 13.37 is not a String. In order for JsonObject to do this it would need to use mirrors to determine the type of the field and manually do a type check. This is possible, but it would add to the dart2js output size.
So barring that, I think that throwing a type error when reading the field is reasonable, and you might have just found a bug-worthy issue here. Since noSuchMethod is being used to implement an abstract method, the runtime can actually do a type check on the arguments and return values. It appears from your example that it's not. Care to file a bug?
If this was addressed, then JsonObject could immediate read a field after setting it to cause a type check when decoding without mirrors, and it could do that check in an assert() so that it's only done in checked mode. I think that would be a nice solution.

Typescript with TypeLite - Run time type checking

Let's say I have some C# DTO's and I want to convert them to TypeScript interfaces using T4 templates and neat little library called TypeLite
On the client side, I have some concrete TypeScript classes (that inherit from Backbone.Model but that's not important) that represent the same DTO defined on the server side.
The intended goal of the interfaces is to act as data contracts and ensure client and server DTOs are kept in sync.
However, this is problematic since TypeScript supports no run-time type checking facilities other than instanceof. The problem with instance of is when I fetch my DTOs from the server they are plain JSON objects and not instances of my model. I need to perform run-time type checking on these DTOs that come in from the server as JSON objects.
I know I can do something like this:
collection.fetch({...}).done((baseModels) => {
baseModels.forEach((baseModel) => {
if(baseModel&& baseModel.SomeProperty && baseModel.SomeOtherProperty){
//JSON model has been "type-checked"
}
});
});
However, there is obvious problems to this because now I need to update in three places if I change or add a property.
Currently the only thing I found is this but it's undocumented, not maintained, and uses node which I have zero experience with so I'll save myself the frustration. Does anybody know of anything similar to perform run-time type checking in TypeScript or some other way to accomplish what I'm after?
It would be great if this was built into TypeLite to generate the interfaces as well as a JSON schema for type checking at run-time. Being that this project is open source somebody should really go ahead and extend it. I'd need some pointers at the least if I would do it myself (thus the question).
More details about my particular problem here (not necessary but if needed extra context)

At runtime you are using plain JavaScript, so you could use this answer as it relates to plain JavaScript:
How do I get the name of an object's type in JavaScript?
Here is a TypeScript get-class-name implementation that can supply the name of the enclosing TypeScript class (the link also has a static separate version of this example).
class ShoutMyName {
getName() {
var funcNameRegex = /function (.{1,})\(/;
var anyThis = <any> this;
var results = (funcNameRegex).exec(anyThis.constructor.toString());
return (results && results.length > 1) ? results[1] : "";
}
}
class Example extends ShoutMyName {
}
class AnotherClass extends ShoutMyName {
}
var x = new Example();
var y = new AnotherClass();
alert(x.getName());
alert(y.getName());
This doesn't give you data about the inheritance chain, just the class you are inspecting.

Should we overload the meaning of configuration settings?

Imagine an instance of some lookup of configuration settings called "configuration", used like this:
if(! string.IsNullOrEmpty(configuration["MySetting"])
{
DoSomethingWithTheValue(configuration["MySetting"]);
}
The meaning of the setting is overloaded. It means both "turn this feature on or off" and "here is a specific value to do something with". These can be decomposed into two settings:
if(configuration["UseMySetting"])
{
DoSomethingWithTheValue(configuration["MySetting"]);
}
The second approach seems to make configuration more complicated, but slightly easier to parse, and it separate out the two sorts of behaviour. The first seems much simpler at first but it's not clear what we choose as the default "turn this off" setting. "" might actually a valid value for MySetting.
Is there a general best practice rule for this?

I find the question to be slightly confusing, because it talks about (1) parsing, and (2) using configuration settings, but the code samples are for only the latter. That confusion means that my answer might be irrelevant to what you intended to ask. Anyway...
I suggest an approach that is illustrated by the following pseudo-code API (comments follow afterwards):
class Configuration
{
void parse(String fileName);
boolean exists(String name);
String lookupString(String name);
String lookupString(String name, String defaultValue);
int lookupInt(String name);
int lookupInt(String name, int defaultValue);
float lookupFloat(String name);
float lookupFloat(String name, float defaultValue);
boolean lookupBoolean(String name);
boolean lookupBoolean(String name, boolean defaultValue);
... // more pairs of lookup<Type>() operations for other types
}
The parse() operation parses a configuration file and stores the parsed data in a convenient format, for example, in a map or hash-table. (If you want, parse() can delegate the parsing to a third-party library, for example, a parser for XML, Java Properties, JSON, .ini files or whatever.)
After parsing is complete, your application can invoke the other operations to retrieve/use the configuration settings.
A lookup<Type>() operation retrieves the value of the specified name and parses it into the specified type (and throws an exception if the parsing fails). There are two overloadings for each lookup<Type>() operation. The version with one parameter throws an exception if the specified variable does not exist. The version with an extra parameter (denoting a default value) returns that default value if the specified variable does not exist.
The exists() operation can be used to test whether a specified name exists in the configuration file.
The above pseudo-code API offers two benefits. First, it provides type-safe access to configuration data (which wasn't a stated requirement in your question, but I think it is important anyway). Second, it enables you to distinguish between "variable is not defined in configuration" and "variable is defined but its value happens to be an empty string".
If you have already committed yourself to a particular configuration syntax, then just implement the above Configuration class as a thin wrapper around a parser for the existing configuration syntax. If you haven't already chosen a configuration syntax and if your project is in C++ or Java, then you might want to look at my Config4* library, which provides a ready-to-use implementation of the above pseudo-code class (with a few extra bells and whistles).

How do you implement Pipes and Filters pattern with LinqToSQL/Entity Framework/NHibernate?

While building by DAL Repository, I stumbled upon a concept called Pipes and Filters. I read about it here, here and saw a screencast from here. I am still not sure how to go about implementing this pattern. Theoretically all sounds good , but how do we really implement this in an enterprise scenario?
I will appreciate, if you have any resources,tips or examples ro explanation for this pattern in context to the data mappers/ORM mentioned in the question.
Thanks in advance!!

Ultimately, LINQ on IEnumerable<T> is a pipes and filters implementation. IEnumerable<T> is a streaming API - meaning that data is lazily returns as you ask for it (via iterator blocks), rather than loading everything at once, and returning a big buffer of records.
This means that your query:
var qry = from row in source // IEnumerable<T>
where row.Foo == "abc"
select new {row.ID, row.Name};
is:
var qry = source.Where(row => row.Foo == "abc")
.Select(row = > new {row.ID, row.Name});
as you enumerate over this, it will consume the data lazily. You can see this graphically with Jon Skeet's Visual LINQ. The only things that break the pipe are things that force buffering; OrderBy, GroupBy, etc. For high volume work, Jon and myself worked on Push LINQ for doing aggregates without buffering in such scenarios.
IQueryable<T> (exposed by most ORM tools - LINQ-to-SQL, Entity Framework, LINQ-to-NHibernate) is a slightly different beast; because the database engine is going to do most of the heavy lifting, the chances are that most of the steps are already done - all that is left is to consume an IDataReader and project this to objects/values - but that is still typically a pipe (IQueryable<T> implements IEnumerable<T>) unless you call .ToArray(), .ToList() etc.
With regard to use in enterprise... my view is that it is fine to use IQueryable<T> to write composable queries inside the repository, but they shouldn't leave the repository - as that would make the internal operation of the repository subject to the caller, so you would be unable to properly unit test / profile / optimize / etc. I've taken to doing clever things in the repository, but return lists/arrays. This also means my repository stays unaware of the implementation.
This is a shame - as the temptation to "return" IQueryable<T> from a repository method is quite large; for example, this would allow the caller to add paging/filters/etc - but remember that they haven't actually consumed the data yet. This makes resource management a pain. Also, in MVC etc you'd need to ensure that the controller calls .ToList() or similar, so that it isn't the view that is controlling data access (otherwise, again, you can't unit test the controller properly).
A safe (IMO) use of filters in the DAL would be things like:
public Customer[] List(string name, string countryCode) {
using(var ctx = new CustomerDataContext()) {
IQueryable<Customer> qry = ctx.Customers.Where(x=>x.IsOpen);
if(!string.IsNullOrEmpty(name)) {
qry = qry.Where(cust => cust.Name.Contains(name));
}
if(!string.IsNullOrEmpty(countryCode)) {
qry = qry.Where(cust => cust.CountryCode == countryCode);
}
return qry.ToArray();
}
}
Here we've added filters on-the-fly, but nothing happens until we call ToArray. At this point, the data is obtained and returned (disposing the data-context in the process). This can be fully unit tested. If we did something similar but just returned IQueryable<T>, the caller might do something like:
var custs = customerRepository.GetCustomers()
.Where(x=>SomeUnmappedFunction(x));
And all of a sudden our DAL starts failing (cannot translate SomeUnmappedFunction to TSQL, etc). You can still do a lot of interesting things in the repository, though.
The only pain point here is that it might push you to have a few overloads to support different calling patterns (with/without paging, etc). Until optional/named parameters arrives, I find the best answer here is to use extension methods on the interface; that way, I only need one concrete repository implementation:
class CustomerRepository {
public Customer[] List(
string name, string countryCode,
int? pageSize, int? pageNumber) {...}
}
interface ICustomerRepository {
Customer[] List(
string name, string countryCode,
int? pageSize, int? pageNumber);
}
static class CustomerRepositoryExtensions {
public static Customer[] List(
this ICustomerRepository repo,
string name, string countryCode) {
return repo.List(name, countryCode, null, null);
}
}
Now we have virtual overloads (as extension methods) on ICustomerRepository - so our caller can use repo.List("abc","def") without having to specify the paging.
Finally - without LINQ, using pipes and filters becomes a lot more painful. You'll be writing some kind of text based query (TSQL, ESQL, HQL). You can obviously append strings, but it isn't very "pipe/filter"-ish. The "Criteria API" is a bit better - but not as elegant as LINQ.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008