Json Validation in Apache beam using Google Cloud Dataflow - json

I am trying to write a Filter transformation using Apache beam Java SDK and i need to filter out invalid Json Messages.
If i create a new Gson Object for every element validation, the implementation works fine. However I want to avoid the creation of Gson Objects for every elements(throughput is 1K/Second) and validate the json.
I am creating a constant Gson Object at the start and initializing it in the static block. This approach is not working. Not sure why the same object cannot be used to parse multiple elements as we are not changing the state of the object during the processing?
// Gson object declared as constant
private static final Gson gsonObj=new Gson();
// Initialized GSon object during class loading before main method invocation
static {
gsonObj = new Gson();
}
....
/*
enum to validate json messages.
*/
enum InputValidation implements SerializableFunction<String, Boolean> {
VALID {
#Override
public Boolean apply(String input) {
try {
gsonObj.fromJson(input, Object.class);
return true;
} catch(com.google.gson.JsonSyntaxException ex) {
return false;
}
}
}
}

Use TupleTag to filter-out the record, instead of 'enum InputValidation implements'.
Use the below code to filter out the unparseable json row.
Pipeline p = Pipeline.create(options);
TupleTag<String> successParse = new TupleTag<String>();
TupleTag<String> failParse = new TupleTag<String>();
private static final Gson gsonObj=new Gson();
PCollectionTuple = input.apply(ParDo.of(new DoFn<String, String>(){
#ProcessElement
public void processElement(ProcessContext c) {
try {
gsonObj.fromJson(c.element(), Object.class);
c.output(successParse,c.element());
} catch {
c.output(failParse,c.element());
}
}
}).withOutputTags(successParse, TupleTagList.of(failParse)));
Above piece of code worked in my case and the optimum solution to filter out the records.
Here is the official documentation example.

Related

Is it possible to pass a java.util.Stream to Gson?

I'm currently working on a project where I need to fetch a large amount of data from the Database and parse it into a specific Json format, I already have built my custom Serializers and Its working properly when i pass a List to Gson. But as I was already working with Streams from my JPA Layer, I thought I could pass the Stream down to the Gson parser so that it could transform it directly to my Json data. But I'm getting an empty Json object instead of a correctly populated one.
So, if anyone could point to me a way to make Gson work with Java 8 Streams or if this isn't possible currently.. i could not find anything on Google, so i came to Stackoverflow.
You could use JsonWriter to streaming your data to output stream:
public void writeJsonStream(OutputStream out, Stream<DataObject> data) throws IOException {
try(JsonWriter writer = new JsonWriter(new OutputStreamWriter(out, "UTF-8"))) {
writer.setIndent(" ");
writer.beginArray();
data.forEach(d -> {
d.beginObject();
d.name("yourField").value(d.getYourField());
....
d.endObject();
});
writer.endArray();
}
}
Note that you're in charge of controling the json structure.
That is, if your DataObject contains nested Object, you have to write beginObject()/endObject() respectively. The same goes for nested array.
It is not as trivial as one would expect, but it can be done in a generic way.
When you look into the Javadoc to TypeAdapterFactory, they provide a very simplistic way of writing a TypeAdapterFactory for a custom type. Alas, it does not work as expected because of problems with element type detection. The proper way to do this can be found in Gson-internal CollectionTypeAdapterFactory. It is quite complex, but taking what's necessary one can come up with something like that:
final class StreamTypeAdapterFactory implements TypeAdapterFactory {
#SuppressWarnings("unchecked")
#Override
public <T> TypeAdapter<T> create(Gson gson, TypeToken<T> typeToken) {
Type type = typeToken.getType();
Class<? super T> rawType = typeToken.getRawType();
if (!Stream.class.isAssignableFrom(rawType)) {
return null;
}
Type elementType = ExtraGsonTypes.getStreamElementType(type, rawType);
TypeAdapter<?> elementAdapter = gson.getAdapter(TypeToken.get(elementType));
return (TypeAdapter<T>) new StreamTypeAdapter<>(elementAdapter);
}
private static class StreamTypeAdapter<E> extends TypeAdapter<Stream<E>> {
private final TypeAdapter<E> elementAdapter;
StreamTypeAdapter(TypeAdapter<E> elementAdapter) {
this.elementAdapter = elementAdapter;
}
public void write(JsonWriter out, Stream<E> value) throws IOException {
out.beginArray();
for (E element : iterable(value)) {
elementAdapter.write(out, element);
}
out.endArray();
}
public Stream<E> read(JsonReader in) throws IOException {
Stream.Builder<E> builder = Stream.builder();
in.beginArray();
while (in.hasNext()) {
builder.add(elementAdapter.read(in));
}
in.endArray();
return builder.build();
}
}
private static <T> Iterable<T> iterable(Stream<T> stream) {
return stream::iterator;
}
}
The ExtraGsonTypes is a special class that I used to circumvent package-private access to $Gson$Types.getSupertype method. It's a hack that works if you're not using JDK 9's modules - you simply place this class in the same package as $Gson$Types:
package com.google.gson.internal;
import java.lang.reflect.*;
import java.util.stream.Stream;
public final class ExtraGsonTypes {
public static Type getStreamElementType(Type context, Class<?> contextRawType) {
return getContainerElementType(context, contextRawType, Stream.class);
}
private static Type getContainerElementType(Type context, Class<?> contextRawType, Class<?> containerSupertype) {
Type containerType = $Gson$Types.getSupertype(context, contextRawType, containerSupertype);
if (containerType instanceof WildcardType) {
containerType = ((WildcardType)containerType).getUpperBounds()[0];
}
if (containerType instanceof ParameterizedType) {
return ((ParameterizedType) containerType).getActualTypeArguments()[0];
}
return Object.class;
}
}
(I filed an issue about that in GitHub)
You use it in the following way:
Gson gson = new GsonBuilder()
.registerTypeAdapterFactory(new StreamTypeAdapterFactory())
.create();
System.out.println(gson.toJson(Stream.of(1, 2, 3)));

Struts2 Convert json array to java object array - not LinkedHashmap

First off my question is very similar to below however I'm not sure if the answers are applicable to my specific problem or whether I just need clarification about how to approach it:
Convert LinkedHashMap<String,String> to an object in Java
I am using struts2 json rest plugin to convert a json array into a java array. The array is sent through an ajax post request and the java receives this data. However instead of being the object type I expect it is received as a LinkedHashmap. Which is identical to the json request in structure.
[
{advance_Or_Premium=10000, available=true},
{advance_Or_Premium=10000, available=true},
{advance_Or_Premium=10000, available=true}
]
The data is all present and correct but just in the wrong type. Ideally I want to send the data in my object type or if this is not possible convert the LinkedHashMap from a list of keys and values into the object array. Here is the class I am using, incoming data is received in the create() method:
#Namespace(value = "/rest")
public class OptionRequestAction extends MadeAbstractAction implements ModelDriven<ArrayList<OptionRequestRest>>
{
private String id;
ArrayList<OptionRequestRest> model = new ArrayList<OptionRequestRest>();
public HttpHeaders create()
{
// TODO - need to use model here but it's a LinkedHashmap
return new DefaultHttpHeaders("create");
}
public String getId()
{
return this.id;
}
public ArrayList<OptionRequestRest> getModel()
{
return this.model;
}
public ArrayList<OptionRequestRest> getOptionRequests()
{
#SuppressWarnings("unchecked")
ArrayList<OptionRequestRest> lReturn = (ArrayList<OptionRequestRest>) this.getSession().get("optionRequest");
return lReturn;
}
// Handles /option-request GET requests
public HttpHeaders index()
{
this.model = this.getOptionRequests();
return new DefaultHttpHeaders("index").lastModified(new Date());
}
public void setId(String pId)
{
this.id = pId;
}
public void setModel(ArrayList<OptionRequestRest> pModel)
{
this.model = pModel;
}
// Handles /option-request/{id} GET requests
public HttpHeaders show()
{
this.model = this.getOptionRequests();
return new DefaultHttpHeaders("show").lastModified(new Date());
}
}
One of the things which is confusing me is that this code works fine and returns the correct object type if the model is not an array. Please let me know if my question is not clear enough and needs additional information. Thanks.

Change the json DateTime serialization in WCF 4.0 REST Service

I need to replace the DateTime serialization for JSON in WCF REST Self Hosted service. Right now, I'm using something like the following code to do it, but it's definitely not the way to go since it requires manipulating each class.
[DataContract]
public class Test
{
[IgnoreDataMember]
public DateTime StartDate;
[DataMember(Name = "StartDate")]
public string StartDateStr
{
get { return DateUtil.DateToStr(StartDate); }
set { StartDate = DateTime.Parse(value); }
}
}
where my utility function DateUtil.DateToStr does all the formatting work.
Is there any easy way to do it without having to touch the attributes on my classes which have the DataContract attribute? Ideally, there would be no attributes, but a couple of lines of code in my configuration to replace the serializer with one where I've overridden DateTime serialization.
Everything that I've found looks like I have to replace huge pieces of the pipeline.
This article doesn't appear to apply because in I'm using WebServiceHost not HttpServiceHost, which not part of the 4.5.1 Framework.
JSON.NET Serializer for WCF REST Services
By default WCF uses DataContractJsonSerializer to serialize data into JSON. Unfortunatelly date from this serializer is in very difficult format to parse by human brain.
"DateTime": "\/Date(1535481994306+0200)\/"
To override this behavior we need to write custom IDispatchMessageFormatter. This class will receive all data which should be returned to requester and change it according to our needs.
To make it happen to the operations in the endpoint add custom formatter - ClientJsonDateFormatter:
ServiceHost host=new ServiceHost(typeof(CustomService));
host.AddServiceEndpoint(typeof(ICustomContract), new WebHttpBinding(), Consts.WebHttpAddress);
foreach (var endpoint in host.Description.Endpoints)
{
if (endpoint.Address.Uri.Scheme.StartsWith("http"))
{
foreach (var operation in endpoint.Contract.Operations)
{
operation.OperationBehaviors.Add(new ClientJsonDateFormatter());
}
endpoint.Behaviors.Add(new WebHttpBehavior());
}
}
ClientJsonDateFormatter is simple class which just applies formatter ClientJsonDateFormatter
public class ClientJsonDateFormatter : IOperationBehavior
{
public void AddBindingParameters(OperationDescription operationDescription, BindingParameterCollection bindingParameters) { }
public void ApplyClientBehavior(OperationDescription operationDescription, ClientOperation clientOperation) { }
public void ApplyDispatchBehavior(OperationDescription operationDescription, DispatchOperation dispatchOperation)
{
dispatchOperation.Formatter = new ResponseJsonFormatter(operationDescription);
}
public void Validate(OperationDescription operationDescription) { }
}
In the formatter we took imput and serialize it with the changed Serializer:
public class ResponseJsonFormatter : IDispatchMessageFormatter
{
OperationDescription Operation;
public ResponseJsonFormatter(OperationDescription operation)
{
this.Operation = operation;
}
public void DeserializeRequest(Message message, object[] parameters)
{
}
public Message SerializeReply(MessageVersion messageVersion, object[] parameters, object result)
{
string json=Newtonsoft.Json.JsonConvert.SerializeObject(result);
byte[] bytes = Encoding.UTF8.GetBytes(json);
Message replyMessage = Message.CreateMessage(messageVersion, Operation.Messages[1].Action, new RawDataWriter(bytes));
replyMessage.Properties.Add(WebBodyFormatMessageProperty.Name, new WebBodyFormatMessageProperty(WebContentFormat.Raw));
return replyMessage;
}
}
And to send information to client we need data writer - RawDataWriter. Its implementation is simple:
class RawDataWriter : BodyWriter
{
byte[] data;
public RawDataWriter(byte[] data)
: base(true)
{
this.data = data;
}
protected override void OnWriteBodyContents(XmlDictionaryWriter writer)
{
writer.WriteStartElement("Binary");
writer.WriteBase64(data, 0, data.Length);
writer.WriteEndElement();
}
}
Applying all code will result in returning date in more friendly format:
"DateTime":"2018-08-28T20:56:48.6411976+02:00"
To show it in practice I created example in the github branch DateTimeFormatter.
Please check also this answer as very likely you also will need it.
There is a limitation in JSON to convert DateTime, specially according to your case.
Please see http://msdn.microsoft.com/en-us/library/bb412170(v=vs.110).aspx
and read the section Dates/Times and JSON
To resolve this problem, I simply changed the type of serialization from JSON to XML for all the calls including DateTime.
After long time discussion ,I have find out the solution for it.
Please Use the following Code to Solve serialized date..
[IgnoreDataMember]
public DateTime? PerformanceDate { get; set; }
[DataMember(EmitDefaultValue = false, Name = "PerformanceDate")]
public string UpdateStartDateStr
{
get
{
if (this.PerformanceDate.HasValue)
return this.PerformanceDate.Value.ToUniversalTime().ToString("s", CultureInfo.InvariantCulture);
else
return null;
}
set
{
// should implement this...
}
}

Emit odata.type field with DataContractJsonSerializer?

Is there a way to make DataContractJsonSerializer emit the "odata.type" field required when posting an OData entity into a collection that supports multiple entity types (hierarchy per table)?
If I construct DataContractJsonSerializer with a settings object with EmitTypeInformation set to Always, it emits a "__type" field in the output, but that's not the field name needed for OData and the format of the value is wrong as well.
Is there any way to hook into the DataContractJsonSerializer pipeline to inject the desired "odata.type" field into the serialization output?
It would be such a hack to have to parse the serialization output in order to inject the field. How does WCF Data Services do it? Not using DataContractJsonSerializer is my guess.
Have you considered using Json.Net? Json.Net is much more extensible and the scenario that you have can be done using a custom resolver. sample code
class Program
{
static void Main(string[] args)
{
Console.WriteLine(
JsonConvert.SerializeObject(new Customer { Name = "Raghu" }, new JsonSerializerSettings
{
ContractResolver = new CustomContractResolver()
}));
}
}
public class CustomContractResolver : DefaultContractResolver
{
protected override JsonObjectContract CreateObjectContract(Type objectType)
{
JsonObjectContract objectContract = base.CreateObjectContract(objectType);
objectContract.Properties.Add(new JsonProperty
{
PropertyName = "odata.type",
PropertyType = typeof(string),
ValueProvider = new StaticValueProvider(objectType.FullName),
Readable = true
});
return objectContract;
}
private class StaticValueProvider : IValueProvider
{
private readonly object _value;
public StaticValueProvider(object value)
{
_value = value;
}
public object GetValue(object target)
{
return _value;
}
public void SetValue(object target, object value)
{
throw new NotSupportedException();
}
}
}
public class Customer
{
public string Name { get; set; }
}
I can't answer your first two questions, but for the third question, I found on the OData Team blog a link to the OData WCF Data Services V4 library open source code. Downloading that code, you will see that they perform all serialization and deserialization manually. They have 68 files in their two Json folders! And looking through the code they have comments such as:
// This is a work around, needTypeOnWire always = true for client side:
// ClientEdmModel's reflection can't know a property is open type even if it is, so here
// make client side always write 'odata.type' for enum.
So that to me kind of implies there is no easy, clean, simple, elegant way to do it.
I tried using a JavaScriptConverter, a dynamic type, and other stuff, but most of them ended up resorting to using Reflection which just made for a much more complicated solution versus just using a string manipulation approach.

WCF restful returning JSON by using Entity Framework Complex

recently I have set up a WCF restful service with EF4.
It all worked out when returning XML format response. however when it comes to JSON, I got 504 Error. unable to return json data, WCF Resful Service .NET 4.0
By digging deeper by using Service Trace Viewer:
I found this error:
'The type 'xxx.DataEntity.AppView'
cannot be serialized to JSON because
its IsReference setting is 'True'. The
JSON format does not support
references because there is no
standardized format for representing
references. To enable serialization,
disable the IsReference setting on the
type or an appropriate parent class of
the type.'
The "AppView" is a complex object class which generated by EF4 from a store procedure.
I spend quite a bit time google how to disable the IsReference, very little result so far.
anyone? with any solutions?
thanks in advance
Code:
[OperationContract]
[WebInvoke(Method = "GET",
BodyStyle = WebMessageBodyStyle.Wrapped,
UriTemplate = "App/{id}/{format}")]
AppView FuncDetail(string id, string format);
public AppView FuncDetail(string id, string format)
{
SetResponseFormat(format);
return AppSvcs.GetById(id);
}
private void SetResponseFormat(string format)
{
if (format.ToLower() == "json")
{
ResponseContext.Format = WebMessageFormat.Json;
}
else
{
ResponseContext.Format = WebMessageFormat.Xml;
}
}
I ran into exactly the same issue. It only happened on one of my service methods where I was trying to return JSON serialised Entity objects. For all my other methods I was returning JSON serialised data transfer objects (DTOs), which are stand-alone and not connected to the Entity framework. I am using DTOs for data posted into methods. Often, the data you send out does not need all the data you store in the model or the database e.g. ID values, updated dates, etc. The mapping is done in the model class, like so:
public partial class Location
{
public static LocationDto CreateLocationDto(Location location)
{
LocationDto dto = new LocationDto
{
Accuracy = location.Accuracy,
Altitude = location.Altitude,
Bearing = location.Bearing
};
return dto;
}
It may seem a bit clunky but it works and it ensures that you only send the data fields you intended to send back. It works for me because I only have 5 or 6 entities but I can see that it would get a bit tedious if you have lots of classes.
I was running into the same problem, as caused by using the auto-generated ADO Entity Models. I have not found a direct fix for this issue, but as a work around, I serialize the response as json explicitly.
So in your example, the AppView FuncDetail looks like this:
public object FuncDetail(string id, string format)
{
SetResponseFormat(format);
// where AppSvc is the object type and the enumerable list of this type is returned by the GetById method, cast it to a json string
return JSONSerializer.ToJson<AppSvc>(AppSvcs.GetById(id));
}
Here are the Serializers that I'm using:
public static class GenericSerializer
{
public static DataTable ToDataTable<T>(IEnumerable<T> varlist)
{
DataTable dtReturn = new DataTable();
// column names
PropertyInfo[] oProps = null;
if (varlist == null) return dtReturn;
foreach (T rec in varlist)
{
// Use reflection to get property names, to create table, Only first time, others will follow
if (oProps == null)
{
oProps = ((Type)rec.GetType()).GetProperties();
foreach (PropertyInfo pi in oProps)
{
Type colType = pi.PropertyType;
if ((colType.IsGenericType) && (colType.GetGenericTypeDefinition()
== typeof(Nullable<>)))
{
colType = colType.GetGenericArguments()[0];
}
dtReturn.Columns.Add(new DataColumn(pi.Name, colType));
}
}
DataRow dr = dtReturn.NewRow();
foreach (PropertyInfo pi in oProps)
{
dr[pi.Name] = pi.GetValue(rec, null) == null ? DBNull.Value : pi.GetValue
(rec, null);
}
dtReturn.Rows.Add(dr);
}
return dtReturn;
}
}
public static class JSONSerializer
{
public static string ToJson<T>(IEnumerable<T> varlist)
{
DataTable dtReturn = GenericSerializer.ToDataTable(varlist);
return GetJSONString(dtReturn);
}
static object RowsToDictionary(this DataTable table)
{
var columns = table.Columns.Cast<DataColumn>().ToArray();
return table.Rows.Cast<DataRow>().Select(r => columns.ToDictionary(c => c.ColumnName, c => r[c]));
}
static Dictionary<string, object> ToDictionary(this DataTable table)
{
return new Dictionary<string, object>
{
{ table.TableName, table.RowsToDictionary() }
};
}
static Dictionary<string, object> ToDictionary(this DataSet data)
{
return data.Tables.Cast<DataTable>().ToDictionary(t => "Table", t => t.RowsToDictionary());
}
public static string GetJSONString(DataTable table)
{
JavaScriptSerializer serializer = new JavaScriptSerializer();
return serializer.Serialize(table.ToDictionary());
}
public static string GetJSONString(DataSet data)
{
JavaScriptSerializer serializer = new JavaScriptSerializer();
return serializer.Serialize(data.ToDictionary());
}}
It is a lot clearer to use Entity Metadata instead of Reflection.
The Metadata is pretty extensive.
another way to do this is to use LINQ to create an anonymous type with the subset of fields that you need from your entity and then use JSON.NET to serialize the collection of anon types that you created in the LINQ statement. then persist that collection out as a string by serializing.