Storing json, jsonb, hstore, xml, enum, ipaddr, etc fails with "column "x" is of type json but expression is of type character varying" - json

When using PostgreSQL to store data in a field of a string-like validated type, like xml, json, jsonb, xml, ltree, etc, the INSERT or UPDATE fails with an error like:
column "the_col" is of type json but expression is of type character varying
... or
column "the_col" is of type json but expression is of type text
Why? What can I do about it?
I'm using JDBC (PgJDBC).
This happens via Hibernate, JPA, and all sorts of other abstraction layers.
The "standard" advice from the PostgreSQL team is to use a CAST in the SQL. This is not useful for people using query generators or ORMs, especially if those systems don't have explicit support for database types like json, so they're mapped via String in the application.
Some ORMs permit the implementation of custom type handlers, but I don't really want to write a custom handler for each data type for each ORM, e.g. json on Hibernate, json on EclipseLink, json on OpenJPA, xml on Hibernate, ... etc. There's no JPA2 SPI for writing a generic custom type handler. I'm looking for a general solution.

Why it happens
The problem is that PostgreSQL is overly strict about casts between text and non-text data types. It will not allow an implicit cast (one without a CAST or :: in the SQL) from a text type like text or varchar (character varying) to a text-like non-text type like json, xml, etc.
The PgJDBC driver specifies the data type of varchar when you call setString to assign a parameter. If the database type of the column, function argument, etc, is not actually varchar or text, but instead another type, you get a type error. This is also true of quite a lot of other drivers and ORMs.
PgJDBC: stringtype=unspecified
The best option when using PgJDBC is generally to pass the parameter stringtype=unspecified. This overrides the default behaviour of passing setString values as varchar and instead leaves it up to the database to "guess" their data type. In almost all cases this does exactly what you want, passing the string to the input validator for the type you want to store.
All: CREATE CAST ... WITH FUNCTION ...
You can instead CREATE CAST to define a data-type specific cast to permit this on a type-by-type basis, but this can have side effects elsewhere. If you do this, do not use WITHOUT FUNCTION casts, they will bypass type validation and result in errors. You must use the input/validation function for the data type. Using CREATE CAST is suitable for users of other database drivers that don't have any way to stop the driver specifying the type for string/text parameters.
e.g.
CREATE OR REPLACE FUNCTION json_intext(text) RETURNS json AS $$
SELECT json_in($1::cstring);
$$ LANGUAGE SQL IMMUTABLE;
CREATE CAST (text AS json)
WITH FUNCTION json_intext(text) AS IMPLICIT;
All: Custom type handler
If your ORM permits, you can implement a custom type handler for the data type and that specific ORM. This mostly useful when you're using native Java type that maps well to the PostgreSQL type, rather than using String, though it can also work if your ORM lets you specify type handlers using annotations etc.
Methods for implementing custom type handlers are driver-, language- and ORM-specific. Here's an example for Java and Hibernate for json.
PgJDBC: type handler using PGObject
If you're using a native Java type in Java, you can extend PGObject to provide a PgJDBC type mapping for your type. You will probably also need to implement an ORM-specific type handler to use your PGObject, since most ORMs will just call toString on types they don't recognise. This is the preferred way to map complex types between Java and PostgreSQL, but also the most complex.
PgJDBC: Type handler using setObject(int, Object)
If you're using String to hold the value in Java, rather than a more specific type, you can invoke the JDBC method setObject(integer, Object) to store the string with no particular data type specified. The JDBC driver will send the string representation, and the database will infer the type from the destination column type or function argument type.
See also
Questions:
Mapping postgreSQL JSON column to Hibernate value type
Are JPA (EclipseLink) custom types possible?
External:
http://www.postgresql.org/message-id/54096082.1090009#2ndquadrant.com
https://github.com/pgjdbc/pgjdbc/issues/265
http://www.pateldenish.com/2013/05/inserting-json-data-into-postgres-using-jdbc-driver.html

Related

Dealing with NULLs when migrating from MySQL to Mongo in Go

Setup
I started a project using MySQL and as such, my project has some helper types that assist with dealing with nulls, both when unmarshalling incoming data on the API, inputting data into the DB, and then the inverse of that, pulling data out of the Database and responding with said data to the API.
For the purposes of this question, we'll deal with a struct i have that represents a Character.
type Character struct {
MongoID primitive.ObjectID `bson:"_id" json:"-"`
ID uint64 `bson:"id" json:"id"`
Name string `bson:"name" json:"name"`
CorporationID uint `bson:"corporation_id" json:"corporation_id"`
AllianceID null.Uint `bson:"alliance_id" json:"alliance_id,omitempty"`
FactionID null.Uint `bson:"faction_id" json:"faction_id,omitempty"`
SecurityStatus float64 `bson:"security_status" json:"security_status"`
NotModifiedCount uint `bson:"not_modified_count" json:"not_modified_count"`
UpdatePriority uint `bson:"update_priority" json:"update_priority"`
Etag null.String `bson:"etag" json:"etag"`
CachedUntil time.Time `bson:"cached_until" json:"cached_until"`
CreatedAt time.Time `bson:"created_at" json:"created_at"`
UpdatedAt time.Time `bson:"updated_at" json:"updated_at"`
}
I want to specifically concentrate on the AllianceID property of type null.Uint which is represented with the following struct:
// Uint is an nullable uint.
type Uint struct {
Uint uint
Valid bool
}
In an API setup using JSON and MySQL (i.e. My setup, but this is not exclusive), this structure allows me to easily deal with values that are "nullable" without having to deal with Pointers. I've always heard that it is best to avoid Pointers with the exception of pointers to structures (Slices, Slices of Structs, Map of Structs, etc). If you have a primitive type (int, bool, float, etc), try to avoid using a pointer to that primitive type.
This type has functions like MarshalJSON, UnmarshalJSON, Scan, and Value with logic inside those functions that leverage the Vaild property to determine what type of value to return. This works really really well with this setup.
Question
After some research, I've come to realize that Mongo would suit me better than a relational database, but due to the fluidity of a Mongo Document (Schemaless), I'm having a hard time understanding how to handle scenarios where a field maybe missing, or a property that i have in MySQL that would normally be null and I can easily unmarshal ontop this struct and use the helper functions logically, is handled. Also, when I setup a connection to Mongo and pull a couple of rows from MySQL and created Documents in Mongo from these rows, the BSON layer is marshalling the entire type for Alliance ID and sticking it in the DB.
Example:
"alliance_id" : {
"uint" : NumberLong(99007760),
"valid" : true
},
Where as in MySQL, the Value function implementing Valuer interface would be called and return 99007760 and that is the value in the DB.
Another scenario would be if valid was false. In MySQL this would mean a null value and when the Value function is called, it would return nil and the mysql driver would populate the field with NULL
So my question is how do I do this? Do I need to start from scratch and rebuild my models and redo some of the logic in my application that leverages the Valid property and use *Pointers or can I do what I am attempting to do using these helper types.
I do want to say that I have tried implementing the Marshaller, and Unmarshaller interfaces on the bson package and the alliane_id in the document is still set to the json encoded version of this type as I outlined above. I wanted to point this out to rule out any suggestions of implemeting those interfaces. If what I am attempting to achieve is counter intuitive to Mongo, please link some guides that can help me achieve what im attempting to do.
Thank you to all who can assist with this.
The easiest way to deal with optional fields like this is to use a pointer:
type Character struct {
ID *uint64 `bson:"id,omitempty" json:"id"`
Name string `bson:"name" json:"name"`
...
}
Above, the ID field will be written if it s non-nil. When unmarshaling, it will be set to a non-nil value if database record has a value for it. If you omit the omitempty flag, marshaling this struct will write null to the database.
For strings, you may use omitempty to omit the field completely if it is empty. If you want to store empty strings, omit omitempty.

Excessive use of map[string]interface{} in go development?

The majority of my development experience has been from dynamically typed languages like PHP and Javascript. I've been practicing with Golang for about a month now by re-creating some of my old PHP/Javascript REST APIs in Golang. I feel like I'm not doing things the Golang way most of the time. Or more generally, I'm not use to working with strongly typed languages. I feel like I'm making excessive use of map[string]interface{} and slices of them to box up data as it comes in from http requests or when it gets shipped out as json http output. So what I'd like to know is if what I'm about to describe goes against the philosophy of golang development? Or if I'm breaking the principles of developing with strongly typed languages?
Right now, about 90% of the program flow for REST Apis I've rewritten with Golang can be described by these 5 steps.
STEP 1 - Receive Data
I receive http form data from http.Request.ParseForm() as formvals := map[string][]string. Sometimes I will store serialized JSON objects that need to be unmarshaled like jsonUserInfo := json.Unmarshal(formvals["user_information"][0]) /* gives some complex json object */.
STEP 2 - Validate Data
I do validation on formvals to make sure all the data values are what I expect before using it in SQL queries. I treat everyting as a string, then use Regex to determine if the string format and business logic is valid (eg. IsEmail, IsNumeric, IsFloat, IsCASLCompliant, IsEligibleForVoting,IsLibraryCardExpired etc...). I've written my own Regex and custom functions for these types of validations
STEP 3 - Bind Data to SQL Queries
I use golang's database/sql.DB to take my formvals and bind them to my Query and Exec functions like this Query("SELECT * FROM tblUser WHERE user_id = ?, user_birthday > ? ",formvals["user_id"][0], jsonUserInfo["birthday"]). I never care about the data types I'm supplying as arguments to be bound, so they're all probably strings. I trust the validation in the step immediately above has determined they are acceptable for SQL use.
STEP 4 - Bind SQL results to []map[string]interface{}{}
I Scan() the results of my queries into a sqlResult := []map[string]interface{}{} because I don't care if the value types are null, strings, float, ints or whatever. So the schema of an sqlResult might look like:
sqlResult =>
[0] {
"user_id":"1"
"user_name":"Bob Smith"
"age":"45"
"weight":"34.22"
},
[1] {
"user_id":"2"
"user_name":"Jane Do"
"age":nil
"weight":"22.22"
}
I wrote my own eager load function so that I can bind more information like so EagerLoad("tblAddress", "JOIN ON tblAddress.user_id",&sqlResult) which then populates sqlResult with more information of the type []map[string]interface{}{} such that it looks like this:
sqlResult =>
[0] {
"user_id":"1"
"user_name":"Bob Smith"
"age":"45"
"weight":"34.22"
"addresses"=>
[0] {
"type":"home"
"address1":"56 Front Street West"
"postal":"L3L3L3"
"lat":"34.3422242"
"lng":"34.5523422"
}
[1] {
"type":"work"
"address1":"5 Kennedy Avenue"
"postal":"L3L3L3"
"lat":"34.3422242"
"lng":"34.5523422"
}
},
[1] {
"user_id":"2"
"user_name":"Jane Do"
"age":nil
"weight":"22.22"
"addresses"=>
[0] {
"type":"home"
"address1":"56 Front Street West"
"postal":"L3L3L3"
"lat":"34.3422242"
"lng":"34.5523422"
}
}
STEP 5 - JSON Marshal and send HTTP Response
then I do a http.ResponseWriter.Write(json.Marshal(sqlResult)) and output data for my REST API
Recently, I've been revisiting articles with code samples that use structs in places I would have used map[string]interface{}. For example, I wanted to refactor Step 2 with a more standard approach that other golang developers would use. So I found this https://godoc.org/gopkg.in/go-playground/validator.v9, except all it's examples are with structs . I also noticed that most blogs that talk about database/sql scan their SQL results into typed variables or structs with typed properties, as opposed to my Step 4 which just puts everything into map[string]interface{}
Hence, i started writing this question. I feel the map[string]interface{} is so useful because majority of the time,I don't really care what the data is and it gives me to the freedom in Step 4 to construct any data schema on the fly before I dump it as JSON http response. I do all this with as little code verbosity as possible. But this means my code is not as ready to leverage Go's validation tools, and it doesn't seem to comply with the golang community's way of doing things.
So my question is, what do other golang developers do with regards to Step 2 and Step 4? Especially in Step 4...do Golang developers really encourage specifying the schema of the data through structs and strongly typed properties? Do they also specify structs with strongly typed properties along with every eager loading call they make? Doesn't that seem like so much more code verbosity?
It really depends on the requirements just like you have said you don't require to process the json it comes from the request or from the sql results. Then you can easily unmarshal into interface{}. And marshal the json coming from sql results.
For Step 2
Golang has library which works on validation of structs used to unmarshal json with tags for the fields inside.
https://github.com/go-playground/validator
type Test struct {
Field `validate:"max=10,min=1"`
}
// max will be checked then min
you can also go to godoc for validation library. It is very good implementation of validation for json values using struct tags.
For STEP 4
Most of the times, We use structs if we know the format and data of our JSON. Because it provides us more control over the data types and other functionality. For example if you wants to empty a JSON feild if you don't require it in your JSON. You should use struct with _ json tag.
Now you have said that you don't care if the result coming from sql is empty or not. But if you do it again comes to using struct. You can scan the result into struct with sql.NullTypes. With that also you can provide json tag for omitempty if you wants to omit the json object when marshaling the data when sending a response.
Struct values encode as JSON objects. Each exported struct field
becomes a member of the object, using the field name as the object
key, unless the field is omitted for one of the reasons given below.
The encoding of each struct field can be customized by the format
string stored under the "json" key in the struct field's tag. The
format string gives the name of the field, possibly followed by a
comma-separated list of options. The name may be empty in order to
specify options without overriding the default field name.
The "omitempty" option specifies that the field should be omitted from
the encoding if the field has an empty value, defined as false, 0, a
nil pointer, a nil interface value, and any empty array, slice, map,
or string.
As a special case, if the field tag is "-", the field is always
omitted. Note that a field with name "-" can still be generated using
the tag "-,".
Example of json tags
// Field appears in JSON as key "myName".
Field int `json:"myName"`
// Field appears in JSON as key "myName" and
// the field is omitted from the object if its value is empty,
// as defined above.
Field int `json:"myName,omitempty"`
// Field appears in JSON as key "Field" (the default), but
// the field is skipped if empty.
// Note the leading comma.
Field int `json:",omitempty"`
// Field is ignored by this package.
Field int `json:"-"`
// Field appears in JSON as key "-".
Field int `json:"-,"`
As you can analyze from above information given in Golang spec for json marshal. Struct provide so much control over json. That's why Golang developer most probably use structs.
Now on using map[string]interface{} you should use it when you don't the structure of your json coming from the server or the types of fields. Most Golang developers stick to structs wherever they can.

Apache Drill view.drill json syntax to describe complex data types

Does the JSON fields array in the view.drill file created by create view support describing an ARRAY, STRUCT type?
Want to define views of PARQUET files so they are described by the JDBC driver (DatabaseMetadata.getTables, getColumns...). Want to project the columns of the PARQUET file as the actual type (i.e. INTEGER) versus leaving Drill to describe them as the ANY type. Unfortunately, that requires CAST ( as ) and the target types supported by CAST does not include ARRAY or STRUCT.
Hence, if the view.drill file can be externally defined without requiring a CAST and supported complex types would build them programmatically.

Julia function argument type def

I faced some problem regarding defining the type of arguments for a function in Julia.
From one hand, the code would be faster to run if the type is defined: for example Int64 for an integer number. On the other hand, passing a simple number to the function would needs type cast every time I call the function, e.g. by calling:
convert(a, Int64)
That seems to be an overkill.
what is the advice for good style?
With Julia, it's not generally true that specifying the type for a function's argument(s) will make it faster. If the argument has no type (i.e. Any), or has just an abstract type (for example, Integer instead of Int64, Julia can generate methods for whatever concrete types are actually used to call the function, instead of having to do any conversion.
BTW, the syntax is actually convert(Int64, a), the type you wish to convert to comes first.

Parsing JSON objects of unknown type with AutoBean on GWT

My server returns a list of objects in JSON. They might be Cats or Dogs, for example.
When I know that they'll all be Cats, I can set the AutoBeanCodex to work easily. When I don't know what types they are, though... what should I do?
I could give all of my entities a type field, but then I'd have to parse each entity before passing it to the AutoBeanCodex, which borders on defeating the point. What other options do I have?
Just got to play with this the other day, and fought it for a few hours, trying #Category methods and others, until I found this: You can create a property of type Splittable, which represents the underlying transport type that has some encoding for booleans/Strings/Lists/Maps. In my case, I know some enveloping type that goes over the wire at design time, and based on some other property, some other field can be any number of other autobeans.
You don't even need to know the type of the other bean at compile time, you could get values out using Splittable's methods, but if using autobeans anyway, it is nice to define the data that is wrapped.
interface Envelope {
String getStatus();
String getDataType();
Splittable getData();
}
(Setters might be desired if you sending data as well as recieving - encoding a bean into a `Splittable to send it in an envelope is even easier than decoding it)
The JSON sent over the wire is decoded (probably using AutoBeanCodex) into the Envelope type, and after you've decided what type must be coming out of the getData() method, call something like this to get the nested object out
SpecificNestedBean bean = AutoBeanCodex.decode(factory,
SpecificNestedBean.class,
env.getData()).as();
The Envelope type and the nested types (in factory above) don't even need to be the same AutoBeanFactory type. This could allow you to abstract out the reading/writing of envelopes from the generic transport instance, and use a specific factory for each dataType string property to decode the data's model (and nested models).