Apache Drill view.drill json syntax to describe complex data types - apache-drill

Does the JSON fields array in the view.drill file created by create view support describing an ARRAY, STRUCT type?
Want to define views of PARQUET files so they are described by the JDBC driver (DatabaseMetadata.getTables, getColumns...). Want to project the columns of the PARQUET file as the actual type (i.e. INTEGER) versus leaving Drill to describe them as the ANY type. Unfortunately, that requires CAST ( as ) and the target types supported by CAST does not include ARRAY or STRUCT.
Hence, if the view.drill file can be externally defined without requiring a CAST and supported complex types would build them programmatically.

Related

Spark from_avro 2nd argument is a constant string, any way to obtain schema string from some column of each record?

suppose we are developing an application that pulls Avro records from a source
stream (e.g. Kafka/Kinesis/etc), parses them into JSON, then further processes that
JSON with additional transformations. Further assume these records can have a
varying schema (which we can look up and fetch from a registry).
We would like to use Spark's built in from_avro function, But it is pretty clear that
Spark from_avro wants you to hard code a >Fixed< schema into your code. It doesn't seem
to allow the schema to vary row by incoming row.
That sort of makes sense if you are parsing the Avro to Internal row format.. One would need
a consistent structure for the dataframe. But what if we wanted something like
from_avro which grabbed the bytes from some column in the row and also grabbed the string
representation of the Avro schema from some other column in the row, and then parsed that Avro
into a JSON string.
Does such built-in method exist? Or is such functionality available in a 3rd party library ?
Thanks !

How can i convert xml to json in oracle?

If i have
<xml><name>himasnhu</name><age>24</age></xml>
How can i covert it to
{"name":"himanshu","age":24} .
Thanks.
In Oracle 12.2 you should be able to use:
SELECT JSON_OBJECTAGG( id VALUE text )
FROM XMLTABLE(
'/xml/*'
PASSING XMLTYPE( '<xml><name>himanshu</name></xml>')
COLUMNS id VARCHAR2(200) PATH './name()',
text VARCHAR2(200) PATH './text()'
);
I'm not on a 12c system so this is untested.
In earlier versions you can write a Java function [1] [2] using one of the many Java JSON packages to perform the conversion and then load it into the database using the loadjava utility (or a CREATE JAVA statement) and then use that.
You can use the XML to JSON filter to convert an XML document to a JavaScript Object Notation (JSON) document. For details on the mapping conventions used, see:Github- Mapping convention
Configuration
To configure the XML to JSON filter, specify the following fields:
Name:
Enter a suitable name to reflect the role of this filter.
Automatically insert JSON array boundaries:
Select this option to attempt to automatically reconstruct JSON arrays from the incoming XML document. This option is selected by default.
[Note]
If the incoming XML document includes the processing instruction, the JSON array is reconstructed regardless of this option setting. If the XML document does not contain , and this option is selected, the filter makes an attempt at guessing what should be part of the array by examining the element names.

Storing json, jsonb, hstore, xml, enum, ipaddr, etc fails with "column "x" is of type json but expression is of type character varying"

When using PostgreSQL to store data in a field of a string-like validated type, like xml, json, jsonb, xml, ltree, etc, the INSERT or UPDATE fails with an error like:
column "the_col" is of type json but expression is of type character varying
... or
column "the_col" is of type json but expression is of type text
Why? What can I do about it?
I'm using JDBC (PgJDBC).
This happens via Hibernate, JPA, and all sorts of other abstraction layers.
The "standard" advice from the PostgreSQL team is to use a CAST in the SQL. This is not useful for people using query generators or ORMs, especially if those systems don't have explicit support for database types like json, so they're mapped via String in the application.
Some ORMs permit the implementation of custom type handlers, but I don't really want to write a custom handler for each data type for each ORM, e.g. json on Hibernate, json on EclipseLink, json on OpenJPA, xml on Hibernate, ... etc. There's no JPA2 SPI for writing a generic custom type handler. I'm looking for a general solution.
Why it happens
The problem is that PostgreSQL is overly strict about casts between text and non-text data types. It will not allow an implicit cast (one without a CAST or :: in the SQL) from a text type like text or varchar (character varying) to a text-like non-text type like json, xml, etc.
The PgJDBC driver specifies the data type of varchar when you call setString to assign a parameter. If the database type of the column, function argument, etc, is not actually varchar or text, but instead another type, you get a type error. This is also true of quite a lot of other drivers and ORMs.
PgJDBC: stringtype=unspecified
The best option when using PgJDBC is generally to pass the parameter stringtype=unspecified. This overrides the default behaviour of passing setString values as varchar and instead leaves it up to the database to "guess" their data type. In almost all cases this does exactly what you want, passing the string to the input validator for the type you want to store.
All: CREATE CAST ... WITH FUNCTION ...
You can instead CREATE CAST to define a data-type specific cast to permit this on a type-by-type basis, but this can have side effects elsewhere. If you do this, do not use WITHOUT FUNCTION casts, they will bypass type validation and result in errors. You must use the input/validation function for the data type. Using CREATE CAST is suitable for users of other database drivers that don't have any way to stop the driver specifying the type for string/text parameters.
e.g.
CREATE OR REPLACE FUNCTION json_intext(text) RETURNS json AS $$
SELECT json_in($1::cstring);
$$ LANGUAGE SQL IMMUTABLE;
CREATE CAST (text AS json)
WITH FUNCTION json_intext(text) AS IMPLICIT;
All: Custom type handler
If your ORM permits, you can implement a custom type handler for the data type and that specific ORM. This mostly useful when you're using native Java type that maps well to the PostgreSQL type, rather than using String, though it can also work if your ORM lets you specify type handlers using annotations etc.
Methods for implementing custom type handlers are driver-, language- and ORM-specific. Here's an example for Java and Hibernate for json.
PgJDBC: type handler using PGObject
If you're using a native Java type in Java, you can extend PGObject to provide a PgJDBC type mapping for your type. You will probably also need to implement an ORM-specific type handler to use your PGObject, since most ORMs will just call toString on types they don't recognise. This is the preferred way to map complex types between Java and PostgreSQL, but also the most complex.
PgJDBC: Type handler using setObject(int, Object)
If you're using String to hold the value in Java, rather than a more specific type, you can invoke the JDBC method setObject(integer, Object) to store the string with no particular data type specified. The JDBC driver will send the string representation, and the database will infer the type from the destination column type or function argument type.
See also
Questions:
Mapping postgreSQL JSON column to Hibernate value type
Are JPA (EclipseLink) custom types possible?
External:
http://www.postgresql.org/message-id/54096082.1090009#2ndquadrant.com
https://github.com/pgjdbc/pgjdbc/issues/265
http://www.pateldenish.com/2013/05/inserting-json-data-into-postgres-using-jdbc-driver.html

How to convert between BSON and JSON, especially for those special objects?

I am not asking for any libraries to do so and I am just writing code for bson_to_json and json_to_bson.
so here is the BSON specification.
For regular double, doc, array, string, it is fine and it is easy to convert between BSON and JSON.
However, for those particular objects, such as
Timestamp and UTC:
If convert from JSON to BSON, how can I know they are timestamp and utc?
Regex (string, string), JavaScript code with scope (string, doc)
their structures have multiple parts, how can I present the structures in JSON?
Binary data (generic, function, etc)`
How can I present the type of binary data in JSON?
int32 and int64
How can I present them in JSON, so BSON can know which is 32 bit or 64 bit?
Thanks
As we know JSON cannot express objects so you will need to decide how you want the stringified version of the BSON objects (field types) to be represented within the output of your ocaml driver.
Some of the data types are easy, Timestamp is not needed since it is internal to sharding only and Javascript blocks are best left out due to the fact that they are best used only within system.js as saved functions for use in MRs.
You also gotta consider that some of these fields are actually both in and out. What I mean by in and out is that some are used to specify input documents to be serialised to BSON and some are part of output document that need deserialising from BSON into JSON.
Regex is one which will most likely be a field type you send down. As such you will need to serialise your ocaml object to the BSON equivilant of {$regex: 'd', '$options': 'ig'} from /d/ig PCRE representation.
Dates can be represented in JSON by either choosing to use the ISODate string or a timestamp for the representation. The output will be something like {$sec:556675,$usec:6787} and you can convert $sec to the display you need.
Binary data in JSON can be represented by taking the data (if I remember right) property from the output document and then encoding that to base 64 and storing it as a stirng in the field.
int32 and int64 has no real definition between the two in JSON except that 64bit ints will be bigger than 2147483647 so I am unsure if you can keep the data types unique there.
That should help get you started.

MySQL UDF for working with json?

Are there any good UDFs in MySQL to deal with json data, that supports the ability to retrieve a particular value in json (by dot notation key - EG: json_get('foo.bar.baz')) as well as the ability to set the value of a particular key - EG: json_set('foo.bar.baz', 'value')?
I found http://www.mysqludf.org/lib_mysqludf_json/ - but it seems to only provide the ability to create json data structures from non-json column values, as opposed to interacting with json column values.
This UDF is able to parse JSON and return the value of an attribute:
https://github.com/kazuho/mysql_json
This other one too: https://github.com/webaroo/mysql-json-udf