How to avoid serialization of postgres json result in Sequel - json

In a web server, I'm running a query in Postgres that performs a select json_build_object of an aggregation(json_agg). I'm using Sequel to fetch this data, and I'm getting as a value an instance of Sequel::Postgres::JSONHash. Then, I perform a to_json in order to send it to the web client.
The output of this query is very big and I think that it could be more performant If I could grab the raw json response of postgres directly and send it to the client, instead of parsing it into JSONHash (which is done by Sequel) and then converting it again to json.
How could I do it?

Cast the json_build_object from json to text in the query: SELECT json_build_object(...)::text ...

Related

How to get json output in nvarchar, in SQL Server 2016

I used this Query to convert the data to JSON
SELECT *FROM tbl_subject FOR JSON AUTO
Iam getting the response as
when i click the response it is opening as XML file
how to change this xml to nvarchar data type
First of all, in SQL Server, JSON is not a data type in itself (XML is), but just a string representation.
What you see is due to how SQL Server Management Studio handles JSON when returned as a resultset. It is NOT xml, SSMS just slaps on an .xml file type, and prettifies the result. If you were to change how results were returned (Tools|Options|Query Results|SQL Server|General), you'd see it something like so:
JSON_F52E2B61-18A1-11d1-B105-00805F49916B
----------------------------------------------------------
[{"RowID":1,"UniversityID":1,"AcademicID":4,"CourseID":1}]
But this is just how SSMS returns result. If you were to execute your statement from an application, the result would be of string data type.
You could also change how you execute the query, to something like so:
DECLARE #nres nvarchar(max) = (SELECT * FROM dbo.tb_Subject FOR JSON AUTO)
SELECT #nres
Hope this helps!

hadoop - Validate json data loaded into hive warehouse

I have json files, volume is approx 500 TB. I have loaded complete set into hive data warehouse.
How would I validate or test the data that was loaded into hive warehouse. What should be my testing strategy ?
Client want us to validate the json data. Whether the data loaded into hive is correct ot not. Is there any miss? If yes, which field it was?
Please help.
How is your data being stored in hive tables ?
One option is create a Hive UDF function that receive the JSON string and validate the data and return another string with the error message or an empty string if the JSON string is well formed.
Here is a Hve UDF tutorial: http://blog.matthewrathbone.com/2013/08/10/guide-to-writing-hive-udfs.html
With the Hive UDF function in place you can executequeries like:
select strjson, validateJson(strjson) from jsonTable where validateJson(strjson) != "";

Apache Drill Query PostgreSQL Json

I am trying to query a jsonb field in PostgreSQL in drill and read it as if were coming from a json storage type but am running into trouble. I can conver from text to json but cannot seem to query the json object. At least I think I can convert to JSON. My goal is to avoid reading through millions of uneven json objects from PostgreSQL, perform joins and things with text files such as CSV files and XML files. Is there a way to query the text field as if it were coming from a json storage type without writing large files to disk?
The goal is to generate results implicitly which PostgreSQL nor Pentaho do and integrate these data sets with others of any format.
Attempt:
SELECT * FROM (SELECT convert_to(json_field,'JSON') as data FROM postgres.mytable) as q1
Sample Result:
[B#7106dd
Attempt to existing field that should be in any json object:
SELECT data[field] FROM (SELECT convert_to(json_field,'JSON') as data FROM postgres.mytable) as q1
Result:
null
Attempting to do anything with jsonb results in a Null Pointer Error.

Spark SQL on Postgresql JSONB data

The current Postgresql version (9.4) supports json and jsonb data type as described in http://www.postgresql.org/docs/9.4/static/datatype-json.html
For instance, JSON data stored as jsonb can be queried via SQL query:
SELECT jdoc->'guid', jdoc->'name'
FROM api
WHERE jdoc #> '{"company": "Magnafone"}';
As a Sparker user, is it possible to send this query into Postgresql via JDBC and receive the result as DataFrame?
What I have tried so far:
val url = "jdbc:postgresql://localhost:5432/mydb?user=foo&password=bar"
val df = sqlContext.load("jdbc",
Map("url"->url,"dbtable"->"mydb", "driver"->"org.postgresql.Driver"))
df.registerTempTable("table")
sqlContext.sql("SELECT data->'myid' FROM table")
But sqlContext.sql() was unable to understand the data->'myid' part in the SQL.
It is not possible to query json / jsonb fields dynamically from Spark DataFrame API. Once data is fetched to Spark it is converted to string and is no longer a queryable structure (see: SPARK-7869).
As you've already discovered you can use dbtable / table arguments to pass a subquery directly to the source and use it to extract fields of interest. Pretty much the same rule applies to any non-standard type, calling stored procedures or any other extensions.

How to serialise a spark sql row when it contains a comma

I am using Spark Jobserver https://github.com/spark-jobserver/spark-jobserver and Apache Spark for some analytic processing.
I am receiving back the following structure from jobserver when a job finishes
"status": "OK",
"result": [
"[17799.91015625,null,hello there how areyou?]",
"[50000.0,null,Hi, im fine]",
"[0.0,null,All good]"
]
The result doesnt contain valid json, as explained here:
https://github.com/spark-jobserver/spark-jobserver/issues/176
So I'm trying to convert the returned structure into a json structure, however I cant simply make the result string insert ' (single quotes) based on the comma delimiter, as sometimes the result contains a comma itself.
How can i convert a spark Sql row into a json object in the above situation?
I actually found a better way in the end,
from 1.3.0 onwards you can use .toJSON on a Dataframe to convert it to json
df.toJSON.collect()
to output a dataframes schema to json you can use
df.schema.json