Grails - Rendered Json file too huge for client side operations - json

I have a grails controller to render json files as follows, to be further used by d3 on my front end (.gsp file):
def dataSource
def salesjson = {
def sql = new Sql(dataSource)
def rows = sql.rows("select date_hour,mv,device,department,browser,platform,total_revenue as metric, total_revenue_ly as metric_ly from composite")
sql.close()
render rows as JSON
}
I use this file to render my crossfiltered dc-charts on the front end. The problem is, queries such as the one above returns large JSON file / object and my client stops working and hangs. (100MB plus, on the client side, and still loading!!)
I can't think of any alternative to this method, which would reduce my file size (maybe rendering as a csv string? Would that help a lot? If so, how do I go about it? I have about 600,000 rows in my json currently)
What other options do I have?

I'd suggest using something like MessagePack to create more binary like smaller representations of your JSON. There are a few other options out there, but I think this one is probably the most JVM / Javascript friendly.

Related

Are there Size limits for flutter json.decode?

I have a dictionary file with 200,000 items in it.
I have a Dictionary Model which matches the SQLite db and the proper methods.
If I try to parse the whole file, it seems to hang. If I do 8000 items, it seems to do it quite quickly. Is there a size limit, or is just because there might be some corrupted data somewhere? This json was exported from the sqlite db as json pretty, so I would imagine it was done correctly. It also works fine with the first 8000 items.
String peuJson = await getPeuJson();
List<Dictionary> dicts = (json.decode(peuJson) as List)
.map((i) => Dictionary.fromJson(i))
.toList();
JSON is similar to other data formats like XML - if you need to transmit more data, you just send more data. There's no inherent size limitation to the JSON request. Any limitation would be set by the server parsing the request.

Merging and/or Reading 88 JSON Files into Dataframe - different datatypes

I basically have a procedure where I make multiple calls to an API and using a token within the JSON return pass that pack to a function top call the API again to get a "paginated" file.
In total I have to call and download 88 JSON files that total 758mb. The JSON files are all formatted the same way and have the same "schema" or at least should do. I have tried reading each JSON file after it has been downloaded into a data frame, and then attempted to union that dataframe to a master dataframe so essentially I'll have one big data frame with all 88 JSON files read into.
However the problem I encounter is roughly on file 66 the system (Python/Databricks/Spark) decides to change the file type of a field. It is always a string and then I'm guessing when a value actually appears in that field it changes to a boolean. The problem is then that the unionbyName fails because of different datatypes.
What is the best way for me to resolve this? I thought about reading using "extend" to merge all the JSON files into one big file however a 758mb JSON file would be a huge read and undertaking.
Could the other solution be to explicitly set the schema that the JSON file is read into so that it is always the same type?
If you know the attributes of those files, you can define the schema before reading them and create an empty df with that schema so you can to a unionByName with the allowMissingColumns=True:
something like:
from pyspark.sql.types import *
my_schema = StructType([
StructField('file_name',StringType(),True),
StructField('id',LongType(),True),
StructField('dataset_name',StringType(),True),
StructField('snapshotdate',TimestampType(),True)
])
output = sqlContext.createDataFrame(sc.emptyRDD(), my_schema)
df_json = spark.read.[...your JSON file...]
output.unionByName(df_json, allowMissingColumns=True)
I'm not sure this is what you are looking for. I hope it helps

Writing data from MATLAB to firebase database

I am using MATLAB to write data from MATLAB to firebase. I am using following lines of code to do so:
thingSpeakURL = 'https://hybrid-cabinet-265907.firebaseio.com/Ship A/Time Stamp.json';
lat = num2str(42);
lon = num2str(42);
data = struct('lat',lat,'lon',lon);
webwrite(thingSpeakURL,data)
Data is successfully written to Firebase. It is making my original JSON data as a child to a random string been generated on run-time.
For example, my JSON string is {lat: '40',lon:'40'} but instead it is creating a random string, let say, "Mxkkllslsll-1112", making that random string as parent and writing something like {"Mxkkllslsll-1112": lat:'40', lon:'40'} to the firebase database.
Please have a look at following image. It shows that for ship A, I have written data from MATLAB and it is not writing properly(I am facing the problem which I discussed above). I want to make it something like data written for Ship B.
I want to write the data without making any random string as a parent. Kindly assist me in that.
This is because webwrite uses the HTTP POST method by default.
As shown in the Firebase Realtime Database REST API documentation, if you do a POST you will push the data and therefore automatically generate a unique key every time a new child is added to the specified Firebase reference (the -MDJVMk..... value we can see in your question).
You need to use the PUT method.
I don't know matlab but a rapid look at the documentation shows that you need to use the RequestMethod option with a put value, in the weboptions object.
The above pushed me in the right direction (thanks!), and I had success with the following.
CAUTION: The following will overwrite everything in your database!
url = 'https://***.firebaseio.com/.json';
data.users(1) = struct('first','John','last','Locke');
data.users(2) = struct('first','Thomas','last','Hobbes');
data.users(3) = struct('first','Rene','last','Descartes');
headers = {'Content-Type' 'application/json'; 'Accept' 'application/json'};
options = weboptions('RequestMethod', 'put', 'HeaderFields', headers, 'ArrayFormat', 'json');
response = webwrite(url, data, options);
If your data is stored in a .json file (i.e., you don't want to create structures manually in Matlab), you can read it using "fileread" and pass in data as a string (instead of a structure).

How to capture incorrect (corrupt) JSON records in (Py)Spark Structured Streaming?

I have a Azure Eventhub, which is streaming data (in JSON format).
I read it as a Spark dataframe, parse the incoming "body" with from_json(col("body"), schema) where schema is pre-defined. In code it, looks like:
from pyspark.sql.functions import col, from_json
from pyspark.sql.types import *
schema = StructType().add(...) # define the incoming JSON schema
df_stream_input = (spark
.readStream
.format("eventhubs")
.options(**ehConfInput)
.load()
.select(from_json(col("body").cast("string"), schema)
)
And now = if there is some inconsistency between the incoming JSON's schema and the defined schema (e.g. the source eventhub starts sending data in new format without notice), the from_json() functions will not throw an error = instead, it will put NULL to the fields, which are present in my schema definition but not in the JSONs eventhub sends.
I want to capture this information and log it somewhere (Spark's log4j, Azure Monitor, warning email, ...).
My question is: what is the best way how to achieve this.
Some of my thoughts:
First thing I can think of is to have a UDF, which checks for the NULLs and if there is any problem, it raise an Exception. I believe there it is not possible to send logs to log4j via PySpark, as the "spark" context cannot be initiated within the UDF (on the workers) and one wants to use the default:
log4jLogger = sc._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger('PySpark Logger')
Second thing I can think of is to use "foreach/foreachBatch" function and put this check logic there.
But I feel both these approaches are like.. like too much custom - I was hoping that Spark has something built-in for these purposes.
tl;dr You have to do this check logic yourself using foreach or foreachBatch operators.
It turns out I was mistaken thinking that columnNameOfCorruptRecord option could be an answer. It will not work.
Firstly, it won't work due to this:
case _: BadRecordException => null
And secondly due to this that simply disables any other parsing modes (incl. PERMISSIVE that seems to be used alongside columnNameOfCorruptRecord option):
new JSONOptions(options + ("mode" -> FailFastMode.name), timeZoneId.get))
In other words, your only option is to use the 2nd item in your list, i.e. foreach or foreachBatch and handle corrupted records yourself.
A solution could use from_json while keeping the initial body column. Any record with an incorrect JSON would end up with the result column null and foreach* would catch it, e.g.
def handleCorruptRecords:
// if json == null the body was corrupt
// handle it
df_stream_input = (spark
.readStream
.format("eventhubs")
.options(**ehConfInput)
.load()
.select("body", from_json(col("body").cast("string"), schema).as("json"))
).foreach(handleCorruptRecords).start()

Do web2py json returns have extraneous whitespace, if so, how to remove

Just to check, the default JSON view which changes python objects to JSON seems to include whitespace between the variables, i.e.
"field": [[110468, "Octopus_vulgaris", "common octopus"...
rather than
"field":[[110468,"Octopus_vulgaris","common octopus"...
Is that right? If so, is there an easy way to output the JSON without the extra spaces, and is this for any reason (other than readability) a bad idea.
I'm trying to make some API calls return the fastest and most concise JSON representation, so any other tips gratefully accepted. For example, I see the view calls from gluon.serializers import json - does that get re-imported every time the view is used, or is python clever enough to use it once-only. I'm hoping the latter.
The generic.json view calls gluon.serializers.json, which ultimately calls json.dumps from the Python standard library. By default, json.dumps inserts spaces after separators. If you want no spaces, you will not be able to use the generic.json view as is. You can instead do:
import json
output = json.dumps(input, separators=(',', ':'))
If input includes some data that are not JSON serializable and you want to take advantage of the special data type conversions implemented in gluon.serializers.json (i.e., datetime objects and various web2py specific objects), you can do the following:
import json
from gluon.serializers import custom_json
output = json.dumps(input, separators=(',', ':'), default=custom_json)
Using the above, you can either edit the generic.json view, create your own custom JSON view, or simply return the JSON directly from the controller.
Also, no need to worry about re-importing modules in Python -- the interpreter only loads the module once.