Are there Size limits for flutter json.decode? - json

I have a dictionary file with 200,000 items in it.
I have a Dictionary Model which matches the SQLite db and the proper methods.
If I try to parse the whole file, it seems to hang. If I do 8000 items, it seems to do it quite quickly. Is there a size limit, or is just because there might be some corrupted data somewhere? This json was exported from the sqlite db as json pretty, so I would imagine it was done correctly. It also works fine with the first 8000 items.
String peuJson = await getPeuJson();
List<Dictionary> dicts = (json.decode(peuJson) as List)
.map((i) => Dictionary.fromJson(i))
.toList();

JSON is similar to other data formats like XML - if you need to transmit more data, you just send more data. There's no inherent size limitation to the JSON request. Any limitation would be set by the server parsing the request.

Related

Merging and/or Reading 88 JSON Files into Dataframe - different datatypes

I basically have a procedure where I make multiple calls to an API and using a token within the JSON return pass that pack to a function top call the API again to get a "paginated" file.
In total I have to call and download 88 JSON files that total 758mb. The JSON files are all formatted the same way and have the same "schema" or at least should do. I have tried reading each JSON file after it has been downloaded into a data frame, and then attempted to union that dataframe to a master dataframe so essentially I'll have one big data frame with all 88 JSON files read into.
However the problem I encounter is roughly on file 66 the system (Python/Databricks/Spark) decides to change the file type of a field. It is always a string and then I'm guessing when a value actually appears in that field it changes to a boolean. The problem is then that the unionbyName fails because of different datatypes.
What is the best way for me to resolve this? I thought about reading using "extend" to merge all the JSON files into one big file however a 758mb JSON file would be a huge read and undertaking.
Could the other solution be to explicitly set the schema that the JSON file is read into so that it is always the same type?
If you know the attributes of those files, you can define the schema before reading them and create an empty df with that schema so you can to a unionByName with the allowMissingColumns=True:
something like:
from pyspark.sql.types import *
my_schema = StructType([
StructField('file_name',StringType(),True),
StructField('id',LongType(),True),
StructField('dataset_name',StringType(),True),
StructField('snapshotdate',TimestampType(),True)
])
output = sqlContext.createDataFrame(sc.emptyRDD(), my_schema)
df_json = spark.read.[...your JSON file...]
output.unionByName(df_json, allowMissingColumns=True)
I'm not sure this is what you are looking for. I hope it helps

How do I read a Large JSON Array File in PySpark

Issue
I recently encountered a challenge in Azure Data Lake Analytics when I attempted to read in a Large UTF-8 JSON Array file and switched to HDInsight PySpark (v2.x, not 3) to process the file. The file is ~110G and has ~150m JSON Objects.
HDInsight PySpark does not appear to support Array of JSON file format for input, so I'm stuck. Also, I have "many" such files with different schemas in each containing hundred of columns each, so creating the schemas for those is not an option at this point.
Question
How do I use out-of-the-box functionality in PySpark 2 on HDInsight to enable these files to be read as JSON?
Thanks,
J
Things I tried
I used the approach at the bottom of this page:
from Databricks that supplied the below code snippet:
import json
df = sc.wholeTextFiles('/tmp/*.json').flatMap(lambda x: json.loads(x[1])).toDF()
display(df)
I tried the above, not understanding how "wholeTextFiles" works, and of course ran into OutOfMemory errors that killed my executors quickly.
I attempted loading to an RDD and other open methods, but PySpark appears to support only the JSONLines JSON file format, and I have the Array of JSON Objects due to ADLA's requirement for that file format.
I tried reading in as a text file, stripping Array characters, splitting on the JSON object boundaries and converting to JSON like the above, but that kept giving errors about being unable to convert unicode and/or str (ings).
I found a way through the above, and converted to a dataframe containing one column with Rows of strings that were the JSON Objects. However, I did not find a way to output only the JSON Strings from the data frame rows to an output file by themselves. The always came out as
{'dfColumnName':'{...json_string_as_value}'}
I also tried a map function that accepted the above rows, parsed as JSON, extracted the values (JSON I wanted), then parsed the values as JSON. This appeared to work, but when I would try to save, the RDD was type PipelineRDD and had no saveAsTextFile() method. I then tried the toJSON method, but kept getting errors about "found no valid JSON Object", which I did not understand admittedly, and of course other conversion errors.
I finally found a way forward. I learned that I could read json directly from an RDD, including a PipelineRDD. I found a way to remove the unicode byte order header, wrapping array square brackets, split the JSON Objects based on a fortunate delimiter, and have a distributed dataset for more efficient processing. The output dataframe now had columns named after the JSON elements, inferred the schema, and dynamically adapts for other file formats.
Here is the code - hope it helps!:
#...Spark considers arrays of Json objects to be an invalid format
# and unicode files are prefixed with a byteorder marker
#
thanksMoiraRDD = sc.textFile( '/a/valid/file/path', partitions ).map(
lambda x: x.encode('utf-8','ignore').strip(u",\r\n[]\ufeff")
)
df = sqlContext.read.json(thanksMoiraRDD)

Logstash: Handling of large messages

I'm trying to parse a large message with Logstash using a file input, a json filter, and an elasticsearch output. 99% of the time this works fine, but when one of my log messages is too large, I get JSON parse errors, as the initial message is broken up into two partial invalid JSON streams. The size of such messages is about 40,000+ characters long. I've looked to see if there is any information on the size of the buffer, or some max length that I should try to stay under, but haven't had any luck. The only answers I found related to the udp input, and being able to change the buffer size.
Does Logstash has a limit size for each event-message?
https://github.com/elastic/logstash/issues/1505
This could also be similar to this question, but there were never any replies or suggestions: Logstash Json filter behaving unexpectedly for large nested JSONs
As a workaround, I wanted to split my message up into multiple messages, but I'm unable to do this, as I need all the information to be in the same record in Elasticsearch. I don't believe there is a way to call the Update API from logstash. Additionally, most of the data is in an array, so while I can update an Elasticsearch record's array using a script (Elasticsearch upserting and appending to array), I can't do that from Logstash.
The data records look something like this:
{ "variable1":"value1",
......,
"variable30": "value30",
"attachements": [ {5500 charcters of JSON},
{5500 charcters of JSON},
{5500 charcters of JSON}..
...
{8th dictionary of JSON}]
}
Does anyone know of a way to have Logstash process these large JSON messages, or a way that I can split them up and have them end up in the same Elasticsearch record (using Logstash)?
Any help is appreciated, and I'm happy to add any information needed!
If your elasticsearch output has a document_id set, it will update the document (the default action in logstash is to index the data -- which will update the document if it already exists)
In your case, you'd need to include some unique field as part of your json messages and then rely on that to do the merge in elasticsearch. For example:
{"key":"123455","attachment1":"something big"}
{"key":"123455","attachment2":"something big"}
{"key":"123455","attachment3":"something big"}
And then have an elasticsearch output like:
elasticsearch {
host => localhost
document_id => "%{key}"
}

json parsing error for large data in browser

I am trying to perform an export to excel functionality from the data in an html table (5000+ rows). I am using json2.js for parsing the client side data in to json string called as jsonToExport.
The value of this variable is fine for less number of records and it is decoded fine (I checked in the browser in debug mode).
But for large dataset 5000+ records the json parsing/decoding is failing. I can see the encoded string but the decoded value shows:
jsonToExport: unable to decode
I experimented with the data and found that if the data exceeds a particular size then I get this error.
like increasing the column size or replacing large data columns with small length columns, so in effect its not an issue with the data format of encoded json string missing anything since all combination of columns work if the number of columns is limited.
Its definitely not able to decode/parse and then pass the json string in the request if its above a particular size limit.
Is there an Issue with json2.js which does the parsing (I think)?.
I also tried json3.min.js and received the same error.
Unless you're doing old browser support, like i.e. 7, you don't need to use antiquated libraries to parse JSON any longer, it's built in JSON.parse(jsonString)

Grails - Rendered Json file too huge for client side operations

I have a grails controller to render json files as follows, to be further used by d3 on my front end (.gsp file):
def dataSource
def salesjson = {
def sql = new Sql(dataSource)
def rows = sql.rows("select date_hour,mv,device,department,browser,platform,total_revenue as metric, total_revenue_ly as metric_ly from composite")
sql.close()
render rows as JSON
}
I use this file to render my crossfiltered dc-charts on the front end. The problem is, queries such as the one above returns large JSON file / object and my client stops working and hangs. (100MB plus, on the client side, and still loading!!)
I can't think of any alternative to this method, which would reduce my file size (maybe rendering as a csv string? Would that help a lot? If so, how do I go about it? I have about 600,000 rows in my json currently)
What other options do I have?
I'd suggest using something like MessagePack to create more binary like smaller representations of your JSON. There are a few other options out there, but I think this one is probably the most JVM / Javascript friendly.