I need to find certain information from a JSON data set that my company acquired. When I try to import it to a variable via the "fromJSON" method, I get the error listed in the title. The data set contains information for over 16,000 files, so searching for the problem manually just isn't an option (especially since it's JSON, so there are tons of colons). Is there a way in R to find the source, or at least line-number, of the problematic character(s)?
Paste the json here and validate it. It will tell you where the json is invalid.
https://jsonlint.com/
Related
I am trying to upload JSON files to BigQuery. The JSON files are outputs from the Lighthouse auditing tool. I have made some changes them in Python to make field names acceptable for BigQuery and converted the format into newline JSON.
I am now testing this process and I have found that while for many web pages the upload runs without issue, BigQuery is rejecting some of the JSON files. The rejected JSONs always seem to be from the same website, for example, many of the audit JSONs from Topshop have failed on upload (the manipulations in Python run without issue). What I am confused by is that I can see no difference in the formatting/structure of the JSONs which succeed and fail.
I have included some examples here of the JSON files: https://drive.google.com/open?id=1x66PoDeQGfOCTEj4l3VqMIjjhdrjqs9w
The error I get from BigQuery when a JSON fails to load is this:
Error while reading table: build_test_2f38f439_7e9a_4206_ada6_ac393e55b8ec4_source, error message: Failed to parse JSON: No active field found.; ParsedString returned false; Could not parse value; Could not parse value; Could not parse value; Could not parse value; Could not parse value; Could not parse value; Parser terminated before end of string
I have also attempted to upload the failed JSONs to a new table through the interface using the autodetect feature (in an attempt to discover whether the Schema was at fault) and these uploads fail too, with the same error.
This makes me think the JSON files must be wrong, but I have copied them into several different JSON validators which all accept them as one row of valid JSON.
Any help understanding this issue would be much appreciated, thank you!
When you load JSON files to BigQuery, it's good to remember that there are some limitations associated to this format. You can find them here. Even though your files might be valid JSON files, some of them may not comply with BigQuery limitations, so I would recommend you to double check if they are actually correct for BigQuery.
I hope that helps.
I eventually found the error here through a long trial and error process where I uploaded first the first-half and then the second-half of the JSON file to BigQuery. The second-half failed so I split that in half again to see which half the error occurred in. This continued until I found the line.
At a deep level of nesting there was a situation where one field was always a list of strings, but when there were no values associated with the field it appeared as an empty string (rather than an empty list). This inconsistency was causing the error. The trial and error process was long but given the vague error message and that the JSON was thousands of lines long, this seemed like the most efficient way to get there.
I have some data in a csv format. However they are already a string, since i have got them from an HTTP request.
I would like to use Data Frames, in order to view the data.
However i don't know how to parse it, because the CSV package only accepts files, not Strings.
One solution would be to write the content of the String into a file, and then to read it out again. But there has to be a better way!
Use IOBuffer(your_string):
CSV.read(IOBuffer(your_string), DataFrame)
I'm new to the World of triplets :-) I'm trying to use DotNetRDF to load the SOLR searchresult into a Graph using DotNetRDF.
The URL I'm getting data from is:
https://nvv.entryscape.net/store/search?type=solr&query=rdfType:https%5C%3A%2F%2Fnvv.entryscape.net%2Fns%2FDocument+AND+context:https%5C%3A%2F%2Fnvv.entryscape.net%2Fstore%2F1
The format is supposed to be "RDF/JSON". No matter what parser or what I try - I only get "invalid URI". Have tried to load from the URL and also tried downloadning the result to a file and load from file, same error.
I'm using VS2017 and have "nugetted" the latest version of DotNetRdf.
Please help me, what am I missing?
Regards,
Lars Siden
It looks like the JSON being returned by that endpoint is not valid RDF/JSON. It does appear to contain some RDF/JSON fragments but they are wrapped up inside another JSON structure. The RDFJSONParser in dotNetRDF requires that your entire JSON document be a single, valid chunk of RDF/JSON.
The value at resource.children[*].metadata is an RDF/JSON object. So is the value at resource.children[*].info. The rest is wrapper using property names that are not valid IRIs (hence the parser error message).
Unfortunately there is no easy way to skip over the rest of the JSON document and only parse the valid bits. To do that you will need to load the JSON document using Newtonsoft.JSON and then serialize each valid RDF/JSON object you are interested in as a string and load that using the RDFJSONParser's Load(IGraph, TextReader) or Parse(IRdfHandler, TextReader) method.
I have this document which nicely uploads to robomongo but in mlab(mlab.com)it is showing JSON validation error.
Specifically ,"We encountered an error while parsing your JSON. Please check your syntax (e.g. ensure you are using double quotes around both your field names and values) and try again. " is making me nervous.
please check the document here.
That appears to be an array of JSON documents, not a single JSON document, which is what the mLab JSON document editor expects. In other words, an array is not a valid JSON document, even though its elements may be valid JSON documents.
I'm using R to obtain json files from a website. In particular, I'm using the fromJSON function of the RJSONIO package. I'm trying to get json files from 50,000 website links. However, some of them might not be valid API links. Right now, I'm using for loop to get the json files. However, each time I encounter an invalid link, the whole program comes to a halt.
How can I just pass to the next iteration if I encounter invalid API links?
It turns out that using tryCatch at every iteration will do the job.
If it is an invalid URL, some warning messages will be posted but the iteration will continue.