Getting error while converting json file to a data frame using jsonlite - json

I am using the tweetscores package of R to get 'tweets list from twitter. The tweets are stored in json format. While converting it to a data frame I get a lexical error
' Error: lexical error: inside a string, '\' occurs before a character which it may not.".
Any solution to the mentioned error.
A part of the json file text
":[{"text":["MUFC"],"indices":[[83],[88]]}],"symbols":[],"user_mentions":[],"urls":[]},"metadata":{"iso_language_code":["en"],"result_type":["recent"]},"source":["http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>"],"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":[7.32108114527322e+017],"id_str":["732108114527322112"],"name":["wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww(^o^)/"],"screen_name":["SukiSukinal"],"location":["+6222"],"description":["Alliansi osaosi ngevote kagak. katanya sih fans a.k.a + "],"url":null,"entities":{"description":{"urls":[]}},"protected":[false],"followers_count":[163],"friends_count":[107],"listed_count":[4],"created_at":["Mon May 16 07:19:11 +0000

Json format does not allow backslashes so you need to escape them. replace any '\' character found with '\\'. Refer [here][1]
[1]: http://www.json.org/ for more info

You likely have an incomplete json string, which may be caused by the package or by an interrupted connection to Twitter's API. A complete json string returned from Twitter should look something like the following:
which I got using rtweet's stream_tweets() function. With a complete string returned by Twitter's REST or stream API, you should be able to convert the data using basically any json parser (e.g., jsonlite::fromJSON()).

Related

is there any way where we can load deformed json into python object?

i am getting a json data after hitting an API .
when i try to load that json into python using json.loads(response.text), I am getting a delimiter error .
when checked few fields in json dose not have "," separating them.
{
"id":"142379",
"label":"1182_Mailer_PROD",
"location":"Bangalore, India",
"targetType":"HTTPS performance",
"frequency":"15",
"fails":"2764",
"totalUptime":"85.32"
"tests":[
{"date":"09-24-2019 09:31","status":"Could not resolve: mailer.accenture.com (DNS server returned answer with no data)","responseTime":"0.000","dnsTime":"0.000","connectTime":"0.000","redirectTime":"0.000","firstbyteTime":"0.000","lastbyteTime":"0.000","pingLoss":"0.00","pingMin":"0.000","pingAvg":"0.000","pingMax":"0.000","size":"0","md5hash":"(null)"}
]
}
,
{
"id":"158651",
"label":"11883-GR dd-WSP",
"location":"Chicago, IL",
"targetType":"Performance transaction",
"frequency":"15",
"fails":"5919",
"totalUptime":"35.14"
,"tests":[
{"date":"09-24-2019 09:26","status":"Keywords not found - Working","responseTime":"0.669","stepresults":[
{"stepid":"1","date":"09-24-2019 09:26","status":"OK","responseTime":"0.453","dnsTime":"0.000","connectTime":"0.025","redirectTime":"0.264","firstbyteTime":"0.141","lastbyteTime":"0.024","size":"22351","md5hash":"ca002cf662980511a9faa88286f2ee96"},
{"stepid":"2","date":"09-24-2019 09:26","status":"Keywords not found - Working","responseTime":"0.216","dnsTime":"0.000","connectTime":"0.023","redirectTime":"0.000","firstbyteTime":"0.171","lastbyteTime":"0.022","size":"22457","md5hash":"38327404e4f2392979aa7dfa27118f4e"}
]}]
}
This is a small chunk of data from the response , as you could see "totalUptime":"85.32" doesn't have a comma separating it.
could you please let me know how can we load the data into python object even though the json is deformed
Deformed JSON is not JSON, so obviously you can't load it with a standard procedure. There are only two possibilities to load it:
Create your own parser
Modify the input to conform to the JSON standard
Both possibilities need you to define what format do you want to import. If it is OK for your format not to have commas then you have to define what your delimiters are.
From the example you posted is difficult to make any definitive assessment about how the input format is defined. So you probably will have to write a rudimentary parser and approximate it by try and error to the input you are trying to parse.

Azure Logic Apps - invalid json parameter error

Update: the problem was with the file encoding. See answer.
I've got a json payload that is 15.7 MB coming from blob storage. When I pass the output to a ParseJson action I use the json() converter function, but I get this error:
Unable to process template language expressions in action 'Parse_JSON'
inputs at line '1' and column '2792': 'The template language function
'json' parameter is not valid.
Then I took the same json file and stripped it down to 1 KB and tested with the same Logic App and it worked. So is there a size limit for json()?
The problem was that the stream was written with a byte order mark (BOM) added at the start of the text, so it was not recognized as valid JSON. StreamWriter was used to write to the stream with UTF8 encoding. The fix was to not specify the encoding in the constructor, which defaults to an instance of UTF8 without a BOM:
https://learn.microsoft.com/en-us/dotnet/api/system.io.streamwriter?view=netframework-4.7.2#remarks

Import JSON using WITH in Neo4j

I'm trying to use a JSON, to eventually import it into Neo4j.
I use something like, it's a big JSON string:
WITH [
{"fullname":"Full name","note":"f","addr":[],"phone":[],"email":[{"value":"mail#city.com"}],"first_name":"","last_name":""},
..
] AS contacts
The colors of the first contact is mostly orange, then the other contacts become green, then black.
I get the following error:
Invalid input '"': expected whitespace, an identifier, UnsignedDecimalInteger, a property key name or '}'
I can view my JSON file with http://jsonviewer.stack.hu/ And it looks fine
Do I need to escape some kind of character, so that Neo4j understands it?
Edit:
Based on Martins answer, I removed the quotes using a regex in PHP from:
Remove double-quotes from a json_encoded string on the keys
Remove the quotation marks around the keys. The error message tells you that it expects a property key. Cypher does not use JSON here.
WITH [
{fullname:"Full name",note:"f",addr:[],phone:[],
email:[{value:"mail#city.com"}],
first_name:"",last_name:""}
] AS contacts
RETURN contacts
A neo4j driver or client library will handle data passed from dictionary like structures as parameters: https://neo4j.com/docs/developer-manual/current/cypher/#cypher-parameters
If you want to work with JSON and maybe load it from external sources you should have a look at the APOC procedures: https://neo4j-contrib.github.io/neo4j-apoc-procedures/.
This for example converts a JOSN string to a map that can be used in Cypher: https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_from_tojson
CALL apoc.convert.fromJsonMap(
'{"fullname":"Full name","note":"f","addr":[],"phone":[],
"email":[{"value":"mail#city.com"}],"first_name":"","last_name":""}'
)
YIELD value
RETURN value

R - JSON Returned from RCurl::getURL has Special Characters that make it invalid with fromJSON

How can make sure that the result of my getURL() call is properly formatted to be parsed using from JSON?
Details
If I take the string for api URL and paste that into Chrome, then copy and paste out the resulting JSON, RJSONIO::fromJSON() will parse it. However, if I pass the variable test, as in my code below, to fromJSON(), I get this error:
Error in fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input
In going through the differences between the two, I found some issues encoding escaped character sequences such as "\\\"\\\\\\\"" which I am able to search for and replace. However there are some other things, where for example, the broken JSON will show " " while the working JSON will show "\u00A0".
library(RJSONIO)
library(RCurl)
apiURL= #sorry I can't post the actual URL due to company security policies
test<-getURL(apiURL,userpwd="myusername:mypassword",httpauth=1L)
transcripts1 <- fromJSON(test)

Parsing large JSON file with Scala and JSON4S

I'm working with Scala in IntelliJ IDEA 15 and trying to parse a large twitter record json file and count the total number of hashtags. I am very new to Scala and the idea of functional programming. Each line in the json file is a json object (representing a tweet). Each line in the file starts like so:
{"in_reply_to_status_id":null,"text":"To my followers sorry..
{"in_reply_to_status_id":null,"text":"#victory","in_reply_to_screen_name"..
{"in_reply_to_status_id":null,"text":"I'm so full I can't move"..
I am most interested in a property called "entities" which contains a property called "hastags" with a list of hashtags. Here is an example:
"entities":{"hashtags":[{"text":"thewayiseeit","indices":[0,13]}],"user_mentions":[],"urls":[]},
I've browsed the various scala frameworks for parsing json and have decided to use json4s. I have the following code in my Scala script.
import org.json4s.native.JsonMethods._
var json: String = ""
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line
val data = parse(json)
My logic here is that I am trying to read each line from twitter38.json into a string and then parse the entire string with parse(). The parse function is throwing an error claiming:
"Type mismatch, expected: Nothing, found:String."
I have seen examples that use parse() on strings that hold json objects such as
val jsontest =
"""{
|"name" : "bob",
|"age" : "50",
|"gender" : "male"
|}
""".stripMargin
val data = parse(jsontest)
but I have received the same error. I am coming from an object oriented programming background, is there something fundamentally wrong with the way I am approaching this problem?
You have most likely incorrectly imported dependencies to your Intellij project or modules into your file. Make sure you have the following lines imported:
import org.json4s.native.JsonMethods._
Even if you correctly import this module, parse(String: json) will not work for you, because you have incorrectly formed a json. Your json String will look like this:
"""{"in_reply_...":"someValue1"}{"in_reply_...":"someValues2"}"""
but should look as follows to be a valid json that can be parsed:
"""{{"in_reply_...":"someValue1"},{"in_reply_...":"someValues2"}}"""
i.e. you need starting and ending brackets for the json, and a comma between each line of tweets. Please read the json4s documenation for more information.
Although being almost 6 years old, I think this question deserves another try.
JSON format has a few misunderstandings in people's minds, especially how they are stored and how they are read back.
JSON documents, are stored as either a single object having all the other fields, or an array of multiple object possibly in same format. this second part is important because arrays in almost every programming language are defined by angle brackets and values separated by commas (note here I used a person object as my single value):
[
{"name":"John","surname":"Doe"},
{"name":"Jane","surname":"Doe"}
]
also note that everything except brackets, numbers and booleans are enclosed in quotes when written into file.
however, there is another use that is not official but preferred to transfer datasets easily where every object, or document as in nosql/mongo language, are stored in a new line like this:
{"name":"John","surname":"Doe"}
{"name":"Jane","surname":"Doe"}
so for the question, OP has a document written in this second form, but tries an algorithm written to read the first form. following code has few simple changes to achieve this, and the user must read the file knowing that:
var json: String = "["
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line + ","
json=json.splitAt(json.length()-1)._1
json+= "]"
val data = parse(json)
PS: although #sbrannon, has the correct idea, the example he/she gave has mistakenly curly braces instead of angle brackets to surround the data.
EDIT: I have added json=json.splitAt(json.length()-1)._1 because the code above ends with a trailing comma which will cause parse error per the JSON format definition.