how to solve Tree.read(): expected u')' but got u'end-of-string' Error ? (NLTK) - nltk

I am trying to import text file then use:
import codecs
import re
from nltk.tree import Tree
string1=codecs.open(r'C:\Users\User\Desktop\Tree2.txt','r','utf-8')
txt2=string1.read()
ptree = ParentedTree.fromstring (txt2)
but I got the following error :
Tree.read(): expected u')' but got u'end-of-string'
Note : there is no error with content of this file becouse the program works
correctly when I treat it as string like :
txt2=unicode("""(S (w) (VP (VERB_PERFECT g) ))""",'utf-8')
ptree = ParentedTree.fromstring (txt2)
Any help?

This worked for me,
I replaced /, (, ), { and } by "" in the tree string. Because it has their own meaning in the chunk tree string.
I had the same error in google colab pasted below:
ValueError: Tree.read(): expected ')' but got 'end-of-string' at index 4233. "...N) '/'')"
Have a nice day!

Related

Selecting random greeting from reading a JSON file using python

I have a JSON file looking like this which i have to randomise so that every time any input comes it shows any random output from the 3 in the json file.
{
"1":"Welcome",
"2":"Hello",
"3":"Hi"
}
I read the JSON file
greeting_template1=readjson(input_file_path+'greeting_template1.json')
and to randomise
greeting_template1 = random.choice(greeting_template1)
But I am getting the error:
greeting_template1 = random.choice(greeting_template1)
File "C:\Users\\AppData\Local\Continuum\anaconda3\envs\lib\random.py", line 262, in choice
return seq[i]
KeyError: 2
Please highlight where I am going wrong
As others have pointed out your JSON is not valid.
Valid json file would be:
{
"1":"Welcome",
"2":"Hello",
"3":"Hi"
}
And the code to get a random would look something like:
import json
import random
with open('greeting_template1.json') as json_file:
data = json.load(json_file)
random_greeting = data[random.choice(list(data))]
The reason you are getting error is because random.choice() needs a sequence as an argument. Parsing a json gives you a python dictionary which is not a sequence.
Your document has 3 JSONs in it, not one. Once you close the initial {, that is your JSON. You need to rewrite to:
{
"1":"Welcome",
"2":"Hello",
"3":"Hi"
}

How to navigate through a json file with Python 3? TypeError: list indices must be integers or slices, not str

I am trying to get as many profile links as I can on khanacademy.org. I am using their api.
I am struggling navigating through the json file to get the desired data.
Here is my code :
from urllib.request import urlopen
import json
with urlopen("https://www.khanacademy.org/api/internal/discussions/video/what-are-algorithms/questions?casing=camel&limit=10&page=0&sort=1&lang=en&_=190422-1711-072ca2269550_1556031278137") as response:
source = response.read()
data= json.loads(source)
for item in data['feedback']:
print(item['authorKaid'])
profile_answers = item['answers']['authorKaid']
print(profile_answers)
My goal is to get as many authorKaid as possible en then store them (to create a database later).
When I run this code I get this error :
TypeError: list indices must be integers or slices, not str
I don't understand why, on this tutorial video : https://www.youtube.com/watch?v=9N6a-VLBa2I at 16:10 it is working.
the issue is item['answers'] are lists and you are trying to access by a string rather than an index value. So when you try to get item['answers']['authorKaid'] there is the error:
What you really want is
print (item['answers'][0]['authorKaid'])
print (item['answers'][1]['authorKaid'])
print (item['answers'][2]['authorKaid'])
etc...
So you're actually wanting to iterate through those lists. Try this:
from urllib.request import urlopen
import json
with urlopen("https://www.khanacademy.org/api/internal/discussions/video/what-are-algorithms/questions?casing=camel&limit=10&page=0&sort=1&lang=en&_=190422-1711-072ca2269550_1556031278137") as response:
source = response.read()
data= json.loads(source)
for item in data['feedback']:
print(item['authorKaid'])
for each in item['answers']:
profile_answers = each['authorKaid']
print(profile_answers)

Solve issue with nested keys in JSON

I am trying to adapt some python code from an awesome guide for dark web scanning/graph creation.
I have thousands of json files created with Onionscan, and I have this code that should wrap everything in a gephi graph. Unfortunately, this code is old, as the Json files are now formatted differently and this code does not work anymore:
code (partial):
import glob
import json
import networkx
import shodan
file_list = glob.glob("C:\\test\\*.json")
graph = networkx.DiGraph()
for json_file in file_list:
with open(json_file,"rb") as fd:
scan_result = json.load(fd)
edges = []
if scan_result('linkedOnions') is not None:
edges.extend(scan_result['linkedOnions'])
In fact, at this point I get "KeyError", because linkedOnions is one-level nested like this:
"identifierReport": {
"privateKeyDetected": false,
"foundApacheModStatus": false,
"serverVersion": "",
"relatedOnionServices": null,
"relatedOnionDomains": null,
"linkedOnions": [many urls here]
could you please help me fix the code above?
I would be VERY grateful :)
Lorenzo
this is the correct way to read nested JSON.
if scan_result['identifierReport']['linkedOnions'] is not None:
edges.extend(scan_result'identifierReport']['linkedOnions'])
Try this it will work for you if your JSON file is correct format
try:
scan_result = json.load(fd)
edges = []
if scan_result('linkedOnions') is not None:
edges.extend(scan_result['linkedOnions'])
except Exception,e:
#print your message or log
print e

Copy content from HTML element and export to text file with Python3.x

I am using a python3.x script to save a string to a text file:
nN = "hello"
f = open("file.txt", "w")
f.write(nN)
f.close()
and now I am trying to parse the content of an h2 element from a website (page scraping works fine) and I am getting an error when I am trying this:
nN = driver.find_element_by_id("title")
f = open("file.txt", "w")
f.write(nN)
f.close()
where the html line is:
<h2 id="title">hello</h2>
The error is:
write() argument must be str, not WebElement
I tried converting the nN into a string using the following:
f.write(str(nN))
and the new error is:
invalid syntax
It looks like you are using Selenium and then using the webdriver to parse the html content?
The reason the string conversion is not working is because the nN is a Selenium/html object that probably is a dictionary or a list. You could try simply f.write(nN.text) and according to the documentation the .text version of nN should work.
To the larger issue of parsing html though, I would recommend using Beautiful Soup. Do pip3 install BeautifulSoup4 and then to import from bs4 import BeautifulSoup. Then as example:
with open('file.html','r') as f:
htmltext = f # change as necessary, just needs to be string
soup = BeautifulSoup(htmltext,'lxml')
h2found = soup.find('h2',id="title")
print(h2found)
print(h2found.text)
Beautiful Soup has great documentation and is the standard and best library to use for parsing html.

Error parsing JSON file in python 3.4

I am trying to load a Json file from a url and parse it on Python3.4 but i get a few errors and I've no idea what they are pointing to. I did verify the json file on the url from jsonlint.com and the file seems fine. The data.read() is returning 'byte' file and i've type casted it. The code is
import urllib.request
import json
inp = input("enter url :")
if len(inp)<1: inp ='http://python-data.dr-chuck.net/comments_42.json'
data=urllib.request.urlopen(inp)
data_str = str(data.read())
print(type(data_str))
parse_data = json.loads(data_str)
print(type(parse_data))
The error that i'm getting is:
The expression str(data.read()) doesn't "cast" your bytes into a string, it just produces a string representation of them. This can be seen if you print data_str: it's a str beginning with b'.
To actually decode the JSON, you need to do data_str = data.read().decode('utf=8')