I am trying to adapt some python code from an awesome guide for dark web scanning/graph creation.
I have thousands of json files created with Onionscan, and I have this code that should wrap everything in a gephi graph. Unfortunately, this code is old, as the Json files are now formatted differently and this code does not work anymore:
code (partial):
import glob
import json
import networkx
import shodan
file_list = glob.glob("C:\\test\\*.json")
graph = networkx.DiGraph()
for json_file in file_list:
with open(json_file,"rb") as fd:
scan_result = json.load(fd)
edges = []
if scan_result('linkedOnions') is not None:
edges.extend(scan_result['linkedOnions'])
In fact, at this point I get "KeyError", because linkedOnions is one-level nested like this:
"identifierReport": {
"privateKeyDetected": false,
"foundApacheModStatus": false,
"serverVersion": "",
"relatedOnionServices": null,
"relatedOnionDomains": null,
"linkedOnions": [many urls here]
could you please help me fix the code above?
I would be VERY grateful :)
Lorenzo
this is the correct way to read nested JSON.
if scan_result['identifierReport']['linkedOnions'] is not None:
edges.extend(scan_result'identifierReport']['linkedOnions'])
Try this it will work for you if your JSON file is correct format
try:
scan_result = json.load(fd)
edges = []
if scan_result('linkedOnions') is not None:
edges.extend(scan_result['linkedOnions'])
except Exception,e:
#print your message or log
print e
Related
So I've been beating my head against a wall for days now and have been diving down the google/SO rabbit hole in search of answers. I've been debating on how to phrase this question as the API that I am pulling from, may or may not contain some sensitive information that gets uncomfortably close to HIPPA laws for my liking. For that reason I will not be providing the direct link/auth for the my code. That being said I will be providing a made up JSON script to help with the explaining.
import requests
import json
import urllib3
r = requests.get('https://madeup.url.com/api/vi/information here', auth=('123456789', '1111111111222222222223333333333444444455555555'))
payload = {'query': 'firstName'}
response = requests.get(r, params=payload)
json_response = response.json()
print(json.dumps(json_response))
The JSON file that I'm trying to parse looks in part like this:
"{\"id\": 123456789, \"firstName\": \"NAME\", \"lastName\": \"NAME\", \"phone\": \"NUMBER\", \"email\": \"EMAIL#gmail.com\", \"date\": \"December 16, 2021\", \"time\": \"9:50am\", \"endTime\": \"10:00am\",.....
When I run the code I am getting a "urllib3.exceptions.LocationParseError: Failed to parse: <Response [200]>" traceback and I can not for the life of me figure out what is going on. urllib3 is installed and updated according to the console.
Any help would be much appreciated. TIA
That is not a JSON file. It is a string containing escaped characters. It needs to be unescaped before parsing can work.
youre passing r to requests.get() (line 9) , but r is a response to another requests.get() (line 5)... shouldn't you be passing params=payload in line 5? then getting de response from there, in one single request
import requests
import json
import urllib3
payload = {'query': 'firstName'}
response = requests.get('{YOUR_URL}', auth=('{USER}', '{PASS}'), params=payload)
json_response = response.json()
print(json.dumps(json_response))
That is not a JSON file. It is a string containing escaped characters. It needs to be unescaped before parsing can work.
Well now I'm even more confused. I'm trying to self teach myself python and clearly struggling. To get the "JSON" I posted I used the following code:
r = requests.get('URL', 'auth = ('user', 'pass'))
Data = r.json()
packages_str = json.dumps(Data[0])
with open('Data.json', 'w') as f:
json.dump(packages_str, f)
So basically I'm even more lost now...
Okay, update: Good news! kinda... so my code now reads as follows;
import requests
import json
import urllib3
payload = {
'query1'= 'firstName',
'query2'= 'lastName'
}
response = requests.get("url", auth= ("user","pass"), params=payload)
Data = response.json()
packages_str = json.dumps(Data, ensure_ascii=False, indent=2)
with open('Data.json), 'w') as f:
json.dump(packages_str,f)
f.write(packages_str)
And when I then open the JOSN file, the first line of is the entire API in a string but below that, is a properly formatted JSON file. Unfortunately its the entire API and not a parsed JSON file looking for the the information That I need...
Continuing down the google/youtube/SO rabbit hole and will update at a later date if i find a work around.
I have the file log.txt with following data:
{"__TIMESTAMP":"2020-07-09T19:05:20.858013","__LABEL":"web_channel","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"Port web_channel/diagnose_client not connected!"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229737","__LABEL":"context_logging_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"startup component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229761","__LABEL":"context_logging_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"activate component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229793","__LABEL":"context_monitoring_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"startup component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229805","__LABEL":"context_monitoring_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"activate component"}
If I define a single row, I can convert in real JSON type:
import json
import datetime
from json import JSONEncoder
log = {
"__TIMESTAMP":"2020-07-09T19:05:21.229737",
"__LABEL":"context_logging_addon",
"__LEVEL":4,
"__DIAGNOSE_SLOT":"",
"msg":"Port web_channel/diagnose_client not connected!"}
class DateTimeEncoder(JSONEncoder):
#Override the default method
def default(self,obj):
if isinstance(obj,(datetime.date,datetime.datetime)):
return obj.isoformat()
print("Printing to check how it will look like")
print(DateTimeEncoder().encode(log))
I have the following output, which format is perfect JSON.
Printing to check how it will look like
{"__TIMESTAMP": "2020-07-09T19:05:21.229737", "__LABEL": "context_logging_addon", "__LEVEL": 4, "__DIAGNOSE_SLOT": "", "msg": "Port web_channel/diagnose_client not connected!"}
But I don't know how should I open the log.txt file, read the data to convert into JSON without any failure.
Could you help me please? Thanks in advance.
Let us say your log.txt file is in the same directory than your .py file.
Just open it with with open(... and then parse your file according to your syntax to create a list of dictionaries (each item corresponding to a row, then parse each dictionary as you're currently doing).
Here is how you could open and parse your file:
with open("log.txt","r") as file:
all_text = file.readlines()
parsed_line = list()
for text in all_text:
parsed_line.append(dict([item.split('":"') for item in text[2:-2].split('","')]))
If you have any question about the parsing let me know. This one is pretty straightforward.
Hope this helped you.
Try it this way:
logs = """[your log file above]"
for log in logs.splitlines():
print(DateTimeEncoder().encode(log))
Output:
"{\"__TIMESTAMP\":\"2020-07-09T19:05:20.858013\",\"__LABEL\":\"web_channel\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"Port web_channel/diagnose_client not connected!\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229737\",\"__LABEL\":\"context_logging_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"startup component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229761\",\"__LABEL\":\"context_logging_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"activate component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229793\",\"__LABEL\":\"context_monitoring_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"startup component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229805\",\"__LABEL\":\"context_monitoring_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"activate component\"}"
I searched for all similar questions and yet couldn't resolve below issue.
Here's my json file content:
https://objectsoftconsultants.s3.amazonaws.com/tweets.json
Code to get a particular element is as below:
import json
testsite_array = []
with open('tweets.json') as json_file:
testsite_array = json_file.readlines()
for text in testsite_array:
json_text = json.dumps(text)
resp = json.loads(json_text)
print(resp["created_at"])
Keep getting below error:
print(resp["created_at"])
TypeError: string indices must be integers
Thanks much for your time and help, well in advance.
I have to guess what you're trying to do and can only hope that this will help you:
with open('tweets.json') as f:
tweets = json.load(f)
print(tweets['created_at'])
It doesn't make sense to read a json file with readlines, because it is unlikely that each line of the file represents a complete json document.
Also I don't get why you're dumping the string only to load it again immediately.
Update:
Try this to parse your file line by line:
with open('tweets.json') as f:
lines = f.readlines()
for line in lines:
try:
tweet = json.loads(line)
print(tweet['created_at'])
except json.decoder.JSONDecodeError:
print('Error')
I want to point out however, that I do not recommend this approach. A file should contain only one json document. If the file does not contain a valid json document, the source for the file should be fixed.
I am trying to get as many profile links as I can on khanacademy.org. I am using their api.
I am struggling navigating through the json file to get the desired data.
Here is my code :
from urllib.request import urlopen
import json
with urlopen("https://www.khanacademy.org/api/internal/discussions/video/what-are-algorithms/questions?casing=camel&limit=10&page=0&sort=1&lang=en&_=190422-1711-072ca2269550_1556031278137") as response:
source = response.read()
data= json.loads(source)
for item in data['feedback']:
print(item['authorKaid'])
profile_answers = item['answers']['authorKaid']
print(profile_answers)
My goal is to get as many authorKaid as possible en then store them (to create a database later).
When I run this code I get this error :
TypeError: list indices must be integers or slices, not str
I don't understand why, on this tutorial video : https://www.youtube.com/watch?v=9N6a-VLBa2I at 16:10 it is working.
the issue is item['answers'] are lists and you are trying to access by a string rather than an index value. So when you try to get item['answers']['authorKaid'] there is the error:
What you really want is
print (item['answers'][0]['authorKaid'])
print (item['answers'][1]['authorKaid'])
print (item['answers'][2]['authorKaid'])
etc...
So you're actually wanting to iterate through those lists. Try this:
from urllib.request import urlopen
import json
with urlopen("https://www.khanacademy.org/api/internal/discussions/video/what-are-algorithms/questions?casing=camel&limit=10&page=0&sort=1&lang=en&_=190422-1711-072ca2269550_1556031278137") as response:
source = response.read()
data= json.loads(source)
for item in data['feedback']:
print(item['authorKaid'])
for each in item['answers']:
profile_answers = each['authorKaid']
print(profile_answers)
I'm trying to convert an http JSON response into a DataFrame, then out to CSV file.
I'm struggling with the JSON into DF.
http line:
http://api.kraken.com/0/public/OHLC?pair=XXBTZEUR&interval=1440
JSON response (part of - 720 records in arrays):
[formatted using a JSON site does not post here apparently]
{
"error": [],
"result": {
"XXBTZEUR": [
[1486252800, "959.7", "959.7", "935.0", "943.6", "945.6", "4423.72544809", 5961],
[1486339200, "943.8", "959.7", "940.0", "952.9", "953.5", "4464.48492401", 7678],
[1486425600, "953.6", "990.0", "952.7", "988.5", "977.3", "8123.94462701", 10964],
[1486512000, "988.4", "1000.1", "963.3", "987.5", "983.7", "10989.31074845", 16741],
[1486598400, "987.4", "1007.4", "847.9", "926.4", "934.5", "22530.11626076", 52668],
[1486684800, "926.4", "949.0", "886.0", "939.7", "916.7", "11173.53504917", 12588],
],
"last": 1548288000
}
}
I get
KeyError: 'XXBTZEUR'
on the json_normalize line. Seems to indicate to me that json_normalize is trying to build the DF from the "XXBTZEUR" level, not lower down at the record level. How do I get json_normalize to read the records instead. ie How do I get it to reference deep enough?
I have read several other posts on this site without understanding what I'm doing wrong.
One post mentions that json.loads() must be used. Is json_string.json() also loading the JSON object or do I need the json.loads() instead?
Also tried variations of json_normalize:
BTCEUR_Daily_Table = json_normalize(json_data[[]])
TypeError: unhashable type: 'list'
Can normalize not load an array into a DF line?
code so far:
BTCEUR_Daily_URL = 'http://api.kraken.com/0/public/OHLC?pair=XXBTZEUR&interval=1440'
json_string = requests.get(BTCEUR_Daily_URL)
json_data = json_string.json()
BTCEUR_Daily_Table = json_normalize(json_data, record_path=["XXBTZEUR"])
What I need in result:
In my DF, I just want the arrayed records shown in the "body" of the JSON structure. None of the header & footer are needed.
The solution I found was:
BTCEUR_Daily_Table = json_normalize(data=json_data, record_path=[['result','XXBTZEUR']])
The 2nd parameter specifies the full "path" to the parent label of the records.
Apparently double brackets are needed to specify a full path, otherwise the 2 labels are taken to mean 2 top level names.
Without another post here, I would never have found the solution.