JSON Webscraper returning empty array - json

Im trying to parse a website using the following code:
import requests
r = requests.get('https://www.finn.no/realestate/homes/search.html?sort=PUBLISHED_DESC')
print(r.json())
However, it appears that it just returns an empty array.
I tried putting it in a dict and catching the response using
import sys, json
struct = {}
try:
dataform = str(r).strip("'<>() ").replace('\'', '\"')
struct = json.loads(dataform)
except:
print(repr(r))
print(sys.exc_info())
struct
And the code returns:
<Response [200]>
(<class 'json.decoder.JSONDecodeError'>, JSONDecodeError('Expecting value: line 1 column 1 (char 0)') ....

Now you're trying to treat HTML document as Json, so obviously thats not what you want. The page's Json data is embedded inside one <script> element so you can use beautifulsoup to locate it and json module to parse it:
import json
import requests
from bs4 import BeautifulSoup
r = requests.get(
"https://www.finn.no/realestate/homes/search.html?sort=PUBLISHED_DESC"
)
soup = BeautifulSoup(r.content, "html.parser")
data = soup.select_one("#__NEXT_DATA__")
data = json.loads(data.text)
# pretty print the data:
print(json.dumps(data, indent=4))
Prints:
{
"props": {
"pageProps": {
"search": {
"docs": [
{
"type": "realestate",
"ad_id": 276867609,
"main_search_key": "SEARCH_ID_REALESTATE_NEWBUILDINGS",
"heading": "Unik anledning! \u00d8nsker du \u00e5 bo med \"leilighetsf\u00f8lelse\" rett ved bysentrum, og likevel ha plass til storfamilien?",
"location": "Fauchaldsgate 2, Gj\u00f8vik",
"image": {
"url": "https://images.finncdn.no/dynamic/default/2022/11/vertical-0/22/9/276/867/609_1157963344.jpg",
"path": "2022/11/vertical-0/22/9/276/867/609_1157963344.jpg",
"height": 1280,
"width": 1920,
"aspect_ratio": 1.5
},
...and so on.

Related

Parsing nested json to find value of an object

I am trying to find the value of statement object in below json file.
I am trying below code but its erroring out.
import cx_Oracle
import json
import pandas as pd
f = open('test.json')
records = json.load(f)
pd.json_normalize(records, record_path = [‘ddl’], meta =['op_type', 'op_ts','pos','xid'])
Above gives SyntaxError: invalid character '‘' (U+2018).
test.json is as follows:
[
{
"op_type": "DDL",
"op_ts": "2023-02-16T05:30:04.000Z",
"pos": "G-AQAAAMQNAAAAAAAAAAAAAAAAAAAGAAoAAA==8057790.6.13.989",
"xid": "0.6.13.989",
"ddl": {
"object": {
"catalog": "",
"schema": "TKGGU1",
"object": "SRCTAB2"
},
"statement": "create table srctab2"
}
}
]
You're using the wrong quotes:
pd.json_normalize(records, record_path = [‘ddl’], ...
^ ^
| |
Don't use ‘ and ’ but use ' instead.
import cx_Oracle
import json
import pandas as pd
f = open('test.json')
records = json.load(f)
pd.json_normalize(records, record_path = ['ddl'], meta =['op_type', 'op_ts', 'pos', 'xid'])

Python - parsing JSON and f string

I am trying to write a small JSON script that parses JSON files. I need to include multiple variables in the code but currently, I'm stuck since f string does not seem to be working as I expected. Here is an example code:
import json
test = 10
json_data = f'[{"ID": {test},"Name":"Pankaj","Role":"CEO"}]'
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)
The above code returns an error:
json_data = f'[{"ID": { {test} },"Name":"Pankaj","Role":"CEO"}]'
ValueError: Invalid format specifier
Could you, please let me know how can I add variables to the JSON?
Thank you.
You can put extra{ and } to your string:
import json
test = 10
json_data = f'[{{"ID": {test},"Name":"Pankaj","Role":"CEO"}}]'
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)
Prints:
[
{
"ID": 10,
"Name": "Pankaj",
"Role": "CEO"
}
]

Trying to parse access.log

Good afternoon, I'm trying to find the top 10 ip in access.log (standard log of the Apache server).
There is a code like this:
import argparse
import json
import re
from collections import defaultdict, Counter
parser = argparse.ArgumentParser(description='parser script')
parser.add_argument('-f', dest='logfile', action='store', default='access.log')
args = parser.parse_args()
regul_ip = (r"^(?P<ips>.*?)")
regul_method = (r"\"(?P<request_method>GET|POST|PUT|DELETE|HEAD)")
def req_by_method():
dict_ip = defaultdict(lambda: {"GET": 0, "POST": 0, "PUT": 0, "DELETE": 0, "HEAD": 0})
with open(args.logfile) as file:
for index, line in enumerate(file.readlines()):
try:
ip = re.search(regul_ip, line).group()
method = re.search(regul_method, line).groups()[0]
return Counter(dict_ip).most_common(10)
except AttributeError:
pass
dict_ip[ip][method] += 1
print(json.dumps(dict_ip, indent=4))
with open("final_log.json", "w") as jsonfile:
json.dump(dict_ip, jsonfile, indent=5)
When the code is executed, I only get: []
How can I fix this code to make it work?
I also need to output to the final json file a set of such lines: "ip", "method", "status code", "url" and the duration of the request

After updating json file with groovy, the file data contains extra curly brackets and "content" object

This is sort of data I got in my json file
{"globals":{"code":"1111","country_code":"8888","hits":80,"extra_hit":1,"keep_money":true},"time_window":{"from":"2020.12.14 08:40:00","to":"2020.12.14 08:45:00"},"car":{"have":"nope"}}
After I run it through this groovy code in jmeter:
import groovy.json.JsonSlurper
import groovy.json.JsonBuilder
import groovy.json.JsonOutput
def jsonSlurper = new JsonSlurper().parse(new File("C:/pathToFile/test.json"))
log.info(jsonSlurper.toString())
jsonSlurper.globals.hits = 70
jsonSlurper.time_window.from = "2020.12.14 08:42:00"
jsonSlurper.time_window.to = "2020.12.14 08:48:00"
def builder = new JsonBuilder(jsonSlurper)
log.info(builder.toString())
def json_str = JsonOutput.toJson(builder)
def json_beauty = JsonOutput.prettyPrint(json_str)
log.info(json_beauty.toString())
File file = new File("C:/pathToFile/test.json")
file.write(json_beauty)
the json file is updated, but all data are wrapped in new object "content"
"content": {
"globals": {
"code":"1111",
"country_code": "8888",
"hits": 70,
"extra_hit": 1,
"keep_money": true
},
"time_window": {
"from": "2020.12.14 08:42:00",
"to": "2020.12.14 08:48:00"
},
"car": {
"have": "nope"
}
}
}
How to avoid that wrapping into "content" object?
Copying and pasting the code from Internet without having any idea what it is doing is not the best way to proceed, at some point you will end up running a Barmin's patch
My expectation is that you're looking for JsonBuilder.toPrettyString() function so basically everything which goes after this line:
def builder = new JsonBuilder(jsonSlurper)
can be replaced with:
new File("C:/pathToFile/test.json").text = builder.toPrettyString()
More information:
Apache Groovy: Parsing and producing JSON
Apache Groovy - Why and How You Should Use It

Working with JSON and Django

I am new to Python and Django. I am an IT professional that deploys software that monitors computers. The api outputs to JSON. I want to create a Django app that reads the api and outputs the data to an html page. Where do I get started? I think the idea is to write the JSON feed to a Django model. Any help/advice is greatly appreciated.
Here's a simple single file to extract the JSON data:
import urllib2
import json
def printResults(data):
theJSON = json.loads(data)
for i in theJSON[""]
def main():
urlData = ""
webUrl = urllib2.urlopen(urlData)
if (webUrl.getcode() == 200):
data = webUrl.read()
printResults(data)
else:
print "Received error"
if __name__ == '__main__':
main()
If you have an URL returning a json as response, you could try this:
import requests
import json
url = 'http://....' # Your api url
response = requests.get(url)
json_response = response.json()
Now json_response is a list containing dicts. Let's suppose you have this structure:
[
{
'code': ABC,
'avg': 14.5,
'max': 30
},
{
'code': XYZ,
'avg': 11.6,
'max': 21
},
...
]
You can iterate over the list and take every dict into a model.
from yourmodels import CurrentModel
...
for obj in json_response:
cm = CurrentModel()
cm.avg = obj['avg']
cm.max = obj['max']
cm.code = obj['code']
cm.save()
Or you could use a bulk method, but keep in mind that bulk_create does not trigger save method.