Related
I'm trying to transform a JSON file generated by the Day One Journal to a text file using Python but hit a brick wall.
This is broadly the format:
{'metadata': {'version': '1.0'},
'entries': [{'richText': '{"meta":{"version":1,"small-lines-removed":true,"created":{"platform":"com.bloombuilt.dayone-mac","version":1344}},"contents":[{"attributes":{"line":{"header":1,"identifier":"F78B28DA-488E-489E-9C95-1A0648099792"}},"text":"2022\\n"},{"attributes":{"line":{"header":0,"identifier":"FA8C6594-F43D-4652-B442-DAF72A379799"}},"text":"\\n"},{"attributes":{"line":{"header":0,"identifier":"0923BCC8-B24A-4C0D-963C-73D09561EECD"}},"text":"It’s the beginning of a new year"},{"embeddedObjects":[{"type":"horizontalRuleLine"}]},{"text":"\\n\\n\\n\\n"},{"embeddedObjects":[{"type":"horizontalRuleLine"}]}]}',
'duration': 0,
'creationOSVersion': '12.1',
'weather': {'sunsetDate': '2022-01-12T16:15:28Z',
'temperatureCelsius': 7,
'weatherServiceName': 'HAMweather',
'windBearing': 230,
'sunriseDate': '2022-01-12T08:00:44Z',
'conditionsDescription': 'Mostly Clear',
'pressureMB': 1042,
'visibilityKM': 48.28020095825195,
'relativeHumidity': 81,
'windSpeedKPH': 6,
'weatherCode': 'clear-night',
'windChillCelsius': 6.699999809265137},
'editingTime': 2925.313938140869,
'timeZone': 'Europe/London',
'creationDeviceType': 'Hal 9000',
'uuid': '988D9D9876624FAEB88F9BCC666FD9CD',
'creationDeviceModel': 'MacBookPro15,2',
'starred': False,
'location': {'region': {'center': {'longitude': -0.0095,
'latitude': 51},
'radius': 75},
'localityName': 'London',
'country': 'United Kingdom',
'timeZoneName': 'Europe/London',
'administrativeArea': 'England',
'longitude': -0.0095,
'placeName': 'Somewhere',
'latitude': 51},
'isPinned': False,
'creationDevice': 'somedevice'...,
}
I only want the 'text' (of which there might be a number of 'text' entries and 'creationDate' so I've got a daily record.
My code to pull out the data is straightforward:
import json
# Opening JSON file
f = open('files/2022.json')
# returns JSON object as
# a dictionary
data = json.load(f)
# Closing file
f.close()
I've tried using list comprensions and then concatenating the Series in Pandas, but two don't match in length - because multiple entries on one day mix up the dataframe.
I wanted to use this code, but:
result = []
for i in data['entries']:
entry = i['creationDate'] + i['text']
result.append(entry)
but I get this error:
KeyError: 'text'
What do I need to do?
Update:
{'richText': '{"meta":{"version":1,"small-lines-removed":true,"created":{"platform":"com.bloombuilt.dayone-mac","version":1344}},"contents":[{"text":"Later than I planned\\n"}]}',
'duration': 0,
'creationOSVersion': '12.1',
'weather': {'sunsetDate': '2022-01-12T16:15:28Z',
'temperatureCelsius': 7,
'weatherServiceName': 'HAMweather',
'windBearing': 230,
'sunriseDate': '2022-01-12T08:00:44Z',
'conditionsDescription': 'Mostly Clear',
'pressureMB': 1042,
'visibilityKM': 48.28020095825195,
'relativeHumidity': 81,
'windSpeedKPH': 6,
'weatherCode': 'clear-night',
'windChillCelsius': 6.699999809265137},
'editingTime': 672.3099998235703,
'timeZone': 'Europe/London',
'creationDeviceType': 'Computer',
'uuid': 'F53DCC5E05BB4106A49C76954117DBF4',
'creationDeviceModel': 'xompurwe',
'isPinned': False,
'creationDevice': 'Computer',
'text': 'Later than I planned \\\n',
'modifiedDate': '2022-01-05T01:01:29Z',
'isAllDay': False,
'creationDate': '2022-01-05T00:39:19Z',
'creationOSName': 'macOS'},
Sort of managed to work a solution - thank you to everyone who helped this morning, particularly #Tomer S.
My solution was:
result = []
for i in data['entries']:
print (i['creationDate'] + i['text'])
result.append(entry)
It still won't get what I want
I am trying to use this colab of this github page to extract the triplet [term, opinion, value] from a sentence from my custom dataset.
Here is an overview of the system architecture:
While I can use the sample offered in the colab and also train the model with my data, I don't know I should re-use this against an unlabeled sample.
If I try to run the colab as-is changing only the test and dev data with unlabeled data, I encounter this error:
DEVICE=0 { "names": "sample", "seeds": [
0 ], "sep": ",", "name_out": "results", "kwargs": {
"trainer__cuda_device": 0,
"trainer__num_epochs": 10,
"trainer__checkpointer__num_serialized_models_to_keep": 1,
"model__span_extractor_type": "endpoint",
"model__modules__relation__use_single_pool": false,
"model__relation_head_type": "proper",
"model__use_span_width_embeds": true,
"model__modules__relation__use_distance_embeds": true,
"model__modules__relation__use_pair_feature_multiply": false,
"model__modules__relation__use_pair_feature_maxpool": false,
"model__modules__relation__use_pair_feature_cls": false,
"model__modules__relation__use_span_pair_aux_task": false,
"model__modules__relation__use_span_loss_for_pruners": false,
"model__loss_weights__ner": 1.0,
"model__modules__relation__spans_per_word": 0.5,
"model__modules__relation__neg_class_weight": -1 }, "root": "aste/data/triplet_data" } { "root": "/content/Span-ASTE/aste/data/triplet_data/sample", "train_kwargs": {
"seed": 0,
"trainer__cuda_device": 0,
"trainer__num_epochs": 10,
"trainer__checkpointer__num_serialized_models_to_keep": 1,
"model__span_extractor_type": "endpoint",
"model__modules__relation__use_single_pool": false,
"model__relation_head_type": "proper",
"model__use_span_width_embeds": true,
"model__modules__relation__use_distance_embeds": true,
"model__modules__relation__use_pair_feature_multiply": false,
"model__modules__relation__use_pair_feature_maxpool": false,
"model__modules__relation__use_pair_feature_cls": false,
"model__modules__relation__use_span_pair_aux_task": false,
"model__modules__relation__use_span_loss_for_pruners": false,
"model__loss_weights__ner": 1.0,
"model__modules__relation__spans_per_word": 0.5,
"model__modules__relation__neg_class_weight": -1 }, "path_config": "/content/Span-ASTE/training_config/aste.jsonnet", "repo_span_model": "/content/Span-ASTE", "output_dir": "model_outputs/aste_sample_c7b00b66bf7ec669d23b80879fda043d", "model_path": "models/aste_sample_c7b00b66bf7ec669d23b80879fda043d/model.tar.gz", "data_name": "sample", "task_name": "aste" }
# of original triplets: 11
# of triplets for current setup: 11
# of original triplets: 7
# of triplets for current setup: 7 Traceback (most recent call last): File "/usr/lib/python3.7/pdb.py", line 1699, in main
pdb._runscript(mainpyfile)
File "/usr/lib/python3.7/pdb.py", line 1568, in _runscript
self.run(statement)
File "/usr/lib/python3.7/bdb.py", line 578, in run
exec(cmd, globals, locals) File "<string>", line 1, in <module>
File "/content/Span-ASTE/aste/main.py", line 1, in <module>
import json
File "/usr/local/lib/python3.7/dist-packages/fire/core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/usr/local/lib/python3.7/dist-packages/fire/core.py", line 468, in
_Fire
target=component.__name__)
File "/usr/local/lib/python3.7/dist-packages/fire/core.py", line 672, in
_CallAndUpdateTrace
component = fn(*varargs, **kwargs) File "/content/Span-ASTE/aste/main.py", line 278, in main
scores = main_single(p, overwrite=True, seed=seeds[i], **kwargs)
File "/content/Span-ASTE/aste/main.py", line 254, in main_single
trainer.train(overwrite=overwrite)
File "/content/Span-ASTE/aste/main.py", line 185, in train
self.setup_data()
File "/content/Span-ASTE/aste/main.py", line 177, in setup_data
data.load()
File "aste/data_utils.py", line 214, in load
opinion_offset=self.opinion_offset,
File "aste/evaluation.py", line 165, in read_inst
o_output = line[2].split() # opinion IndexError: list index out of range Uncaught exception. Entering post mortem debugging Running 'cont' or 'step' will restart the program
> /content/Span-ASTE/aste/evaluation.py(165)read_inst()
-> o_output = line[2].split() # opinion (Pdb)
From my understanding, it seems that it is searching for the labels to start the evaluation. The problem is that I don't have those labels - although I have provided training set with similar data and labels associated.
I am new in deep learning and also allennlp so I am probably missing knowledge. I have tried to solve this for the past 2 weeks but I am still stuck, so here I am.
KeyPi, this is a supervised learning model, it needs labelled data for your text corpus in the form sentence(ex: I charge it at night and skip taking the cord with me because of the good battery life .) followed by '#### #### ####' as a separator and list of labels(include aspect/target word index in first list and the openion token index in the sentence followed by 'POS' for Positive and 'NEG' for negitive.) [([16, 17], [15], 'POS')]
16 and 17- battery life and in index 15, we have openion word "good".
I am not sure if you have figures this out already and find some way to label the corpus.
I found a few SO posts on related issues which were unhelpful. I finally figured it out and here's how to read the contents of a .json file. Say the path is /home/xxx/dnns/test/params.json, I want to turn the dictionary in the .json into a Prolog dictionary:
{
"type": "lenet_1d",
"input_channel": 1,
"output_size": 130,
"batch_norm": 1,
"use_pooling": 1,
"pooling_method": "max",
"conv1_kernel_size": 17,
"conv1_num_kernels": 45,
"conv1_stride": 1,
"conv1_dropout": 0.0,
"pool1_kernel_size": 2,
"pool1_stride": 2,
"conv2_kernel_size": 12,
"conv2_num_kernels": 35,
"conv2_stride": 1,
"conv2_dropout": 0.514948804688646,
"pool2_kernel_size": 2,
"pool2_stride": 2,
"fcs_hidden_size": 109,
"fcs_num_hidden_layers": 2,
"fcs_dropout": 0.8559119274655482,
"cost_function": "SmoothL1",
"optimizer": "Adam",
"learning_rate": 0.0001802763794651928,
"momentum": null,
"data_is_target": 0,
"data_train": "/home/xxx/data/20180402_L74_70mm/train_2.h5",
"data_val": "/home/xxx/data/20180402_L74_70mm/val_2.h5",
"batch_size": 32,
"data_noise_gaussian": 1,
"weight_decay": 0,
"patience": 20,
"cuda": 1,
"save_initial": 0,
"k": 4,
"save_dir": "DNNs/20181203090415_11_created/k_4"
}
To read a JSON file with SWI-Prolog, query
?- use_module(library(http/json)). % to enable json_read_dict/2
?- FPath = '/home/xxx/dnns/test/params.json', open(FPath, read, Stream), json_read_dict(Stream, Dicty).
You'll get
FPath = 'DNNs/test/k_4/model_params.json',
Stream = <stream>(0x7fa664401750),
Dicty = _12796{batch_norm:1, batch_size:32, conv1_dropout:0.
0, conv1_kernel_size:17, conv1_num_kernels:45, conv1_stride:
1, conv2_dropout:0.514948804688646, conv2_kernel_size:12, co
nv2_num_kernels:35, conv2_stride:1, cost_function:"SmoothL1"
, cuda:1, data_is_target:0, data_noise_gaussian:1, data_trai
n:"/home/xxx/Downloads/20180402_L74_70mm/train_2.h5", data
_val:"/home/xxx/Downloads/20180402_L74_70mm/val_2.h5", fcs
_dropout:0.8559119274655482, fcs_hidden_size:109, fcs_num_hi
dden_layers:2, input_channel:1, k:4, learning_rate:0.0001802
763794651928, momentum:null, optimizer:"Adam", output_size:1
30, patience:20, pool1_kernel_size:2, pool1_stride:2, pool2_
kernel_size:2, pool2_stride:2, pooling_method:"max", save_di
r:"DNNs/20181203090415_11_created/k_4", save_initial:0, type
:"lenet_1d", use_pooling:1, weight_decay:0}.
where Dicty is the desired dictionary.
If you want to define this as a predicate, you could do:
:- use_module(library(http/json)).
get_dict_from_json_file(FPath, Dicty) :-
open(FPath, read, Stream), json_read_dict(Stream, Dicty), close(Stream).
Even DEC10 Prolog released 40 years ago could handle JSON just as a normal term . There should be no need for a specialized library or parser for JSON because Prolog can just parse it directly .
?- X={"a":3,"b":"hello","c":undefined,"d":null} .
X = {"a":3, "b":"hello", "c":undefined, "d":null}.
?-
I'm acceessing an open JSON API like this
require 'net/http'
require 'rubygems'
require 'json'
require 'uri'
require 'pp'
url = "http://api.turfgame.com/v4/users"
uri = URI.parse(url)
data = [{"name" => "tbone"}]
headers = {"Content-Type" => "application/json"}
http = Net::HTTP.new(uri.host,uri.port)
response = http.post(uri.path,data.to_json,headers)
This gives a JSON ouput like this
[{"region"=>{"id"=>141, "name"=>"Stockholm"}, "medals"=>[34, 53, 12, 5, 46], "pointsPerHour"=>95, "blocktime"=>24, "zones"=>[275, 42460, 35956, 31926, 24247, 31722, 1097, 26104, 6072, 24283, 289, 325, 22199, 37740, 22198, 37743, 37074, 22845, 22201, 22846, 7477, 7310], "country"=>"se", "id"=>95195, "rank"=>24, "name"=>"tbone", "uniqueZonesTaken"=>178, "taken"=>1170, "points"=>41693, "place"=>377, "totalPoints"=>176654}]
What I want to do is to grab some of the tags:
name (not in the region block but "tbone")
blocktime
totalPoints
all the IDs from the zone-array
and insert into a mysql table. But I don't get how to iterate the JSON object and get the stuff I want.
doing
puts data["name"]
gives an error like
./headerTest.rb:28:in `[]': can't convert String into Integer (TypeError)
from ./headerTest.rb:28:in `<main>'
And I get that it's because there's two name tags but at different depth but i don't get how to accees either one specifically.
Please?
So you have:
result = [{"region"=>{"id"=>141, "name"=>"Stockholm"}, "medals"=>[34, 53, 12, 5, 46], "pointsPerHour"=>95, "blocktime"=>24, "zones"=>[275, 42460, 35956, 31926, 24247, 31722, 1097, 26104, 6072, 24283, 289, 325, 22199, 37740, 22198, 37743, 37074, 22845, 22201, 22846, 7477, 7310], "country"=>"se", "id"=>95195, "rank"=>24, "name"=>"tbone", "uniqueZonesTaken"=>178, "taken"=>1170, "points"=>41693, "place"=>377, "totalPoints"=>176654}]
This is an array with one value. To obtain those values you desire do:
result[0].slice("name", "blocktime", "totalPoints", "zones")
# this returns => {"name"=>"tbone", "blocktime"=>24, "totalPoints"=>176654, "zones"=>[275, 42460, 35956, 31926, 24247, 31722, 1097, 26104, 6072, 24283, 289, 325, 22199, 37740, 22198, 37743, 37074, 22845, 22201, 22846, 7477, 7310]}
I am trying to create JSON data to pass to InfluxDB. I create it using strings but I get errors. What am I doing wrong. I am using json.dumps as has been suggested in various posts.
Here is basic Python code:
json_body = "[{'points':["
json_body += "['appx', 1, 10, 0]"
json_body += "], 'name': 'WS1', 'columns': ['RName', 'RIn', 'SIn', 'OIn']}]"
print("Write points: {0}".format(json_body))
client.write_points(json.dumps(json_body))
The output I get is
Write points: [{'points':[['appx', 1, 10, 0]], 'name': 'WS1', 'columns': ['RName', 'RIn', 'SIn', 'OIn']}]
Traceback (most recent call last):
line 127, in main
client.write_points(json.dumps(json_body))
File "/usr/local/lib/python2.7/dist-packages/influxdb/client.py", line 173, in write_points
return self.write_points_with_precision(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/influxdb/client.py", line 197, in write_points_with_precision
status_code=200
File "/usr/local/lib/python2.7/dist-packages/influxdb/client.py", line 127, in request
raise error
influxdb.client.InfluxDBClientError
I have tried with double quotes too but get the same error. This is stub code (to minimize the solution), I realize in the example the points list contains just one list object but in reality it contains multiple. I am generating the JSON code reading through outputs of various API calls.
json_body = '[{\"points\":['
json_body += '[\"appx\", 1, 10, 0]'
json_body += '], \"name\": \"WS1\", \"columns\": [\"RName\", \"RIn\", \"SIn\", \"OIn\"]}]'
print("Write points: {0}".format(json_body))
client.write_points(json.dumps(json_body))
I understand if I used the below things would work:
json_body = [{ "points": [["appx", 1, 10, 0]], "name": "WS1", "columns": ["Rname", "RIn", "SIn", "OIn"]}]
You don't need to create JSON manually. Just pass an appropriate Python structure into write_points function. Try something like that:
data = [{'points':[['appx', 1, 10, 0]],
'name': 'WS1',
'columns': ['RName', 'RIn', 'SIn', 'OIn']}]
client.write_points(data)
Please visit JSON.org for proper JSON structure. I can see several errors with your self-generated JSON:
The outer-most item can be an unordered object, enclosed by curly braces {}, or an ordered array, enclosed by brackets []. Don't use both. Since your data is structured like a dict, the curly braces are appropriate.
All strings need to be enclosed in double quotes, not single. "This is valid JSON". 'This is not valid'.
Your 'points' value array is surrounded by double brackets, which is unnecessary. Only use a single set.
Please check out the documentation of the json module for details on how to use it. Basically, you can feed json.dumps() your Python data structure, and it will output it as valid JSON.
In [1]: my_data = {'points': ["appx", 1, 10, 0], 'name': "WS1", 'columns': ["RName", "RIn", "SIn", "OIn"]}
In [2]: my_data
Out[2]: {'points': ['appx', 1, 10, 0], 'name': 'WS1', 'columns': ['RName', 'RIn', 'SIn', 'OIn']}
In [3]: import json
In [4]: json.dumps(my_data)
Out[4]: '{"points": ["appx", 1, 10, 0], "name": "WS1", "columns": ["RName", "RIn", "SIn", "OIn"]}'
You'll notice the value of using a Python data structure first: because it's Python, you don't need to worry about single vs. double quotes, json.dumps() will automatically convert them. However, building a string with embedded single quotes leads to this:
In [5]: op_json = "[{'points':[['appx', 1, 10, 0]], 'name': 'WS1', 'columns': ['RName', 'RIn', 'SIn', 'OIn']}]"
In [6]: json.dumps(op_json)
Out[6]: '"[{\'points\':[[\'appx\', 1, 10, 0]], \'name\': \'WS1\', \'columns\': [\'RName\', \'RIn\', \'SIn\', \'OIn\']}]"'
since you fed the string to json.dumps(), not the data structure.
So next time, don't attempt to build JSON yourself, rely on the dedicated module to do it.