dumping list to JSON file creates list within a list [["x", "y","z"]], why? - json

I want to append multiple list items to a JSON file, but it creates a list within a list, and therefore I cannot acces the list from python. Since the code is overwriting existing data in the JSON file, there should not be any list there. I also tried it by having just an text in the file without brackets. It just creates a list within a list so [["x", "y","z"]] instead of ["x", "y","z"]
import json
filename = 'vocabulary.json'
print("Reading %s" % filename)
try:
with open(filename, "rt") as fp:
data = json.load(fp)
print("Data: %s" % data)#check
except IOError:
print("Could not read file, starting from scratch")
data = []
# Add some data
TEMPORARY_LIST = []
new_word = input("give new word: ")
TEMPORARY_LIST.append(new_word.split())
print(TEMPORARY_LIST)#check
data = TEMPORARY_LIST
print("Overwriting %s" % filename)
with open(filename, "wt") as fp:
json.dump(data, fp)
example and output with appending list with split words:
Reading vocabulary.json
Data: [['my', 'dads', 'house', 'is', 'nice']]
give new word: but my house is nicer
[['but', 'my', 'house', 'is', 'nicer']]
Overwriting vocabulary.json

So, if I understand what you are trying to accomplish correctly, it looks like you are trying to overwrite a list in a JSON file with a new list created from user input. For easiest data manipulation, set up your JSON file in dictionary form:
{
"words": [
"my",
"dad's",
"house",
"is",
"nice"
]
}
You should then set up functions to separate your functionality to make it more manageable:
def load_json(filename):
with open(filename, "r") as f:
return json.load(f)
Now, we can use those functions to load the JSON, access the words list, and overwrite it with the new word.
data = load_json("vocabulary.json")
new_word = input("Give new word: ").split()
data["words"] = new_word
write_json("vocabulary.json", data)
If the user inputs "but my house is nicer", the JSON file will look like this:
{
"words": [
"but",
"my",
"house",
"is",
"nicer"
]
}
Edit
Okay, I have a few suggestions to make before I get into solving the issue. Firstly, it's great that you have delegated much of the functionality of the program over to respective functions. However, using global variables is generally discouraged because it makes things extremely difficult to debug as any of the functions that use that variable could have mutated it by accident. To fix this, use method parameters and pass around the data accordingly. With small programs like this, you can think of the main() method as the point in which all data comes to and from. This means that the main() function will pass data to other functions and receive new or edited data back. One final recommendation, you should only be using all capital letters for variable names if they are going to be constant. For example, PI = 3.14159 is a constant, so it is conventional to make "pi" all caps.
Without using global, main() will look much cleaner:
def main():
choice = input("Do you want to start or manage the list? (start/manage)")
if choice == "start":
data = load_json()
words = data["words"]
dictee(words)
elif choice == "manage":
manage_list()
You can use the load_json() function from earlier (notice that I deleted write_json(), more on that later) if the user chooses to start the game. If the user chooses to manage the file, we can write something like this:
def manage_list():
choice = input("Do you want to add or clear the list? (add/clear)")
if choice == "add":
words_to_add = get_new_words()
add_words("vocabulary.json", words_to_add)
elif choice == "clear":
clear_words("vocabulary.json")
We get the user input first and then we can call two other functions, add_words() and clear_words():
def add_words(filename, words):
with open(filename, "r+") as f:
data = json.load(f)
data["words"].extend(words)
f.seek(0)
json.dump(data, f, indent=4)
def clear_words(filename):
with open(filename, "w+") as f:
data = {"words":[]}
json.dump(data, f, indent=4)
I did not utilize the load_json() function in the two functions above. My reasoning for this is because it would call for opening the file more times than needed, which would hurt performance. Furthermore, in these two functions, we already need to open the file, so it is okayt to load the JSON data here because it can be done with only one line: data = json.load(f). You may also notice that in add_words(), the file mode is "r+". This is the basic mode for reading and writing. "w+" is used in clear_words(), because "w+" not only opens the file for reading and writing, it overwrites the file if the file exists (that is also why we don't need to load the JSON data in clear_words()). Because we have these two functions for writing and/or overwriting data, we don't need the write_json() function that I had initially suggested.
We can then add to the list like so:
>>> Do you want to start or manage the list? (start/manage)manage
>>> Do you want to add or clear the list? (add/clear)add
>>> Please enter the words you want to add, separated by spaces: these are new words
And the JSON file becomes:
{
"words": [
"but",
"my",
"house",
"is",
"nicer",
"these",
"are",
"new",
"words"
]
}
We can then clear the list like so:
>>> Do you want to start or manage the list? (start/manage)manage
>>> Do you want to add or clear the list? (add/clear)clear
And the JSON file becomes:
{
"words": []
}
Great! Now, we implemented the ability for the user to manage the list. Let's move on to creating the functionality for the game: dictee()
You mentioned that you want to randomly select an item from a list and remove it from that list so it doesn't get asked twice. There are a multitude of ways you can accomplish this. For example, you could use random.shuffle:
def dictee(words):
correct = 0
incorrect = 0
random.shuffle(words)
for word in words:
# ask word
# evaluate response
# increment correct/incorrect
# ask if you want to play again
pass
random.shuffle randomly shuffles the list around. Then, you can iterate throught the list using for word in words: and start the game. You don't necessarily need to use random.choice here because when using random.shuffle and iterating through it, you are essentially selecting random values.
I hope this helped illustrate how powerful functions and function parameters are. They not only help you separate your code, but also make it easier to manage, understand, and write cleaner code.

Related

It's a bad design to try to print classes' variable name and not value (eg. x.name print "name" instead of content of name)

The long title contain also a mini-exaple because I couldn't explain well what I'm trying to do. Nonethless, the similar questions windows led me to various implementation. But since I read multiple times that it's a bad design, I would like to ask if what I'm trying to do is a bad design rather asking how to do it. For this reason I will try to explain my use case with a minial functional code.
Suppose I have a two classes, each of them with their own parameters:
class MyClass1:
def __init__(self,param1=1,param2=2):
self.param1=param1
self.param2=param2
class MyClass2:
def __init__(self,param3=3,param4=4):
self.param3=param3
self.param4=param4
I want to print param1...param4 as a string (i.e. "param1"..."param4") and not its value (i.e.=1...4).
Why? Two reasons in my case:
I have a GUI where the user is asked to select one of of the class
type (Myclass1, Myclass2) and then it's asked to insert the values
for the parameters of that class. The GUI then must show the
parameter names ("param1", "param2" if MyClass1 was chosen) as a
label with the Entry Widget to get the value. Now, suppose the
number of MyClass and parameter is very high, like 10 classes and 20
parameters per class. In order to minimize the written code and to
make it flexible (add or remove parameters from classes without
modifying the GUI code) I would like to cycle all the parameter of
Myclass and for each of them create the relative widget, thus I need
the paramx names under the form od string. The real application I'm
working on is even more complex, like parameter are inside other
objects of classes, but I used the simpliest example. One solution
would be to define every parameter as an object where
param1.name="param1" and param1.value=1. Thus in the GUI I would
print param1.name. But this lead to a specifi problem of my
implementation, that's reason 2:
MyClass1..MyClassN will be at some point printed in a JSON. The JSON
will be a huge file, and also since it's a complex tree (the example
is simple) I want to make it as simple as possibile. To explain why
I don't like to solution above, suppose this situation:
class MyClass1:
def init(self,param1,param2,combinations=[]):
self.param1=param1
self.param2=param2
self.combinations=combinations
Supposse param1 and param2 are now list of variable size, and
combination is a list where each element is composed by all the
combination of param1 and param2, and generate an output from some
sort of calculation. Each element of the list combinations is an
object SingleCombination,for example (metacode):
param1=[1,2] param2=[5,6] SingleCombination.param1=1
SingleCombination.param2=5 SingleCombination.output=1*5
MyInst1.combinations.append(SingleCombination).
In my case I will further incapsulated param1,param2 in a object
called parameters, so every condition will hace a nice tree with
only two object, parameters and output, and expanding parameters
node will show all the parameters with their value.
If I use JSON pickle to generate a JSON from the situation above, it
is nicely displayed since the name of the node will be the name of
the varaible ("param1", "param2" as strings in the JSON). But if I
do the trick at the end of situation (1), creating an object of
paramN as paramN.name and paramN.value, the JSON tree will become
ugly but especially huge, because if I have a big number of
condition, every paramN contains 2 sub-element. I wrote the
situation and displayed with a JSON Viewer, see the attached immage
I could pre processing the data structure before creating the JSON,
the problem is that I use the JSON to recreate the data structure in
another session of the program, so I need all the pieces of the data
structure to be in the JSON.
So, from my requirements, it seems that the workround to avoid print the variable names creates some side effect on the JSON visualization that I don't know how to solve without changing the logic of my program...
If you use dataclasses, getting the field names is pretty straightforward:
from dataclasses import dataclass, fields
#dataclass
class MyClass1:
first:int = 4
>>> fields(MyClass1)
(Field(name='first',type=<class 'int'>,default=4,...),)
This way, you can iterate over the class fields and ask your user to fill them. Note the field has a type, which you could use to eg ask the user for several values, as in your example.
You could add functions to extract programatically the param names (_show_inputs below ) from the class and values from instances (_json below ):
def blossom(cls):
"""decorate a class with `_json` (classmethod) and `_show_inputs` (bound)"""
def _json(self):
return json.dumps(self, cls=DataClassEncoder)
def _show_inputs(cls):
return {
field.name: field.type.__name__
for field in fields(cls)
}
cls._json = _json
cls._show_inputs = classmethod(_show_inputs)
return cls
NOTE 1: There's actually no need to decorate the classes with blossom. You could just use its internal functions programatically.
Using a custom json encoder to dump the dataclass objects, including properties:
import json
class DataClassPropEncoder(json.JSONEncoder): # https://stackoverflow.com/a/51286749/7814595
def default(self, o):
if is_dataclass(o):
cls = type(o)
# inject instance properties
props = {
name: getattr(o, name)
for name, value in cls.__dict__.items() if isinstance(value, property)
}
return {
**props,
**asdict(o)
}
return super().default(o)
Finally, wrap the computations inside properties so they are
serialized as well when using the decorated class. Full code example:
from dataclasses import asdict
from dataclasses import dataclass
from dataclasses import fields
from dataclasses import is_dataclass
import json
from itertools import product
from typing import List
class DataClassPropEncoder(json.JSONEncoder): # https://stackoverflow.com/a/51286749/7814595
def default(self, o):
if is_dataclass(o):
cls = type(o)
props = {
name: getattr(o, name)
for name, value in cls.__dict__.items() if isinstance(value, property)
}
return {
**props,
**asdict(o)
}
return super().default(o)
def blossom(cls):
def _json(self):
return json.dumps(self, cls=DataClassEncoder)
def _show_inputs(cls):
return {
field.name: field.type.__name__
for field in fields(cls)
}
cls._json = _json
cls._show_inputs = classmethod(_show_inputs)
return cls
#blossom
#dataclass
class MyClass1:
param1:int
param2:int
#blossom
#dataclass
class MyClass2:
param3: List[str]
param4: List[int]
def _compute_single(self, values): # TODO: implmement this
return values[0]*values[1]
#property
def combinations(self):
# TODO: cache if used more than once
# TODO: combinations might explode
field_names = []
field_values = []
cls = type(self)
for field in fields(cls):
field_names.append(field.name)
field_values.append(getattr(self, field.name))
results = []
for values in product(*field_values):
result = {
**{
field_names[idx]: value
for idx, value in enumerate(values)
},
"output": self._compute_single(values)
}
results.append(result)
return results
>>> print(f"MyClass1:\n{MyClass1._show_inputs()}")
MyClass1:
{'param1': 'int', 'param2': 'int'}
>>> print(f"MyClass2:\n{MyClass2._show_inputs()}")
MyClass2:
{'param3': 'List', 'param4': 'List'}
>>> obj_1 = MyClass1(3,4)
>>> print(f"obj_1:\n{obj_1._json()}")
obj_1:
{"param1": 3, "param2": 4}
>>> obj_2 = MyClass2(["first", "second"],[4,2])._json()
>>> print(f"obj_2:\n{obj_2._json()}")
obj_2:
{"combinations": [{"param3": "first", "param4": 4, "output": "firstfirstfirstfirst"}, {"param3": "first", "param4": 2, "output": "firstfirst"}, {"param3": "second", "param4": 4, "output": "secondsecondsecondsecond"}, {"param3": "second", "param4": 2, "output": "secondsecond"}], "param3": ["first", "second"], "param4": [4, 2]}
NOTE 2: If you need to perform several computations per class, it might be a good idea to abstract away the pattern in the combinations property to avoid repeating code.
NOTE 3: If you need access to the properties several times and not ust once, you might want to consider caching their values to avoid re-computation.
Once you have an instance of MyClass / MyClass2, you can call vars() or vars().keys() and it will give you the attributes as a str. Unlike dir, it will not show all the builtin attributes/methods starting with __.
class MyClass2:
def __init__(self,param3=3,param4=4):
self.param3=param3
self.param4=param4
instance_of_myclass2 = MyClass2(param3="what", param4="ever")
print(vars(instance_of_myclass2))
{'param3': 'what', 'param4': 'ever'}
print(vars(instance_of_myclass2).keys())
dict_keys(['param3', 'param4'])
dir(instance_of_myclass2)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'param3', 'param4']

Python: import JSON file into SQLAlchemy JSON field

I'm relatively new to Python so I'm hoping that I've just missed something really obvious... But all the similar questions/answers here on StackOverflow seem really overly complex for the simple task that I am trying to achieve.
I have a few hundred text files containing JSON data (the actual data structure isn't important, this block below is just to show you what kind of thing I have, the actual structure of the data could be wildly different but it will always be valid JSON data).
{
"config": {
"item1": "value1",
"item2": "value2"
},
"data": [
{
"dataA1": "valueA1",
"itemA2": "valueA2"
},
{
"dataB1": "valueB1",
"itemB2": "valueB2",
"itemB3": "valueB3"
}
]
}
My Model is something like this:
class ModelName(db.Model):
__tablename__ = 'table_name'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(64))
data1 = db.Column(db.JSON)
data2 = db.Column(db.JSON)
I have multiple data columns here, data1 and data2, simply so I can do a visual comparison of the inserted data. The final model will only have a single data field.
Here is the data insert where everything seems to be going wrong:
import json
new_record = ModelName(
name='Foo',
data1=open('./filename.json').read(),
data2=json.dumps(open('./filename.json').read(), indent=2)
)
try:
db.session.add(new_record)
db.session.commit()
print('Insert successful')
except:
print('Insert failed')
The data that ends up in data1 and data2 get littered with varying numbers of \ to escape double quotes and line breaks, plus it wraps the whole data insert in a set of double-quotes. As a result, the data is simply unusable. So I'm currently having to copy and paste the data into the DB manually which although this tedious task works fine, it is far from the right thing to have to do.
I don't need to edit, manipulate, or do anything to the data in any way. I simply want to read the JSON string from a given file and then insert its content into a record in the database, that is it, end of story, nothing else.
Is there really no SIMPLE way to achieve this?
When you read in a file you need json.loads().
And there's no indent kwarg for that.
So instead do:
data2=json.loads(open('filename.json').read())

How do I search for a string in this JSON with Python

My JSON file looks something like:
{
"generator": {
"name": "Xfer Records Serum",
....
},
"generator": {
"name: "Lennar Digital Sylenth1",
....
}
}
I ask the user for search term and the input is searched for in the name key only. All matching results are returned. It means if I input 's' only then also both the above ones would be returned. Also please explain me how to return all the object names which are generators. The more simple method the better it will be for me. I use json library. However if another library is required not a problem.
Before switching to JSON I tried XML but it did not work.
If your goal is just to search all name properties, this will do the trick:
import re
def search_names(term, lines):
name_search = re.compile('\s*"name"\s*:\s*"(.*' + term + '.*)",?$', re.I)
return [x.group(1) for x in [name_search.search(y) for y in lines] if x]
with open('path/to/your.json') as f:
lines = f.readlines()
print(search_names('s', lines))
which would return both names you listed in your example.
The way the search_names() function works is it builds a regular expression that will match any line starting with "name": " (with varying amount of whitespace) followed by your search term with any other characters around it then terminated with " followed by an optional , and the end of string. Then applies that to each line from the file. Finally it filters out any non-matching lines and returns the value of the name property (the capture group contents) for each match.

access leaves of json tree

I have a JSON file of the form:
{"id":442500000116137984, "reply":0, "children":[{"id":442502378957201408, "reply":0, "children":[]}]}
{"id":442500001084612608, "reply":0, "children":[{"id":442500145871990784, "reply":1, "children":[{"id":442500258421952512, "reply":1, "children":[]}]}]}
{"id":442500000258342912, "reply":0, "children":[{"id":442500636668489728, "reply":0, "children":[]}]}
In this each line refers to a separate tree. Now I want to go to the leaves of every tree and do something, basically
import json
f = open("file", 'r')
for line in f:
tree = json.loads(line)
#somehow walk through the tree and find leaves
if isLeaf(child):
print "Reached Leaf"
How do I walk through this tree object to detect all leaves?
This should work.
import json
f = open("file", 'r')
leafArray = []
def parseTree(obj):
if len(obj["children"]) == 0:
leafArray.append(obj)
else:
for child in obj["children"]:
parseTree(child)
for line in f:
global leafArray
leafArray = []
tree = json.loads(line.strip())
parseTree(tree)
#somehow walk through the tree and find leaves
print ""
for each in leafArray:
print each
You know, I once had to deal with a lot of hypermedia objects out of JSON, so I wrote this library. The problem was that I didn't know the depths of the trees beforehand, so I needed to be able to search around and get what I called the "paths" (the set of keys/indices you would use to reach a leaf) and values.
Anyway, you can mine it for ideas (I wrote it only for Python3.3+, but here's the method inside a class that would do what you want).
The basic idea is that you walk down the tree and check the objects you encounter and if you get more dictionaries (even inside of lists), you keep plunging deeper (I found it easier to write it as a recursive generator mostly by subclassing collections.MutableMapping and creating a class with a custom enumerate).
You keep track of the path you've taken along the way and once you get a value that doesn't merit further exploration (it's not a dict or a list), then you yield your path and the value:
def enumerate(self, path=None):
"""Iterate through the PelicanJson object yielding 1) the full path to
each value and 2) the value itself at that path.
"""
if path is None:
path = []
for k, v in self.store.items():
current_path = path[:]
current_path.append(k)
if isinstance(v, PelicanJson):
yield from v.enumerate(path=current_path)
elif isinstance(v, list):
for idx, list_item in enumerate(v):
list_path = current_path[:]
list_path.append(idx)
if isinstance(list_item, PelicanJson):
yield from list_item.enumerate(path=list_path)
else:
yield list_path, list_item
else:
yield current_path, v
Because this is exclusively for Python3, it takes advantage of things like yield from, so it won't work out of the box for you (and I certainly don't mean to offer my solution as the only one). Personally, I just got frustrated with reusing a lot of this logic in various functions, so writing this library saved me a lot of work and I could go back to doing weird things with the Hypermedia APIs I had to deal with.
You can do something like this. (I don't know the syntax of python).
temp = tree #Your JSON object in each line
while (temp.children ! = []){
temp = temp.children;
}
Your temp will now be the leaf.

Using Python's csv.dictreader to search for specific key to then print its value

BACKGROUND:
I am having issues trying to search through some CSV files.
I've gone through the python documentation: http://docs.python.org/2/library/csv.html
about the csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds) object of the csv module.
My understanding is that the csv.DictReader assumes the first line/row of the file are the fieldnames, however, my csv dictionary file simply starts with "key","value" and goes on for atleast 500,000 lines.
My program will ask the user for the title (thus the key) they are looking for, and present the value (which is the 2nd column) to the screen using the print function. My problem is how to use the csv.dictreader to search for a specific key, and print its value.
Sample Data:
Below is an example of the csv file and its contents...
"Mamer","285713:13"
"Champhol","461034:2"
"Station Palais","972811:0"
So if i want to find "Station Palais" (input), my output will be 972811:0. I am able to manipulate the string and create the overall program, I just need help with the csv.dictreader.I appreciate any assistance.
EDITED PART:
import csv
def main():
with open('anchor_summary2.csv', 'rb') as file_data:
list_of_stuff = []
reader = csv.DictReader(file_data, ("title", "value"))
for i in reader:
list_of_stuff.append(i)
print list_of_stuff
main()
The documentation you linked to provides half the answer:
class csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
[...] maps the information read into a dict whose keys are given by the optional fieldnames parameter. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames.
It would seem that if the fieldnames parameter is passed, the given file will not have its first record interpreted as headers (the parameter will be used instead).
# file_data is the text of the file, not the filename
reader = csv.DictReader(file_data, ("title", "value"))
for i in reader:
list_of_stuff.append(i)
which will (apparently; I've been having trouble with it) produce the following data structure:
[{"title": "Mamer", "value": "285713:13"},
{"title": "Champhol", "value": "461034:2"},
{"title": "Station Palais", "value": "972811:0"}]
which may need to be further massaged into a title-to-value mapping by something like this:
data = {}
for i in list_of_stuff:
data[i["title"]] = i["value"]
Now just use the keys and values of data to complete your task.
And here it is as a dictionary comprehension:
data = {row["title"]: row["value"] for row in csv.DictReader(file_data, ("title", "value"))}
The currently accepted answer is fine, but there's a slightly more direct way of getting at the data. The dict() constructor in Python can take any iterable.
In addition, your code might have issues on Python 3, because Python 3's csv module expects the file to be opened in text mode, not binary mode. You can make your code compatible with 2 and 3 by using io.open instead of open.
import csv
import io
with io.open('anchor_summary2.csv', 'r', newline='', encoding='utf-8') as f:
data = dict(csv.reader(f))
print(data['Champhol'])
As a warning, if your csv file has two rows with the same value in the first column, the later value will overwrite the earlier value. (This is also true of the other posted solution.)
If your program really is only supposed to print the result, there's really no reason to build a keyed dictionary.
import csv
import io
# Python 2/3 compat
try:
input = raw_input
except NameError:
pass
def main():
# Case-insensitive & leading/trailing whitespace insensitive
user_city = input('Enter a city: ').strip().lower()
with io.open('anchor_summary2.csv', 'r', newline='', encoding='utf-8') as f:
for city, value in csv.reader(f):
if user_city == city.lower():
print(value)
break
else:
print("City not found.")
if __name __ == '__main__':
main()
The advantage of this technique is that the csv isn't loaded into memory and the data is only iterated over once. I also added a little code the calls lower on both the keys to make the match case-insensitive. Another advantage is if the city the user requests is near the top of the file, it returns almost immediately and stops looking through the file.
With all that said, if searching performance is your primary consideration, you should consider storing the data in a database.