A pickle with jsonpickle (Python 3.7) - json

I have an issue with using jsonpickle. Rather, I believe it to be working correctly but it's not producing the output I want.
I have a class called 'Node'. In 'Node' are four ints (x, y, width, height) and a StringVar called 'NodeText'.
The problem with serialising a StringVar is that there's lots of information in there and for me it's just not necessary. I use it when the program's running, but for saving and loading it's not needed.
So I used a method to change out what jsonpickle saves, using the __getstate__ method for my Node. This way I can do this:
def __getstate__(self):
state = self.__dict__.copy()
del state['NodeText']
return state
This works well so far and NodeText isn't saved. The problem comes on a load. I load the file as normal into an object (in this case a list of nodes).
The problem loaded is this: the items loaded from json are not Nodes as defined in my class. They are almost the same (they have x, y, width and height) but because NodeText wasn't saved in the json file, these Node-like objects do not have that property. This then causes an error when I create a visual instance on screen of these Nodes because the StringVar is used for the tkinter Entry textvariable.
I would like to know if there is a way to load this 'almost node' into my actual Nodes. I could just copy every property one at a time into a new instance but this just seems like a bad way to do it.
I could also null the NodeText StringVar before saving (thus saving the space in the file) and then reinitialise it on loading. This would mean I'd have my full object, but somehow it seems like an awkward workaround.
If you're wondering just how much more information there is with the StringVar, my test json file has just two Nodes. Just saving the basic properties (x,y,width,height), the file is 1k. With each having a StringVar, that becomes 8k. I wouldn't care so much in the case of a small increase, but this is pretty huge.
Can I force the load to be to this Node type rather than just some new type that Python has created?
Edit: if you're wondering what the json looks like, take a look here:
{
"1": {
"py/object": "Node.Node",
"py/state": {
"ImageLocation": "",
"TextBackup": "",
"height": 200,
"uID": 1,
"width": 200,
"xPos": 150,
"yPos": 150
}
},
"2": {
"py/object": "Node.Node",
"py/state": {
"ImageLocation": "",
"TextBackup": "",
"height": 200,
"uID": 2,
"width": 100,
"xPos": 50,
"yPos": 450
}
}
}
Since the class name is there I assumed it would be an instantiation of the class. But when you load the file using jsonpickle, you get the dictionary and can inspect the loaded data and inspect each node. Neither node contains the property 'NodeText'. That is to say, it's not something with 'None' as the value - the attribute simple isn't there.

That's because jsonpickle doesn't know which fields are in your object normally, it restores only the fields passed from the state but the state doesn't field NodeText property. So it just misses it :)
You can add a __setstate__ magic method to achieve that property in your restored objects. This way you will be able to handle dumps with or without the property.
def __setstate__(self, state):
state.setdefault('NodeText', None)
for k, v in state.items():
setattr(self, k, v)
A small example
from pprint import pprint, pformat
import jsonpickle
class Node:
def __init__(self) -> None:
super().__init__()
self.NodeText = Node
self.ImageLocation = None
self.TextBackup = None
self.height = None
self.uID = None
self.width = None
self.xPos = None
self.yPos = None
def __setstate__(self, state):
state.setdefault('NodeText', None)
for k, v in state.items():
setattr(self, k, v)
def __getstate__(self):
state = self.__dict__.copy()
del state['NodeText']
return state
def __repr__(self) -> str:
return str(self.__dict__)
obj1 = Node()
obj1.NodeText = 'Some heavy description text'
obj1.ImageLocation = 'test ImageLocation'
obj1.TextBackup = 'test TextBackup'
obj1.height = 200
obj1.uID = 1
obj1.width = 200
obj1.xPos = 150
obj1.yPos = 150
print('Dumping ...')
dumped = jsonpickle.encode({1: obj1})
print(dumped)
print('Restoring object ...')
print(jsonpickle.decode(dumped))
outputs
# > python test.py
Dumping ...
{"1": {"py/object": "__main__.Node", "py/state": {"ImageLocation": "test ImageLocation", "TextBackup": "test TextBackup", "height": 200, "uID": 1, "width": 200, "xPos": 150, "yPos": 150}}}
Restoring object ...
{'1': {'ImageLocation': 'test ImageLocation', 'TextBackup': 'test TextBackup', 'height': 200, 'uID': 1, 'width': 200, 'xPos': 150, 'yPos': 150, 'NodeText': None}}

Related

How to extract value from a dict in a string - Pandas / Python

I have a pandas DataFrame with one feature, df['Computed Data'].
Computed Data
'{"stats":{"TypeCount":{"1
25":"31","8":"31"}},"plaintsCard":[{"root":"old","plaintsCount":1,"residencyCount":1}],"Count":62,"Status":{"activable":"10","activated":"18","inactivable":"3"},"Counta":0,"invoiCount":"31"}'
'{"Count":33,"invoiCount":"11","stats":{"TypeCount":{"1":"9","4":"22","11":"2"}},"plaintsCard":[],"Count":0,"Status":{"activated":"0","activable":"9","inactivable":"1"}}'
'{"Count":79,"invoiCount":"32","stats":{"TypeCount":{"1":"29","4":"32","18":"3","23":"15"}},"plaintsCard":[],"Count":0,"Status":{"activated":"0","activable":"28","inactivable":"2"}}'
'{"Count":80,"invoiCount":"32","stats":{"TypeCount":{"1":"31","4":"42","13":"1","23":"6"}},"plaintsCard":[],"Count":0,"Status":{"activated":"0","activable":"27","inactivable":"6"}}'
'{"stats": {"TypeCount": {"17": "27"}}, "plaintsCard": [], "parcelsCount": 27, "Status": {"activable": "9", "activated": "2", "inactivable": "16"}, "Count": 0, "invoiCount": "0"}'
I want to extract the "membersStatus", "activable" part from every string and to put it in a new column.
I have tried to use ast.literal_eval() and it is working but only when I apply it to one value
x = ast.literal_eval(df["Computed Data"][0])
x["membersStatus"]["activable"]
'10'
It gives me : '10'. Which is what I want but for every dict in "Computed Data" and to put it in a new column.
I tried to do it with a for loop :
for n, i in enumerate(df["Computed Data"]):
x = ast.literal_eval(df["Computed Data"][n])
ValueError: malformed node or string: <_ast.Name object at 0x13699c610>
I don't know how can I change what I did to make it work.
Can you Help please ?
you can use:
import json
df['Computed Data']=df['Computed Data'].apply(json.loads)
df['activable']=df['Computed Data'].apply(lambda x: x['membersStatus']['activable'])
if there are nan values:
df['Computed Data']=df['Computed Data'].apply(lambda x: json.loads(x) if pd.notna(x) else np.nan)
df['activable']=df['Computed Data'].apply(lambda x: x['membersStatus']['activable'] if pd.notna(x) else np.nan)
you can use apply methods to extract them:
df['member Status']=df['Computed Data'].apply(lambda x:eval(x)['membersStatus']['activable'])

Read and store game state as CSV

Thanks to the great help from Tenfour04, I've got wonderful code for handling CSV files.
However, I am in trouble like followings.
How to call these functions?
How to initialize 2-dimensional array variables?
Below is the code that finally worked.
MainActivity.kt
package com.surlofia.csv_tenfour04_1
import androidx.appcompat.app.AppCompatActivity
import android.os.Bundle
import java.io.File
import java.io.IOException
import com.surlofia.csv_tenfour04_1.databinding.ActivityMainBinding
var chk_Q_Num: MutableList<Int> = mutableListOf (
0,
1, 2, 3, 4, 5,
6, 7, 8, 9, 10,
11, 12, 13, 14, 15,
16, 17, 18, 19, 20,
)
var chk_Q_State: MutableList<String> = mutableListOf (
"z",
"a", "b", "c", "d", "e",
"f", "g", "h", "i", "j"
)
class MainActivity : AppCompatActivity() {
private lateinit var binding: ActivityMainBinding
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// setContentView(R.layout.activity_main)
binding = ActivityMainBinding.inflate(layoutInflater)
val view = binding.root
setContentView(view)
// Load saved data at game startup. It will be invalid if performed by other activities.
val filePath = filesDir.path + "/chk_Q.csv"
val file = File(filePath)
binding.fileExists.text = isFileExists(file).toString()
if (isFileExists(file)) {
val csvIN = file.readAsCSV()
for (i in 0 .. 10) {
chk_Q_Num[i] = csvIN[i][0].toInt()
chk_Q_State[i] = csvIN[i][1]
}
}
// Game Program Run
val csvOUT = mutableListOf(
mutableListOf("0","OK"),
mutableListOf("1","OK"),
mutableListOf("2","OK"),
mutableListOf("3","Not yet"),
mutableListOf("4","Not yet"),
mutableListOf("5","Not yet"),
mutableListOf("6","Not yet"),
mutableListOf("7","Not yet"),
mutableListOf("8","Not yet"),
mutableListOf("9","Not yet"),
mutableListOf("10","Not yet")
)
var tempString = ""
for (i in 0 .. 10) {
csvOUT[i][0] = chk_Q_Num[i].toString()
csvOUT[i][1] = "OK"
tempString = tempString + csvOUT[i][0] + "-->" + csvOUT[i][1] + "\n"
}
binding.readFile.text = tempString
// and save Data
file.writeAsCSV(csvOUT)
}
// https://www.techiedelight.com/ja/check-if-a-file-exists-in-kotlin/
private fun isFileExists(file: File): Boolean {
return file.exists() && !file.isDirectory
}
#Throws(IOException::class)
fun File.readAsCSV(): List<List<String>> {
val splitLines = mutableListOf<List<String>>()
forEachLine {
splitLines += it.split(", ")
}
return splitLines
}
#Throws(IOException::class)
fun File.writeAsCSV(values: List<List<String>>) {
val csv = values.joinToString("\n") { line -> line.joinToString(", ") }
writeText(csv)
}
}
chk_Q.csv
0,0
1,OK
2,OK
3,Not yet
4,Not yet
5,Not yet
6,Not yet
7,Not yet
8,Not yet
9,Not yet
10,Not yet
1. How to call these functions?
The code below seems work well.
Did I call these funtions in right way?
Or are there better ways to achieve this?
read
if (isFileExists(file)) {
val csvIN = file.readAsCSV()
for (i in 0 .. 10) {
chk_Q_Num[i] = csvIN[i][0].toInt()
chk_Q_State[i] = csvIN[i][1]
}
}
write
file.writeAsCSV(csvOUT)
2. How to initialize 2-dimensional array variables?
val csvOUT = mutableListOf(
mutableListOf("0","OK"),
mutableListOf("1","OK"),
mutableListOf("2","OK"),
mutableListOf("3","Not yet"),
mutableListOf("4","Not yet"),
mutableListOf("5","Not yet"),
mutableListOf("6","Not yet"),
mutableListOf("7","Not yet"),
mutableListOf("8","Not yet"),
mutableListOf("9","Not yet"),
mutableListOf("10","Not yet")
)
I would like to know the clever way to use a for loop instead of writing specific values one by one.
For example, something like bellow.
val csvOUT = mutableListOf(mutableListOf())
for (i in 0 .. 10) {
csvOUT[i][0] = i
csvOUT[i][1] = "OK"
}
But this gave me the following error message:
Not enough information to infer type variable T
It would be great if you could provide an example of how to execute this for beginners.
----- Added on June 15, 2022. -----
[Question 1]
Regarding initialization, I got an error "keep stopping" when I executed the following code.
The application is forced to terminate.
Why is this?
val csvOUT: MutableList<MutableList<String>> = mutableListOf(mutableListOf())
for (i in 0 .. 10) {
csvOUT[i][0] = "$i"
csvOUT[i][1] = "OK"
}
[Error Message]
java.lang.RuntimeException: Unable to start activity ComponentInfo{com.surlofia.csv_endzeit_01/com.surlofia.csv_endzeit_01.MainActivity}: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
In my opinion there are basically two parts to your question. First you need an understanding of the Kotlin type system including generics. Secondly you want some knowledge about approaches to the problem at hand.
type-system and generics
The function mutableListOf you're using is generic and thus needs a single type parameter T, as can be seen by definition its taken from the documentation:
fun <T> mutableListOf(): MutableList<T>
Most of the time the Kotlin compiler is quite good at type-inference, that is guessing the type used based on the context. For example, I do not need to provide a type explicitly in the following example, because the Kotlin compiler can infer the type from the usage context.
val listWithInts = mutableListOf(3, 7)
The infered type is MutableList<Int>.
However, sometimes this might not be what one desires. For example, I might want to allow null values in my list above. To achieve this, I have to tell the compiler that it should not only allow Int values to the list but also null values, widening the type from Int to Int?. I can achieve this in at least two ways.
providing a generic type parameter
val listWithNullableInts = mutableListOf<Int?>(3, 7)
defining the expected return type explicitly
val listWithNullableInts: MutableList<Int?> = mutableListOf(3, 7)
In your case the compiler does NOT have enough information to infer the type from the usage context. Thus you either have to provide it that context, e.g. by passing values of a specific type to the function or using one of the two options named above.
initialization of multidimensional arrays
There are questions and answers on creating multi-dimensional arrays in Kotlin on StackOverflow already.
One solution to your problem at hand might be the following.
val csvOUT: MutableList<MutableList<String>> = mutableListOf(mutableListOf())
for (i in 0 .. 10) {
csvOUT[i][0] = "$i"
csvOUT[i][1] = "OK"
}
You help the Kotlin compiler by defining the expected return type explicitly and then add the values as Strings to your 2D list.
If the dimensions are fixed, you might want to use fixed-size Arrays instead.
val csvArray = Array(11) { index -> arrayOf("$index", "OK") }
In both solutions you convert the Int index to a String however.
If the only information you want to store for each level is a String, you might as well use a simple List<String and use the index of each entry as the level number, e.g.:
val csvOut = List(11) { "OK" }
val levelThree = csvOut[2] // first index of List is 0
This would also work with more complicated data structures instead of Strings. You simply would have to adjust your fun File.writeAsCSV(values: List<List<String>>) to accept a different type as the values parameter.
Assume a simple data class you might end up with something along the lines of:
data class LevelState(val state: String, val timeBeaten: Instant?)
val levelState = List(11) { LevelState("OK", Instant.now()) }
fun File.writeAsCSV(values: List<LevelState>) {
val csvString = values
.mapIndexed { index, levelState -> "$index, ${levelState.state}, ${levelState.timeBeaten}" }
.joinToString("\n")
writeText(csvString)
}
If you prefer a more "classical" imperative approach, you can populate your 2-dimensional Array / List using a loop like for in.
val list: MutableList<MutableList<String>> = mutableListOf() // list is now []
for (i in 0..10) {
val innerList: MutableList<String> = mutableListOf()
innerList.add("$i")
innerList.add("OK")
innerList.add("${Instant.now()}")
list.add(innerList)
// list is after first iteration [ ["0", "OK", "2022-06-15T07:03:14.315Z"] ]
}
The syntax listName[index] = value is just syntactic sugar for the operator overload of the set operator, see the documentation on MutableList for example.
You cannot access an index, that has not been populated before, e.g. during the List's initialization or by using add; or else you're greeted with a IndexOutOfBoundsException.
If you want to use the set operator, one option is to use a pre-populated Array as such:
val array: Array<Array<String>>> = Array(11) {
Array(3) { "default" }
} // array is [ ["default, "default", "default"], ...]
array[1][2] = "myValue"
However, I wouldn't recommend this approach, as it might lead to left over, potentially invalid initial data, in case one misses to replace a value.

It's a bad design to try to print classes' variable name and not value (eg. x.name print "name" instead of content of name)

The long title contain also a mini-exaple because I couldn't explain well what I'm trying to do. Nonethless, the similar questions windows led me to various implementation. But since I read multiple times that it's a bad design, I would like to ask if what I'm trying to do is a bad design rather asking how to do it. For this reason I will try to explain my use case with a minial functional code.
Suppose I have a two classes, each of them with their own parameters:
class MyClass1:
def __init__(self,param1=1,param2=2):
self.param1=param1
self.param2=param2
class MyClass2:
def __init__(self,param3=3,param4=4):
self.param3=param3
self.param4=param4
I want to print param1...param4 as a string (i.e. "param1"..."param4") and not its value (i.e.=1...4).
Why? Two reasons in my case:
I have a GUI where the user is asked to select one of of the class
type (Myclass1, Myclass2) and then it's asked to insert the values
for the parameters of that class. The GUI then must show the
parameter names ("param1", "param2" if MyClass1 was chosen) as a
label with the Entry Widget to get the value. Now, suppose the
number of MyClass and parameter is very high, like 10 classes and 20
parameters per class. In order to minimize the written code and to
make it flexible (add or remove parameters from classes without
modifying the GUI code) I would like to cycle all the parameter of
Myclass and for each of them create the relative widget, thus I need
the paramx names under the form od string. The real application I'm
working on is even more complex, like parameter are inside other
objects of classes, but I used the simpliest example. One solution
would be to define every parameter as an object where
param1.name="param1" and param1.value=1. Thus in the GUI I would
print param1.name. But this lead to a specifi problem of my
implementation, that's reason 2:
MyClass1..MyClassN will be at some point printed in a JSON. The JSON
will be a huge file, and also since it's a complex tree (the example
is simple) I want to make it as simple as possibile. To explain why
I don't like to solution above, suppose this situation:
class MyClass1:
def init(self,param1,param2,combinations=[]):
self.param1=param1
self.param2=param2
self.combinations=combinations
Supposse param1 and param2 are now list of variable size, and
combination is a list where each element is composed by all the
combination of param1 and param2, and generate an output from some
sort of calculation. Each element of the list combinations is an
object SingleCombination,for example (metacode):
param1=[1,2] param2=[5,6] SingleCombination.param1=1
SingleCombination.param2=5 SingleCombination.output=1*5
MyInst1.combinations.append(SingleCombination).
In my case I will further incapsulated param1,param2 in a object
called parameters, so every condition will hace a nice tree with
only two object, parameters and output, and expanding parameters
node will show all the parameters with their value.
If I use JSON pickle to generate a JSON from the situation above, it
is nicely displayed since the name of the node will be the name of
the varaible ("param1", "param2" as strings in the JSON). But if I
do the trick at the end of situation (1), creating an object of
paramN as paramN.name and paramN.value, the JSON tree will become
ugly but especially huge, because if I have a big number of
condition, every paramN contains 2 sub-element. I wrote the
situation and displayed with a JSON Viewer, see the attached immage
I could pre processing the data structure before creating the JSON,
the problem is that I use the JSON to recreate the data structure in
another session of the program, so I need all the pieces of the data
structure to be in the JSON.
So, from my requirements, it seems that the workround to avoid print the variable names creates some side effect on the JSON visualization that I don't know how to solve without changing the logic of my program...
If you use dataclasses, getting the field names is pretty straightforward:
from dataclasses import dataclass, fields
#dataclass
class MyClass1:
first:int = 4
>>> fields(MyClass1)
(Field(name='first',type=<class 'int'>,default=4,...),)
This way, you can iterate over the class fields and ask your user to fill them. Note the field has a type, which you could use to eg ask the user for several values, as in your example.
You could add functions to extract programatically the param names (_show_inputs below ) from the class and values from instances (_json below ):
def blossom(cls):
"""decorate a class with `_json` (classmethod) and `_show_inputs` (bound)"""
def _json(self):
return json.dumps(self, cls=DataClassEncoder)
def _show_inputs(cls):
return {
field.name: field.type.__name__
for field in fields(cls)
}
cls._json = _json
cls._show_inputs = classmethod(_show_inputs)
return cls
NOTE 1: There's actually no need to decorate the classes with blossom. You could just use its internal functions programatically.
Using a custom json encoder to dump the dataclass objects, including properties:
import json
class DataClassPropEncoder(json.JSONEncoder): # https://stackoverflow.com/a/51286749/7814595
def default(self, o):
if is_dataclass(o):
cls = type(o)
# inject instance properties
props = {
name: getattr(o, name)
for name, value in cls.__dict__.items() if isinstance(value, property)
}
return {
**props,
**asdict(o)
}
return super().default(o)
Finally, wrap the computations inside properties so they are
serialized as well when using the decorated class. Full code example:
from dataclasses import asdict
from dataclasses import dataclass
from dataclasses import fields
from dataclasses import is_dataclass
import json
from itertools import product
from typing import List
class DataClassPropEncoder(json.JSONEncoder): # https://stackoverflow.com/a/51286749/7814595
def default(self, o):
if is_dataclass(o):
cls = type(o)
props = {
name: getattr(o, name)
for name, value in cls.__dict__.items() if isinstance(value, property)
}
return {
**props,
**asdict(o)
}
return super().default(o)
def blossom(cls):
def _json(self):
return json.dumps(self, cls=DataClassEncoder)
def _show_inputs(cls):
return {
field.name: field.type.__name__
for field in fields(cls)
}
cls._json = _json
cls._show_inputs = classmethod(_show_inputs)
return cls
#blossom
#dataclass
class MyClass1:
param1:int
param2:int
#blossom
#dataclass
class MyClass2:
param3: List[str]
param4: List[int]
def _compute_single(self, values): # TODO: implmement this
return values[0]*values[1]
#property
def combinations(self):
# TODO: cache if used more than once
# TODO: combinations might explode
field_names = []
field_values = []
cls = type(self)
for field in fields(cls):
field_names.append(field.name)
field_values.append(getattr(self, field.name))
results = []
for values in product(*field_values):
result = {
**{
field_names[idx]: value
for idx, value in enumerate(values)
},
"output": self._compute_single(values)
}
results.append(result)
return results
>>> print(f"MyClass1:\n{MyClass1._show_inputs()}")
MyClass1:
{'param1': 'int', 'param2': 'int'}
>>> print(f"MyClass2:\n{MyClass2._show_inputs()}")
MyClass2:
{'param3': 'List', 'param4': 'List'}
>>> obj_1 = MyClass1(3,4)
>>> print(f"obj_1:\n{obj_1._json()}")
obj_1:
{"param1": 3, "param2": 4}
>>> obj_2 = MyClass2(["first", "second"],[4,2])._json()
>>> print(f"obj_2:\n{obj_2._json()}")
obj_2:
{"combinations": [{"param3": "first", "param4": 4, "output": "firstfirstfirstfirst"}, {"param3": "first", "param4": 2, "output": "firstfirst"}, {"param3": "second", "param4": 4, "output": "secondsecondsecondsecond"}, {"param3": "second", "param4": 2, "output": "secondsecond"}], "param3": ["first", "second"], "param4": [4, 2]}
NOTE 2: If you need to perform several computations per class, it might be a good idea to abstract away the pattern in the combinations property to avoid repeating code.
NOTE 3: If you need access to the properties several times and not ust once, you might want to consider caching their values to avoid re-computation.
Once you have an instance of MyClass / MyClass2, you can call vars() or vars().keys() and it will give you the attributes as a str. Unlike dir, it will not show all the builtin attributes/methods starting with __.
class MyClass2:
def __init__(self,param3=3,param4=4):
self.param3=param3
self.param4=param4
instance_of_myclass2 = MyClass2(param3="what", param4="ever")
print(vars(instance_of_myclass2))
{'param3': 'what', 'param4': 'ever'}
print(vars(instance_of_myclass2).keys())
dict_keys(['param3', 'param4'])
dir(instance_of_myclass2)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'param3', 'param4']

dumping list to JSON file creates list within a list [["x", "y","z"]], why?

I want to append multiple list items to a JSON file, but it creates a list within a list, and therefore I cannot acces the list from python. Since the code is overwriting existing data in the JSON file, there should not be any list there. I also tried it by having just an text in the file without brackets. It just creates a list within a list so [["x", "y","z"]] instead of ["x", "y","z"]
import json
filename = 'vocabulary.json'
print("Reading %s" % filename)
try:
with open(filename, "rt") as fp:
data = json.load(fp)
print("Data: %s" % data)#check
except IOError:
print("Could not read file, starting from scratch")
data = []
# Add some data
TEMPORARY_LIST = []
new_word = input("give new word: ")
TEMPORARY_LIST.append(new_word.split())
print(TEMPORARY_LIST)#check
data = TEMPORARY_LIST
print("Overwriting %s" % filename)
with open(filename, "wt") as fp:
json.dump(data, fp)
example and output with appending list with split words:
Reading vocabulary.json
Data: [['my', 'dads', 'house', 'is', 'nice']]
give new word: but my house is nicer
[['but', 'my', 'house', 'is', 'nicer']]
Overwriting vocabulary.json
So, if I understand what you are trying to accomplish correctly, it looks like you are trying to overwrite a list in a JSON file with a new list created from user input. For easiest data manipulation, set up your JSON file in dictionary form:
{
"words": [
"my",
"dad's",
"house",
"is",
"nice"
]
}
You should then set up functions to separate your functionality to make it more manageable:
def load_json(filename):
with open(filename, "r") as f:
return json.load(f)
Now, we can use those functions to load the JSON, access the words list, and overwrite it with the new word.
data = load_json("vocabulary.json")
new_word = input("Give new word: ").split()
data["words"] = new_word
write_json("vocabulary.json", data)
If the user inputs "but my house is nicer", the JSON file will look like this:
{
"words": [
"but",
"my",
"house",
"is",
"nicer"
]
}
Edit
Okay, I have a few suggestions to make before I get into solving the issue. Firstly, it's great that you have delegated much of the functionality of the program over to respective functions. However, using global variables is generally discouraged because it makes things extremely difficult to debug as any of the functions that use that variable could have mutated it by accident. To fix this, use method parameters and pass around the data accordingly. With small programs like this, you can think of the main() method as the point in which all data comes to and from. This means that the main() function will pass data to other functions and receive new or edited data back. One final recommendation, you should only be using all capital letters for variable names if they are going to be constant. For example, PI = 3.14159 is a constant, so it is conventional to make "pi" all caps.
Without using global, main() will look much cleaner:
def main():
choice = input("Do you want to start or manage the list? (start/manage)")
if choice == "start":
data = load_json()
words = data["words"]
dictee(words)
elif choice == "manage":
manage_list()
You can use the load_json() function from earlier (notice that I deleted write_json(), more on that later) if the user chooses to start the game. If the user chooses to manage the file, we can write something like this:
def manage_list():
choice = input("Do you want to add or clear the list? (add/clear)")
if choice == "add":
words_to_add = get_new_words()
add_words("vocabulary.json", words_to_add)
elif choice == "clear":
clear_words("vocabulary.json")
We get the user input first and then we can call two other functions, add_words() and clear_words():
def add_words(filename, words):
with open(filename, "r+") as f:
data = json.load(f)
data["words"].extend(words)
f.seek(0)
json.dump(data, f, indent=4)
def clear_words(filename):
with open(filename, "w+") as f:
data = {"words":[]}
json.dump(data, f, indent=4)
I did not utilize the load_json() function in the two functions above. My reasoning for this is because it would call for opening the file more times than needed, which would hurt performance. Furthermore, in these two functions, we already need to open the file, so it is okayt to load the JSON data here because it can be done with only one line: data = json.load(f). You may also notice that in add_words(), the file mode is "r+". This is the basic mode for reading and writing. "w+" is used in clear_words(), because "w+" not only opens the file for reading and writing, it overwrites the file if the file exists (that is also why we don't need to load the JSON data in clear_words()). Because we have these two functions for writing and/or overwriting data, we don't need the write_json() function that I had initially suggested.
We can then add to the list like so:
>>> Do you want to start or manage the list? (start/manage)manage
>>> Do you want to add or clear the list? (add/clear)add
>>> Please enter the words you want to add, separated by spaces: these are new words
And the JSON file becomes:
{
"words": [
"but",
"my",
"house",
"is",
"nicer",
"these",
"are",
"new",
"words"
]
}
We can then clear the list like so:
>>> Do you want to start or manage the list? (start/manage)manage
>>> Do you want to add or clear the list? (add/clear)clear
And the JSON file becomes:
{
"words": []
}
Great! Now, we implemented the ability for the user to manage the list. Let's move on to creating the functionality for the game: dictee()
You mentioned that you want to randomly select an item from a list and remove it from that list so it doesn't get asked twice. There are a multitude of ways you can accomplish this. For example, you could use random.shuffle:
def dictee(words):
correct = 0
incorrect = 0
random.shuffle(words)
for word in words:
# ask word
# evaluate response
# increment correct/incorrect
# ask if you want to play again
pass
random.shuffle randomly shuffles the list around. Then, you can iterate throught the list using for word in words: and start the game. You don't necessarily need to use random.choice here because when using random.shuffle and iterating through it, you are essentially selecting random values.
I hope this helped illustrate how powerful functions and function parameters are. They not only help you separate your code, but also make it easier to manage, understand, and write cleaner code.

Getting image URL's from amazon product page

I'm trying to scrape image URL's from Amazon products, for example, this link.
In the page source code, there is a section which contains all the urls for images of different sizes (large, medium, hirez, etc). I can get that part of the script by doing, with scrapy,
imagesString = (response.xpath('//script[contains(., "ImageBlockATF")]/text()').extract_first())
Which gives me a string that looks like this,
P.when('A').register("ImageBlockATF", function(A){
var data = {
'colorImages': { 'initial': [{"hiRes":"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SL1500_.jpg","thumb":"https://images-na.ssl-images-amazon.com/images/I/31HoKqtljqL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/31HoKqtljqL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX355_.jpg":[308,355],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX450_.jpg":[390,450],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX425_.jpg":[369,425],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX466_.jpg":[404,466],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX522_.jpg":[453,522],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX569_.jpg":[494,569],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX679_.jpg":[589,679]},"variant":"MAIN","lowRes":null},{"hiRes":"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SL1500_.jpg","thumb":"https://images-na.ssl-images-amazon.com/images/I/31Y%2B8oE5DtL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/31Y%2B8oE5DtL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX355_.jpg":[308,355],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX450_.jpg":[390,450],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX425_.jpg":[369,425],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX466_.jpg":[404,466],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX522_.jpg":[453,522],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX569_.jpg":[494,569],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX679_.jpg":[589,679]},"variant":"PT01","lowRes":null},{"hiRes":null,"thumb":"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL._SX355_.jpg":[236,355],"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL._SX450_.jpg":[300,450],"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL._SX425_.jpg":[283,425],"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL._SX466_.jpg":[310,466],"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL.jpg":[333,500]},"variant":"PT02","lowRes":null},{"hiRes":null,"thumb":"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL._SX355_.jpg":[236,355],"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL._SX450_.jpg":[300,450],"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL._SX425_.jpg":[283,425],"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL._SX466_.jpg":[310,466],"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL.jpg":[333,500]},"variant":"PT03","lowRes":null},{"hiRes":null,"thumb":"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL._SX355_.jpg":[236,355],"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL._SX450_.jpg":[300,450],"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL._SX425_.jpg":[283,425],"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL._SX466_.jpg":[310,466],"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL.jpg":[333,500]},"variant":"PT04","lowRes":null}]},
'colorToAsin': {'initial': {}},
'holderRatio': 1.0,
'holderMaxHeight': 700,
'heroImage': {'initial': []},
'heroVideo': {'initial': []},
'spin360ColorData': {'initial': {}},
'spin360ColorEnabled': {'initial': 0},
'spin360ConfigEnabled': false,
'spin360LazyLoadEnabled': false,
'playVideoInImmersiveView':'false',
'tabbedImmersiveViewTreatment':'T2',
'totalVideoCount':'0',
'videoIngressATFSlateThumbURL':'',
'mediaTypeCount':'0',
'atfEnhancedHoverOverlay' : true,
'winningAsin': 'B00XLSS79Y',
'weblabs' : {},
'aibExp3Layout' : 1,
'aibRuleName' : 'frank-powered',
'acEnabled' : false
};
A.trigger('P.AboveTheFold'); // trigger ATF event.
return data;
});
My goal is to get into a Json dictionary the data inside colorImages, so then I can easily get each URL.
I tried doing something like this:
m = re.search(r'^var data = ({.*};)', imagesString , re.S | re.M)
data = m.group()
jsonObj = json.loads(data[:-1].replace("'", '"'))
But it seems that imagesString does not work well with re.search, I keep getting errors regarding imagesString not being a string when it actually is.
I got similar data from an amazon page by using re.findall, something like this (script is a chunk of text i got from the page).
variationValues = re.findall(r'variationValues\" : ({.*?})', ' '.join(script))[0]
and then
variationValuesDict = json.loads(variationValues)
But my knowledge of regular expressions is not that great.
From the string I pasted above, I erased the start and end so only the data remained, so I was left with this:
https://jsoneditoronline.org/?id=9ea92643044f4ac88bcc3e76d98425fc
I can't figure out how to get colorImages with re.findall() (or the data in the json editor) so I can then load it into Json and use it like a dictionary, any ideas on how to achieve this?
You just need to initially convert the var data to the correct markup json. It is easy ))) Just replace all chars ' to " and delete SPACES. And you will get json object:
(It's your right json)