Groovy csv to string - csv

I am using Dell Boomi to map data from one system to another. I can use groovy in the maps but have no experience with it. I tried to do this with the other Boomi tools, but have been told that I'll need to use groovy in a script. My inbound data is:
132265,Brown
132265,Gold
132265,Gray
132265,Green
I would like to output:
132265,"Brown,Gold,Gray,Green"
Hopefully this makes sense! Any ideas on the groovy code to make this work?

It can be elegantly solved with groupBy and the spread operator:
#Grapes(
#Grab(group='org.apache.commons', module='commons-csv', version='1.2')
)
import org.apache.commons.csv.*
def csv = '''
132265,Brown
132265,Gold
132265,Gray
132265,Green
'''
def parsed = CSVParser.parse(csv, CSVFormat.DEFAULT.withHeader('code', 'color')
parsed.records.groupBy({ it.code }).each { k,v -> println "$k,\"${v*.color.join(',')}\"" }
The above prints:
132265,"Brown,Gold,Gray,Green"

Well, I don't know how are you getting your data, but here is a general way to achieve your goal. You can use a library, such as the one bellow to parse the csv.
https://github.com/xlson/groovycsv
The example for your data would be:
#Grab('com.xlson.groovycsv:groovycsv:1.1')
import static com.xlson.groovycsv.CsvParser.parseCsv
def csv = '''
132265,Brown
132265,Gold
132265,Gray
132265,Green
'''
def data = parseCsv(csv)
I believe you want to associate the number with various values of colors. So for each line you can create a map of the number and the colors associated with that number, splitting the line by ",":
map = [:]
for(line in data) {
number = line.split(',')[0]
colour = line.split(',')[1]
if(!map[number])
map[number] = []
map[number].add(colour)
}
println map
So map should contain:
[132265:["Brown","Gold","Gray","Green"]]
Well, if it is not what you want, you can extract the general idea.

Assuming your data is coming in as a comma separated string of data like this:
"132265,Brown 132265,Gold 132265,Gray 132265,Green 122222,Red 122222,White"
The following Groovy script code should do the trick.
def csvString = "132265,Brown 132265,Gold 132265,Gray 132265,Green 122222,Red 122222,White"
LinkedHashMap.metaClass.multiPut << { key, value ->
delegate[key] = delegate[key] ?: []; delegate[key] += value
}
def map = [:]
def csv = csvString.split().collect{ entry -> entry.split(",") }
csv.each{ entry -> map.multiPut(entry[0], entry[1]) }
def result = map.collect{ k, v -> k + ',"' + v.join(",") + '"'}.join("\n")
println result
Would print:
132265,"Brown,Gold,Gray,Green"
122222,"Red,White"

Do you HAVE to use scripting for some reason? This can be easily accomplished with out-of-the-box Boomi functionality.
Create a map function that prepends the ID field to a string of your choice (i.e. 222_concat_fields). Then use that value to set a dynamic process prop with that value.
The value of the process prop will contain the result of concatenating the name fields. Simply adding this function to your map should take care of it. Then use the final value to populate your result.

Well it depends upon the data how is it coming.
If the data which you have posted in the question is coming in a single document, then you can easily handle this in a map with groovy scripting.
If the data which you have posted in the question is coming into multiple documents i.e.
doc1: 132265,Brown
doc2: 132265,Gold
doc3: 132265,Gray
doc4: 132265,Green
In that case it cannot be handled into map. You will need to use Data Process Step with Custom Scripting.
For the code which you are asking to create in groovy depends upon the input profile in which you are getting the data. Please provide more information i.e. input profile, fields etc.

Related

It's a bad design to try to print classes' variable name and not value (eg. x.name print "name" instead of content of name)

The long title contain also a mini-exaple because I couldn't explain well what I'm trying to do. Nonethless, the similar questions windows led me to various implementation. But since I read multiple times that it's a bad design, I would like to ask if what I'm trying to do is a bad design rather asking how to do it. For this reason I will try to explain my use case with a minial functional code.
Suppose I have a two classes, each of them with their own parameters:
class MyClass1:
def __init__(self,param1=1,param2=2):
self.param1=param1
self.param2=param2
class MyClass2:
def __init__(self,param3=3,param4=4):
self.param3=param3
self.param4=param4
I want to print param1...param4 as a string (i.e. "param1"..."param4") and not its value (i.e.=1...4).
Why? Two reasons in my case:
I have a GUI where the user is asked to select one of of the class
type (Myclass1, Myclass2) and then it's asked to insert the values
for the parameters of that class. The GUI then must show the
parameter names ("param1", "param2" if MyClass1 was chosen) as a
label with the Entry Widget to get the value. Now, suppose the
number of MyClass and parameter is very high, like 10 classes and 20
parameters per class. In order to minimize the written code and to
make it flexible (add or remove parameters from classes without
modifying the GUI code) I would like to cycle all the parameter of
Myclass and for each of them create the relative widget, thus I need
the paramx names under the form od string. The real application I'm
working on is even more complex, like parameter are inside other
objects of classes, but I used the simpliest example. One solution
would be to define every parameter as an object where
param1.name="param1" and param1.value=1. Thus in the GUI I would
print param1.name. But this lead to a specifi problem of my
implementation, that's reason 2:
MyClass1..MyClassN will be at some point printed in a JSON. The JSON
will be a huge file, and also since it's a complex tree (the example
is simple) I want to make it as simple as possibile. To explain why
I don't like to solution above, suppose this situation:
class MyClass1:
def init(self,param1,param2,combinations=[]):
self.param1=param1
self.param2=param2
self.combinations=combinations
Supposse param1 and param2 are now list of variable size, and
combination is a list where each element is composed by all the
combination of param1 and param2, and generate an output from some
sort of calculation. Each element of the list combinations is an
object SingleCombination,for example (metacode):
param1=[1,2] param2=[5,6] SingleCombination.param1=1
SingleCombination.param2=5 SingleCombination.output=1*5
MyInst1.combinations.append(SingleCombination).
In my case I will further incapsulated param1,param2 in a object
called parameters, so every condition will hace a nice tree with
only two object, parameters and output, and expanding parameters
node will show all the parameters with their value.
If I use JSON pickle to generate a JSON from the situation above, it
is nicely displayed since the name of the node will be the name of
the varaible ("param1", "param2" as strings in the JSON). But if I
do the trick at the end of situation (1), creating an object of
paramN as paramN.name and paramN.value, the JSON tree will become
ugly but especially huge, because if I have a big number of
condition, every paramN contains 2 sub-element. I wrote the
situation and displayed with a JSON Viewer, see the attached immage
I could pre processing the data structure before creating the JSON,
the problem is that I use the JSON to recreate the data structure in
another session of the program, so I need all the pieces of the data
structure to be in the JSON.
So, from my requirements, it seems that the workround to avoid print the variable names creates some side effect on the JSON visualization that I don't know how to solve without changing the logic of my program...
If you use dataclasses, getting the field names is pretty straightforward:
from dataclasses import dataclass, fields
#dataclass
class MyClass1:
first:int = 4
>>> fields(MyClass1)
(Field(name='first',type=<class 'int'>,default=4,...),)
This way, you can iterate over the class fields and ask your user to fill them. Note the field has a type, which you could use to eg ask the user for several values, as in your example.
You could add functions to extract programatically the param names (_show_inputs below ) from the class and values from instances (_json below ):
def blossom(cls):
"""decorate a class with `_json` (classmethod) and `_show_inputs` (bound)"""
def _json(self):
return json.dumps(self, cls=DataClassEncoder)
def _show_inputs(cls):
return {
field.name: field.type.__name__
for field in fields(cls)
}
cls._json = _json
cls._show_inputs = classmethod(_show_inputs)
return cls
NOTE 1: There's actually no need to decorate the classes with blossom. You could just use its internal functions programatically.
Using a custom json encoder to dump the dataclass objects, including properties:
import json
class DataClassPropEncoder(json.JSONEncoder): # https://stackoverflow.com/a/51286749/7814595
def default(self, o):
if is_dataclass(o):
cls = type(o)
# inject instance properties
props = {
name: getattr(o, name)
for name, value in cls.__dict__.items() if isinstance(value, property)
}
return {
**props,
**asdict(o)
}
return super().default(o)
Finally, wrap the computations inside properties so they are
serialized as well when using the decorated class. Full code example:
from dataclasses import asdict
from dataclasses import dataclass
from dataclasses import fields
from dataclasses import is_dataclass
import json
from itertools import product
from typing import List
class DataClassPropEncoder(json.JSONEncoder): # https://stackoverflow.com/a/51286749/7814595
def default(self, o):
if is_dataclass(o):
cls = type(o)
props = {
name: getattr(o, name)
for name, value in cls.__dict__.items() if isinstance(value, property)
}
return {
**props,
**asdict(o)
}
return super().default(o)
def blossom(cls):
def _json(self):
return json.dumps(self, cls=DataClassEncoder)
def _show_inputs(cls):
return {
field.name: field.type.__name__
for field in fields(cls)
}
cls._json = _json
cls._show_inputs = classmethod(_show_inputs)
return cls
#blossom
#dataclass
class MyClass1:
param1:int
param2:int
#blossom
#dataclass
class MyClass2:
param3: List[str]
param4: List[int]
def _compute_single(self, values): # TODO: implmement this
return values[0]*values[1]
#property
def combinations(self):
# TODO: cache if used more than once
# TODO: combinations might explode
field_names = []
field_values = []
cls = type(self)
for field in fields(cls):
field_names.append(field.name)
field_values.append(getattr(self, field.name))
results = []
for values in product(*field_values):
result = {
**{
field_names[idx]: value
for idx, value in enumerate(values)
},
"output": self._compute_single(values)
}
results.append(result)
return results
>>> print(f"MyClass1:\n{MyClass1._show_inputs()}")
MyClass1:
{'param1': 'int', 'param2': 'int'}
>>> print(f"MyClass2:\n{MyClass2._show_inputs()}")
MyClass2:
{'param3': 'List', 'param4': 'List'}
>>> obj_1 = MyClass1(3,4)
>>> print(f"obj_1:\n{obj_1._json()}")
obj_1:
{"param1": 3, "param2": 4}
>>> obj_2 = MyClass2(["first", "second"],[4,2])._json()
>>> print(f"obj_2:\n{obj_2._json()}")
obj_2:
{"combinations": [{"param3": "first", "param4": 4, "output": "firstfirstfirstfirst"}, {"param3": "first", "param4": 2, "output": "firstfirst"}, {"param3": "second", "param4": 4, "output": "secondsecondsecondsecond"}, {"param3": "second", "param4": 2, "output": "secondsecond"}], "param3": ["first", "second"], "param4": [4, 2]}
NOTE 2: If you need to perform several computations per class, it might be a good idea to abstract away the pattern in the combinations property to avoid repeating code.
NOTE 3: If you need access to the properties several times and not ust once, you might want to consider caching their values to avoid re-computation.
Once you have an instance of MyClass / MyClass2, you can call vars() or vars().keys() and it will give you the attributes as a str. Unlike dir, it will not show all the builtin attributes/methods starting with __.
class MyClass2:
def __init__(self,param3=3,param4=4):
self.param3=param3
self.param4=param4
instance_of_myclass2 = MyClass2(param3="what", param4="ever")
print(vars(instance_of_myclass2))
{'param3': 'what', 'param4': 'ever'}
print(vars(instance_of_myclass2).keys())
dict_keys(['param3', 'param4'])
dir(instance_of_myclass2)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'param3', 'param4']

Trying to make a command to store and retreat info from a json file

Im trying to make a command that will store name,description and image of a character and another command to retrieve that data in an embed,but i have trouble working with json files
this is my code to add them:
#client.command()
async def addskillset(ctx):
await ctx.send("Let's add this skillset!")
questions = ["What is the monster name?","What is the monster description?","what is the monster image link?"]
answers = []
#code checking the questions results
embedkra = nextcord.Embed(title = f"{answers[0]}", description = f"{answers[1]}",color=ctx.author.color)
embedkra.set_image(url = f"{answers[2]}")
mess = await ctx.reply(embed=embedkra,mention_author=False)
await mess.add_reaction('✅')
await mess.add_reaction('❌')
def check(reaction, user):
return user == ctx.author and (str(reaction.emoji) == "✅" or "❌")
try:
reaction, user = await client.wait_for('reaction_add', timeout=1000.0, check=check)
except asyncio.TimeoutError:
#giving a message that the time is over
else:
if reaction.emoji == "✅":
monsters = await get_skillsets_data() #this data is added at the end
if str(monster_name) in monsters:
await ctx.reply("the monster is already added")
else:
monsters[str(monster_name)]["monster_name"] = {}
monsters[str(monster_name)]["monster_name"] = answers[0]
monsters[str(monster_name)]["monster_description"] = answers[1]
monsters[str(monster_name)]["monster_image"] = answers[2]
with open('skillsets.json','w') as f:
json.dump(monsters,f)
await mess.delete()
await ctx.reply(f"{answers[0]} successfully added to the list")
Code to get the embed with the asked info:
#client.command()
async def skilltest(ctx,*,monster_name):
data = open('skillsets.json').read()
data = json.loads(data)
if str(monster_name) in data:
name = data["monster_name"]
description = data["monster_description"]
link = data["monster_image"]
embedkra = nextcord.Embed(title = f"{name}", description = f"{description}",color=ctx.author.color)
embedkra.set_image(url = f"{link}")
await ctx.reply(embed=embedkra,mention_author=False)
else:
# otherwise, it is still None meaning we didn't find it
await ctx.reply("monster not found",mention_author=False)
and my json should look like this:
{"katufo": {"monster_name": "Katufo","Monster_description":"Katufo is the best","Monster_image":"#image_link"},
"armor claw":{"monster_name": "Armor Claw","Monster_description":"Armor claw is the best","Monster_image":#image_link}}
The get_skillsets_data used in first command:
async def get_skillsets_data():
with open('skillsets.json','r') as f:
monsters = json.load(f)
return monsters
Well, When you are trying to retrieve data from your json file try using name = data["katufo"]["monster_name"] now here it will only retrieve monster_name of key katufo. If You want to retrieve data for armor claw code must go like this name = data["armor claw"]["monster_name"]. So try this code :
#client.command()
async def skilltest(ctx,*,monster):
data = open('skillsets.json').read()
data = json.loads(data)
if str(monster) in data:
name = data[f"monster"]["monster_name"]
description = data[f"monster"]["Monster_description"]
link = data[f"monster"]["Monster_image"]
embedkra = nextcord.Embed(title = f"{name}", description = f"{description}",color=ctx.author.color)
embedkra.set_image(url = f"{link}")
await ctx.reply(embed=embedkra,mention_author=False)
else:
# otherwise, it is still None meaning we didn't find it
await ctx.reply("monster not found",mention_author=False)
Hope this works for you :)
If your json looks like what you showed above,
{
"katufo":{
"monster_name":"Katufo",
"Monster_description":"Katufo is the best",
"Monster_image":"#image_link"
},
"armor claw":{
"monster_name":"Armor Claw",
"Monster_description":"Armor claw is the best",
"Monster_image":"#image_link"
}
}
then there is no data["monster_name"] the two objects inside of your JSON are named katufo and armor_claw. To get one of them you can simply write data['katufo']['monster_name'] or data.katufo.monster_name.
Your problem stems from looking up the monster name like this:
if str(monster_name) in data:
name = data["monster_name"]
description = data["monster_description"]
link = data["monster_image"]
What you could do instead is loop through data, as it contains several monsters and then on each object, to the check that you do:
for monster in data:
if str(monster_name) in monster.values():
name = monster.monster_name
description = monster.Monster_description
link = monster.Monster_image
One thing to think about, the way the variables are named is not something I personally recommend. Don't be afraid of adding longer descriptive names so things make more sense for you in the code. Also, in the JSON you provided, there are certain attributes starting with a capital letter, something you should think about.
Edit:
Dicts in python are the equivalent of objects in Javascript and are initialized using the same syntax which we can see below:
monster_data = {}
But since you want a specific structure on these monsters we can go further and create a function called add_monster_object():
def add_monster_object(original_dict, new_monster):
new_monster = {
"monster_name": '',
"monster_description": '',
"monster_image": ''
}
#Now we have a new empty object with the correct names.
return original_dict.update(new_monster)
Now every time you run this function with a given name, in the dict there will be an object with that name. Example is if user writes armor_sword as the monster_name attribute, then we can call the function above as add_monster_object(original_dict, monster_name).
This will, if we take your initial dict as an example, return this:
{
"katufo":{
"monster_name":"Katufo",
"Monster_description":"Katufo is the best",
"Monster_image":"#image_link"
},
"armor claw":{
"monster_name":"Armor Claw",
"Monster_description":"Armor claw is the best",
"Monster_image":"#image_link"
},
"armor sword":{
"monster_name":"",
"monster_description":"",
"monster_image":""
}
}
Then you can populate them as you want, or update the function to take more parameters. The important part here is that you take a minute and figure out what you want to keep saved. Then make sure that you can read and write from file and you should have a somewhat simple structure going. Warning: This isn't a slap and dry method, you will also have to think about special cases, such as adding an object that already exists and soforth.
If you decide to go with Replit you could use their database to create similar functionality but you wouldn't have to worry about reading and writing to a file.
As it is right now, I still think you need to proceed with your bot, add some of the changes that I mentioned before the next actual problem arrives as there are many things that arent quite right. I also suggest you break everything into managing parts, 1 would be to read from a file. 2 would be to write. 3 to write a dict to a file. 4 to update a dict and soforth. Good luck!

unable to load csv from GCS bucket to BigQuery table accurately

I am trying to load the airbnb_nyc data set from GCS bucket to BigqueryTable. Link to the dataset.
I am using the following Code:
def parse_file(element):
for line in csv.reader([element],delimiter=','):
return line
class DataIngestion2:
def parse_method2(self, values):
row1 = dict(
zip(('id', 'name', 'host_id', 'host_name', 'neighbourhood_group', 'neighbourhood', 'latitude', 'longitude',
'room_type', 'price', 'minimum_nights', 'number_of_reviews', 'last_review', 'reviews_per_month',
'calculated_host_listings_count', 'availability_365'),
values))
return row1
with beam.Pipeline(options=pipeline_options) as p:
lines= p | 'Read' >> ReadFromText(known_args.input,skip_header_lines=1)\
| 'parse' >> beam.Map(parse_file)
pipeline2 = lines | 'Format to Dict _ original CSV' >> beam.Map(lambda x: data_ingestion2.parse_method2(x))
pipeline2 | 'Load2' >> beam.io.WriteToBigQuery(table_spec, schema=table_schema,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED
)
`
But my output on BigQuery Table is wrong.
I am only getting values for the first two columns and the rest of the 14 columns are showing NULL. I am not able to figure out what I am doing wrong. Can Someone Help me find the error in my logic. I basically want to know how to transfer a csv from GCS bucket to BigQuery through DataFlow pipeline.
Thank you,
You can use the ReadFromText method and then create your own transform by extending beam.DoFn. Attached the code below for reference.
https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.io.textio.html#apache_beam.io.textio.ReadFromText
Note that you can use gs:// for GCS in file_pattern.
More details about Pardo and DoFn
https://beam.apache.org/documentation/programming-guide/#pardo
import apache_beam as beam
from apache_beam.io.textio import ReadAllFromText,ReadFromText
from apache_beam.io.gcp.bigquery import WriteToBigQuery
from apache_beam.io.gcp.gcsio import GcsIO
import csv
COLUMN_NAMES = ['id','name','host_id','host_name','neighbourhood_group','neighbourhood','latitude','longitude','room_type','price','minimum_nights','number_of_reviews','last_review','reviews_per_month','calculated_host_listings_count','availability_365']
def files(path='gs:/some/path'):
return list(GcsIO(storage_client='<ur storage client>').list_prefix(path=path).keys())
def transform_csv(element):
rows = []
with open(element,newline='\r\n') as f:
itr = csv.reader(f, delimiter = ',',quotechar= '"')
skip_head = next(itr)
for row in itr:
rows.append(row)
return rows
def to_dict(element):
rows = []
for item in element:
row_dict = {}
zipped = zip(COLUMN_NAMES,item)
for key,val in zipped:
row_dict[key] =val
rows.append(row_dict)
yield rows
with beam.Pipeline() as p:
read =(
p
|'read-file'>> beam.Create(files())
|'transform-dict'>>beam.Map(transform_csv)
|'list-to-dict'>>beam.FlatMap(to_dict )
|'print'>>beam.Map(print)
#|'write-to-bq'>>WriteToBigQuery(schema=COLUMN_NAMES,table='ur table',project='',dataset='')
)
EDITED1 The ReadFromText supports \r\n as newline char.But,this fails to consider the condition where column data itself has \r\n. Updating the code below.
EDITED 2 GcsIo error fixed.
Note - I have used GCSIO for getting the list of files.
Details here
Please Up-vote and mark as answer if this helps.
Let me suggest another approch for this use case. BiqQuery offers special feature for uploading from Google Could Storage (GCS) to Bigquery. You can load data in several formats and CSV is among them.
There is nice tutorial on Google documentation explaining how to do it. You do not have to use Dataflow or apache_beam. Such process is available through BigQuery API itself.
This is working in many languages, but you do not have to use any language as such process can be done from console or via Cloud SDK using bq command. Everything can be found in mentioned tutorial.

how to evaluate String to identify duplicate keys

I have an input string in groovy which is not strictly JSON.
String str = "['OS_Node':['eth0':'1310','eth0':'1312']]"
My issue is to identify the duplicate "eth0" . I tried to convert this into map using Eval.me(), but it automatically removes the duplicate key "eth0" and gives me a Map.
What is the best way for me to identify the presence of duplicate key ?
Note: there could be multiple OS_Node1\2\3\ entries.. need to identify duplicates in each of them ?
Is there any JSON api that can be used? or need to use logic based on substring() ?
One way to solve this could be to cheat a little and replace colons with commas which would transform the maps into lists and then do a recursive search for duplicates:
def str = "['OS_Node':['eth0':'1310','eth0':'1312'], 'OS_Node':['eth1':'1310','eth1':'1312']]"
def tree = Eval.me(str.replaceAll(":", ","))
def dupes = findDuplicates(tree)
dupes.each { println it }
def findDuplicates(t, path=[], dupes=[]) {
def seen = [] as Set
t.collate(2).each { k, v ->
if (k in seen) dupes << [path: path + k]
seen << k
if (v instanceof List) findDuplicates(v, path+k, dupes)
}
dupes
}
when run, prints:
─➤ groovy solution.groovy
[path:[OS_Node, eth0]]
[path:[OS_Node]]
[path:[OS_Node, eth1]]
i.e. the method finds all paths to duplicated keys where "path" is defined as the key sequence required to navigate to the duplicate key.
The function returns a list of maps which you can then do whatever you wish with. Should be noted that the "OS_Node" key is with this logic treated as a duplicate but you could easily filter that out as a step after this function call.
First of all, the string you have there is not JSON - not only due to
the duplicate keys, but also by the use of [] for maps. This looks
a lot more like a groovy map literal. So if this is your custom format
and you can not do anything against it, I'd write a small parser for this,
because sooner or later edge cases or quoting problems come around the
corner.
#Grab("com.github.petitparser:petitparser-core:2.3.1")
import org.petitparser.tools.GrammarDefinition
import org.petitparser.tools.GrammarParser
import org.petitparser.parser.primitive.CharacterParser as CP
import org.petitparser.parser.primitive.StringParser as SP
import org.petitparser.utils.Functions as F
class MappishGrammerDefinition extends GrammarDefinition {
MappishGrammerDefinition() {
define("start", ref("map"))
define("map",
CP.of("[" as Character)
.seq(ref("kv-pairs"))
.seq(CP.of("]" as Character))
.map{ it[1] })
define("kv-pairs",
ref("kv-pair")
.plus()
.separatedBy(CP.of("," as Character))
.map{ it.collate(2)*.first()*.first() })
define("kv-pair",
ref('key')9
.seq(CP.of(":" as Character))
.seq(ref('val'))
.map{ [it[0], it[2]] })
define("key",
ref("quoted"))
define("val",
ref("quoted")
.or(ref("map")))
define("quoted",
CP.anyOf("'")
.seq(SP.of("\\''").or(CP.pattern("^'")).star().flatten())
.seq(CP.anyOf("'"))
.map{ it[1].replace("\\'", "'") })
}
// Helper for `def`, which is a keyword in groovy
void define(s, p) { super.def(s,p) }
}
println(new GrammarParser(new MappishGrammerDefinition()).parse("['OS_Node':['eth0':'1310','eth0':'1312'],'OS_Node':['eth0':'42']]").get())
// → [[OS_Node, [[eth0, 1310], [eth0, 1312]]], [OS_Node, [[eth0, 42]]]]

dumping list to JSON file creates list within a list [["x", "y","z"]], why?

I want to append multiple list items to a JSON file, but it creates a list within a list, and therefore I cannot acces the list from python. Since the code is overwriting existing data in the JSON file, there should not be any list there. I also tried it by having just an text in the file without brackets. It just creates a list within a list so [["x", "y","z"]] instead of ["x", "y","z"]
import json
filename = 'vocabulary.json'
print("Reading %s" % filename)
try:
with open(filename, "rt") as fp:
data = json.load(fp)
print("Data: %s" % data)#check
except IOError:
print("Could not read file, starting from scratch")
data = []
# Add some data
TEMPORARY_LIST = []
new_word = input("give new word: ")
TEMPORARY_LIST.append(new_word.split())
print(TEMPORARY_LIST)#check
data = TEMPORARY_LIST
print("Overwriting %s" % filename)
with open(filename, "wt") as fp:
json.dump(data, fp)
example and output with appending list with split words:
Reading vocabulary.json
Data: [['my', 'dads', 'house', 'is', 'nice']]
give new word: but my house is nicer
[['but', 'my', 'house', 'is', 'nicer']]
Overwriting vocabulary.json
So, if I understand what you are trying to accomplish correctly, it looks like you are trying to overwrite a list in a JSON file with a new list created from user input. For easiest data manipulation, set up your JSON file in dictionary form:
{
"words": [
"my",
"dad's",
"house",
"is",
"nice"
]
}
You should then set up functions to separate your functionality to make it more manageable:
def load_json(filename):
with open(filename, "r") as f:
return json.load(f)
Now, we can use those functions to load the JSON, access the words list, and overwrite it with the new word.
data = load_json("vocabulary.json")
new_word = input("Give new word: ").split()
data["words"] = new_word
write_json("vocabulary.json", data)
If the user inputs "but my house is nicer", the JSON file will look like this:
{
"words": [
"but",
"my",
"house",
"is",
"nicer"
]
}
Edit
Okay, I have a few suggestions to make before I get into solving the issue. Firstly, it's great that you have delegated much of the functionality of the program over to respective functions. However, using global variables is generally discouraged because it makes things extremely difficult to debug as any of the functions that use that variable could have mutated it by accident. To fix this, use method parameters and pass around the data accordingly. With small programs like this, you can think of the main() method as the point in which all data comes to and from. This means that the main() function will pass data to other functions and receive new or edited data back. One final recommendation, you should only be using all capital letters for variable names if they are going to be constant. For example, PI = 3.14159 is a constant, so it is conventional to make "pi" all caps.
Without using global, main() will look much cleaner:
def main():
choice = input("Do you want to start or manage the list? (start/manage)")
if choice == "start":
data = load_json()
words = data["words"]
dictee(words)
elif choice == "manage":
manage_list()
You can use the load_json() function from earlier (notice that I deleted write_json(), more on that later) if the user chooses to start the game. If the user chooses to manage the file, we can write something like this:
def manage_list():
choice = input("Do you want to add or clear the list? (add/clear)")
if choice == "add":
words_to_add = get_new_words()
add_words("vocabulary.json", words_to_add)
elif choice == "clear":
clear_words("vocabulary.json")
We get the user input first and then we can call two other functions, add_words() and clear_words():
def add_words(filename, words):
with open(filename, "r+") as f:
data = json.load(f)
data["words"].extend(words)
f.seek(0)
json.dump(data, f, indent=4)
def clear_words(filename):
with open(filename, "w+") as f:
data = {"words":[]}
json.dump(data, f, indent=4)
I did not utilize the load_json() function in the two functions above. My reasoning for this is because it would call for opening the file more times than needed, which would hurt performance. Furthermore, in these two functions, we already need to open the file, so it is okayt to load the JSON data here because it can be done with only one line: data = json.load(f). You may also notice that in add_words(), the file mode is "r+". This is the basic mode for reading and writing. "w+" is used in clear_words(), because "w+" not only opens the file for reading and writing, it overwrites the file if the file exists (that is also why we don't need to load the JSON data in clear_words()). Because we have these two functions for writing and/or overwriting data, we don't need the write_json() function that I had initially suggested.
We can then add to the list like so:
>>> Do you want to start or manage the list? (start/manage)manage
>>> Do you want to add or clear the list? (add/clear)add
>>> Please enter the words you want to add, separated by spaces: these are new words
And the JSON file becomes:
{
"words": [
"but",
"my",
"house",
"is",
"nicer",
"these",
"are",
"new",
"words"
]
}
We can then clear the list like so:
>>> Do you want to start or manage the list? (start/manage)manage
>>> Do you want to add or clear the list? (add/clear)clear
And the JSON file becomes:
{
"words": []
}
Great! Now, we implemented the ability for the user to manage the list. Let's move on to creating the functionality for the game: dictee()
You mentioned that you want to randomly select an item from a list and remove it from that list so it doesn't get asked twice. There are a multitude of ways you can accomplish this. For example, you could use random.shuffle:
def dictee(words):
correct = 0
incorrect = 0
random.shuffle(words)
for word in words:
# ask word
# evaluate response
# increment correct/incorrect
# ask if you want to play again
pass
random.shuffle randomly shuffles the list around. Then, you can iterate throught the list using for word in words: and start the game. You don't necessarily need to use random.choice here because when using random.shuffle and iterating through it, you are essentially selecting random values.
I hope this helped illustrate how powerful functions and function parameters are. They not only help you separate your code, but also make it easier to manage, understand, and write cleaner code.