How to use implement a ListGroup of ListGroupItem (s) dynamically in Plotly Dash? - plotly-dash

I want to create a list of words that updates dynamically in a Dash application.
For that, I would like to use ListGroup and ListGroupItem from the dash_bootstrap_components library.
Now, my callback bellow would work if the children attribute of the ListGroup component would be have the same type on input as well as on output. However, even if my output is a list, which is clearly an accepted type for the children of ListGroup, what I am reading in from the function is a dictionary. of the type: {'props': {'children': 'sample_word'}, 'type': 'ListGroupItem', 'namespace': 'dash_bootstrap_components'}.
The question then is, how do I get as input from the ListGroup component the list of ListGroupItem components I am returning in the callback?
import dash
import dash_bootstrap_components as dbc
import dash_html_components as html
app.layout = html.Div(children=[dbc.Row(dbc.ListGroup(id='list-group-items'))])
#app.callback([Output('list-group-items','children')],
[Input('input-domain-specific-words','value'),
Input('add-button','n_clicks'),
Input('delete-button','n_clicks'),
Input('reset-button','n_clicks')],
[State('list-group-items','children')])
def update_list(word,n_clicks_add,n_clicks_delete,n_clicks_reset,listChildren):
ctx = dash.callback_context
if not ctx.triggered:
raise dash.exceptions.PreventUpdate
else:
button_id = ctx.triggered[0]['prop_id'].split('.')[0]
if button_id not in ['add-button','delete-button','reset_button']:
raise dash.exceptions.PreventUpdate
else:
if button_id == 'delete-button':
for item in listChildren:
if item.children == word:
listChildren.remove(item)
elif button_id == 'add-button':
listChildren.append(dbc.ListGroupItem(word))
elif button_id == 'reset-button':
listChildren = []
return [listChildren]

I expect this might be useful later on, even if only for myself. The answer is that the ListGroup component does return a list of ListGroupItem components that are written in JSON when collected in the callback. Therefore, whenever you append a ListGroupItem component to a ListGroup it will get added as a ListGroupItem() element, however, when you get it back it will read like a JSON. Bellow, I am showing an example from my code: string1 and string2 have been previously added to the ListGroup component, however, string3 was added in the current callback, therefore it appears as a ListGroupItem element rather than a JSON.
[{'props': {'children': 'string1'}, 'type': 'ListGroupItem', 'namespace': 'dash_bootstrap_components'}, {'props': {'children': 'string2'}, 'type': 'ListGroupItem', 'namespace': 'dash_bootstrap_components'}, ListGroupItem('string3')]
My final version for my specific callback is:
#app.callback(
[Output('list-group-items','children')],
[Input('input-domain-specific-words','value'),
Input('add-button','n_clicks'),
Input('delete-button','n_clicks'),
Input('reset-button','n_clicks')],
[State('list-group-items','children')],
)
def updateWordList(word,n_clicks_add,n_clicks_delete,n_clicks_reset,listChildren):
ctx = dash.callback_context
if not ctx.triggered:
raise dash.exceptions.PreventUpdate
else:
button_id = ctx.triggered[0]['prop_id'].split('.')[0]
if button_id not in ['add-button','delete-button','reset-button']:
raise dash.exceptions.PreventUpdate
else:
if not listChildren:
listChildren = []
if button_id == 'delete-button':
for item in listChildren:
if item['props']['children'] == word:
listChildren.remove(item)
elif button_id == 'add-button':
listChildren.append(dbc.ListGroupItem(word))
elif button_id == 'reset-button':
print('pressed reset')
return [[]]
return [listChildren]

Related

scrapy export into single row

I'm trying to scrape store locations into a csv using scrapy. I'm capturing the right data, but the output looks like this (with "name" field as an example)
csv output
Code:
import scrapy
from xx.items import xxItem
class QuotesSpider(scrapy.Spider):
name = 'xx_spider'
allowed_domains = ['www.my.xx.com']
start_urls = [
'https://my.xx.com/storefinder/list/a',
]
def parse(self, response):
rows = response.css('div.col-md-4.col-sm-6')
for row in rows:
item = xxItem()
item['name'] = rows.css('h3::text').extract()
item['address'] = rows.css('p::text').extract()
return item
A return statement is used to end the execution of the function call
and “returns” the result (value of the expression following the return
keyword) to the caller.
Reference link.
Hence, when you use return keyword, your code execution stops. Instead, you need to use yield keyword.
What does the “yield” keyword do?
Solution:
Replace statement return item with yield item and move it to the for loop scope:
Code with changes:
import scrapy
from xx.items import xxItem
class QuotesSpider(scrapy.Spider):
name = 'xx_spider'
allowed_domains = ['www.my.xx.com']
start_urls = [
'https://my.xx.com/storefinder/list/a',
]
def parse(self, response):
rows = response.css('div.col-md-4.col-sm-6')
for row in rows:
item = xxItem()
item['name'] = row.css('h3::text').extract()
item['address'] = row.css('p::text').extract()
yield item
To store data in csv file, run your spider using command:
scrapy crawl xx_spider -o output_file.csv
Hope it helps :)

A pickle with jsonpickle (Python 3.7)

I have an issue with using jsonpickle. Rather, I believe it to be working correctly but it's not producing the output I want.
I have a class called 'Node'. In 'Node' are four ints (x, y, width, height) and a StringVar called 'NodeText'.
The problem with serialising a StringVar is that there's lots of information in there and for me it's just not necessary. I use it when the program's running, but for saving and loading it's not needed.
So I used a method to change out what jsonpickle saves, using the __getstate__ method for my Node. This way I can do this:
def __getstate__(self):
state = self.__dict__.copy()
del state['NodeText']
return state
This works well so far and NodeText isn't saved. The problem comes on a load. I load the file as normal into an object (in this case a list of nodes).
The problem loaded is this: the items loaded from json are not Nodes as defined in my class. They are almost the same (they have x, y, width and height) but because NodeText wasn't saved in the json file, these Node-like objects do not have that property. This then causes an error when I create a visual instance on screen of these Nodes because the StringVar is used for the tkinter Entry textvariable.
I would like to know if there is a way to load this 'almost node' into my actual Nodes. I could just copy every property one at a time into a new instance but this just seems like a bad way to do it.
I could also null the NodeText StringVar before saving (thus saving the space in the file) and then reinitialise it on loading. This would mean I'd have my full object, but somehow it seems like an awkward workaround.
If you're wondering just how much more information there is with the StringVar, my test json file has just two Nodes. Just saving the basic properties (x,y,width,height), the file is 1k. With each having a StringVar, that becomes 8k. I wouldn't care so much in the case of a small increase, but this is pretty huge.
Can I force the load to be to this Node type rather than just some new type that Python has created?
Edit: if you're wondering what the json looks like, take a look here:
{
"1": {
"py/object": "Node.Node",
"py/state": {
"ImageLocation": "",
"TextBackup": "",
"height": 200,
"uID": 1,
"width": 200,
"xPos": 150,
"yPos": 150
}
},
"2": {
"py/object": "Node.Node",
"py/state": {
"ImageLocation": "",
"TextBackup": "",
"height": 200,
"uID": 2,
"width": 100,
"xPos": 50,
"yPos": 450
}
}
}
Since the class name is there I assumed it would be an instantiation of the class. But when you load the file using jsonpickle, you get the dictionary and can inspect the loaded data and inspect each node. Neither node contains the property 'NodeText'. That is to say, it's not something with 'None' as the value - the attribute simple isn't there.
That's because jsonpickle doesn't know which fields are in your object normally, it restores only the fields passed from the state but the state doesn't field NodeText property. So it just misses it :)
You can add a __setstate__ magic method to achieve that property in your restored objects. This way you will be able to handle dumps with or without the property.
def __setstate__(self, state):
state.setdefault('NodeText', None)
for k, v in state.items():
setattr(self, k, v)
A small example
from pprint import pprint, pformat
import jsonpickle
class Node:
def __init__(self) -> None:
super().__init__()
self.NodeText = Node
self.ImageLocation = None
self.TextBackup = None
self.height = None
self.uID = None
self.width = None
self.xPos = None
self.yPos = None
def __setstate__(self, state):
state.setdefault('NodeText', None)
for k, v in state.items():
setattr(self, k, v)
def __getstate__(self):
state = self.__dict__.copy()
del state['NodeText']
return state
def __repr__(self) -> str:
return str(self.__dict__)
obj1 = Node()
obj1.NodeText = 'Some heavy description text'
obj1.ImageLocation = 'test ImageLocation'
obj1.TextBackup = 'test TextBackup'
obj1.height = 200
obj1.uID = 1
obj1.width = 200
obj1.xPos = 150
obj1.yPos = 150
print('Dumping ...')
dumped = jsonpickle.encode({1: obj1})
print(dumped)
print('Restoring object ...')
print(jsonpickle.decode(dumped))
outputs
# > python test.py
Dumping ...
{"1": {"py/object": "__main__.Node", "py/state": {"ImageLocation": "test ImageLocation", "TextBackup": "test TextBackup", "height": 200, "uID": 1, "width": 200, "xPos": 150, "yPos": 150}}}
Restoring object ...
{'1': {'ImageLocation': 'test ImageLocation', 'TextBackup': 'test TextBackup', 'height': 200, 'uID': 1, 'width': 200, 'xPos': 150, 'yPos': 150, 'NodeText': None}}

How to use ijson/other to parse this large JSON file?

I have this massive json file (8gb), and I run out of memory when trying to read it in to Python. How would I implement a similar procedure using ijson or some other library that is more efficient with large json files?
import pandas as pd
#There are (say) 1m objects - each is its json object - within in this file.
with open('my_file.json') as json_file:
data = json_file.readlines()
#So I take a list of these json objects
list_of_objs = [obj for obj in data]
#But I only want about 200 of the json objects
desired_data = [obj for obj in list_of_objs if object['feature']=="desired_feature"]
How would I implement this using ijson or something similar? Is there a way I can extract the objects I want without reading in the whole JSON file?
The file is a list of objects like:
{
"review_id": "zdSx_SD6obEhz9VrW9uAWA",
"user_id": "Ha3iJu77CxlrFm-vQRs_8g",
"business_id": "tnhfDv5Il8EaGSXZGiuQGg",
"stars": 4,
"date": "2016-03-09",
"text": "Great place to hang out after work: the prices are decent, and the ambience is fun. It's a bit loud, but very lively. The staff is friendly, and the food is good. They have a good selection of drinks.",
"useful": 0,
"funny": 0,
}
The file is a list of objects
This is a little ambiguous. Looking at your code snippet it looks like your file contains separate JSON object on each line. Which is not the same as the actual JSON array that starts with [, ends with ] and has , between items.
In the case of a json-per-line file it's as easy as:
import json
from itertools import islice
with(open(filename)) as f:
objects = (json.loads(line) for line in f)
objects = islice(objects, 200)
Note the differences:
you don't need .readlines(), the file object itself is an iterable that yields individual lines
parentheses (..) instead of brackets [..] in (... for line in f) create a lazy generator expression instead of a Python list in memory with all the lines
islice(objects, 200) will give you the first 200 items without iterating further. If objects would've been a list you could just do objects[:200]
Now, if your file is actually a JSON array then you indeed need ijson:
import ijson # or choose a faster backend if needed
from itertools import islice
with open(filename) as f:
objects = ijson.items(f, 'item')
objects = islice(objects, 200)
ijson.items returns a lazy iterator over a parsed array. The 'item' in the second parameter means "each item in a top-level array".
The problem is that not all JSON comes nicely formatted and you cannot rely on line-by-line parsing to extract your objects.
I understood your "acceptance criteria" as "want to collect only those JSON objects whose specified keys contain specified values". For example, only collecting objects about a person if that person's name is "Bob". The following function will provide a list of all objects that fit your criteria. Parsing is done character by character (something that would be much more efficient in C, but Python is still pretty good). This should be more robust because it doesn't care about newlines, formatting etc. I tested this on both formatted and unformatted JSON with 1,000,000 objects.
import json
def parse_out_objects(file, feature, desired_value):
with open(file) as f:
compose_object_flag = False
ignore_characters_flag = False
object_string = ''
selected_objects = []
json_object = None
while True:
c = f.read(1)
if c == '"':
ignore_characters_flag = not ignore_characters_flag
if c == '{' and ignore_characters_flag == False:
compose_object_flag = True
if c == '}' and compose_object_flag == True and ignore_characters_flag == False:
compose_object_flag = False
object_string = object_string + '}'
json_object = json.loads(object_string)
if json_object[feature] == desired_value:
selected_objects.append(json_object)
object_string = ''
if compose_object_flag == True:
object_string = object_string + c
if not c:
break
return selected_objects

filter particular field name and value from field_dict of package django-reversion

I have a function which returns json data as history from Version of reversion.models.
from django.http import HttpResponse
from reversion.models import Version
from django.contrib.admin.models import LogEntry
import json
def history_list(request):
history_list = Version.objects.all().order_by('-revision__date_created')
data = []
for i in history_list:
data.append({
'date_time': str(i.revision.date_created),
'user': str(i.revision.user),
'object': i.object_repr,
'field': i.revision.comment.split(' ')[-1],
'new_value_field': str(i.field_dict),
'type': i.content_type.name,
'comment': i.revision.comment
})
data_ser = json.dumps(data)
return HttpResponse(data_ser, content_type="application/json")
When I run the above snippet I get the output json as
[{"type": "fruits", "field": "colour", "object": "anyobject", "user": "anyuser", "new_value_field": "{'price': $23, 'weight': 2kgs, 'colour': 'red'}", "comment": "Changed colour."}]
From the function above,
'comment': i.revision.comment
returns json as "comment": "changed colour" and colour is the field which I have written in the function to retrieve it from comment as
'field': i.revision.comment.split(' ')[-1]
But i assume getting fieldname and value from field_dict is a better approach
Problem: from the above json list I would like to filter new_field_value and old_value. In the new_filed_value only value of colour.
Getting the changed fields isn't as easy as checking the comment, as this can be overridden.
Django-reversion just takes care of storing each version, not comparing.
Your best option is to look at the django-reversion-compare module and its admin.py code.
The majority of the code in there is designed to produce a neat side-by-side HTML diff page, but the code should be able to be re-purposed to generate a list of changed fields per object (as there can be more than one changed field per version).
The code should* include a view independent way to get the changed fields at some point, but this should get you started:
from reversion_compare.admin import CompareObjects
from reversion.revisions import default_revision_manager
def changed_fields(obj, version1, version2):
"""
Create a generic html diff from the obj between version1 and version2:
A diff of every changes field values.
This method should be overwritten, to create a nice diff view
coordinated with the model.
"""
diff = []
# Create a list of all normal fields and append many-to-many fields
fields = [field for field in obj._meta.fields]
concrete_model = obj._meta.concrete_model
fields += concrete_model._meta.many_to_many
# This gathers the related reverse ForeignKey fields, so we can do ManyToOne compares
reverse_fields = []
# From: http://stackoverflow.com/questions/19512187/django-list-all-reverse-relations-of-a-model
changed_fields = []
for field_name in obj._meta.get_all_field_names():
f = getattr(
obj._meta.get_field_by_name(field_name)[0],
'field',
None
)
if isinstance(f, models.ForeignKey) and f not in fields:
reverse_fields.append(f.rel)
fields += reverse_fields
for field in fields:
try:
field_name = field.name
except:
# is a reverse FK field
field_name = field.field_name
is_reversed = field in reverse_fields
obj_compare = CompareObjects(field, field_name, obj, version1, version2, default_revision_manager, is_reversed)
if obj_compare.changed():
changed_fields.append(field)
return changed_fields
This can then be called like so:
changed_fields(MyModel,history_list_item1, history_list_item2)
Where history_list_item1 and history_list_item2 correspond to various actual Version items.
*: Said as a contributor, I'll get right on it.

WxPython MultiLine TextCtrl to 2D Array

I'm trying to use a multiline TextCtrl to get user inputted multiline CSV data and convert it to a 2D array. I've not been successful. Any ideas? Thanks in advance.
import wx
import csv
import StringIO
Title="MultiLine TextCtrl to 2D Array"
class MainFrame(wx.Frame):
def __init__(self,title):
wx.Frame.__init__(self, None, title=title, pos=(140,140), size=(320,300))
panel=Panel(self)
class Panel(wx.Panel):
def __init__(self, parent):
wx.Panel.__init__(self, parent)
self.InputData= wx.TextCtrl(self, value="1,2,3,4", pos=(20, 20), size=(150,200), style=wx.TE_MULTILINE)
self.button =wx.Button(self, label="GO", pos=(200,200))
self.Bind(wx.EVT_BUTTON, self.OnClick,self.button)
def OnClick(self,event):
DataString=self.InputData.GetValue()
f=StringIO.StringIO(DataString)
reader = csv.reader(f, delimiter=',')
x=list(reader)
print x
print x[0,0]
if __name__=="__main__":
app = wx.App(redirect=False)
frame = MainFrame(Title)
frame.Show()
app.MainLoop()
I would use wxGrid, but I want to be able to paste CSV text into the field, and I don't know of a way to do that with wxGrid. Here is a sample of the data I want to be able to paste into the field:
Point,X,Y,Z
1,-.500,-15.531,.000
2,.000,-15.531,2.354
3,.000,-14.719,2.354
4,.000,-14.719,2.273
5,.000,-14.531,2.273
csv.reader returns a list of lists, so you would print out an element using a term like x[3][2], where 3 would select the row, and 2 would select the column.
Indexing likex[3,2] does not work for a list of list, but is more for multidimensional arrays, like would be used with numpy.