I have stacked graph with multiple traces (for Priority High, Low), the request data comes from CSV file which may or may not have a priority (say High). I add the traces to the graph but it throws exception if we dont get High priority requests in CSV.
figure_priority={
'data': [
trace2,
trace1,
What I did is conditional check to verify priority is present in the dataframe and then returning related figure i.e.
if High in Col.list && Low in Col.list:
trace2 = go.Bar(x=pv.index, y=pv[('Number', 'High')]
trace1 = go.Bar(x=pv.index, y=pv[('Number', 'Low')]
figure_priority={
'data': [
trace2,
trace1,
elif Low in Col.list :
trace = go.Bar(x=pv.index, y=pv[('Number', 'Low')]
figure_priority={
'data': [
trace2,
but I think there would be easier way to do this logic, what happens when I get High, Low and Medium priorities
How would I check if specific group is in Columns and add traces to stacked chart based on content..
As mentioned above, here is code snippet with color for stacked bar, any optimization is more than welcome..
app_type=[('High','#7a5195'),('Low','#ffa600'),('Medium','#ffa6CC')]
traces =[]
for app_type,color in app_types:
if app_type in str(pv.columns.tolist())
traces.append(
go.Bar(x=pv.index, y=pv[('Number', app_type)], text=pv[('Number', app_type)], textposition = 'auto',
opacity=0.8,
marker={"color":color,
"line": {
"color": "#cdcdcd",
"width": 2,
},
},
name=app_type)
)
figure_new={
"data":
traces
,
'layout': go.Layout(
You could use a for loop and fill up a list of traces:
types = ['High', 'Medium', 'Low']
traces = []
for type in types:
if type in Col.list:
traces.append(go.Bar(x=pv.index, y=pv[('Number', type)])
Related
I am using ObservableHQ and vega lite API to do data visualizations and have faced a problem I can't figure out.
The problem is that, I would like to access data object from the following data structure,
Array
Array
Array
Item
Item
Array
As you can see in my bad drawing, I have a multidimensional array and would like to access a specific array from the main array. How can I do that using Vegalite API?
vl.markCircle({
thickness: 4,
bandSize: 2
})
.data(diff[0])
.encode(
vl.x().fieldQ("mins").scale({ domain: [-60, 60] }),
vl.color().fieldN('type').scale({ range: ['#636363', '#f03b20'] }),
)
.config({bandSize: 10})
.width(600)
.height(40)
.render()
Thank you,
Based on your comments, I’m assuming that you’re trying to automatically chart all of the nested arrays (separately), not just one of them. And based on your chart code, I’m assuming that your data looks sorta like this:
const diff = [
[
{ mins: 38, type: "Type B" },
{ mins: 30, type: "Type B" },
{ mins: 28, type: "Type A" },
…
],
[
{ mins: 20, type: "Type B" },
{ mins: 17, type: "Type A" },
{ mins: 19, type: "Type A" },
…
],
…
];
First, flatten all the arrays into one big array, and record which array each came from with a new array property on the item object, with flatMap. If each child array represents, say, a different city, or a different year, or a different person collecting the data, you could replace array: i with something more meaningful about the data.
const flat = diff.flatMap((arr, i) => arr.map((d) => ({ ...d, array: i })));
Then use Vega-Lite’s “faceting” (documentation, Observable tutorial and examples) to make split the chart into sections, one for each value of array: i, with shared scales. This just adds one line to your example:
vl
.markCircle({
thickness: 4,
bandSize: 2
})
.data(flat)
.encode(
vl.row().fieldN("array"), // this line is new
vl
.x()
.fieldQ("mins")
.scale({ domain: [-60, 60] }),
vl
.color()
.fieldN("type")
.scale({ range: ["#636363", "#f03b20"] })
)
.config({ bandSize: 10 })
.width(600)
.height(40)
.render()
Here’s an Observable notebook with examples of this working. As I show there at the bottom, you can also map over your array to make a totally separate chart for each nested array.
I have an issue with using jsonpickle. Rather, I believe it to be working correctly but it's not producing the output I want.
I have a class called 'Node'. In 'Node' are four ints (x, y, width, height) and a StringVar called 'NodeText'.
The problem with serialising a StringVar is that there's lots of information in there and for me it's just not necessary. I use it when the program's running, but for saving and loading it's not needed.
So I used a method to change out what jsonpickle saves, using the __getstate__ method for my Node. This way I can do this:
def __getstate__(self):
state = self.__dict__.copy()
del state['NodeText']
return state
This works well so far and NodeText isn't saved. The problem comes on a load. I load the file as normal into an object (in this case a list of nodes).
The problem loaded is this: the items loaded from json are not Nodes as defined in my class. They are almost the same (they have x, y, width and height) but because NodeText wasn't saved in the json file, these Node-like objects do not have that property. This then causes an error when I create a visual instance on screen of these Nodes because the StringVar is used for the tkinter Entry textvariable.
I would like to know if there is a way to load this 'almost node' into my actual Nodes. I could just copy every property one at a time into a new instance but this just seems like a bad way to do it.
I could also null the NodeText StringVar before saving (thus saving the space in the file) and then reinitialise it on loading. This would mean I'd have my full object, but somehow it seems like an awkward workaround.
If you're wondering just how much more information there is with the StringVar, my test json file has just two Nodes. Just saving the basic properties (x,y,width,height), the file is 1k. With each having a StringVar, that becomes 8k. I wouldn't care so much in the case of a small increase, but this is pretty huge.
Can I force the load to be to this Node type rather than just some new type that Python has created?
Edit: if you're wondering what the json looks like, take a look here:
{
"1": {
"py/object": "Node.Node",
"py/state": {
"ImageLocation": "",
"TextBackup": "",
"height": 200,
"uID": 1,
"width": 200,
"xPos": 150,
"yPos": 150
}
},
"2": {
"py/object": "Node.Node",
"py/state": {
"ImageLocation": "",
"TextBackup": "",
"height": 200,
"uID": 2,
"width": 100,
"xPos": 50,
"yPos": 450
}
}
}
Since the class name is there I assumed it would be an instantiation of the class. But when you load the file using jsonpickle, you get the dictionary and can inspect the loaded data and inspect each node. Neither node contains the property 'NodeText'. That is to say, it's not something with 'None' as the value - the attribute simple isn't there.
That's because jsonpickle doesn't know which fields are in your object normally, it restores only the fields passed from the state but the state doesn't field NodeText property. So it just misses it :)
You can add a __setstate__ magic method to achieve that property in your restored objects. This way you will be able to handle dumps with or without the property.
def __setstate__(self, state):
state.setdefault('NodeText', None)
for k, v in state.items():
setattr(self, k, v)
A small example
from pprint import pprint, pformat
import jsonpickle
class Node:
def __init__(self) -> None:
super().__init__()
self.NodeText = Node
self.ImageLocation = None
self.TextBackup = None
self.height = None
self.uID = None
self.width = None
self.xPos = None
self.yPos = None
def __setstate__(self, state):
state.setdefault('NodeText', None)
for k, v in state.items():
setattr(self, k, v)
def __getstate__(self):
state = self.__dict__.copy()
del state['NodeText']
return state
def __repr__(self) -> str:
return str(self.__dict__)
obj1 = Node()
obj1.NodeText = 'Some heavy description text'
obj1.ImageLocation = 'test ImageLocation'
obj1.TextBackup = 'test TextBackup'
obj1.height = 200
obj1.uID = 1
obj1.width = 200
obj1.xPos = 150
obj1.yPos = 150
print('Dumping ...')
dumped = jsonpickle.encode({1: obj1})
print(dumped)
print('Restoring object ...')
print(jsonpickle.decode(dumped))
outputs
# > python test.py
Dumping ...
{"1": {"py/object": "__main__.Node", "py/state": {"ImageLocation": "test ImageLocation", "TextBackup": "test TextBackup", "height": 200, "uID": 1, "width": 200, "xPos": 150, "yPos": 150}}}
Restoring object ...
{'1': {'ImageLocation': 'test ImageLocation', 'TextBackup': 'test TextBackup', 'height': 200, 'uID': 1, 'width': 200, 'xPos': 150, 'yPos': 150, 'NodeText': None}}
I have a file composed of a single array containing multiple records.
{
"Client": [
{
"ClientNo": 1,
"ClientName": "Alpha",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "12345"
},
{
"BusinessNo": 2,
"IndustryCode": "23456"
}
]
},
{
"ClientNo": 2,
"ClientName": "Bravo",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "34567"
},
{
"BusinessNo": 2,
"IndustryCode": "45678"
}
]
}
]
}
I load it with the following code:
create or replace stage stage.test
url='azure://xxx/xxx'
credentials=(azure_sas_token='xxx');
create table if not exists stage.client (json_data variant not null);
copy into stage.client_test
from #stage.test/client_test.json
file_format = (type = 'JSON' strip_outer_array = true);
Snowflake imports the entire file as one row.
I would like the the COPY INTO command to remove the outer array structure and load the records into separate table rows.
When I load larger files, I hit the size limit for variant and get the error Error parsing JSON: document is too large, max size 16777216 bytes.
If you can import the file into Snowflake, into a single row, then you can use LATERAL FLATTEN on the Clients field to generate one row per element in the array.
Here's a blog post on LATERAL and FLATTEN (or you could look them up in the snowflake docs):
https://support.snowflake.net/s/article/How-To-Lateral-Join-Tutorial
If the format of the file is, as specified, a single object with a single property that contains an array with 500 MB worth of elements in it, then perhaps importing it will still work -- if that works, then LATERAL FLATTEN is exactly what you want. But that form is not particularly great for data processing. You might want to use some text processing script to massage the data if that's needed.
RECOMMENDATION #1:
The problem with your JSON is that it doesn't have an outer array. It has a single outer object containing a property with an inner array.
If you can fix the JSON, that would be the best solution, and then STRIP_OUTER_ARRAY will work as you expected.
You could also try to recompose the JSON (an ugly business) after reading line for line with:
CREATE OR REPLACE TABLE X (CLIENT VARCHAR);
COPY INTO X FROM (SELECT $1 CLIENT FROM #My_Stage/Client.json);
User Response to Recommendation #1:
Thank you. So from what I gather, COPY with STRIP_OUTER_ARRAY can handle a file starting and ending with square brackets, and parse the file as if they were not there.
The real files don't have line breaks, so I can't read the file line by line. I will see if the source system can change the export.
RECOMMENDATION #2:
Also if you would like to see what the JSON parser does, you can experiment using this code, I have parsed JSON on the copy command using similar code. Working with your JSON data in small project can help you shape the Copy command to work as intended.
CREATE OR REPLACE TABLE SAMPLE_JSON
(ID INTEGER,
DATA VARIANT
);
INSERT INTO SAMPLE_JSON(ID,DATA)
SELECT
1,parse_json('{
"Client": [
{
"ClientNo": 1,
"ClientName": "Alpha",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "12345"
},
{
"BusinessNo": 2,
"IndustryCode": "23456"
}
]
},
{
"ClientNo": 2,
"ClientName": "Bravo",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "34567"
},
{
"BusinessNo": 2,
"IndustryCode": "45678"
}
]
}
]
}');
SELECT
C.value:ClientNo AS ClientNo
,C.value:ClientName::STRING AS ClientName
,ClientBusiness.value:BusinessNo::Integer AS BusinessNo
,ClientBusiness.value:IndustryCode::Integer AS IndustryCode
from SAMPLE_JSON f
,table(flatten( f.DATA,'Client' )) C
,table(flatten(c.value:ClientBusiness,'')) ClientBusiness;
User Response to Recommendation #2:
Thank you for the parse_json example!
Trouble is, the real files are sometimes 500 MB, so the parse_json function chokes.
Follow-up on Recommendation #2:
The JSON needs to be in the NDJSON http://ndjson.org/ format. Otherwise the JSON will be impossible to parse because of the potential for large files.
Hope the above helps other running into similar questions!
Please consider this dataset:
type Deck = JsonProvider<"...">
let dt = Deck.GetSamples()
dt
[{"collectible":true,"health":4,"artist":"Zoltan Boros","type":"MINION","cost":1,"attack":2},
{"collectible":true,"health":8,"artist":"James Ryman","type":"MINION","cost":8,"attack":8},
{"collectible":true,"health":3,"artist":"Warren Mahy", "type":"LAND","cost":2,"attack":2}]
I am trying to build a function capable of extracting certain info from it and, eventually, store them in a smaller dataset. It should, given a list-like dataset deck, consider only the cards that for the keys equal to given values.
let rec filter deck key value =
let rec aux l1 l2 l3 =
match l1 with
[] -> []
| x::xs when x.l2 = l3 -> x::(aux xs key value)
aux deck key value
For example,
filter dt type minion
should subset the deck in a smaller one with only the first and second card. I think I did few steps forward in getting the concept, but still it does not work, throwing an error of kind
FS0072: Lookup on object of indeterminate type based on information prior to
this program point. A type annotation may be needed prior to this program point to
constrain the type of the object. This may allow the lookup to be resolved.
How should I define the type of key? I tried with key : string and key : string list, without succeed.
Are you trying to re-implement filter?
#if INTERACTIVE
#r #"..\packages\FSharp.Data\lib\net40\FSharp.Data.dll"
#endif
open FSharp.Data
[<Literal>]
let jsonFile = #"C:\tmp\test.json"
type Json = JsonProvider<jsonFile>
let deck = Json.Load(jsonFile)
deck |> Seq.filter (fun c -> c.Type = "MINION")
Gives me:
val it : seq.Root> = seq
[{ "collectible": true, "health": 4, "artist": "Zoltan Boros", "type": "MINION", "cost": 1, "attack": 2 };
{ "collectible": true, "health": 8, "artist": "James Ryman", "type": "MINION", "cost": 8, "attack": 8 }]
You actually need to annotate the type of l1.
setting l1: something list should be what you want.
Key doesn't help as type inference is top to bottom and x.l2 is before aux is called with key as an argument
After retrieving results from the Google Custom Search API and writing it to JSON, I want to parse that JSON to make valid Elasticsearch documents. You can configure a parent - child relationship for nested results. However, this relationship seems to not be inferred by the data structure itself. I've tried automatically loading, but not results.
Below is some example input that doesn't include things like id or index. I'm trying to focus on creating the correct data structure. I've tried modifying graph algorithms like depth-first-search but am running into problems with the different data structures.
Here's some example input:
# mock data structure
google = {"content": "foo",
"results": {"result_one": {"persona": "phone",
"personb": "phone",
"personc": "phone"
},
"result_two": ["thing1",
"thing2",
"thing3"
],
"result_three": "none"
},
"query": ["Taylor Swift", "Bob Dole", "Rocketman"]
}
# correctly formatted documents for _source of elasticsearch entry
correct_documents = [
{"content":"foo"},
{"results": ["result_one", "result_two", "result_three"]},
{"result_one": ["persona", "personb", "personc"]},
{"persona": "phone"},
{"personb": "phone"},
{"personc": "phone"},
{"result_two":["thing1","thing2","thing3"]},
{"result_three": "none"},
{"query": ["Taylor Swift", "Bob Dole", "Rocketman"]}
]
Here is my current approach this is still a work in progress:
def recursive_dfs(graph, start, path=[]):
'''recursive depth first search from start'''
path=path+[start]
for node in graph[start]:
if not node in path:
path=recursive_dfs(graph, node, path)
return path
def branching(google):
""" Get branches as a starting point for dfs"""
branch = 0
while branch < len(google):
if google[google.keys()[branch]] is dict:
#recursive_dfs(google, google[google.keys()[branch]])
pass
else:
print("branch {}: result {}\n".format(branch, google[google.keys()[branch]]))
branch += 1
branching(google)
You can see that recursive_dfs() still needs to be modified to handle string, and list data structures.
I'll keep going at this but if you have thoughts, suggestions, or solutions then I would very much appreciate it. Thanks for your time.
here is a possible answer to your problem.
def myfunk( inHole, outHole):
for keys in inHole.keys():
is_list = isinstance(inHole[keys],list);
is_dict = isinstance(inHole[keys],dict);
if is_list:
element = inHole[keys];
new_element = {keys:element};
outHole.append(new_element);
if is_dict:
element = inHole[keys].keys();
new_element = {keys:element};
outHole.append(new_element);
myfunk(inHole[keys], outHole);
if not(is_list or is_dict):
new_element = {keys:inHole[keys]};
outHole.append(new_element);
return outHole.sort();