Grouping raw datasets in a kedro visualization - namespaces

I am looking for a way to group all of the raw datasets in a kedro pipeline visualization into one collapsible/expandable "node", similar to the way that namespaces are collapsible/expandable. In order to do this with a namespace, however, it seems that you need a function with inputs and outputs, which obviously would not be applicable at the raw data stage.
Here is my current visualization:
enter image description here
I would like datasets 0-5 to be grouped together into an expandable "node" called "raw data".
I have searched stackoverflow, the kedro docs, and the community forum on github for ways to accomplish this, without finding much that is relevant. The closest concept I found is namespaces, but again it seems these need a function, input, and output.

So the start and the end of namespaced visualisation will not be collapsed, it's not something that's configurable

Related

Search HTML Tables on Multiple Pages

Hello Stack Overflow Community!
I am making a directory of many thousand custom mods for a game using HTML tables. When I started this project, I thought one HTML page would be slow, but adequate for the ~4k files I was expecting. As I progressed, I realized there are tens of thousands of files I need to have in these tables, and let the user search though to find what they are missing to load up a new scenario. Each entry has about 20 text entries and a small image (~3KB). I only need to be able to search through one column.
I'm thinking of dividing the tables across several pages on my website to help loading speeds and improve overall organization. But then a user would have to navigate to each page, and perform a search there. This could take a while and be very cumbersome.
I'm not great at website programming. Can someone advise a way to allow the user to search through several web pages and tables from one location? Ideally this would jump to the location in the table on the new webpage, or maybe highlight the entry like the browser's search function does.
You can see my current setup here : https://www.loco-dat-directory.site/
Hopefully someone can point me in the right direction, as I'm quite confused now :-)
This would be my steps,
Copy all my info into an excel spredsheet, then convert that to json, then make that an array for javascript (myarray), then can make an input field, and on click an if statement if input == myarray[0].propertyName
if you want something more than an exact match, you'd need https://lodash.com/
in your project.
Hacky Solution
There is a browser tool, called TableCapture, to capture data from html tables and load into excel/spreadsheets - where you are basically deferring to spreadsheet software to manage the searching.
You would have to see if:
This type of tool would solve your problem - maybe you can pull each HTML page's contents manually, then merge these pages into a document with multiple "sheets", and then let people download the "spreadsheet" from your website.
If you do not take on the labor above and just tell other people to do it, then you'd have to see if you can teach the people how to perform the search and do this method on their own. eg. "download this plugin, use it on these pages, search"
Why your question is difficult to answer
The reason why it will be hard for people to answer you in stackoverflow.com (usually code solutions) is that you need a more complicated solution (in my opinion) than hard coded tables and html/css/javascript.
This type of situation is exactly why people use databases and APIs to accept requests ("term": "something") for information and deliver responses ( "results": [...] ).
Thank you everyone for your great advice. I wasn't aware most of these potential solutions existed, and it was good to see how other people were tackling problems of similar scope.
I've decided to go with DataTables for their built-in sorting and filtering : https://datatables.net/
I'm also going to use a javascript array with an input field on the main page to allow users to search for which pack their mod is in. This will lead them to separate pages on my site, each with a unique datatable for a mod pack. Separate pages will load up much quicker than one gigantic page trying to show everything.

Foundry Workshop or Quiver - Aggregation editable by user in view mode, is it possible?

Is it possible in Workshop or Quiver to expose an aggregated bar chart where aggregation property is changeable by user (in reading mode). For instance by offering to user a dropdown that lists properties to aggregate on.
I guess the function presented in this thread can do the job. And then it should be a story to create dropdown widget and chart plugged on the function result. But I miss experience in Workshop or Quiver. Can some one help me with more detail ? Thank you by advance.
I want to move from Slate to Workshop/Quiver applications without loosing this feature.
One approach is to use a Function-backed chart and the approach you linked to.
Another option is to change your data structure to a "metrics"-style schema like:
|id|main_object_foreign_key|type|timestamp|value
And then make a chart based on filtering on the metric type property with the series also set to the Type value.
This effectively lets the user control what "metrics" to show on the chart by providing them with a filter element.
There are some downsides with this approach, but adding a representation of your data in this format can bring a lot of flexibility to certain kinds of visualization or workflow building needs.

Early abandon discord searching in Time-Series using saxpy

I am trying to look for discords (the most unusual, least similar shape) in a data-set using time-series. I came across this function in the saxpy package that outputs the discord shape. However, the link above is the only documentation that I could find and the input parameters to the function haven't been explained very well there.
More specifically,
find_best_discord_brute_force(series, win_size, global_registry, z_threshold=0.01)
What do the parameters win_size, global_registry stand for?
Also, does the series parameter require me to input SAX words?
It would be great if someone could clear this up.
Thanks!
You should use the Matrix Profile instead. Is it faster and simpler, and there is free code, see this presentation
http://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part1.pdf
series is a numpy array of the data whose discords you are looking for.
win_size is the size of the sliding window used by sax_via_window for calculating the words representing your array.
Sorry, not sure what global_registry refers to.
For further information and documentation, there's documentation on github: https://github.com/seninp/saxpy

Is it possible to embed [bokeh] high level charts?

It seems most Bokeh embedding examples are using bokeh.plotting.figure object. Is it possible to embed a high level chart, like bokeh.charts.Bar or bokeh.charts.Scatter? Or is it possible to have convert a high level chart to a bokeh.plotting.figure object?
Thanks a lot.
The User's Guide section on Bokeh APIs has a good run down of how all these parts fit together, that I would suggest reading.
The long-story-short: Regardless of what API you use, bokeh.plotting or bokeh.charts, the end result is always just a collection of the same low-level bokeh.models objects. You can can think of bokeh.models as very basic building blocks, and the other higher level APIs as conveniences that help you to assemble the building blocks more efficiently and correctly.
So, in that light, yes, it is perfectly fine to embed a bokeh.chart using exactly the same functions described in Embedding Plots and Apps.
The one thing I will add is that if you need to update the plot's data after the fact, in place, then the bokeh.figure API will probably be more straightforward. The mapping between your data, and what gets plotted is more direct. Things generated by bokeh.charts may transform your input data into entirely different forms before plotting (e.g. you give a series, and Histogram has to spit out coordinates for boxes—not the data you started with)

How can I harnest Wikidata to build a Siri-like service?

I'd like to discuss the first part of this Siri-like service.
Ideally, I'd like to be able to query for things like:
"the social network"
"beethoven"
"bad blood taylor swift"
And get results like this:
{type:"film"}
{type:"composer"}
{type:"song"}
I care about nothing else, I find descriptions, images and general information utterly useless outside Wikipedia. I see Wikidata as a meta-data service that can provide me with the semantics of the text I search for.
Do all data structures have "types" or some kind of a property that has to do with its meaning? Is there a list of all the types? Is there a suggestions feature for entities that have double meaning like "apple"? Finally, how can I send a text query and read the "type" of the response data structure?
I know I'm not providing any code but I really can't wrap my head around Wikidata's API. I've searched everywhere and all I can't find are some crippled fetch examples and messed up Objective-C HTML parsers. I can't even get their "example query" page to work because of some error I don't understand.
Really newbie not-friendly and full of heavy terminology.
The problem with Wikidata's API is that it does not have a query interface. All it does is return information for a specific data item, if you already know the ID. We have simply not been able to build a query interface yet that is powerful enough and able to scale. There is an early beta of a SPARQL endpoint though: https://tools.wmflabs.org/ppp-sparql/.
Once that is up and running, we hope to provide easier to use services on top of this, like Magnus' WDQ http://magnusmanske.de/wordpress/?p=72.
(Edit to answer the concrete questions about the API:)
I've searched everywhere and all I can't find are some crippled fetch examples
Documentation could be nicer, but https://www.wikidata.org/wiki/Wikidata:Data_access is a good start. Also note that https://www.wikidata.org/w/api.php is self-documenting. In particular, have a look at https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities and https://www.wikidata.org/w/api.php?action=help&modules=wbsearchentities
Do all data structures have "types" or some kind of a property that has to do with its meaning?
All statements about a data item have to do with its meaning. Many have a statement about the "instance of" (P31) or "subclass of" (P279) property, which is pretty close to what you want, I suppose.
Is there a list of all the types?
No. Wikidata doesn't use a closed, pre-defined ontology to describe the world. It's a platform to describe the world collaboratively, in a machine readable way; from that, a fluid ontology emerges, which is never quite complete or consistent.
Any data item can serve as the class or suprt-class of another item. An item can be an instance or subclass of multiple classes. The relationships are quite complex.
Is there a suggestions feature for entities that have double meaning like "apple"?
There is a search interface that can list all matching data items for a given term. It's called wbsearchentities, for instance https://www.wikidata.org/w/api.php?action=wbsearchentities&search=apple&language=en (add format=json for machine readable JSON).
However, the ranking in the result is very naive. And without the semantic context of the original sentence, there is no way to find which word sense is meant. This is an interesting area of research called "word sense disambiguation".
Finally, how can I send a text query and read the "type" of the response data structure?
At the moment, you will have to do two API calls: one to wbsearchentities to get the ID of the entity you are interested in, and one to wbgetentities to get the instance-of statement for that entity. It would be nice to combine this in a single call; there's a ticket open for this: https://phabricator.wikimedia.org/T90693
As to Siri-like services: an early prototype called "wiri" by Magnus Manske has been around for a long time. It uses very simple patterns though: https://tools.wmflabs.org/magnus-toolserver/thetalkpage/
Bene* has been working on a more advanced approach for natural language question answering, see the Platypus Demo: https://projetpp.github.io/demo.html
Just yesterday, he presented a new prototype he has been developing together with Tpt, which generates SPARQL queries from natural language input: https://tools.wmflabs.org/ppp-sparql/
All of these projects are open source, and were created by enthusiastic volunteers. Look at the code and talk to them. :)