Plotly dash callback order - plotly-dash

I have a dash app, where I query a database.
Some of my queries are quick, some are slow.
I would like to show the results of these queries in a table in a way, that first I would populate the table with the quickly fetchable columns, then add the resulting columns of the slower queries gradually.
My problem is that the rendering callback of the aggragate data only runs after all the queries are done, whereas I would like to see it firing after each query callback result.
Here is a minimal example, where I fetch some quick data, then based on the quick query I fetch a slower one. There is a rendering callback, which is supposed to run after each query callback, but in reality runs only once in the end. (For the sake of simplicity I did not add the table here, just a basic div. I run Dash within a larger django project using django_plotly_dash, but probably it is not key regarding the question here.)
from django_plotly_dash import DjangoDash
import time
from dash import html
from dash.dependencies import Output, Input
app = DjangoDash("Minimal",)
app.layout = html.Div(
id='main-container',
children = [
html.Div(id='user-id'),
html.Div(id='quick-data'),
html.Div(id='slow-data'),
html.Div(id='aggregate-data'),
],
)
#app.callback(
Output('quick-data', 'children'),
Input('user-id', 'children'),
)
def query_quick_data(user_id,):
print("--------- query quick data ----------")
return "quick data"
#app.callback(
Output('slow-data', 'children'),
Input('quick-data', 'children'),
)
def query_slow_data(slow_data,):
print("--------- query slow data ----------")
time.sleep(3)
return "slow data"
#app.callback(
Output('aggregate-data', 'children'),
Input('quick-data', 'children'),
Input('slow-data', 'children'),
)
def render_data(quick_data,slow_data):
print("--------- render aggregate data ----------")
return quick_data + " | " + slow_data
Upon opening the app, the terminal looks as follows, while I would expect therender aggregate data to run twice (once straight after the quick query):
backend_1 | --------- query quick data ----------
backend_1 | --------- query slow data ----------
backend_1 | --------- render aggregate data ----------
My guess is that the query_slow_data callback is called first and the render_data is only fired after. So the question is, how could I force the render_data to be called first.

Your guess is correct, this is related to how Dash actually works. At the initial call, dash-render will recursively look at all the callbacks in your app, and will order them in by the availability of the input (read more from here to prevent unnecessary re-rendering. This is really important when you have a big dashboard. Imagine a dashboard with many callbacks, in which many are related to each other, if dash-render start calling randomly without proper organization, some callbacks might run in a loop, breaking the application or in the best scenario, making the application very heavy.
For your given example, assume that dash-render first saw render_data callback. When it recursively checks the callback inputs with other available callbacks, it will find out that query_quick_data callback's is ready to use (not output from another callback) so it will give it a priority. The query_slow_data has only one input and that input already came from the executed callback, so everything is ready for it can be called, while your render-data callback is asking for two inputs, one is available and the other isn't therefore, the dash-render will run the query_slow_data as you guessed.
I have been reading dash documents for a while and I have not run into any method in which you can custom the callback order to make a callback fire twice during the initial_call.

Related

How to fix a query in functions within foundry which is hiting ObjectSet:PagingAboveConfiguredLimitNotAllowed?

I have phonorgraph object with billions of rows and we are querying it through object set service
for example, I want to get all DriverLicences from certain city.
#Function()
public getDriverLicences(city: string): ObjectSet<DriverLicences> {
let drivers = Objects.search().DriverLicences().filter(row => row.city.exactMatch(city));
return drivers ;
}
I am facing this error when I am trying query it from slate:
ERROR 400: {"errorCode":"INVALID_ARGUMENT","errorName":"ObjectSet:PagingAboveConfiguredLimitNotAllowed","errorInstanceId":"0000-000","parameters":{}}
I understand that I am probably retrieving more than 100 000 results but I need all the results because of the implemented logic in the front is a complex slate dashboard built by another team that we cannot re-factor.
The issue here is that, specifically in the Slate <> Function connector, there is a "translation layer" that serializes the contents of the object set and provides a response data structure that materializes the property:value pairs for each object in the set.
This clearly doesn't work for large object sets where throwing so much data into the browser is likely to overwhelm the resources allocated to the tab.
From context it seems like you might be migrating an existing Slate app over to Functions; in the current version, how is the query limiting the number of results returned? It certainly must not be returning several 100 thousand results for further processing on the front end? (And if so, that might be an anti-pattern to consider addressing).
As for options that you could currently explore, you can sort your object set and then specify a smaller limit to return:
Objects.search().DriverLicences().filter(row => row.city.exactMatch(city)).orderBy(date_of_issue).take(100)
You'll find a few more details in the Functions documentation Reference entry on Ontology API: Object Sets in the section on Ordering and limiting.
You can even make a work around for the (current) lack of paging when return an ObjectSet to Slate by using the last value from the property ordered on (i.e. date_of_issue) as a filter in the subsequent request and return the next N objects.
This can work if you need a Slate table or HTML widget that renders on set of results then, on a user action, gets the next page.

How can I make a select function more performant in pyspark?

When I use the following function, it takes up to 10 seconds to execute. Is there any way to make it run quicker?
def select_top_20 (df, col):
most_data = df.groupBy(col).count().sort(f.desc("count"))
top_20_count = most_data.limit(20).drop("count")
top_20 = [row[col] for row in top_20_count.collect()]
return top_20
Hard to answer in general, the code seems fine to me.
It depends on how the input DataFrame was created:
if it was directly read from a data source (parquet, database or so), it is an I/O problem and there is not much you can do.
if the DataFrame went through some processing before the function is executed, you might inspect this part. Lazy evaluation in Spark means that all this processing is done from scratch when you execute this function (instead of only the commands listed in the function). I.e. reading the data from disk, processing, everything. Persisting or caching the DataFrame somewhere in-between might speed you up considerably.

How to run Julia function on specific processor using remotecall(), when the function itself does not have return

I tried to use remotecall() in julia to distribute work to specific processor. The function I like to run does not have any return but it will output something. I can not make it work as there is no output file after running the code.
This is the test code I am creating:
using DelimitedFiles
addprocs(4) # add 4 processors
#everywhere function test(x) # Define the function
print("hi")
writedlm(string("test",string(x),".csv"), [x], ',')
end
remotecall(test, 2, 2) # To run the function on process 2
remotecall(test, 3, 3) # To run the function on process 3
This is the output I am getting:
Future(3, 1, 67, nothing)
And there is no output file (csv), or "hi" shown
I wonder if anyone can help me with this or I did anything wrong. I am fairly new to julia and have never used parallel processing.
The background is I need to run a big simulation (A big function with bunch of includes, but no direct return outputs) lots of times, and I like to split the work to different processors.
Thanks a lot
If you want to use a module function in a worker, you need to import that module locally in that worker first, just like you have to do it in your 'root' process. Therefore your using DelimitedFiles directive needs to occur "#everywhere" first, rather than just on the 'root' process. In other words:
#everywhere using DelimitedFiles
Btw, I am assuming you're using a relatively recent version of Julia and simply forgot to add the using Distributed directive in your example.
Furthermore, when you perform a remote call, what you get back is a "Future" object, which is a way of allowing you to obtain the 'future results of that computation' from that worker, once they're finished. To get the results of that 'future computation', use fetch.
This is all very simplistic and general information, since you haven't provided a specific example that can be copy / pasted and answered specifically. Have a look at the relevant section in the manual, it's fairly clearly written: https://docs.julialang.org/en/v1/manual/parallel-computing/#Multi-Core-or-Distributed-Processing-1

How to debug knex.js? Without having to pollute my db

does anyone know anything about debugging with knexjs and mysql? I'm trying to do a lot of things and test out stuff and I keep polluting my test database with random data. ideally, I'd just like to do things and see what the output query would be instead of running it against the actual database to see if it actually worked.
I can't find anything too helpful in their docs. they mention passing {debug: true} as one of the options in your initialize settings but it doesn't really explain what it does.
I am a junior developer, so maybe some of this is not meant to be understood by juniors but at the end of the day Is just not clear at all what steps I should take to be able to just see what queries would have been ran instead of running the real queries and polluting my db.
const result = await db().transaction(trx =>
trx.insert(mapToSnakeCase(address), 'id').into('addresses')
.then(addressId =>
trx.insert({ addresses_id: addressId, display_name: displayName }, 'id')
.into('chains')).toString();
You can build a knex query, but until you attach a .then() or awiat() (or run . asCallback((error,cb)=>{})), the query is just an object.
So you could do
let localVar = 8
let query = knex('table-a').select().where('id', localVar)
console.log(query.toString())
// outputs a string 'select * from table-a where id = 8'
This does not hit the database, and is synchronous. Make as many of these as you want!
As soon as you do await query or query.then(rows => {}) or query.asCallback( (err,rows)=>{} ) you are awaiting the db results, starting the promise chain or defining the callback. That is when the database is hit.
Turning on debug: true when initializing just writes the results of query.toSQL() to the console as they run against the actual DB. Sometimes an app might make a lot of queries and if one goes wrong this is a way to see why a DB call failed (but is VERY verbose so typically is not on all the time).
In our app's tests, we do actually test against the database because unit testing this type of stuff is a mess. We use knex's migrations on a test database that is brought down and up every time the tests run. So it always starts clean (or with known seed data), and if a test fails the DB is in the same state to be manually inspected. While we create a lot of test data in a testing run, it's cleaned up before the next test.

Maintaining State with def ref in Clojure

I have the following function:
(defn best-move [tracked-moves]
(def all-scores (ref tracked-moves))
#all-scores)
Its being called by a recursive function.
I want to be able to keep passing in tracked-moves, and for them to all exist within
#all-scores.
The way it is written right now, #all-scores will only hold onto the last tracked-moves that it is given. How can I get it to hold onto all of the data that it receives every time the best-move function is called? And to not just return the last of all the data it receives?
The problem is that you're using def incorrectly. Any use of def (and defn) will create a namespace-level var. It doesn't matter where you call def. As you've pointed out, you're continuously redefining all-scores. The short answer is to pull your definition of all-scores to the top level, so you're not constantly invoking it. Then, update the ref as described in the documentation. If you're not using transactions, and don't need to manage multiple refs, you might want to use atoms instead.