which storage type is good for large data when using dcc.Store() - plotly-dash

when using dcc.Store() for large data, which storage type is good, memory, local or session?
I got an error of
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
The webpage also crashes showing below figure:

There are limitations to using dcc.Store. It stores the data in the browser, which is not ideal or even intended for large/very large datasets, as performance issues will arise. Here is a list of those limitations for the component.
If you intend to use visualise large datasets using Dash, I recommend a great tool that combines Django with Dash called django-plotly-dash. Essentially, you'll be handling the backend all in django using its model-view approach, and then building your dashboard in dash that will be integrated with the django backend. So rather than using dcc.Store to store any large datasets, you can query your database in the callbacks, hence visualising large datasets more efficiently.

this component is not for storing large datasets... It's a good component to store some data that will be used through the application;
Also, Dash is not a good framework for big data projects like the one you did mention.
If you need some extra help feel free to message me pvt.

Related

Json vs File storage? How can i decide when is worth to use a persistant storage api like Azure/Google Storage vs a mysql database column?

So im building a project and we have fairly large data. My averga json has a size of 20 Kb sometimes more sometimes less, but it doesnt fluctuate a lot.
The thing is im using Spring Boot + React with Microsoft Azure and to render some data i use innerHtml (react's dangerouslySetInnerHTML). My question is how can i calculate/decide when is worth to put the data in a json file in storage and send the link through rest compared to have it as an entity in mysql. Im not sure if im making myself clear but i'd appreciate some clarity. Thanks
This is hard to answer without knowing exactly how all the pieces fit together frequency, etc.
Aside from the typical "test them and see what works" here are some thoughts to maybe help you make a choice.
blob/table will be faster than DB. 100%. I get double digit ms responses from Azure storage almost always. You're talking about a roundtrip in < 100ms for most items.
you cannot use a query type lookup - to use blob/table you'll need to know the exact URL, 2 keys (partition/row) or at least 1 key (partition) if you want to get more than a single record). This provides super fast access. If you're going to need SQL type lookups, stick with DB.
Azure storage is way cheaper than DB
You need a good storage strategy for Azure storage. How do you plan on purging, archiving, cleaning up, etc. There's no good way to say "all records from 2020" unless you also implement a tracking table. This is a good read on some patterns: https://learn.microsoft.com/en-us/azure/storage/tables/table-storage-design-patterns
I really like Azure storage when it's doable. It's cheaper and faster so often. With some tracking tables, it's workable in more scenarios.
Where it dies: reporting. It's hard to do reporting (for business and enterprise level expectations) with data in storage (unless you track elsewhere)
Hope that helps a bit.

Data cleasning, migration in big, in-service database

Hi I'm a server developer and we have a big mysql database(biggest table has about 0.5 billion rows) running 24-7.
And there's a lot of broken data. Most of them are logically wrong and involves multi-source(multiple tables, s3). And since it's kinda logically complicated, we need Rails model to clean them (can't be done with pure sql queries)
Right now, I am using my own small cleansing framework and using AWS Auto Scaling Group to scale up instances and speed up. But since the database is in-service, I have to be careful(table locks and other stuffs) and limit the process amount.
So I am curious about
How do you (or big companies) clean your data while the database is in-service?
Do you use temporary tables and swap? or just update/insert/delete to an in-service database?
Do you use a framework or library or solution to clean data in efficient way? (such as distributed processing)
How do you detect messed up data real-time?
Do you use a framework or library or solution to detect broken data?
So I have face a problem similar in nature to what you are dealing with but different in scale. this is how I would approach the situation.
First access the infrastructure concerns, like can the data base be offline or restricted from use for a few hours for maintenance, if so read on.
Next you need to define what constitutes as "broken data".
Once you arrive at your definition of "broken data" translate a way to programmatically identify it.
Write a script that leverages your programatic identification algorithm and run some test.
Then back up you data records in preparation.
Then given the scale of your data set you will probably need to increase your server resources as not to bottle neck the script.
Run the script
Test your data to assess the effectiveness of your script
If needed adjust and rerun
Its possible to do this with out closing of the database for maintenance but I think you will get better results if you do. Also since this is a rails app I would look at the model validations that your app has and input field validations to prevent "broken data" in real time.

Live chat application using Node JS Socket IO and JSON file

I am developing a Live chat application using Node JS, Socket IO and JSON file. I am using JSON file to read and write the chat data. Now I am stuck on one issue, When I do the stress testing i.e pushing continuous messages into the JSON file, the JSON format becomes invalid and my application crashes.Although I am using forever.js which should keep application up but still the application crashes.
Does anybody have idea on this?
Thanks in advance for any help.
It is highly recommended that you re-consider your approach for persisting data to disk.
Among other things, one really big issue is that you will likely experience data loss. If we both get the file at the exact same time - {"foo":"bar"} - we both make a change and you save it before me, my change will overwrite yours since I started with the same thing as you. Although you saved it before me, I didn't re-open it after you saved.
What you are possibly seeing now in an append-only approach is that we're both adding bits and pieces without regard to valid JSON structure (IE: {"fo"bao":r":"ba"for"o"} from {"foo":"bar"} x 2).
Disk I/O is actually pretty slow. Even with an SSD hard drive. Memory is where it's at.
As recommended, you may want to consider MongoDB, MySQL, or otherwise. This may be a decent use case for Couchbase which is an in-memory key/value store based on memcache that persists things to disk ASAP. It is extremely JSON friendly (it is actually mostly based on JSON), offers great map/reduce support to query data, is super easy to scale to multiple servers, and has a node.js module.
This would allow you to very easily migrate your existing data storage routine into a database. Also, it provides CAS support which will prevent you from data loss in the scenarios outlined earlier.
At minimum though, you should possibly just modify an in memory object that you save to disk ever so often to prevent permanent data loss. However, this only works well with 1 server and then you're back at likely needing to look at a database.

Options for transferring data between MySQL and SQLite via a web service

I've only recently started to deal with database systems.
I'm developing an ios app that will have a local database (sqlite) and that will have to periodically update the internal database with the contents of a database stored in a webserver (mySQL). My questions is, whats the best way to fetch the data from the webserver and store it in the local database? There are some options that came to me, don't know if all of them are possible
Webserver->XML/JSON->Send it->Locally convert and store in local database
Webserver->backupFile->Send it->Feed it to the SQLite db
Are there any other options? Which one is better in terms of amount of data taken?
Thank you
The XML/JSON route is by far the simplest while providing sufficient flexibility to handle updates to the database schema/older versions of the app accessing your web service.
In terms of the second option you mention, there are two approaches - either use an SQL statement dump, or a CSV dump. However:
The "default" (i.e.: mysqldump generated) backup files won't import into SQLite without substantial massaging.
Using a CSV extract/import will mean you have considerably less flexibility in terms of schema changes, etc. so it's probably not a sensible approach if the data format is ever likely to change.
As such, I'd recommend sticking with the tried and tested XML/JSON approach.
In terms of the amount of data transmitted, JSON may be smaller than the equivalent XML, but it really depends on the variable/element names used, etc. (See the existing How does JSON compare to XML in terms of file size and serialisation/deserialisation time? question for more information on this.)

Drupal: Many fields/content-types equal *many* tables and after a point make MySQL very slow

As I create more and more fields and content-types, I see that Drupal creates a huge numbers of tables (>1k) in MySQL and after a point my system becomes very slow.
I have tried several MySQL performance tuning tips, but nothing has improved the performance significantly. Enabling caching makes for good speed in the front-end, but if I try to edit a content-type from the admin back-end, it takes for ever!
How do you cope with that? How do you scale Drupal?
If sheer number of tables has become the database performance bottleneck, I'd have to agree with Rimian. You can define your own content types programmatically, and then develop your own content type model by leveraging the Node API.
API documentation and an example of doing just that are here: http://api.drupal.org/api/drupal/developer--examples--node_example--node_example.module/6
The code flow is basically:
Make Drupal recognize your content type
Define the fields it needs to take using the Forms API
Define how each of the Node API's functions should behave (view, load, save, etc.).
This allows you control over how things are stored, yet still gives you (and all contributed modules) ability to leverage the hook system for Node API calls.
Obvious drawbacks are missing out on all of the features/modules that directly depend on CCK for their functionality. But at >1k tables (which suggests a gargantuan number of content types and fields), it sounds like you're at that level of custom work already.
I worked on a Drupal 5 site with more than a million nodes and this was a serious issue.
If you're scaling Drupal up to enterprise level, consider not using CCK for your fields and developing your own content model with the node API. It's actually quite easy.
The devel module offers a Performance Monitoring tool that will show you all queries performed organized by time, showing which hooks and modules called them, etc.
Just don't run on production.