Plotly-Dash and Mongodb DuplicateKeyError - plotly-dash

Within a plotly-dash application, I am entering some user-specified data into a mongoDB database.
The Issue:
The first entry of the information is successful, however, any consecutive entries are not and a pymongo.errors.DuplicateKeyError is raised.
I am speculating that since mongodDB ObjectID() generation is done client-side, there is no refresh occurring since all aspects of this code for insert are done within a app.callback decorator within dash and are likely executed within a thread or separate process.
Shutting down the app and re-starting allows for the insertion of a new record.
The question:
Is there a way to manually "refresh" the ObjectID generated within pymonngo? I would likely want to do this after an exception haldeing of DuplicateKeyError.

For anyone out there with this problem:
Simply have a new dict, placing dict['_id']= ObjectID() prior to insert, do not let mongodb handle it

Related

ActiveRecord::StaleObject error on opening each result on a new tab

Recently we've added a functionality in our RoR application which allows users to open a particular record, let's say in their own individual tabs. Doing so, we've started seeing frequent ActiveRecord::StaleObject errors. On investigating the issue I found that rails is indeed trying to update the session store first whenever a resource is opened in a tab and the exception is raised.
We've lock_version in our active record session store, so Rails is taking it as optimistic locking by default. Is there any way we could solve this issue without introducing much complexity, as the application is already live on the client's machine and without affecting any sessions' data we've stored in our session store DB.
Any suggestions would be much appreciated. Thanks
It sounds like you're using optimistic locking on a db session record and updating the session record when you process an update to other records. Not sure what you'd need to update in the session, but if you're worried about possibly conflicting updates to the session object (and need the locking) then these errors might be desired.
If you don't - you can refresh the session object before saving the session (or disable it's optimistic locking) to avoid this error for these session updates.
You also might look into what about the session is being updated and whether it's strictly necessary. If you're updating something like "last_active_on" then you might be better off sending off a background job to do this and/or using the update_column method which bypasses the rather heavyweight activerecord save callback chain.
--- UPDATE ---
Pattern: Putting side-effects in background jobs
There are several common Rails patterns that start to break down as your app usage grows. One of the most common that I've run into is when a controller endpoint for a specific record also updates a common/shared record (for example, if creating a 'message' also updates the messages_count for a user using counter cache, or updates a last_active_at on a session). These patterns create bottlenecks in your application as multiple different types of requests across your application will compete for write locks on the same database rows unnecessarily.
These tend to creep into your app over time and become hard to refactor later. I'd recommend always handling side-effects of a request in an asynchronous job (using something like Sidekiq). Something like:
class Message < ActiveRecord::Base
after_commit :enqueue_update_messages_count_job
def enqueue_update_messages_count_job
Jobs::UpdateUserMessageCountJob.enqueue(self.id)
end
end
While this may seem like overkill at first, it creates an architecture that is significantly more scalable. If counting the messages becomes slow... that will make the job slower but not impact the usability of the product. In addition, if certain activities create lots of objects with the same side-effects (lets say you have a "signup" controller that creates a bunch of objects for a user that all trigger an update of user.updated_at) it becomes easy to throw out duplicate jobs and prevent updating the same field 20 times.
Pattern: Skipping the activerecord callback chain
Calling save on an ActiveRecord object runs validations and all the before and after callbacks. These can be slow and (at times) unnecessary. For example, updating a message_count cached value doesn't necessarily care about whether the user's email address is valid (or any other validations) and you may not care about other callbacks running. Similar if you're just updating a user's updated_at value to clear a cache. You can bypass the activerecord callback chain by calling user.update_attribute(:message_count, ..) to write that field directly to the database. In theory this shouldn't be necessary for a well designed application but in practice some larger/legacy codebases may make significant use of the activerecord callback chain to handle business logic that you may not want to invoke.
--- Update #2 ---
On Deadlocks
One reason to avoid updating (or generally locking) a common/shared object from a concurrent request is that it can introduce Deadlock errors.
Generally speaking a "Deadlock" in a database is when there are two processes that both need a lock the other one has. Neither thread can continue so it must error instead. In practice, detecting this is hard, so some databases (like postgres) just throw a "Deadlock" error after a thread waits for an exclusive/write lock for x amount of time. While contention for locks is common (e.g. two updates that are both updating a 'session' object), a true deadlock is often rare (where thread A has a lock on the session that thread B needs, but thread B has a lock on a different object that thread A needs), so you may be able to partially address the problem by looking at / extending your deadlock timeout. While this may reduce the errors, it doesn't fix the issue that the threads may be waiting for up to the deadlock timeout. An alternative approach is to have a short deadlock timeout and rescue/retry a few times.

NodeJS cache mysql data whith clustering enabled

I want to cache data that I got from my MySQL DB and for this I am currently storing the data in an object.
Before querying the database, I check if the needed data exists in the meantioned object or not. If not, I will query and insert it.
This works quiet well and my webserver is now just fetching the data once and reuses it.
My concern is now: Do I have to think of concurrent writes/reads for such data structures that lay in the object, when using nodejs's clustering feature?
Every single line of JavaScript that you write on your Node.js program is thread-safe, so to speak - at any given time, only a single statement is ever executed. The fact that you can do async operations is only implemented at a low level implementation that is completely transparent to the programmer. To be precise, you can only run some code in a "truly parallel" way when you do some input/output operation, i.e. reading a file, doing TCP/UDP communication or when you spawn a child process. And even then, the only code that is executed in parallel to your application is that of Node's native C/C++ code.
Since you use a JavaScript object as a cache store, you are guaranteed no one will ever read or write from/to it at the same time.
As for cluster, every worker is created its own process and thus has its own version of every JavaScript variable or object that exists in your code.

Avoiding loss of mysql data when script terminated abruptly by server crash or run-time limit

Before converting a project to use mysql, I have questions regarding the best way to avoid loss of a simple record update due to either a server crash or a program shutdown due to exceeding a/the cgi run-time limit.
My project is public and therefore applicable to any / many hosts where high level server side management isn't an option.
I wish to open a list file (or table) and acquire a list of records to parse one at a time.
While parsing each acquired list record, have the program / script perform a task with each record and update a counter (simple table) upon successful completion of each task (alternatively update each record with a success flag).
Do mysql tables get auto updated to the hard drive when "updated" or "added" to, thus, avoiding loss of all table changes to the point of crash if / when the program / script is violently terminated as described?
To have any chance with and do same with simple text files the counter has to be opened and closed for each update (as all content of open files on most O/S get clobbered when crashed).
Any description outline of mysql commands / processes etc to follow, if needed to avoid described losses, would also be very much appreciated.
Also, if any sugestions, are they applicable to both InnoDB and MyISM?
A simple answer comes to mind: SQL TRANSACTIONS. They're like a stack of SQL commands that 1. have to be "commited" 2. would come into action only if the last command is successfully executed.
I think this would help:
http://www.sqlteam.com/article/introduction-to-transactions
If my answer wasn't correct, pls, let me know if i misunderstood your intensions.

it's possible to do transactions through http requests in perl?

I'm doing a web application and i want to do is if user don't like the changes or he makes a mistake, he could rollback the changes, and if he likes, save it. I'm using Perl with DBI module and MySQL.
First I send the data to update to a other Perl file, in that page I perform the update and i return the flow to the first page and show the changes to the user.
So I am wondering if its possible to persist or keep alive the transaction through HTTP request or how to do the transaction?
I did the following:
$dbh->{AutoCommit} = 0;
$dbh-do("update ...")
I'm a beginner with Perl and DBI so any answer will be appreciated
How complex a transaction is it? One table, or multiple tables and complex relationships?
If it's a single table, it might be a lot simpler for the confirmation page to show the before (DBI) values and the after (form) values, and perform the transaction following a 'commit' from there.
Apache::DBI and other ORM modules do exist that attempt to persist database connections, but given each web-server process has its own memory-space, you quickly get into some pretty hairy problems. Not for the noob, I would suggest.
I would also recommend that before you go too far with hand-crafted DBI, have a look at some of the object-relational mapping modules out there. DBIx::Class is the most popular/actively maintained one.

Rails debugging in production environment

I'm creating a Twitter application, and every time user updates the page it reloads the newest messages from Twitter and saves them to local database, unless they have already been created before. This works well in development environment (database: sqlite3), but in production environment (mysql) it always creates messages again, even though they already have been created.
Message creation is checked by twitter_id, that each message has:
msg = Message.find_by_twitter_id(message_hash['id'].to_i)
if msg.nil?
# creates new message from message_hash (and possibly new user too)
end
msg.save
Apparently, in production environment it's unable to find the messages by twitter id for some reason (when I look at the database it has saved all the attributes correctly before).
With this long introduction, I guess my main question is how do I debug this? (unless you already have an answer to the main problem, of course :) When I look in the production.log, it only shows something like:
Processing MainPageController#feeds (for 91.154.7.200 at 2010-01-16 14:35:36) [GET]
Rendering template within layouts/application
Rendering main_page/feeds
Completed in 9774ms (View: 164, DB: 874) | 200 OK [http://www.tweets.vidious.net/]
...but not the database requests, logger.debug texts, or anything that could help me find the problem.
You can change the log level in production by setting the log level in config/environment/production.rb
config.log_level = :debug
That will log the sql and everything else you are used to seeing in dev - it will slow down the app a bit, and your logs will be large, so use judiciously.
But as to the actual problem behind the question...
Could it be because of multiple connections accessing mysql?
If the twitter entries have not yet been committed, then a query for them from another connection will not return them, so if your query for them is called before the commit, then you won't find them, and will instead insert the same entries again. This is much more likely to happen in a production environment with many users than with you alone testing on sqlite.
Since you are using mysql, you could use a unique key on the twitter id to prevent dupes, then catch the ActiveRecord exception if you try to insert a dupe. But this means handling an error, which is not a pretty way to handle this (though I recommend doing it as a back up means of prevent dupes - mysql is good for this, use it).
You should also prevent the attempt to insert the dupes. One way is to use a lock on a common record, say the User record which all the tweets are related to, so that another process cannot try to add tweets for the user until it can get that lock (which you will only free once the transaction is done), and so prevent simultaneous commits of the same info.
I ran into a similar issue while saving emails to a database, I agree with Andrew, set the log level to debug for more information on what exactly is happening.
As for the actual problem, you can try adding a unique index to the database that will prevent two items from being saved with the same parameters. This is like the validates_uniqueness but at the database level, and is very effective: Mysql Constraign Database Entries in Rails.
For example if you wanted no message objects in your database that had a duplicate body of text, and a duplicate twitter id (which would mean the same person tweeted the same text). Then you can add this to your migration:
add_index( :message, [:twitter_id, :body] , :unique => true)
It takes a small amount of time after you tell an object in Rails to save, before it actually gets in the database, thats maybe why the query for the id doesn't find anything yet.
For your production server, I would recommend setting up a rollbar to report you all of the unhandled errors and exceptions in your production servers.
You can also store a bunch of useful information, like http request, requested users, code which invoked an error and many more or sends email notifications each time some unhandled exceptions happened on your production server.
Here is a simple article about debugging in rails that could help you out.