I'm creating a Twitter application, and every time user updates the page it reloads the newest messages from Twitter and saves them to local database, unless they have already been created before. This works well in development environment (database: sqlite3), but in production environment (mysql) it always creates messages again, even though they already have been created.
Message creation is checked by twitter_id, that each message has:
msg = Message.find_by_twitter_id(message_hash['id'].to_i)
if msg.nil?
# creates new message from message_hash (and possibly new user too)
end
msg.save
Apparently, in production environment it's unable to find the messages by twitter id for some reason (when I look at the database it has saved all the attributes correctly before).
With this long introduction, I guess my main question is how do I debug this? (unless you already have an answer to the main problem, of course :) When I look in the production.log, it only shows something like:
Processing MainPageController#feeds (for 91.154.7.200 at 2010-01-16 14:35:36) [GET]
Rendering template within layouts/application
Rendering main_page/feeds
Completed in 9774ms (View: 164, DB: 874) | 200 OK [http://www.tweets.vidious.net/]
...but not the database requests, logger.debug texts, or anything that could help me find the problem.
You can change the log level in production by setting the log level in config/environment/production.rb
config.log_level = :debug
That will log the sql and everything else you are used to seeing in dev - it will slow down the app a bit, and your logs will be large, so use judiciously.
But as to the actual problem behind the question...
Could it be because of multiple connections accessing mysql?
If the twitter entries have not yet been committed, then a query for them from another connection will not return them, so if your query for them is called before the commit, then you won't find them, and will instead insert the same entries again. This is much more likely to happen in a production environment with many users than with you alone testing on sqlite.
Since you are using mysql, you could use a unique key on the twitter id to prevent dupes, then catch the ActiveRecord exception if you try to insert a dupe. But this means handling an error, which is not a pretty way to handle this (though I recommend doing it as a back up means of prevent dupes - mysql is good for this, use it).
You should also prevent the attempt to insert the dupes. One way is to use a lock on a common record, say the User record which all the tweets are related to, so that another process cannot try to add tweets for the user until it can get that lock (which you will only free once the transaction is done), and so prevent simultaneous commits of the same info.
I ran into a similar issue while saving emails to a database, I agree with Andrew, set the log level to debug for more information on what exactly is happening.
As for the actual problem, you can try adding a unique index to the database that will prevent two items from being saved with the same parameters. This is like the validates_uniqueness but at the database level, and is very effective: Mysql Constraign Database Entries in Rails.
For example if you wanted no message objects in your database that had a duplicate body of text, and a duplicate twitter id (which would mean the same person tweeted the same text). Then you can add this to your migration:
add_index( :message, [:twitter_id, :body] , :unique => true)
It takes a small amount of time after you tell an object in Rails to save, before it actually gets in the database, thats maybe why the query for the id doesn't find anything yet.
For your production server, I would recommend setting up a rollbar to report you all of the unhandled errors and exceptions in your production servers.
You can also store a bunch of useful information, like http request, requested users, code which invoked an error and many more or sends email notifications each time some unhandled exceptions happened on your production server.
Here is a simple article about debugging in rails that could help you out.
Related
Recently we've added a functionality in our RoR application which allows users to open a particular record, let's say in their own individual tabs. Doing so, we've started seeing frequent ActiveRecord::StaleObject errors. On investigating the issue I found that rails is indeed trying to update the session store first whenever a resource is opened in a tab and the exception is raised.
We've lock_version in our active record session store, so Rails is taking it as optimistic locking by default. Is there any way we could solve this issue without introducing much complexity, as the application is already live on the client's machine and without affecting any sessions' data we've stored in our session store DB.
Any suggestions would be much appreciated. Thanks
It sounds like you're using optimistic locking on a db session record and updating the session record when you process an update to other records. Not sure what you'd need to update in the session, but if you're worried about possibly conflicting updates to the session object (and need the locking) then these errors might be desired.
If you don't - you can refresh the session object before saving the session (or disable it's optimistic locking) to avoid this error for these session updates.
You also might look into what about the session is being updated and whether it's strictly necessary. If you're updating something like "last_active_on" then you might be better off sending off a background job to do this and/or using the update_column method which bypasses the rather heavyweight activerecord save callback chain.
--- UPDATE ---
Pattern: Putting side-effects in background jobs
There are several common Rails patterns that start to break down as your app usage grows. One of the most common that I've run into is when a controller endpoint for a specific record also updates a common/shared record (for example, if creating a 'message' also updates the messages_count for a user using counter cache, or updates a last_active_at on a session). These patterns create bottlenecks in your application as multiple different types of requests across your application will compete for write locks on the same database rows unnecessarily.
These tend to creep into your app over time and become hard to refactor later. I'd recommend always handling side-effects of a request in an asynchronous job (using something like Sidekiq). Something like:
class Message < ActiveRecord::Base
after_commit :enqueue_update_messages_count_job
def enqueue_update_messages_count_job
Jobs::UpdateUserMessageCountJob.enqueue(self.id)
end
end
While this may seem like overkill at first, it creates an architecture that is significantly more scalable. If counting the messages becomes slow... that will make the job slower but not impact the usability of the product. In addition, if certain activities create lots of objects with the same side-effects (lets say you have a "signup" controller that creates a bunch of objects for a user that all trigger an update of user.updated_at) it becomes easy to throw out duplicate jobs and prevent updating the same field 20 times.
Pattern: Skipping the activerecord callback chain
Calling save on an ActiveRecord object runs validations and all the before and after callbacks. These can be slow and (at times) unnecessary. For example, updating a message_count cached value doesn't necessarily care about whether the user's email address is valid (or any other validations) and you may not care about other callbacks running. Similar if you're just updating a user's updated_at value to clear a cache. You can bypass the activerecord callback chain by calling user.update_attribute(:message_count, ..) to write that field directly to the database. In theory this shouldn't be necessary for a well designed application but in practice some larger/legacy codebases may make significant use of the activerecord callback chain to handle business logic that you may not want to invoke.
--- Update #2 ---
On Deadlocks
One reason to avoid updating (or generally locking) a common/shared object from a concurrent request is that it can introduce Deadlock errors.
Generally speaking a "Deadlock" in a database is when there are two processes that both need a lock the other one has. Neither thread can continue so it must error instead. In practice, detecting this is hard, so some databases (like postgres) just throw a "Deadlock" error after a thread waits for an exclusive/write lock for x amount of time. While contention for locks is common (e.g. two updates that are both updating a 'session' object), a true deadlock is often rare (where thread A has a lock on the session that thread B needs, but thread B has a lock on a different object that thread A needs), so you may be able to partially address the problem by looking at / extending your deadlock timeout. While this may reduce the errors, it doesn't fix the issue that the threads may be waiting for up to the deadlock timeout. An alternative approach is to have a short deadlock timeout and rescue/retry a few times.
I want to have ONE single mysql-connection used by EVERY user that selects the data all the time and updates it if specific conditions are met (like a placed bid). Most preferably even then if no user is visiting the website, if that's even possible?
So, in the last days I'm google'ing all the time, trying so hard to figure out to solve my issue, but it seems there are no people with enough knowledge to help me with my problem. So I try to ask my question as simple as possible without confusing you with my code. (But if you're interested seeing the code: http://pastebin.com/dRFzWtEH)
However, this is all about an auction website with live-countdown-timer and I just want to run a node.js server that SELECTs data every second and sends it to a WebSocket to show all users visiting that website the countdown and price-updates (on bids) in realtime.
I accomplished this whole task by using single-mysql-queries but then I ran into errors. Then the author of the GitHub node-mysql-module suggested me to use a MySQL Pool. But there is like no content at all to find about my specific aim stated in my first sentence of this question.
Now I want to ask in general, how could I accomplish this and is this even possible or does at least one user has to be on my website?
What would the code/code-structure/logical process look like?
And I guess I don't need to close the connection at all, so I won't need functions like connection.end()?
No, don't worry about connection pooling. It is not a big deal in MySQL.
Furthermore a "pool" has a problem -- it must clear out all settings, #variables, transaction state, etc, etc, before allowing the next 'client' to use the pooled connection. This can take time, especially if the client is far from the server.
MySQL's connection/disconnection time is very low, unlike competing products.
If you are developing a Web product, then keep in mind that HTTP is "stateless". That is, you cannot hang onto a connection from one 'page' to the next 'page. Hence, no 'state' can be saved.
Edit
If you have "Across the pond" latency problems (100-200ms between US and Europe), client-side connection pool could be very useful. However, if the pool software is injecting commands to reset things, that could totally defeat the pooling.
If you can turn on the 'general log' (in a hosted service, you may have to use log_output=TABLE), do so to see what extra commands are injected.
Also, consider combining multiple client SQL statements into Stored Procedures to cut down on back-and-forth.
Also consider either moving the MySQL server closer to the client, or moving the client closer to the MySQL server, depending on how the end-user to client back-and-forth compares to the client to MySQL traffic.
I am writing a multilingual CMS where admin can add and delete languages. When they add a new language, I would copy all the rows from multiple table with language_id = 1 and insert them with the newly created language_id .
I'm using PHP so the database copying and inserting process would probably be done asynchronously. The problem is the user might add new content during the process and there is a chance both language would not have the same number of rows in the end.
I could probably lock all the table involved but since the user of the CMS is not tech savvy, I don't want them to see a generic error message when they try to create or update record.
I would much prefer to show them a customize message notifying them the system is converting language. But doing so require me to know that tables are being locked.
I should add that CRUD are mostly done by one person at any time so there should be less difficulties.
Any help would be greatly appreciate.
I would just lock the table where new rows are inserted while doing the "creating new language" operation. That way any new insert would automatic wait until the "creating new language" operation was finished, and all the user would see was a slight delay.
This is assuming that the "creating new language" can be done i a few seconds but I can't imagine this being a problem.
What do you mean by I should add that CRUD are mostly done by one person at any time so there should be less difficulties.? if your assumption is correct, how in the world developers are making concurrent requests, same time ?
Generally you will only have one DAO (Data access object) and all your code will come to that which will make a single connection to database and it will be synchronized all the time. Did you follow this practice ? php asynchronous means while pinging the server, ajax request will be made. Not while communicating with DB. If there are so many clients connecting, there will be a pattern of connection pooling. Think of it, does this make any sense ?
So I'm going to attempt to create a basic monitoring tool in VB.net. Now I'd like some advice on how basically to tackle the logging and reporting side of things so I'd appreciate some responses from users who I'm sure have a better idea than me and can tell me far more efficient ways of doing things.
So my plan is to have a client tool, which will read from a MySQL database values and basically change every x interval, I'm thinking 10/15 minutes at the moment. This side of the application is quite easy, I mean I can get something to read a database every x amount of time and then change labels and display alerts based on them. - This is all well documented and I am probably okay with that.
The second part is to have a client that sits in the system tray of the server gathering the required information. Now the system tray part I think will probably be the trickiest bit of this, however that's not really part of my question.
So I assume I can use the normal information gathering commands and store them perhaps as strings and I can then connect to the same database and add them to the relevant fields. For example if I had a MySQL table called "server" and a column titled "Connection" I could check if the server has an internet connection for example and store the result as the value 1 for yes and 0 for no and then send a MySQL command to the table to update the "connection" value to either 0/1.
Then I assume the monitoring tool I can run a MySQL query to check the "Connection" column and if the value is = 0 change a label or flag an error and if 1 report that connectivity is okay?
My main questions about the above are listed below.
Is using a MySQL database the most efficient way of doing something like this?
Obviously if my database goes down there's no more reporting, I still think that's a con I'll have to live with though.
Storing everything as values within the code is the best way to store my data?
Is there anything particular type of format I should use in the MySQL colum, I was thinking maybe tinyint(9)?
Is the above method redundant and pointless?
I assume all these database connections could cause some unwanted server load, however the 15 minute refresh time should combat that.
Is there a way to properly combat delays with perhaps client updating not in time for the reporter so it picks up false data, perhaps a fail safe for a column containing last updated time?
You probably don't need the tool that gathers information per se. The web app (real time monitor) can do that, since the clients are storing their information in the same database. The web app can access the database every 15 minutes and display the data, without the intermediate step of saving it again. This will provide the web app with the latest information instead of a potential 29-minute delay.
In other words, the clients are saving the connection information once. Don't duplicate it in the database.
MySQL should work just about as well as anything.
It's a bad idea to hard code "everything". You can use application settings or a MySQL table if you need to store IPs, etc.
In an application like this, the conversion will more than offset the data savings of a tinyint. I would use the most convenient data type.
I've been asked for a quick turn around on this. The group I'm assisting has a .MDB database where offsite workers that don't have internet all the time. Thus, way back the team implemented an Access DB which allows for synchronization.
As their team grew bigger they started running into the following issues:
Remote synching – when an user tries to synch from a worksite, more often than not, the database will crash either due to loss of wireless signal, program timing out, or Inspector manually shutting down due to time (i.e., 30 or more minutes)
Multiple synchers – we are unable to synch multiple at one time (there are currently 34 users in 3 different territories). If someone is synching and another person tries to synch at the same time, the second user will end up with an error message. They will have to shut down their DB and try to synch at a later time.
Incomplete synchs – sometimes when an worker synch’s his/her DB, not all the line items will copy over to the Master file which can cause confusion during review.
Is there any work arounds or items I can look into to resolve these?
I have little resources and time so anything involving a new server might not work.
THanks
It sounds as though you are mainly adding new data from different field operatives, rather than everyone updating existing data, if this is the case then that's good and you could try the following:
Ensure all the tables have "Replication ID's" for the Primary Keys as this will ensure no two operatives create conflicting records.
The synchronisation process should then be amended to take a snapshot of said table/tables to a .txt file on the operatives machine and then this file transferred back to the source machine.
Then at the end of the day or more often if required, the master copy should be setup to import the new data from all the text files it has received, as there will be no conflicting Primary Keys you should be ok, just remember to insert only those where the Primary Key is not already in the table.
Hope all that makes sense : )