Nodejs and mysql duplicate issue - mysql

I'm using the mysql module for nodejs, and my program is running multiple workers with the cluster package.
Each worker get some tweets and store them in the database, but each record has to be unique by the tweet_id.
Now, I can see that a lot of duplicate records are present in my database, even if I'm checking for duplicates before inserting.
So, did you ever experienced this ? Is there a solution ?

Me a newbee but try to help.
As you have multiple workers
under default transaction isolation level
when worker A is writing to the db
worker B may have already wrote TWEET
but its invisible to A
one common solution is set a unique index to tweet_id column

Related

MySQL - Is it possible to run multiple synchronous inserts?

I googled and searched on SO, but was not able to find an answer; maybe you could point me to some reference/docs?
This is more about understanding the way MySQL treats table contents while inserting.
I have a table (Myisam) which has an auto-increment primary key 'autoid'. I am using a simple script to insert 1000s+ of records. What I am trying to do is running multiple instances of this script (you can image it similar to accessing the script from different machines at same time).
Is MySql capable of distributing the auto-increment primary keys accordingly without any further action from my side or do I have to do some sort of table locking for each machine? Maybe I have to choose InnoDb over MyIsam?
What I am trying to achieve is: irrespective of how many machines are simultaneously triggering the script, all inserts should be completed without skipping any auto-increment id or throwing errors like "Duplicate Value for...".
Thanks a lot
The whole point of using a database is that can handle situations like this transactionally. So yes, this scenario works fine on every commonly used DBMS system, including MySQL.
How do you think the average forum would work with 50 users simultaneously posting replies to a topic, all from forked parallel Apache processes so possible only microseconds apart, or from multiple loadbalanced webservers?
Internally it just uses a mutex/semaphore like any other process when accessing and incrementing the shared resource (the autoincrement value of a table in this case) to mitigate the inherent race conditions.

How to buffer multiple DB insert requests for later delayed bulk execution with mySQL and C#?

I have a situtation where the application gathers a long list of DB inserts like about one hundered per second from external devices. The application that the receives these requests runs on a different server than the mySQL DB itself.
Since I do not want to open and close the DB for every single insert I thought it would be a good idea to gather them first in some sort of queue and then insert them alltogether.
While I was thinking about using the DataSet mySQL construct for this I have one problem.
I need to insert some data into table A with autoincrement key.
And this key I then need to use to insert a subsequent record in table B.
While in normal mySQL I would just use LAST_INSERT_ID() for this purpose - but when using DataSet to gather the insert requests I do not know how to do that.
What would be the correct way to solve this?
ps or can I just insert at the time of the requests and don't worry about the fact that the mySQL is runnning on a remote server. So there is no need to gather these requests in the first place for a later bulk insert?
We are speeding about performance issues here, so it is important to look at alternatives. You are looking at storing persistent data. Therefore, the most options might be :-
sessions
files
databases
Sessions are not recommended for operational type data and have limitations regarding size of data that can be stored. I would exclude this option.
Files are generally the slowest thing on your computer, so opening, writing to and appending to a file might not be a most efficient.
That would leave databases. Personally, I would recommend this option over the other 2 I identified. I hear your concern about continuous opening of database connections. PDO has an anwer for this.
$dbh = new PDO('mysql:host=localhost;dbname=mydatabase', $user, $pass,
array( PDO::ATTR_PERSISTENT => true
));
for your LAST_INSERT_ID():
select max(id) from yourdb.yourtbl;

Preventing duplicate database inserts/updates in our Rails app from simultaneous transactions

As our Rails application deals with increasing user activity and load, we're starting to see some issues with simultaneous transactions. We've used JavaScript to disable / remove the buttons after clicks, and this works for the most part, but isn't an ideal solution. In short, users are performing an action multiple times in rapid succession. Because the action results in a row insert into the DB, we can't just lock one row in the table. Given the high level of activity on the affected models, I can't use the usual locking mechanims ( http://guides.rubyonrails.org/active_record_querying.html#locking-records-for-update ) that you would use for an update.
This question ( Prevent simultaneous transactions in a web application ) addresses a similar issue, but it uses file locking (flock) to provide a solution, so this won't work with multiple application servers, as we have. We could do something similar I suppose with Redis or another data store that is available to all of our application servers, but I don't know if this really solves the problem fully either.
What is the best way to prevent duplicate database inserts from simultaneously executed transactions?
Try adding a unique index to the table where you are having the issue. It won't prevent the system from attempting to insert duplicate data, but it will prevent it from getting stored in the database. You will just need to handle the insert when it fails.

MySQL table locking for a multi user JSP/Servlets site

Hi I am developing a site with JSP/Servlets running on Tomcat for the front-end and with a MySql db for the backend which is accessed through JDBC.
Many users of the site can access and write to the database at the same time ,my question is :
Do i need to explicitly take locks before each write/read access to the db in my code?
OR Does Tomcat handle this for me?
Also do you have any suggestions on how best to implement this ? I have written a significant amount of JDBC code already without taking the locks :/
I think you are thinking about transactions when you say "locks". At the lowest level, your database server already ensure that parallel read writes won't corrupt your tables.
But if you want to ensure consistency across tables, you need to employ transactions. Simply put, what transactions provide you is an all-or-nothing guarantee. That is, if you want to insert a Order in one table and related OrderItems in another table, what you need is an assurance that if insertion of OrderItems fails (in step 2), the changes made to Order tables (step 1) will also get rolled back. This way you'll never end up in a situation where an row in Order table have no associated rows in Order items.
This, off-course, is a very simplified representation of what a transaction is. You should read more about it if you are serious about database programming.
In java, you usually do transactions by roughly with following steps:
Set autocommit to false on your jdbc connection
Do several insert and/or updates using the same connection
Call conn.commit() when all the insert/updates that goes together are done
If there is a problem somewhere during step 2, call conn.rollback()

How to cache latest inserted data in MySQL?

Is it possible to cache recently inserted data in MySQL database internally?
I looked at query cache etc (http://dev.mysql.com/doc/refman/5.1/en/query-cache.html) but thats not what I am looking for. I know that 'SELECT' query will be cached.
Details:
I am inserting lots of data to MySQL DB every second.
I have two kind of users for this Data.
Users who query any random data
Users who query recently inserted data
For 2nd kind of users, my table has primary key as unix time-stamp which tells me how new the data is. Is there any way to cache the data at the time of insert?
One option is to write my own caching module which cache data and then 'INSERT'.
Users can query this module before going to MySQL DB.
I was just wondering if something similar is available.
PS: I am open to other database providing similar feature.
Usually you get the best performance from MySQL if you allow a big index cache (config setting key_buffer_size), at least for MyISAM tables.
If latency is really an issue (as it seems in your case) have a look at Sphinx which has recently introduced real-time indexes.