How to manage a multi user job list in MySQL? - mysql

I have a list of jobs for multiple users stored in a MySQL table. I'm currently planning to do the following in the client app:
Ask MySQL server for jobs that are not allocated to anybody.
Mark the first job allocated to myself.
But the problem is, if 2 users somehow get the same list of "unallocated" jobs, they will both mark the same job as allocated. So how to manage such a situation, and ensure that each user gets only a unique unallocated job?
I'm trying to avoid using stored procs since I want all code within the app if possible.

Sorry, the way you like it, you will need a trigger to avoid sending the same list, allocating at the moment of quering. Or you can blind accept and allocating at the moment of quering...
Someting like this:
Update jobs set allocatedto=myid where status=notallocated limit 1
select * from jobs where status=allocated and allocatedto=myid limit 1;
select * from jobs where status=not allocated;

Related

Concurrent inserts mysql - calling same insert stored proc before the first set of inserts is completed

I am working on social networking site, which includes the creation of media content and also records the interaction of users with the created content.
Background of issue - approach used currently
There is a page called news-feed, which displays the content and activity done with the content by the users they are following on site.
Display order of the content changes with more and more user interactions(eg. if there are more number of comments on a post, its likely to be shown on top of the one with lesser number of comments. However, number of comments is just one of the attributes used to rank the post).
I am using mysql(innodb) database to store the data as follows:
activity_master : activities allowed to be part of news feed(post, comment etc)
activity_set : for aggregation of activities on the same object
activity_feed: details of actual activity
Detailed ER Diagram is at the end of question
Scenario
A user(with 1000 followers) posts something, which initiates an async call to the procedure to insert the relevant entries(1000 rows for 1000 followers) in above mentioned tables for all followers.
Some followers started commenting(activity allowed to be part of news feed) before the above call is completed which initiates another call to the same procedure to insert entries(x total number of their own followers) of this activity for their particular set of followers. (e.g User B commented on this post)
All the insert requests(which seems way too many) will have to be processed in queue by innodb engine
Questions
Is there a better and efficient way to do this? (I definitely think there would be one)
How many insert requests can innodb handle in its default configuration?
How to avoid deadlock (or resource congestion at database end) in this case
Or is there any other type of database best suited in this case
Thanks for showing your interest by reading the description, any sort of help in this regard is much appreciated and let me know if any further details are required, thanks in advance!
ER Diagram of tables (not reputed enough to embed the image directly :( )
A rule of thumb: "Don't queue it, just do it".
Inserting 1000 rows is likely to be untenable. Tomorrow, it will be 10000.
Can't you do the processing on the select side instead of the insert side?

How do I trigger an event in a database when a column has certain value?

I am working on building a social network application similar to twitter where users have newsfeeds, followers, posts ect...
I am trying to implement a feature which would make posts (a post in my application is equivalent to a post on facebook) EXPIRE after a certain amount of time.
What do I mean by expire?
1. Post disappears from news feed
2. User whose post expires, relieves a notification alerting them that the post
has expired. On a programmatic level this is just a insert statement being executed when the post expires.
What have I done so far?
Making posts disappear from the newsfeed was simple, I just adjusted the query which return the newsfeed by checking the date_of_expiration column and compare it to NOW().
Creating notifications when the post expired was trickier.
My initial approach was to make a mysql CRON job which ran every 2 minutes, triggering an event which would select all posts where NOW() > date_of_expiration and use the selected data to insert a notification entry into my notification table.
This works, however, I do not want to use a CRON job. The 2 minute delay means a user might have to wait a full 2 minutes after the post actually expired before receiving the notification telling the user their post expired. I'm assuming if the table had many entries this wait time could be even greater depending on how long the it takes to run the select and insert statements.
What am I looking for?
Another solution to inserting a notification into the notification table when a users post expires.
I was thinking that if there was a way to create some kind of event that would trigger when the expiration date value for each row (in the posts table) is greater than NOW(), it would be a very good solution to my problem. Is something like this possible? What is commonly done in this scenario?
FYI my stack is: MYSQL, JAVA with an Android+IOS front end, but I don't mind going out of my stack to accomplish this feature
I am not sure how your application works. But here is a though, I have done in an application that interact with a telephone system where each second count.
I implemented a server-sent event where a script will keep checking for new updates every second. Then the script will update the client with any new/expired notifications.
I am not sure if this is what you are looking for but it is worth sharing.
EDITED
Since you are leaning more toward having a table for the notification why now create the notification at run time with in a transaction?
START TRANSACTION;
INSERT INTO posts(comment, createdBy....)Values('My new comment',123);
SELECT #lastID := LAST_INSERT_ID();
-- Create temporary table with all the friends to notify
-- "this will help you with performance" Hint then engine type
-- Make sure the final userId list is unique otherwise you will be
-- inserting duplicate notifications which I am sure you want to avoid
CREATE TEMPORARY TABLE myFriends (KEY(user_id)) ENGINE=MEMORY
SELECT 1 FROM users AS s
INNER JOIN friends AS f ON f.friedId = s.userId
WHERE s.userID = su.userID
-- insert the notifications all at once
-- This will also help you with the performance a little
INSERT INTO notifications(userID, postId, isRead)
SELECT userID, #lastID AS postId,'0' AS isRead
FROM users AS su
INNER JOIN myFriends AS f ON f.userId = su.userId;
-- commit the transaction if everything passed
commit;
-- if something fails
rollback;
more thoughts, depending how busy you application will be things to consider
Make sure your server is built with good hardware. lots of RAM 64GB+ and a good hard drives, SSD will be great if possible,
Also, you may consider using GTID replication to have more sources for read.
This is hard to answer, since i don't understand well enough your database schema or the access pattern of the clients. However, I have some ideas that might help you:
What about marking the posts table as expired with a separate "expired" column? If you do that, you could select the posts that are to be sent to the client by getting all posts that are not marked as expired. This of course will include also the messages that are newly expired (NOW() > date_of_expiration) but are not marked yet. Let your java program sort the freshly expired posts out before sending the reply. At this point in your program you already have the posts that need to be marked and these are the exact same ones that need to be inserted into the notification table. You can just do that at this place in your Java program.
Advantage
No need for EVENTS or Cron jobs at all. This should be fairly efficient if you set indexes correctly in your tables. No need for a JOIN with the notification table.
Disadvantage
You need to store the expired info extra in a column, that may require a schema change.

Is it better to store list of each user's Blocked users for query exclusion in $_SESSION var, or to exclude in "real-time" with sub-query?

On one of my PHP/MySQL sites, every user can block every other user on the site. These blocks are stored in a Blocked table with each row representing who did the blocking and who is the target of the block. The columns are indexed for faster retrieval of a user's entire "block list".
For each user, we must exclude from any search results any user that appears in their block list.
In order to do that, is it better to:
1) Generate the "block list" whenever the user logs in by querying the Blocked table once at login and saving it to the $_SESSION (and re-querying any time they make a change to their "block list" and re-saving it to the $_SESSION), and then querying as such:
NOT IN ($commaSeparatedListFromSession)
or
2) Exclude the blocked users in "real-time" directly in the query by using a sub-query for each user's search query as such:
NOT IN (SELECT userid FROM Blocked WHERE Blocked.from = $currentUserID) ?
If the website is PHP and the blocklist is less than say 100 total per user I would store it in a table, load it to $_SESSION when changed/loggging in. You could just as easily load it from SQL on each page load into a local variable however.
What I would store in $_SESSION is a flag 'has_blocklist_contents' that would decide whether or not you should load or check the blocklist on page load.
Instead of then using a NOT IN with all of your queries the list I think it might be smarter to filter them out using PHP.
I have two reasons for wanting to implement this way:
Your database can re-use the SQL for all users on the system resulting in a performance boost for retrieving comments and such.
Your block list will most of the time be empty, so you're not adding any processing time for the majority of users.
I think there is 3rd solution to it. In my opinion this would be the better way to go.
If you can write this
NOT IN (SELECT userid FROM Blocked WHERE Blocked.from = $currentUserID)
Then you can surely write this.
....
SomeTable st
LEFT JOIN
Blocked b
ON( st.userid = b.userid AND Blocked.from = $currentUserID)
WHERE b.primaryKey IS NULL;
I hope you understand what I mean by the above query.
This way you get the best of both worlds i.e. You don't have to run 2 queries, and you don't have to save data in $_SESSION
Don't use the $_SESSION as a substitute for a proper caching system. The more junk you pile into $_SESSION, the more you'll have to load for each and every request.
Using a sub-select for exclusions can be brutally slow if you're not careful to keep your database tuned. Make sure your indexes are covering all your WHERE conditions.

Limits of SQL IN statement

I need to maintain an application of mass email sender. The last programmer did a nice job, but the boss feel that a little optimizations could be done on the database treatment. When a campaing is finished, the report gives the option to save the selected segment. For instance, we send 50000 emails, and we want to save the segment of people who open the newsletter (2000) The tool now creates a new segment duplicating the contacts (with an INSERT), but I think we could improve the tool by saving the id of each contact.
I would like to know if saving the contacts whith an sql IN statement would increase the performance of the tool, or there is another way to perform this. Something like:
Create a list of the ids of the contacts SELECT * FROM contacts
SELECT * FROM contacts WHERE idContact IN (all_contacts_comma_separated) --> I would save
this
Thanks in advance
PD: It's a production environment, so I need to be sure before made any changes :-(
You didn't say where the list of people who opened the email currently resides. If it's not in the database what code/process will you use to generate your IN statement list? If it is in the database why not JOIN your tables to get the information?
Either way I'd not recommend using IN when you have 2000 items in the list.
It might also be worth you reading the following:
SQL Server: JOIN vs IN vs EXISTS - the logical difference
It's written with SQL server in mind (and I'm not sure it all directly applies to MySQL) but the concepts are interesting and you should perform testing before changing your production environment, as eggyal's comment suggested.

MySQL and Scheduled Updates by User Preference?

I'm developing an application that
stores an e-mail address (to a user) in a table.
stores the number of days the user would like to stay in the table.
takes the user off the table when the number of days is up.
I don't really know how to approach this, so here are my questions:
Each second, do I have the application check through every table entry for the time that's currently stored in, let's say, the time_left column?
Wouldn't (1) be inefficient if I'm expecting a significant number (10,000+) users?
If not (2), what's the best algorithm to implement for such a task?
What's the name of what I'm trying to do here? I'd like to do some more research on it before and while I'm writing the script, so I need a good search query to start with.
I plan on writing this script in Perl, although I'm open to suggestions with regards to language choice, frameworks, etc... I'm actually new to web development (both on the back-end and front-end), so I'd appreciate it if you could advise me precisely.
Thank you!
*after posting, Topener asked a valid question:
Why would you store users if they won't get requested?
Assume the user is just sitting in the database.
Let's say I'm using the user's e-mail address every 5 minutes from the time the user was added to the database (so if the user's entry was born at 2:00PM-October 18, the user would be accessed at 2:05, 2:10, etc...).
If the user decides that they want out of the database in 10 days, that means their entry is being accessed normally (every 5 minutes from 2:00PM-October 18) until 2:00PM-October 28.
So to clarify, based on this situation:
The system would have to constantly compare the current time with the user's expiration date, wouldn't it?
you should not store the time_left variable, bt you should store vaildTo. This way, whenever the user is requested from the database, you can check if it is valid.
If not, then do whatever you want with it.
This approach wont let you make any cronjobs, or will cost you extramload.
Hey Mr_spock I like the above answer from Topener. Instead of storing a number of days the user would like to be valid, store the day the user would like to be be removed.
Adding a field like validToDate, which would be a DATETIME field type, you can do a query like
delete from tablename where validToDate <= NOW()
where
the italicized text is a SQL query
tablename is the name of the table in question
NOW() is a valid sql function that returns the current DATETIME
validToDate is a field of type DATETIME
This has what ever efficiency SQL server promises, I think it is fairly good.
You could write a separate program/script which makes the delete query on a set interval. If you are on a Linux machine you can create a cron job to do it. Doing it every second may become very resource intensive for slower machines and larger tables, but I don't believe that will become an issue for a simple delete query.