Problem:
I'm working on a project that consists of multiple studies and a group of users that each participates in one of the studies. Every study divides the participants in two groups based on a list that is generated using some randomization algorithm. Upon signing up each user is assigned to a study and their group is determined by the order of signup and the corresponding index in the groups list. For example if study A has total seats of 4 and the groups list is [0, 1, 1, 0] the first user is assigned to group 0, the second one to 1 and so on until the study is full.
There are other user roles defined in the project which are the admins and can be assigned to multiple studies without occupying a position in the study. That means the relation of users to studies is n:m.
The problem that occurs in the current implementation is a race condition when assigning users to studies and study groups. The code is provided below and the way it works is that it overrides the addUser of Study model and whenever a user is added to a study it checks how many users are already in the study and gives the user the current index of the group list which is the seatsTaken number. This works so long as the users are added to the study in intervals. But whenever multiple users are added at the same time the asynchronous queries cause a race condition and the seatsTaken count is affected by other users signing up at the same time.
In the example below the users which are assigned to study A in intervals have the correct groups assigned but study B with simultaneous queries has incorrect groups assignment.
const Sequelize = require('sequelize');
const assert = require('assert');
const sequelize = new Sequelize({
database: 'database',
username: 'username',
password: 'password',
dialect: process.env.DB_DIALECT || 'sqlite',
storage: 'db.sqlite',
logging: false
});
const User = sequelize.define('user', {
id: {
type: Sequelize.INTEGER,
autoIncrement: true,
primaryKey: true,
},
group: {
type: Sequelize.INTEGER,
allowNull: true,
defaultValue: null
}
});
// Groups list for studies 'A' and 'B'
const groupLists = {
a: [0, 1, 1, 0],
b: [1, 0, 1, 0]
}
const Study = sequelize.define('study', {
id: {
type: Sequelize.INTEGER,
autoIncrement: true,
primaryKey: true,
},
name: {
type: Sequelize.STRING,
allowNull: false
},
seatsTotal: {
type: Sequelize.INTEGER,
defaultValue: 0
}
});
// n:m relation between users and studies
User.belongsToMany(Study, {through: 'UserStudy'});
Study.belongsToMany(User, {through: 'UserStudy'});
// Overridden 'addUser' method for groups assignment
Study.prototype.addUser = async function(user) {
// Count already occupied seats
const seatsTaken = await User.count({
include: [{
model: Study,
where: {
name: this.name
}
}]
});
// Add the user to study
await Study.associations.users.add(this, user);
// Assign the group of the user based on the seatsTaken
await user.update({ group: groupLists[this.name][seatsTaken] });
}
sequelize.sync({force: true}).then(async () => {
// Studies 'A' and 'B' with 4 seats
await Study.bulkCreate([{name: 'a', seatsTotal: 4}, {name: 'b', seatsTotal: 4}]);
// 8 users
await User.bulkCreate(new Array(8).fill(0).map(() => ({})));
const studies = await Study.findAll();
const users = await User.findAll();
// Assign half of the users to study 'A' in intervals
users.filter((_, idx) => idx % 2 === 0).forEach((user, idx) => {
setTimeout(() => {
studies[0].addUser(user);
}, 100*idx);
});
// Assign the other half to study 'B' at the same time
await Promise.all(users.filter((_, idx) => idx % 2 === 1).map(user => {
return studies[1].addUser(user);
}));
setTimeout(async () => {
// Wait for all queries to finish and assert the results
const userStudies = await User.findAll({
include: [Study]
});
const studyUsersA = userStudies.filter(u => u.studies.some(s => s.name === 'a'));
const studyUsersB = userStudies.filter(u => u.studies.some(s => s.name === 'b'));
try {
console.log('Group list A actual:', studyUsersA.map(u => u.group), 'expected:', groupLists['a']);
assert.deepEqual(studyUsersA.map(u => u.group).sort((a, b) => a-b), groupLists['a'].sort((a, b) => a-b), 'Group list A is not assigned correctly');
console.log('Group list B actual:', studyUsersB.map(u => u.group), 'expected:', groupLists['b']);
assert.deepEqual(studyUsersB.map(u => u.group).sort((a, b) => a-b), groupLists['b'].sort((a, b) => a-b), 'Group list B is not assigned correctly');
console.log(`Passed: Group lists are assigned correctly.`);
} catch (e) {
console.log(`Failed: ${e.message}`);
}
}, 500);
});
Related questions that I could find are either about incrementing one value in one table or they just mention transactions and locks without providing example code:
Avoiding race condition with Nodejs Sequelize
How to lock table in sequelize, wait until another request to be complete
Addition and Subtraction Assignment Operator With Sequelize
Database race conditions
Limitations:
The project stack is nodejs, expressjs and sequelize with
mysql database for production and sqlite for development and
tests.
The solution should work for both sqlite and mysql.
It is preferred that the groups lists are not stored in the database. The lists are generated by an algorithm and randomization seeds but in the example code they're hardcoded.
The solution should be a sequelize solution and not throttling or queuing user requests in express server.
In the case of simultaneous requests it is not strictly required that the exact order of users signup is preserved since it's not really verifiable which user is added to the study first, but the final result must have the correct number of 0s and 1s which are the assigned groups.
I've tried sequelize transactions but I had lots of issues with sqlite compatibilities and also express requests failing because of database locks but that might have been because of my lack of knowledge on how to do it properly. The limitation here is that requests should not fail because of database locks.
The provided code is a minimal example reproducing the issue. Please use it as a base.
To run the code
npm install sequelize sqlite3 mysql2
sqlite:
node index.js
mysql (using docker):
docker run -d --env MYSQL_DATABASE=database --env MYSQL_USER=username --env MYSQL_PASSWORD=password --env MYSQL_RANDOM_ROOT_PASSWORD=yes -p 3306:3306 mysql:5.7
DB_DIALECT=mysql node index.js
Note:
The example code is only for demonstration of the issue in the current implementation and the intervals and timeouts are there to simulate the users interaction with the server. Please do not focus on the patterns in the example being wrong and rather focus on the problem itself and how it can be approached in a better way while meeting the requirements mentioned in the limitations section.
This is part of a fairly big project and I might update the requirements based on the actual project requirements and the feedbacks that I receive here.
Please let me know if I should provide any other information. Thank you in advance.
I'm afraid this is expected behavior.
You declare seatsTaken as an asynchronously computed property.
You insert several users asynchronously, too.
You don't isolate each user creation within its own transaction.
Because of this, you see the changing state of one transaction, and it's changing rather chaotically because you do not specify any certain order. Eventually the state becomes consistent, but your way to achieve that state was just to wait for some time.
I suppose the easiest way to achieve consistency would be wrapping each insertion in a transaction.
If a transaction per insert is too slow, you can bulk insert all user records in one transaction, and then count seats taken in another, or even just do everything synchronously.
In any case, if you want consistency, you need logical serialization, a clear "before-after" relation. Currently your code lacks it, AFAICT.
Related
I have a table, called "bsService", where I save my created services. Those services have some relations, like categories, activities and others.
I'm trying to get services where categories was softDeleted. Example: service 1 relates to category 1, I softDeleted category 1, and now this service doesn’t return on findAll even if add 'withDeleted: true' on the query.
Here's my findAll method. I want all data even if a relation is softDeleted.
findAll = async (
where?: WhereConditions,
transactionEntityManager: EntityManager = getManager(),
order?: 'ASC' | 'DESC',
withDeleted?: boolean,
): Promise<BSService[]> => transactionEntityManager.find(BSService, {
withDeleted,
where,
relations: ['sla', 'activity', 'activity.category', 'department', 'department.company', 'attendance', 'logs', 'logs.user', 'requestingAgent', 'alocatedAgent', 'category', 'requestingAgent.jobs', 'alocatedAgent.jobs'],
order: {
updateAt: order,
},
});
The 'withDeleted' becomes true, depending on which page client is using. For that example, it is always true.
withDeleted in .find() only applies to the top layer, not relations.
Here's someone with a similar issue.
Here's a PR that shows how you can solve it by using a query builder with .withDeleted() before the relations you want to include. There's some discussion about supporting this with .find() but it seems like it won't be any time soon.
await manager
.getRepository(BSService)
.createQueryBuilder('bsservice')
.withDeleted() // Above innerJoinAndSelects
.innerJoinAndSelect('bsservice.sla', 'sla')
.innerJoinAndSelect('bsservice.activity', 'activity')
// ... more inner joins ...
.getMany();
I'm working with two tables in particular. Users and Friends. Users has a bunch of information that defines the User whereas Friends has two columns aside from id: user_id and friend_id where both of them are a reference to the User table.
I'm trying to find all of the users friends in as little calls to the db as possible and I currently have 2. One to retrieve the id of a user first from a request, then another to Friends where I compare the IDs from the first call and then a third call that passes the array of friends and find all of them in the Users table. This already feels like overkill and I think that with associations, there has to be a better way.
Modification of the tables unfortunately is not an option.
One thing that I saw from "http://docs.sequelizejs.com/manual/querying.html#relations---associations"
I tried but got an interesting error.. when trying to repurpose the code snippet in the link under Relations/Associations, I get "user is associated to friends multiple times. To identify the correct association, you must use the 'as' keyword to specify the alias of the association you want to include."
const userRecord = await User.findOne({
where: { id }
})
const friendsIDs = await Friends.findAll({
attributes: ["friend_id"],
where: {
user_id: userRecord.id
}
}).then(results => results.map(result => result.friend_id));
const Sequelize = require("sequelize");
const Op = Sequelize.Op;
return await User.findAll({
where: {
id: { [Op.in]: friendsIDs }
},
});
Above for my use case works. I'm just wondering if there are ways to cut down the number of calls to the db.
Turns out Sequelize handles this for you if you have the proper associations in place so yes, it was a one liner user.getFriends() for me.
I think I've done enough research on this subject and I've only got a headache.
Here is what I have done and understood: I have restructured my MySQL database so that I will keep my user's data in different tables, I am using foreign keys. Until now I only concluded that foreign keys are only used for consistency and control and they do not automatize or do anything else (for example, to insert data about the same user in two tables I need to use two separate insert statements and the foreign key will not help to make this different or automatic in some way).
Fine. Here is what I want to do: I want to use Sequelize to insert, update and retrieve data altogether from all the related tables at once and I have absolutely no idea on how to do that. For example, if a user registers, I want to be able to insert the data in the table "A" containing some user information and in the same task insert in the table B some other data (like the user's settings in the dedicated table or whatever). Same with retrievals, I want to be able to get an object (or array) with all the related data from different tables fitting in the criteria I want to find by.
Sequelize documentation covers the things in a way that every thing depends on the previous one, and Sequelize is pretty bloated with a lot of stuff I do not need. I do not want to use .sync(). I do not want to use migrations. I have the structure of my database created already and I want Sequelize to attach to it.
Is it possible insert and retrieve several rows related at the same time and getting / using a single Sequelize command / object? How?
Again, by "related data" I mean data "linked" by sharing the same foreign key.
Is it possible insert and retrieve several rows related at the same
time and getting / using a single Sequelize command / object? How?
Yes. What you need is eager loading.
Look at the following example
const User = sequelize.define('user', {
username: Sequelize.STRING,
});
const Address = sequelize.define('add', {
address: Sequelize.STRING,
});
const Designation = sequelize.define('designation', {
designation: Sequelize.STRING,
});
User.hasOne(Address);
User.hasMany(Designation);
sequelize.sync({ force: true })
.then(() => User.create({
username: 'test123',
add: {
address: 'this is dummy address'
},
designations: [
{ designation: 'designation1' },
{ designation: 'designation2' },
],
}, { include: [Address, Designation] }))
.then(user => {
User.findAll({
include: [Address, Designation],
}).then((result) => {
console.log(result);
});
});
In console.log, you will get all the data with all its associated models that you want to include in the query
I'm using Sequelize in Node.js with Apollo-Server and Express.js.
When making queries that go deeper and deeper, GraphQL is looping my models and doing a separate query by ID on each of those.
For example, if I get user(userId) > playthroughs > scores, this will do a lookup for that user (no problem), then a lookup for all the playthroughs with that userId (still no a big deal), but then to get the scores, it loops each playthroughId and does a completely separate query on each. This is ridiculously inefficient and causes my queries to take way longer than they should.
Instead of looping:
SELECT scoreValue
FROM scores
WHERE playthroughId = id
I'd really like to grab the array myself and do that loop like this:
SELECT scoreValue
FROM scores
WHERE playthroughId IN (...ids)
This also happened when I used the reference GraphQL from Facebook last year, so I don't think it's specific to Apollo's implementation.
I'd like to know how I can tweak these queries so they're not taking such a performance hit.
Example resolvers:
const resolvers = {
Query: {
user: (_, values) => User.findOne(formatQuery(values))
.then(getDataValues),
},
Playthrough: {
score: ({ playthroughId }) => Score.findOne(formatQuery({ playthroughId }))
.then(getDataValues),
},
User: {
playthroughs: ({ userId }, { take }) => Playthrough.findAll(formatQuery({ userId, take, order: 'playthroughId DESC' }))
.then(getAllDataValues),
},
}
In addition to graphql, facebook has also released a much lesser known project, dataloader.
What it does it batch several requests in the same tick into one. So your code would be something like
scoreLoader = new Dataloader(keys => {
return Score.findAll({ where: { id: keys } }).then(() => {
//Map the results back so they are in the same order as keys
})
});
score: ({ playthroughId }) => scoreLoader.load(playthroughId).then(getDataValues)
Of course, having a load for each field is going to be tedious. So instead you can use dataloader-sequelize, which wrap all calls to association.get (i.e. Playthrough.getScores()) and calls to findOne / findById to dataloader calls, so several calls are batched in one.
Since you are building a graphql API backed by sequelize, you might also be interested in https://github.com/mickhansen/graphql-sequelize/, which provides sequelize specific helpers for grahpql, and uses dataloader-sequelize below the hood
By below reference I understood how map many to many with a relationship table
http://sequelizejs.com/docs/latest/associations#many-to-many
User = sequelize.define('User', { user_name : Sequelize.STRING})
Project = sequelize.define('Project', { project_name : Sequelize.STRING })
UserProjects = sequelize.define('UserProjects', {
status: DataTypes.STRING
})
User.hasMany(Project, { through: UserProjects })
Project.hasMany(User, { through: UserProjects })
But how to query Project 's of a User
I Tried like
User.find({where:{id:1},include,[UserProjects]})
User.find({where:{id:1},include,[Projects]})
User.find({where:{id:1},include,[UserProjects]})
User.find({where:{id:1},include,[Projects]})
But i dont get results
Sequelize created table like below
users(id,name)
projects(id,project_name)
userprojects(id,UserId,ProjectId)
I tried https://github.com/sequelize/sequelize/wiki/API-Reference-Associations#hasmanytarget-options
User.find({where:{id:1}}).success(function(user){
user.getProjects().success(function (projects) {
var p1 = projects[0] // this works fine but 2 queries required. I expect in single find. without getProjects
p1.userprojects.started // Is this project started yet?
})
})
How to get all the projects of a USER ??
You should be able to get all of the properties of the user in two different ways: using includes and getting the projects from a user instance.
Using includes the code you submitted above is almost right. This method will only make one query to the database using the JOIN operation. If you want all of the users with their corresponding projects, try:
User.findAll({include: [Project]})
You can also get the projects directly from a user instance. This will take two queries to the database. The code for this looks like
User.find(1).then(function(user) {
user.getProjects().then(function(projects) {
// do stuff with projects
});
});
Does this work for you?