Sharding with sequelize.js

Sharding with sequelize.js - mysql

This is a question about implementing sharding through Sequelize.
Currently I have a sharded databases which I need to support in my app, but this gives me a lot of trouble. I tried to use your framework but did not succeed. I will explain the architecture:
Suppose I have 3 databases, one is "meta db" which stores map of user data (i.e. on which shard the user data is) and two shards with equal structure storing user data.
var MetaUser = sequelize.define("MetaUser", {
shardId: { type: DataTypes.INTEGER(10), allowNull: false }
}
};
var User = sequelize.define("User", {
name: { type: DataTypes.STRING, allowNull: false },
about: { type: DataTypes.STRING(2048) },
}
};
The problem is - all this databases can be on different servers so I need to create 3 different connections, or 3 different sequelize instances. And the following looks like (suppose dbConfig is read from json file and models are described in isolated files):
var metaConnection = new Sequelize(dbConfig.meta.database,
dbConfig.meta.username,
dbConfig.meta.password,
dbConfig.meta.options);
module.exports.dbMeta = metaConnection;
var user1Connection = new Sequelize(dbConfig.shard1.database,
dbConfig.shard1.username,
dbConfig.shard1.password,
dbConfig.shard1.options);
module.exports.dbUser1 = user1Connection;
var user2Connection = new Sequelize(dbConfig.shard2.database,
dbConfig.shard2.username,
dbConfig.shard2.password,
dbConfig.shard2.options);
module.exports.dbUser2 = user2Connection;
module.exports.MetaUser = metaConnection.import(__dirname + "/models/MetaUser");
module.exports.User1 = user1Connection.import(__dirname + "/models/User");
module.exports.User2 = user2Connection.import(__dirname + "/models/User");
This gives me problems:
I cannot reference one model from another in model's module since
I don't have access to other model's connection in this model.
I want to use User model in my project, and have no need in MetaUser,
but I have to handle User through MetaUser in order to calculate
necessary shard connection.
If someone experienced similar problems, follow me to the solution, please.

Related

Do I need to use Prisma's connect & disconnect API? Or is it not safe to just update my Relations via IDs as strings?

I am using prisma + mysql (on planetscale). When I link two items that are in different tables, I normally use connect or disconnect:
const getUser = await prisma.user.update({
where: {
id: 9
},
data: {
posts: {
| connect: {
| id: 11
| },
create: {
title: "My new post title"
}
}
}
})
I am wondering whether that's necessary or why that's necessary?
I also noticed that I can just update records in my database by updating the id (as a plain string), and it will still work. e.g.:
// example for updating a one-to-many relationship:
const getUser = await prisma.user.update({
where: {
id: 9
},
data: {
postId: "123192312i39123123"
}
}
})
... or if it's an explicit many-to-many relation, I can just edit the row in the relation-table & update the id.
Is this a bad way of doing things? Am I going to break something later down the line in doing it this way?

Your cloud provider is not relevant in the context of the question. It will not affect how your framework(prisma) behaves in updates.
I am wondering whether that's necessary or why that's necessary?
You have a user with a one to many relation: user => n posts.
You have an existing post in the db, and you want to add that post to the posts collection of a user.
That posts relation can be either explicit or implicit. The connect clause handles the addition of relation:
{
posts: {
connect: { id: 11 }
}
}
Without using the connect you'd have to create a new post:
{
posts: {
create: {
title: "My new post title"
}
}
}
update records in my database by updating the id (as a plain string)
Not sure what you mean here, mind sharing the schema?
or if it's an explicit many-to-many relation, I can just edit the row in the relation-table & update the id
If it's explicit many-to-many then it's OK to manually edit the id fields. As long as the ids are found and the relation makes sense, there's no problem with manual updates.

Race condition in sequelize queries on multiple tables

Problem:
I'm working on a project that consists of multiple studies and a group of users that each participates in one of the studies. Every study divides the participants in two groups based on a list that is generated using some randomization algorithm. Upon signing up each user is assigned to a study and their group is determined by the order of signup and the corresponding index in the groups list. For example if study A has total seats of 4 and the groups list is [0, 1, 1, 0] the first user is assigned to group 0, the second one to 1 and so on until the study is full.
There are other user roles defined in the project which are the admins and can be assigned to multiple studies without occupying a position in the study. That means the relation of users to studies is n:m.
The problem that occurs in the current implementation is a race condition when assigning users to studies and study groups. The code is provided below and the way it works is that it overrides the addUser of Study model and whenever a user is added to a study it checks how many users are already in the study and gives the user the current index of the group list which is the seatsTaken number. This works so long as the users are added to the study in intervals. But whenever multiple users are added at the same time the asynchronous queries cause a race condition and the seatsTaken count is affected by other users signing up at the same time.
In the example below the users which are assigned to study A in intervals have the correct groups assigned but study B with simultaneous queries has incorrect groups assignment.
const Sequelize = require('sequelize');
const assert = require('assert');
const sequelize = new Sequelize({
database: 'database',
username: 'username',
password: 'password',
dialect: process.env.DB_DIALECT || 'sqlite',
storage: 'db.sqlite',
logging: false
});
const User = sequelize.define('user', {
id: {
type: Sequelize.INTEGER,
autoIncrement: true,
primaryKey: true,
},
group: {
type: Sequelize.INTEGER,
allowNull: true,
defaultValue: null
}
});
// Groups list for studies 'A' and 'B'
const groupLists = {
a: [0, 1, 1, 0],
b: [1, 0, 1, 0]
}
const Study = sequelize.define('study', {
id: {
type: Sequelize.INTEGER,
autoIncrement: true,
primaryKey: true,
},
name: {
type: Sequelize.STRING,
allowNull: false
},
seatsTotal: {
type: Sequelize.INTEGER,
defaultValue: 0
}
});
// n:m relation between users and studies
User.belongsToMany(Study, {through: 'UserStudy'});
Study.belongsToMany(User, {through: 'UserStudy'});
// Overridden 'addUser' method for groups assignment
Study.prototype.addUser = async function(user) {
// Count already occupied seats
const seatsTaken = await User.count({
include: [{
model: Study,
where: {
name: this.name
}
}]
});
// Add the user to study
await Study.associations.users.add(this, user);
// Assign the group of the user based on the seatsTaken
await user.update({ group: groupLists[this.name][seatsTaken] });
}
sequelize.sync({force: true}).then(async () => {
// Studies 'A' and 'B' with 4 seats
await Study.bulkCreate([{name: 'a', seatsTotal: 4}, {name: 'b', seatsTotal: 4}]);
// 8 users
await User.bulkCreate(new Array(8).fill(0).map(() => ({})));
const studies = await Study.findAll();
const users = await User.findAll();
// Assign half of the users to study 'A' in intervals
users.filter((_, idx) => idx % 2 === 0).forEach((user, idx) => {
setTimeout(() => {
studies[0].addUser(user);
}, 100*idx);
});
// Assign the other half to study 'B' at the same time
await Promise.all(users.filter((_, idx) => idx % 2 === 1).map(user => {
return studies[1].addUser(user);
}));
setTimeout(async () => {
// Wait for all queries to finish and assert the results
const userStudies = await User.findAll({
include: [Study]
});
const studyUsersA = userStudies.filter(u => u.studies.some(s => s.name === 'a'));
const studyUsersB = userStudies.filter(u => u.studies.some(s => s.name === 'b'));
try {
console.log('Group list A actual:', studyUsersA.map(u => u.group), 'expected:', groupLists['a']);
assert.deepEqual(studyUsersA.map(u => u.group).sort((a, b) => a-b), groupLists['a'].sort((a, b) => a-b), 'Group list A is not assigned correctly');
console.log('Group list B actual:', studyUsersB.map(u => u.group), 'expected:', groupLists['b']);
assert.deepEqual(studyUsersB.map(u => u.group).sort((a, b) => a-b), groupLists['b'].sort((a, b) => a-b), 'Group list B is not assigned correctly');
console.log(`Passed: Group lists are assigned correctly.`);
} catch (e) {
console.log(`Failed: ${e.message}`);
}
}, 500);
});
Related questions that I could find are either about incrementing one value in one table or they just mention transactions and locks without providing example code:
Avoiding race condition with Nodejs Sequelize
How to lock table in sequelize, wait until another request to be complete
Addition and Subtraction Assignment Operator With Sequelize
Database race conditions
Limitations:
The project stack is nodejs, expressjs and sequelize with
mysql database for production and sqlite for development and
tests.
The solution should work for both sqlite and mysql.
It is preferred that the groups lists are not stored in the database. The lists are generated by an algorithm and randomization seeds but in the example code they're hardcoded.
The solution should be a sequelize solution and not throttling or queuing user requests in express server.
In the case of simultaneous requests it is not strictly required that the exact order of users signup is preserved since it's not really verifiable which user is added to the study first, but the final result must have the correct number of 0s and 1s which are the assigned groups.
I've tried sequelize transactions but I had lots of issues with sqlite compatibilities and also express requests failing because of database locks but that might have been because of my lack of knowledge on how to do it properly. The limitation here is that requests should not fail because of database locks.
The provided code is a minimal example reproducing the issue. Please use it as a base.
To run the code
npm install sequelize sqlite3 mysql2
sqlite:
node index.js
mysql (using docker):
docker run -d --env MYSQL_DATABASE=database --env MYSQL_USER=username --env MYSQL_PASSWORD=password --env MYSQL_RANDOM_ROOT_PASSWORD=yes -p 3306:3306 mysql:5.7
DB_DIALECT=mysql node index.js
Note:
The example code is only for demonstration of the issue in the current implementation and the intervals and timeouts are there to simulate the users interaction with the server. Please do not focus on the patterns in the example being wrong and rather focus on the problem itself and how it can be approached in a better way while meeting the requirements mentioned in the limitations section.
This is part of a fairly big project and I might update the requirements based on the actual project requirements and the feedbacks that I receive here.
Please let me know if I should provide any other information. Thank you in advance.

I'm afraid this is expected behavior.
You declare seatsTaken as an asynchronously computed property.
You insert several users asynchronously, too.
You don't isolate each user creation within its own transaction.
Because of this, you see the changing state of one transaction, and it's changing rather chaotically because you do not specify any certain order. Eventually the state becomes consistent, but your way to achieve that state was just to wait for some time.
I suppose the easiest way to achieve consistency would be wrapping each insertion in a transaction.
If a transaction per insert is too slow, you can bulk insert all user records in one transaction, and then count seats taken in another, or even just do everything synchronously.
In any case, if you want consistency, you need logical serialization, a clear "before-after" relation. Currently your code lacks it, AFAICT.

Hooks not triggering when inserting raw queries via sequelize.query()

I have the following Employee model for a MySQL database:
var bcrypt = require('bcrypt');
module.exports = (sequelize, DataTypes) => {
const Employee = sequelize.define(
"Employee",
{
username: DataTypes.STRING,
password: DataTypes.STRING,
}, {}
);
return Employee;
};
Seeding the database is done by reading a .sql file containing 10,000+ employees via raw queries:
sequelize.query(mySeedingSqlFileHere);
The problem is that the passwords in the SQL file are plain text and I'd like to use bcrypt to hash them before inserting into the database. I've never done bulk inserts before so I was looking into Sequelize docs for adding a hook to the Employee model, like so:
hooks: {
beforeBulkCreate: (employees, options) => {
for (employee in employees) {
if (employee.password) {
employee.password = await bcrypt.hash(employee.password, 10);
}
}
}
}
This isn't working as I'm still getting the plain text values after reseeding - should I be looking into another way? I was looking into sequelize capitalize name before saving in database - instance hook

Your hooks won't be called until you use model's function for DB operation , so if you are running raw query , hooks will never be fired,
Reason : You can write anything inside your raw query , select/insert/update/delete anything , how does sequelize.js know that
it has to fire the hooks. This is only possible when you use methods
like
Model.create();
Model.bulkCreate();
Model.update();
Model.destroy;
And as per DOC raw query doesn't have hooks option to add.
And for MODEL queries you can check that it has option to
enable/disable hook.

How do you insert / find rows related by foreign keys from different tables using Sequelize?

I think I've done enough research on this subject and I've only got a headache.
Here is what I have done and understood: I have restructured my MySQL database so that I will keep my user's data in different tables, I am using foreign keys. Until now I only concluded that foreign keys are only used for consistency and control and they do not automatize or do anything else (for example, to insert data about the same user in two tables I need to use two separate insert statements and the foreign key will not help to make this different or automatic in some way).
Fine. Here is what I want to do: I want to use Sequelize to insert, update and retrieve data altogether from all the related tables at once and I have absolutely no idea on how to do that. For example, if a user registers, I want to be able to insert the data in the table "A" containing some user information and in the same task insert in the table B some other data (like the user's settings in the dedicated table or whatever). Same with retrievals, I want to be able to get an object (or array) with all the related data from different tables fitting in the criteria I want to find by.
Sequelize documentation covers the things in a way that every thing depends on the previous one, and Sequelize is pretty bloated with a lot of stuff I do not need. I do not want to use .sync(). I do not want to use migrations. I have the structure of my database created already and I want Sequelize to attach to it.
Is it possible insert and retrieve several rows related at the same time and getting / using a single Sequelize command / object? How?
Again, by "related data" I mean data "linked" by sharing the same foreign key.

Is it possible insert and retrieve several rows related at the same
time and getting / using a single Sequelize command / object? How?
Yes. What you need is eager loading.
Look at the following example
const User = sequelize.define('user', {
username: Sequelize.STRING,
});
const Address = sequelize.define('add', {
address: Sequelize.STRING,
});
const Designation = sequelize.define('designation', {
designation: Sequelize.STRING,
});
User.hasOne(Address);
User.hasMany(Designation);
sequelize.sync({ force: true })
.then(() => User.create({
username: 'test123',
add: {
address: 'this is dummy address'
},
designations: [
{ designation: 'designation1' },
{ designation: 'designation2' },
],
}, { include: [Address, Designation] }))
.then(user => {
User.findAll({
include: [Address, Designation],
}).then((result) => {
console.log(result);
});
});
In console.log, you will get all the data with all its associated models that you want to include in the query

How to get relationship/ assosiation in sequelizejs ORM

By below reference I understood how map many to many with a relationship table
http://sequelizejs.com/docs/latest/associations#many-to-many
User = sequelize.define('User', { user_name : Sequelize.STRING})
Project = sequelize.define('Project', { project_name : Sequelize.STRING })
UserProjects = sequelize.define('UserProjects', {
status: DataTypes.STRING
})
User.hasMany(Project, { through: UserProjects })
Project.hasMany(User, { through: UserProjects })
But how to query Project 's of a User
I Tried like
User.find({where:{id:1},include,[UserProjects]})
User.find({where:{id:1},include,[Projects]})
User.find({where:{id:1},include,[UserProjects]})
User.find({where:{id:1},include,[Projects]})
But i dont get results
Sequelize created table like below
users(id,name)
projects(id,project_name)
userprojects(id,UserId,ProjectId)
I tried https://github.com/sequelize/sequelize/wiki/API-Reference-Associations#hasmanytarget-options
User.find({where:{id:1}}).success(function(user){
user.getProjects().success(function (projects) {
var p1 = projects[0] // this works fine but 2 queries required. I expect in single find. without getProjects
p1.userprojects.started // Is this project started yet?
})
})
How to get all the projects of a USER ??

You should be able to get all of the properties of the user in two different ways: using includes and getting the projects from a user instance.
Using includes the code you submitted above is almost right. This method will only make one query to the database using the JOIN operation. If you want all of the users with their corresponding projects, try:
User.findAll({include: [Project]})
You can also get the projects directly from a user instance. This will take two queries to the database. The code for this looks like
User.find(1).then(function(user) {
user.getProjects().then(function(projects) {
// do stuff with projects
});
});
Does this work for you?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008