What is the idiomatic, performant way to resolve related objects? - relational-database

How do you write query resolvers in GraphQL that perform well against a relational database?
Using the example schema from this tutorial, let's say I have a simple database with users and stories. Users can author multiple stories but stories only have one user as their author (for simplicity).
When querying for a user, one might also want to get a list of all stories authored by that user. One possible definition a GraphQL query to handle that (stolen from the above linked tutorial):
const Query = new GraphQLObjectType({
name: 'Query',
fields: () => ({
user: {
type: User,
args: {
id: {
type: new GraphQLNonNull(GraphQLID)
}
},
resolve(parent, {id}, {db}) {
return db.get(`
SELECT * FROM User WHERE id = $id
`, {$id: id});
}
},
})
});
const User = new GraphQLObjectType({
name: 'User',
fields: () => ({
id: {
type: GraphQLID
},
name: {
type: GraphQLString
},
stories: {
type: new GraphQLList(Story),
resolve(parent, args, {db}) {
return db.all(`
SELECT * FROM Story WHERE author = $user
`, {$user: parent.id});
}
}
})
});
This will work as expected; if I query a specific user, I'll be able to get that user's stories as well if needed. However, this does not perform ideally. It requires two trips to the database, when a single query with a JOIN would have sufficed. The problem is amplified if I query multiple users -- every additional user will result in an additional database query. The problem gets worse exponentially the deeper I traverse my object relationships.
Has this problem been solved? Is there a way to write a query resolver that won't result in inefficient SQL queries being generated?

There are two approaches to this kind of problem.
One approach, that is used by Facebook, is to enqueue requests happening in one tick and combine them together before sending. This way instead of doing a request for each user, you can do one request to retrieve information about several users. Dan Schafer wrote a good comment explaining this approach. Facebook released Dataloader, which is an example implementation of this technique.
// Pass this to graphql-js context
const storyLoader = new DataLoader((authorIds) => {
return db.all(
`SELECT * FROM Story WHERE author IN (${authorIds.join(',')})`
).then((rows) => {
// Order rows so they match orde of authorIds
const result = {};
for (const row of rows) {
const existing = result[row.author] || [];
existing.push(row);
result[row.author] = existing;
}
const array = [];
for (const author of authorIds) {
array.push(result[author] || []);
}
return array;
});
});
// Then use dataloader in your type
const User = new GraphQLObjectType({
name: 'User',
fields: () => ({
id: {
type: GraphQLID
},
name: {
type: GraphQLString
},
stories: {
type: new GraphQLList(Story),
resolve(parent, args, {rootValue: {storyLoader}}) {
return storyLoader.load(parent.id);
}
}
})
});
While this doesn't resolve to efficient SQL, it still might be good enough for many use cases and will make stuff run faster. It's also a good approach for non-relational databases that don't allow JOINs.
Another approach is to use the information about requested fields in the resolve function to use JOIN when it is relevant. Resolve context has fieldASTs field which has parsed AST of the currently resolved query part. By looking through the children of that AST (selectionSet), we can predict whether we need a join. A very simplified and clunky example:
const User = new GraphQLObjectType({
name: 'User',
fields: () => ({
id: {
type: GraphQLID
},
name: {
type: GraphQLString
},
stories: {
type: new GraphQLList(Story),
resolve(parent, args, {rootValue: {storyLoader}}) {
// if stories were pre-fetched use that
if (parent.stories) {
return parent.stories;
} else {
// otherwise request them normally
return db.all(`
SELECT * FROM Story WHERE author = $user
`, {$user: parent.id});
}
}
}
})
});
const Query = new GraphQLObjectType({
name: 'Query',
fields: () => ({
user: {
type: User,
args: {
id: {
type: new GraphQLNonNull(GraphQLID)
}
},
resolve(parent, {id}, {rootValue: {db}, fieldASTs}) {
// find names of all child fields
const childFields = fieldASTs[0].selectionSet.selections.map(
(set) => set.name.value
);
if (childFields.includes('stories')) {
// use join to optimize
return db.all(`
SELECT * FROM User INNER JOIN Story ON User.id = Story.author WHERE User.id = $id
`, {$id: id}).then((rows) => {
if (rows.length > 0) {
return {
id: rows[0].author,
name: rows[0].name,
stories: rows
};
} else {
return db.get(`
SELECT * FROM User WHERE id = $id
`, {$id: id}
);
}
});
} else {
return db.get(`
SELECT * FROM User WHERE id = $id
`, {$id: id}
);
}
}
},
})
});
Note that this could have problem with, eg, fragments. However one can handle them too, it's just a matter of inspecting the selection set in more detail.
There is currently a PR in graphql-js repository, which will allow writing more complex logic for query optimization, by providing a 'resolve plan' in the context.

Related

How to update array of objects with same value in reference_id (FK Column) in Sequelize JS?

I have an array of objects which somewhat looks like this:
[
{
id: '5b29c08b-597c-460c-a3c7-ac8852b7a5dc',
option_text: 'njnj',
answer: false
},
{
id: '8ff5bda6-9335-495c-9c72-15ef258b899b',
option_text: 'jnjn',
answer: true
}
]
Here the answer column is inter-related like if any of the object's answer is set to true the other will come as false from frontend. So I've to update all the row associated with the referenced id.
What problem am facing is that the update query is not running but it is going inside the then block of the code instead of throwing error. Below is my code for the same:
// UPDATE Option
exports.updateOption = (req, res, next) => {
try {
console.log(req.body);
db.Option.update(req.body, {
where: { question_id: req.params.id }
}).then(() => {
console.log('A');
return res.status(200).send(errors.UPDATED_SUCESSFULLY);
}).catch(err => {
console.log('B');
return res.status(204).send(errors.INTERNAL_SERVER);
});
} catch(err) {
console.log('C');
return res.status(204).send(errors.INTERNAL_SERVER);
}
};
Sample Table Data for the same:
What I am thinking is that firstly to answer column false for all the rows associated with the same question_id and then update the particular row which has answer set to true.
But is this a good approach or anyone can suggest me some better solution ?
You should execute all updates in the same transaction (to avoid inconsistencies in DB):
sequelize.transaction(async transaction => {
const options = req.body;
for (const option of options) {
await db.Option.update(option, {
where: { question_id: req.params.id },
transaction
});
}
}).then(...

Sequelize with MYSQL: Raw query returns a "duplicate" result

I have this method that performs a raw query:
Friendship.getFriends= async (userId)=>{
const result = await sequelize.query(`select id,email from users where
users.id in(SELECT friendId FROM friendships where friendships.userId =
${userId})`);
return result;
};
The result seems to contain the same exact data, but twice:
[ [ TextRow { id: 6, email: 'example3#gmail.com' },
TextRow { id: 1, email: 'yoyo#gmail.com' } ],
[ TextRow { id: 6, email: 'example3#gmail.com' },
TextRow { id: 1, email: 'yoyo#gmail.com' } ] ]
Only two records should actually be found by this query(id's 1 and 6), yet it returns an array with the same records twice.
Can somebody explain me what's going on here?
Edit: the models:
module.exports = (sequelize, DataTypes) => {
const User = sequelize.define('User', {
email: { type: DataTypes.STRING, unique: true },
password: DataTypes.STRING,
isActive:{type:DataTypes.BOOLEAN,defaultValue:true}
});
module.exports = (sequelize, DataTypes) => {
const Friendship = sequelize.define('Friendship', {
userId: DataTypes.INTEGER,
friendId: DataTypes.INTEGER,
});
Try to set query type to SELECT in the second argument.
sequelize.query(queryString, {type: sequelize.QueryTypes.SELECT})
I am not sure but try below code.
const result = await sequelize.query("select id,email from users where
users.id in(SELECT friendId FROM friendships where friendships.userId =
${userId})", {type: sequelize.QueryTypes.SELECT});
One more thing : Use join instead of in
TL;DR;
Use a query type to avoid duplicates being returned when the database is either MySQL or MSSQL
sequelize.query(yourQuery, {type: QueryTypes.SELECT})
Explanation
I'm answering this question because the two answers available up to this day do not explain why this happens.
As per the documentation of Sequelize:
By default the function will return two arguments - a results array, and an object containing metadata (such as amount of affected rows, etc). Note that since this is a raw query, the metadata are dialect specific. Some dialects return the metadata "within" the results object (as properties on an array). However, two arguments will always be returned, but for MSSQL and MySQL it will be two references to the same object.
To avoid this behaviour you may tell Sequelize how to format the results using a query type as shown above.

Nested collection in models Sails.js [duplicate]

I've got myself a question regarding associations in Sails.js version 0.10-rc5. I've been building an app in which multiple models are associated to one another, and I've arrived at a point where I need to get to nest associations somehow.
There's three parts:
First there's something like a blog post, that's being written by a user. In the blog post I want to show the associated user's information like their username. Now, everything works fine here. Until the next step: I'm trying to show comments which are associated with the post.
The comments are a separate Model, called Comment. Each of which also has an author (user) associated with it. I can easily show a list of the Comments, although when I want to display the User's information associated with the comment, I can't figure out how to populate the Comment with the user's information.
In my controller i'm trying to do something like this:
Post
.findOne(req.param('id'))
.populate('user')
.populate('comments') // I want to populate this comment with .populate('user') or something
.exec(function(err, post) {
// Handle errors & render view etc.
});
In my Post's 'show' action i'm trying to retrieve the information like this (simplified):
<ul>
<%- _.each(post.comments, function(comment) { %>
<li>
<%= comment.user.name %>
<%= comment.description %>
</li>
<% }); %>
</ul>
The comment.user.name will be undefined though. If I try to just access the 'user' property, like comment.user, it'll show it's ID. Which tells me it's not automatically populating the user's information to the comment when I associate the comment with another model.
Anyone any ideals to solve this properly :)?
Thanks in advance!
P.S.
For clarification, this is how i've basically set up the associations in different models:
// User.js
posts: {
collection: 'post'
},
hours: {
collection: 'hour'
},
comments: {
collection: 'comment'
}
// Post.js
user: {
model: 'user'
},
comments: {
collection: 'comment',
via: 'post'
}
// Comment.js
user: {
model: 'user'
},
post: {
model: 'post'
}
Or you can use the built-in Blue Bird Promise feature to make it. (Working on Sails#v0.10.5)
See the codes below:
var _ = require('lodash');
...
Post
.findOne(req.param('id'))
.populate('user')
.populate('comments')
.then(function(post) {
var commentUsers = User.find({
id: _.pluck(post.comments, 'user')
//_.pluck: Retrieves the value of a 'user' property from all elements in the post.comments collection.
})
.then(function(commentUsers) {
return commentUsers;
});
return [post, commentUsers];
})
.spread(function(post, commentUsers) {
commentUsers = _.indexBy(commentUsers, 'id');
//_.indexBy: Creates an object composed of keys generated from the results of running each element of the collection through the given callback. The corresponding value of each key is the last element responsible for generating the key
post.comments = _.map(post.comments, function(comment) {
comment.user = commentUsers[comment.user];
return comment;
});
res.json(post);
})
.catch(function(err) {
return res.serverError(err);
});
Some explanation:
I'm using the Lo-Dash to deal with the arrays. For more details, please refer to the Official Doc
Notice the return values inside the first "then" function, those objects "[post, commentUsers]" inside the array are also "promise" objects. Which means that they didn't contain the value data when they first been executed, until they got the value. So that "spread" function will wait the acture value come and continue doing the rest stuffs.
At the moment, there's no built in way to populate nested associations. Your best bet is to use async to do a mapping:
async.auto({
// First get the post
post: function(cb) {
Post
.findOne(req.param('id'))
.populate('user')
.populate('comments')
.exec(cb);
},
// Then all of the comment users, using an "in" query by
// setting "id" criteria to an array of user IDs
commentUsers: ['post', function(cb, results) {
User.find({id: _.pluck(results.post.comments, 'user')}).exec(cb);
}],
// Map the comment users to their comments
map: ['commentUsers', function(cb, results) {
// Index comment users by ID
var commentUsers = _.indexBy(results.commentUsers, 'id');
// Get a plain object version of post & comments
var post = results.post.toObject();
// Map users onto comments
post.comments = post.comments.map(function(comment) {
comment.user = commentUsers[comment.user];
return comment;
});
return cb(null, post);
}]
},
// After all the async magic is finished, return the mapped result
// (or an error if any occurred during the async block)
function finish(err, results) {
if (err) {return res.serverError(err);}
return res.json(results.map);
}
);
It's not as pretty as nested population (which is in the works, but probably not for v0.10), but on the bright side it's actually fairly efficient.
I created an NPM module for this called nested-pop. You can find it at the link below.
https://www.npmjs.com/package/nested-pop
Use it in the following way.
var nestedPop = require('nested-pop');
User.find()
.populate('dogs')
.then(function(users) {
return nestedPop(users, {
dogs: [
'breed'
]
}).then(function(users) {
return users
}).catch(function(err) {
throw err;
});
}).catch(function(err) {
throw err;
);
Worth saying there's a pull request to add nested population: https://github.com/balderdashy/waterline/pull/1052
Pull request isn't merged at the moment but you can use it installing one directly with
npm i Atlantis-Software/waterline#deepPopulate
With it you can do something like .populate('user.comments ...)'.
sails v0.11 doesn't support _.pluck and _.indexBy use sails.util.pluck and sails.util.indexBy instead.
async.auto({
// First get the post
post: function(cb) {
Post
.findOne(req.param('id'))
.populate('user')
.populate('comments')
.exec(cb);
},
// Then all of the comment users, using an "in" query by
// setting "id" criteria to an array of user IDs
commentUsers: ['post', function(cb, results) {
User.find({id:sails.util.pluck(results.post.comments, 'user')}).exec(cb);
}],
// Map the comment users to their comments
map: ['commentUsers', function(cb, results) {
// Index comment users by ID
var commentUsers = sails.util.indexBy(results.commentUsers, 'id');
// Get a plain object version of post & comments
var post = results.post.toObject();
// Map users onto comments
post.comments = post.comments.map(function(comment) {
comment.user = commentUsers[comment.user];
return comment;
});
return cb(null, post);
}]
},
// After all the async magic is finished, return the mapped result
// (or an error if any occurred during the async block)
function finish(err, results) {
if (err) {return res.serverError(err);}
return res.json(results.map);
}
);
You could use async library which is very clean and simple to understand. For each comment related to a post you can populate many fields as you want with dedicated tasks, execute them in parallel and retrieve the results when all tasks are done. Finally, you only have to return the final result.
Post
.findOne(req.param('id'))
.populate('user')
.populate('comments') // I want to populate this comment with .populate('user') or something
.exec(function (err, post) {
// populate each post in parallel
async.each(post.comments, function (comment, callback) {
// you can populate many elements or only one...
var populateTasks = {
user: function (cb) {
User.findOne({ id: comment.user })
.exec(function (err, result) {
cb(err, result);
});
}
}
async.parallel(populateTasks, function (err, resultSet) {
if (err) { return next(err); }
post.comments = resultSet.user;
// finish
callback();
});
}, function (err) {// final callback
if (err) { return next(err); }
return res.json(post);
});
});
As of sailsjs 1.0 the "deep populate" pull request is still open, but the following async function solution looks elegant enough IMO:
const post = await Post
.findOne({ id: req.param('id') })
.populate('user')
.populate('comments');
if (post && post.comments.length > 0) {
const ids = post.comments.map(comment => comment.id);
post.comments = await Comment
.find({ id: commentId })
.populate('user');
}
Granted this is an old question, but a much simpler solution would be to loop over the comments,replacing each comment's 'user' property (which is an id) with the user's full detail using async await.
async function getPost(postId){
let post = await Post.findOne(postId).populate('user').populate('comments');
for(let comment of post.comments){
comment.user = await User.findOne({id:comment.user});
}
return post;
}
Hope this helps!
In case anyone is looking to do the same but for multiple posts, here's one
way of doing it:
find all user IDs in posts
query all users in 1 go from DB
update posts with those users
Given that same user can write multiple comments, we're making sure we're reusing those objects. Also we're only making 1 additional query (whereas if we'd do it for each post separately, that would be multiple queries).
await Post.find()
.populate('comments')
.then(async (posts) => {
// Collect all comment user IDs
const userIDs = posts.reduce((acc, curr) => {
for (const comment of post.comments) {
acc.add(comment.user);
}
return acc;
}, new Set());
// Get users
const users = await User.find({ id: Array.from(userIDs) });
const usersMap = users.reduce((acc, curr) => {
acc[curr.id] = curr;
return acc;
}, {});
// Assign users to comments
for (const post of posts) {
for (const comment of post.comments) {
if (comment.user) {
const userID = comment.user;
comment.user = usersMap[userID];
}
}
}
return posts;
});

Find row by UUID stored as binary in SequelizeJS

I have a Sequelize object called Org which represents a row in the organisations table stored in MySQL. This table has a UUID primary key(id) stored as a 16 byte varbinary. If I have the UUID of an object (bfaf1440-3086-11e3-b965-22000af9141e) as a string in my JavaScript code, what is the right way to pass it as a parameter in the where clause in Sequelize?
Following are the options I've tried
Model: (for an existing MySQL table)
var uuid = require('node-uuid');
module.exports = function(sequelize, Sequelize) {
return sequelize.define('Org', {
id: {
type: Sequelize.BLOB, //changing this to Sequelize.UUID does not make any difference
primaryKey: true,
get: function() {
if (this.getDataValue('id')) {
return uuid.unparse(this.getDataValue('id'));
}
}
},
name: Sequelize.STRING,
}, {
tableName: 'organisation',
timestamps: false,
}
});
};
Option 1: Pass UUID as byte buffer using node-uuid
Org.find({
where: {
id: uuid.parse(orgId)
}
}).then(function(org) {
success(org);
}).catch(function(err) {
next(err);
});
Executing (default): SELECT `id`, `name` FROM `Organisation` AS `Org`
WHERE `Org`.`id` IN (191,175,20,64,48,134,17,227,185,101,34,0,10,249,20,30);
Sequelize treats the byte buffer as multiple values and so I get multiple matches and the top most record (not the one that has the right UUID) gets returned.
Option 2: Write a raw SQL query and pass the UUID as a HEX value
sequelize.query('SELECT * from organisation where id = x:id', Org, {plain: true}, {
id: orgId.replace(/-/g, '')
}).then(function(org) {
success(org);
}).catch(function(err) {
next(err);
});
Executing (default): SELECT * from organisation
where id = x'bfaf1440308611e3b96522000af9141e'
I get the correct record, but this approach is not really useful as I have more complex relationships in the DB and writing too many queries by hand beats the purpose of the ORM.
I'm using Sequelize 2.0.0-rc3.
Solved it by supplying a fixed size empty Buffer object to uuid.parse().
Got it working initially using ByteBuffer, but then realised that the same can be achieved using uuid.parse()
Org.find({
where: {
id: uuid.parse(orgId, new Buffer(16))
}
}).then(function(org) {
console.log('Something happened');
console.log(org);
}).catch(function(err) {
console.log(err);
});
Executing (default): SELECT `id`, `name` FROM `Organisation` AS `Org`
WHERE `Org`.`id`=X'bfaf1440308611e3b96522000af9141e';
If the accepted answer didn't work for you, here's what worked for me.
Note: My objective is to find an instance of an event based on a column which is not the primary key.
// guard clause
if (!uuid.validate(uuid_code))
return
const _event = await event.findOne({ where: { uuid_secret: uuid_code } })
// yet another guard clause
if (_event === null)
return
// your code here

How to return the number of children for tree nodes in Sequelize?

I have a hierarchical tree data structure defined like this:
module.exports = function(sequelize, DataTypes) {
var Category = sequelize.define('Category', {
id: { type: DataTypes.STRING, primaryKey: true },
parent: DataTypes.STRING,
text: DataTypes.STRING
}
);
Category.belongsTo(Category, { foreignKey: 'parent' });
return Category;
};
I have a service returning the list of children for a given node like this:
exports.categoryChildren = function(req, res) {
var id = req.params.id;
db.Category.findAll({ where: { parent: id }}).success(function(categories){
return res.jsonp(categories);
}).error(function(err){
return res.render('error', { error: err, status: 500 });
});
};
Is there a way to make Sequelize to return the number of grandchildren for every child of the given node (i.e. the grandchildren of the given node)?
The SQL query I'd use for that looks like this:
SELECT *, (select count(*) from Categories chld where chld.parent = cat.id)
FROM `Categories` cat
WHERE cat.`parent`='my_node_id';
However, I can't find a way to force Sequelize to generate a query like that.
I ended up using two nested queries as shown below.
Pros: I'm staying within the ORM paradigm.
Cons: the obvious performance hit.
exports.categoryChildren = function(req, res) {
var id = req.params.id;
db.Category.findAll({ where: { parent: id }}).success(function(categories) {
var categoryIds = [];
_.each(categories, function(element, index, list) {
categoryIds[element.id] = element;
});
db.Category.findAll({attributes: [['Categories.parent', 'parent'], [Sequelize.fn('count', 'Categories.id'), 'cnt']], where: { parent: { in: _.keys(categoryIds) }}, group: ['Categories.parent']}).success(function(counts) {
_.each(counts, function(element, index, list) {
categoryIds[element.dataValues.parent].dataValues.cnt = element.dataValues.cnt;
});
return res.jsonp(categories);
}).error(function(err){
return res.render('error', {
error: err,
status: 500
});
});
}).error(function(err){
return res.render('error', {
error: err,
status: 500
});
});
};
I ran into this same issue. So far the only thing I could figure out to do was using raw queries.
There are just some things ORMs can't do, I think this might be one of those things. You can just use query()
sequelize.query('select * from all_counts').success(function(counts) {...})
Just did not use ORM to calculate that kind of stuff. It is performance disaster.
Use database queries (like SQL you specified) and define view for that SQL, and use model to work with that view.