Knex js stream large data - mysql

I have a MySQL table with millions of data.
For each row I have to apply a custom logic and update the modified data on another table.
Using knex.js I run the query to read the data using the stream() function
Once I get the Stream object I apply my logic to the data event.
Everything works correctly but at a certain point it stops without giving any errors.
I tried to pause the stream before each update operation in the new table and restart it after completing the update but the problem is not solved.
Trying to put a limit on the query, for example to 1000 results, the system works fine.
Sample code:
const readableStream = knex.select('*')
.from('big_table')
.stream();
readableStream.on('data', async(data) => {
readableStream.pause() // pause stream
const toUpdate = applyLogic(data) // sync func
const whereCond = getWhereCondition(data) // sync func
try {
await knex('to_update').where(whereCond).update(toUpdate)
console.log('UPDATED')
readableStream.resume() // resume stream
} catch (e) {
console.log('ERROR', e)
}
readableStream.resume() // resume stream
}).on('finish', () => {
console.log('FINISH')
}).on('error', (err) => {
console.log('ERROR', err)
})
Thanks!

I solved.
The problem is not due to knex.js or the streams but to my development environment.
I use k3d to simulate the production environment on the gcp. So to test my script locally I did a port-forward of the MySQL service.
It is not clear to me why the system crashes but by creating a container with my script so that it connects to the MySQL service, the algorithm works as I expect.
Thanks

Related

How do I serialize transactions with multiple database queries using Sequelize?

My server keeps track of game instances. If there are no ongoing games when a user hits a certain endpoint, the server creates a new one. If the endpoint is hit twice at the same time, I want to make sure only one new game is created. I'm attempting to do this via Sequelize's transactions:
const t = await sequelize.transaction({
isolationLevel: Sequelize
.Transaction
.ISOLATION_LEVELS
.SERIALIZABLE,
});
let game = await Game.findOne({
status: {[Op.ne]: "COMPLETED"},
transaction: t,
});
if(game) {
// ...
} else {
game = await Game.create({}, {
transaction: t,
});
// ...
}
await t.commit();
Unfortunately, when this endpoint is hit twice at the same time, I get the following error: SequelizeDatabaseError: Deadlock found when trying to get lock; try restarting transaction.
I looked at possible solutions here and here, and I understand why my code throws the error, but I don't understand how to accomplish what I'm trying to do (or whether transactions are the correct tool to accomplish it). Any direction would be appreciated!

Cloud Functions for Firebase Could not handle the request after a successful request

TLDR: After writing a JSON (successfully) to my Firestore, the next request will give me Internal Server Error (500). I have a suspicion that the problem is that inserting is not yet complete.
So basically, I have this code:
const jsonToDb = express();
exports.jsondb = functions.region('europe-west1').https.onRequest(jsonToDb);
jsonToDb.post('', (req, res) => {
let doc;
try {
doc = JSON.parse(req.body);
} catch(error) {
res.status(400).send(error.toString()).end();
return;
}
myDbFuncs.saveMyDoc(doc);
res.status(201).send("OK").end();
}
The database functions are in another JS file.
module.exports.saveMyDoc = function (myDoc) {
let newDoc = db.collection('insertedDocs').doc(new Date().toISOString());
newDoc.set(myDoc).then().catch();
return;
};
So I have several theories, maybe one of them is not wrong, but please help me with this. (Also if I made some mistakes in this little snippet, just tell me.)
Reproduction:
I send the first request => everything is OK, Json in the database.
I send a second request after the first request give me OK status => it does not do anything for a few secs, then 500: Internal Server Error.
Logs: Function execution took 4345 ms, finished with status: 'connection error'.
I just don't understand. Let's imagine I'm using this as an API, several requests simultaneously. Can't it handle? (I suppose it can handle, just I do something stupid.) Deliberately, I'm sending the second request after the first has finished and this occurs.
Should I make the saveMyDoc async?
saveMyDoc isn't returning a promise that resolves when all the async work is complete. If you lose track of a promise, Cloud Functions will shut down the work and clean up before the work is complete, making it look like it simply doesn't work. You should only send a response from an HTTP type function after all the work is fully complete.
Minimally, it should look more like this:
module.exports.saveMyDoc = function (myDoc) {
let newDoc = db.collection('insertedDocs').doc(new Date().toISOString());
return newDoc.set(myDoc);
};
Then you would use the promise in your main function:
myDbFuncs.saveMyDoc(doc).then(() => {
res.status(201).send("OK").end();
}
See how the response is only sent after the data is saved.
Read more about async programming in Cloud Functions in the documentation. Also watch this video series that talks about working with promises in Cloud Functions.

How to parse or Stringify in asycnhronous way in javascript

I see that JSON.stringify and JSON.parse are both sycnhronous.
I would like to know if there a simple npm library that does this in an asynchonous way .
Thank you
You can make anything "asynchronous" by using Promises:
function asyncStringify(str) {
return new Promise((resolve, reject) => {
resolve(JSON.stringify(str));
});
}
Then you can use it like any other promise:
asyncStringfy(str).then(ajaxSubmit);
Note that because the code is not asynchronous, the promise will be resolved right away (there's no blocking operation on stringifying a JSON, it doesn't require any system call).
You can also use the async/await API if your platform supports it:
async function asyncStringify(str) {
return JSON.stringify(str);
}
Then you can use it the same way:
asyncStringfy(str).then(ajaxSubmit);
// or use the "await" API
const strJson = await asyncStringify(str);
ajaxSubmit(strJson);
Edited: One way of adding true asynchrnous parsing/stringifying (maybe because we're parsing something too complex) is to pass the job to another process (or service) and wait on the response.
You can do this in many ways (like creating a new service that shares a REST API), I will demonstrate here a way of doing this with message passing between processes:
First create a file that will take care of doing the parsing/stringifying. Call it async-json.js for the sake of the example:
// async-json.js
function stringify(value) {
return JSON.stringify(value);
}
function parse(value) {
return JSON.parse(value);
}
process.on('message', function(message) {
let result;
if (message.method === 'stringify') {
result = stringify(message.value)
} else if (message.method === 'parse') {
result = parse(message.value);
}
process.send({ callerId: message.callerId, returnValue: result });
});
All this process does is wait a message asking to stringify or parse a JSON and then respond with the right value.
Now, on your code, you can fork this script and send messages back and forward. Whenever a request is sent, you create a new promise, whenever a response comes back to that request, you can resolve the promise:
const fork = require('child_process').fork;
const asyncJson = fork(__dirname + '/async-json.js');
const callers = {};
asyncJson.on('message', function(response) {
callers[response.callerId].resolve(response.returnValue);
});
function callAsyncJson(method, value) {
const callerId = parseInt(Math.random() * 1000000);
const callPromise = new Promise((resolve, reject) => {
callers[callerId] = { resolve: resolve, reject: reject };
asyncJson.send({ callerId: callerId, method: method, value: value });
});
return callPromise;
}
function JsonStringify(value) {
return callAsyncJson('stringify', value);
}
function JsonParse(value) {
return callAsyncJson('parse', value);
}
JsonStringify({ a: 1 }).then(console.log.bind(console));
JsonParse('{ "a": "1" }').then(console.log.bind(console));
Note: this is just one example, but knowing this you can figure out other improvements or other ways to do it. Hope this is helpful.
Check this out, another npm package-
async-json is a library that provides an asynchronous version of the standard JSON.stringify.
Install-
npm install async-json
Example-
var asyncJSON = require('async-json');
asyncJSON.stringify({ some: "data" }, function (err, jsonValue) {
if (err) {
throw err;
}
jsonValue === '{"some":"data"}';
});
Note-Didn't test it, you need to manually check it's dependency and
required packages.
By asynchronous I assume you actually mean non-blocking asynchronous - i.e., if you have a large (megabytes large) JSON string, and you stringify, you don't want your web server to hard freeze and block newly incoming web requests for 500+ milliseconds while it processes the object.
Option 1
The generic answer is to iterate through your object piece by piece, and to then call setImmedate whenever a threshold is reached. This then allows other functions in the event queue to run for a bit.
For JSON (de)serialization, the yieldable-json library does this very well. It does however drastically sacrifice JSON processing time (which is somewhat intentional).
Usage example from the yieldable-json readme:
const yj = require('yieldable-json')
yj.stringifyAsync({key:"value"}, (err, data) => {
if (!err)
console.log(data)
})
Option 2
If processing speed is extremely important (such as with real-time data), you may want to consider spawning multiple Node threads instead. I've used used the PM2 Process Manager with great success, although initial setup was quite daunting. Once it works however, the final result is magic, and does not require modifying your source code, just your package.json file. It acts as a proxy, load balancer, and monitoring tool for Node applications. It's somewhat analogous to Docker swarm, but bare metal, and does not require a special client on the server.

Showing that node shell is async and mongo shell is not

So I have a mongodb setup and I have some test data in it. I want to be able to show that the mongo shell runs our script sync and node runs our script async. I have setup the two following js files which I got while doing a Mongo University course. This is really more of a test so that I understand what's going on. I am going to cd into the directory where I have mongo installed using npm and where the scripts are also at. Then I will call these scripts, I will call the mongoshell.js using
>mongo mongoshell.js
and nodeshell.js using:
>node nodeshell.js
Here are the two scripts:
mongoshell.js
//Find one document in our collection
var doc = db.allClasses.findOne();
print('before');
//Print the result
printjson(doc);
print('after');
And the result I get from running that in the shell is:
So my thinking here is that the print command is something that would return quicker than the query to mongo. BY placing a before and after print and everything coming out in the right order, it must be synchronous.
Next I have the nodeshell.js
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect('mongodb://127.0.0.1:27017/test', function(err, db){
if (err) throw err;
//Find one document in our collection
db.collection('allClasses').findOne({}, function(err, doc){
//Print the result
console.dir(doc);
//close the DB
db.close();
});
});
setTimeout(function(){
console.dir("10 Milliseconds!");
}, 10);
setTimeout(function(){
console.dir("100 Milliseconds!");
}, 100);
And the result from the console is:
My thinking here is that I have determined that mongo usually takes between 10 and 100 milliseconds to return my data. If I put two print commands with timeouts one at 10 ms and one at 100 ms one should fire before the json is returned BECAUSE THE NODE SHELL IS ASYNC and the other should fire after..
MY QUESTION:
Does this hillbilly test actually show that each of the shells are what they are. Synchronous and Asynchronous? if Yes cool, if not, why?
I don't see how that trick with timeouts demonstrates the async nature. How about this?
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect('mongodb://127.0.0.1:27017/test', function(err, db){
if (err) throw err;
//Find one document in our collection
db.collection('allClasses').findOne({}, function(err, doc){
console.log("Got the data!")
//Print the result
console.dir(doc);
//close the DB
db.close();
});
console.log("Data is being fetched and I do something else")
});
console.log("Mongo connection is being set up and I do something else")
Output
sergio#soviet-russia ‹ master ●● › : ~
[0] % node test.js
Mongo connection is being set up and I do something else
Data is being fetched and I do something else
Got the data!
null

socketstream async call to mysql within rpc actions

First, I need to tell you that I am very new to the wonders of nodejs, socketstream, angularjs and JavaScript in general. I come from a Java background and this might explain my ignorance of the correct way of doing things async.
To toy around with things I installed the ss-angular-demo from americanyak. My problem is now that the Rpc seems to be a synchronous interface and my call the the mysql database has an asynchronous interface. How can I return the database results upon a call of the Rpc?
Here is what I did so far with socketstream 0.3:
In app.js I successfully tell ss to allow my mysql database connection to be accessed by putting ss.api.add('coolStore',mysqlConn); in there at the right place (as explained in the socketstream docs). I use the mysql npm, so I can call mysql within the Rpc
server/rpc/coolRpc.js
exports.actions = function (req, res, ss) {
// use session middleware
req.use('session');
return {
get: function(threshold){
var sql = "SELECT cool.id, cool.score, cool.data FROM cool WHERE cool.score > " + threshold;
if (!ss.arbStore) {
console.log("connecting to mysql arb data store");
ss.coolStore = ss.coolStore.connect();
}
ss.coolStore.query(sql, function(err, rows, fields) {
if(err) {
console.log("error fetching stuff", err);
} else {
console.log("first row = "+rows[0].id);
}
});
var db_rows = ???
return res(null, db_rows || []);
}
}
The console logs the id of my database entry, as expected. However, I am clueless how I can make the Rpc's return statement return the rows of my query. What is the right way of addressing this sort of problem?
Thanks for your help. Please be friendly with me, because this is also my first question on stackoverflow.
It's not synchronous. When your results are ready, you can send them back:
exports.actions = function (req, res, ss) {
// use session middleware
req.use('session');
return {
get: function(threshold){
...
ss.coolStore.query(sql, function(err, rows, fields) {
res(err, rows || []);
});
}
}
};
You need to make sure that you always call res(...) from an RPC function, even when an error occurs, otherwise you might get dangling requests (where the client code keeps waiting for a response that's never generated). In the code above, the error is forwarded to the client so it can be handled there.