Issues when reading a string from TCP socket in Node.js - json

I've implemented a client/server that communicate using a TCP socket. The data that I'm writing to the socket is stringified JSON. Initially everything works as expected, however, as I increase the rate of writes I eventually encounter JSON parse errors where the beginning on the client receives the beginning of the new write on the end of the old one.
Here is the server code:
var data = {};
data.type = 'req';
data.id = 1;
data.size = 2;
var string = JSON.stringify(data);
client.write(string, callback());
Here is how I am receiving this code on the client server:
client.on('data', function(req) {
var data = req.toString();
try {
json = JSON.parse(data);
} catch (err) {
console.log("JSON parse error:" + err);
}
});
The error that I'm receiving as the rate increases is:
SyntaxError: Unexpected token {
Which appears to be the beginning of the next request being tagged onto the end of the current one.
I've tried using ; as a delimiter on the end of each JSON request and then using:
var data = req.toString().substring(0,req.toString().indexOf(';'));
However this approach, instead of resulting in JSON parse errors seems to result in completely missing some requests on the client side as I increase the rate of writes over 300 per second.
Are there any best practices or more efficient ways to delimit incoming requests via TCP sockets?
Thanks!

Thanks everyone for the explanations, they helped me to better understand the way in which data is sent and received via TCP sockets. Below is a brief overview of the code that I used in the end:
var chunk = "";
client.on('data', function(data) {
chunk += data.toString(); // Add string on the end of the variable 'chunk'
d_index = chunk.indexOf(';'); // Find the delimiter
// While loop to keep going until no delimiter can be found
while (d_index > -1) {
try {
string = chunk.substring(0,d_index); // Create string up until the delimiter
json = JSON.parse(string); // Parse the current string
process(json); // Function that does something with the current chunk of valid json.
}
chunk = chunk.substring(d_index+1); // Cuts off the processed chunk
d_index = chunk.indexOf(';'); // Find the new delimiter
}
});
Comments welcome...

You're on the right track with using a delimiter. However, you can't just extract the stuff before the delimiter, process it, and then discard what came after it. You have to buffer up whatever you got after the delimiter and then concatenate what comes next to it. This means that you could end up with any number (including 0) of JSON "chunks" after a given data event.
Basically you keep a buffer, which you initialize to "". On each data event you concatenate whatever you receive to the end of the buffer and then split it the buffer on the delimiter. The result will be one or more entries, but the last one might not be complete so you need to test the buffer to make sure it ends with your delimiter. If not, you pop the last result and set your buffer to it. You then process whatever results remain (which might not be any).

Be aware that TCP does not make any guarantees about where it divides the chunks of data you recieve. All it guarantees is that all the bytes you send will be received in order, unless the connection fails entirely.
I believe Node data events come in whenever the socket says it has data for you. Technically you could get separate data events for each byte in your JSON data and it would still be within the limits of what the OS is allowed to do. Nobody does that, but your code needs to be written as if it could suddenly start happening at any time to be robust. It's up to you to combine data events and then re-split the data stream along boundaries that make sense to you.
To do that, you need to buffer any data that isn't "complete", including data appended to the end of a chunk of "complete" data. If you're using a delimiter, never throw away any data after the delimiter -- always keep it around as a prefix until you see either more data and eventually either another delimiter or the end event.
Another common choice is to prefix all data with a length field. Say you use a fixed 64-bit binary value. Then you always wait for 8 bytes, plus however many more the value in those bytes indicate, to arrive. Say you had a chunk of ten bytes of data incoming. You might get 2 bytes in one event, then 5, then 4 -- at which point you can parse the length and know you need 7 more, since the last 3 bytes of the third chunk were payload. If the next event actually contains 25 bytes, you'd take the first 7 along with the 3 from before and parse that, and look for another length field in bytes 8-16.
That's a contrived example, but be aware that at low traffic rates, the network layer will generally send your data out in whatever chunks you give it, so this sort of thing only really starts to show up as you increase the load. Once the OS starts building packets from multiple writes at once, it will start splitting on a granularity that is convenient for the network and not for you, and you have to deal with that.

Following this response :
var chunk = "";
client.on('data', function(data) {
chunk += data.toString(); // Add string on the end of the variable 'chunk'
d_index = chunk.indexOf(';'); // Find the delimiter
// While loop to keep going until no delimiter can be found
while (d_index > -1) {
try {
string = chunk.substring(0,d_index); // Create string up until the delimiter
json = JSON.parse(string); // Parse the current string
process(json); // Function that does something with the current chunk of valid json.
}
chunk = chunk.substring(d_index+1); // Cuts off the processed chunk
d_index = chunk.indexOf(';'); // Find the new delimiter
}
});
I get a problem with the delimiter because ; was part of my sent data.
It is possible to use this update in order to implement a custom delimiter :
var chunk = "";
const DELIMITER = (';;;');
client.on('data', function(data) {
chunk += data.toString(); // Add string on the end of the variable 'chunk'
d_index = chunk.indexOf(DELIMITER); // Find the delimiter
// While loop to keep going until no delimiter can be found
while (d_index > -1) {
try {
string = chunk.substring(0,d_index); // Create string up until the delimiter
json = JSON.parse(string); // Parse the current string
process(json); // Function that does something with the current chunk of valid json.
}
chunk = chunk.substring(d_index+DELIMITER.length); // Cuts off the processed chunk
d_index = chunk.indexOf(DELIMITER); // Find the new delimiter
}
});

I know this question is old but I have an answer for the people still looking at this.
As said in the answers above, the data event will be fired with a nodejs Buffer containing the data received.
res.on('data', function(chunk) {
//chunk contains the data
})
This next part doesnt seem to be commonly known. The end event is fired when all data is consumed. The close event is fired when the client disconnects
res.on('end', function() {
//the response body has been consumed
})
The full code to get the entire body is below
var body = Buffer.from('');
res.on('data', function(chunk) {
if (chunk && chunk.byteLength > 0) {
body = Buffer.concat([body, chunk]);
}
})
res.on('end', function() {
var data = JSON.parse(body.toString());
//data contains the response json
})
End event is fired when the data is all consumed: source
close event is fired when the request is closed: source

Try with end event and no data
var data = '';
client.on('data', function (chunk) {
data += chunk.toString();
});
client.on('end', function () {
data = JSON.parse(data); // use try catch, because if a man send you other for fun, you're server can crash.
});
Hope help you.

Related

How to do a while loop for a MySQL query in Node.js?

I am currently trying to transform a php code into Node.js using Typescript & Express.
In a script, I am generating a random 6-digit code before querying the database to verify that the code doesn't exist, otherwise I generate a new one.
Here's the original PHP code :
$code = generate_random_int(); // Generate a random code
$existing_codes = exec_sql($db,"SELECT code FROM codes WHERE code = $code;"); // Check if the generated code already exists in the database
while(!empty($existing_codes)){ // While there is (at least) 1 occurence in the DB for the generated code
$code = generate_random_int(); // Generate a new random code
$existing_codes = exec_sql($db,"SELECT code FROM codes WHERE code = $code;"); // Update the check for the newly generated code.
// If an occurence is found, the while loop will be reiterated. Otherwise, the while loop will end with the last generated code.
}
However, Node.js MySQL library only allows callbacks, because the function is asynchronous, which prevent the behavior I've illustrated above.
I have looked here and there on the internet and haven't found any way to reproduce this behavior in Node.js, so that's why I'm writing here :)
I thought about using for loops with db.query calls in them with no success, same with while syntaxes and an updated boolean.
Here's my latest (unsuccessful) attempt :
let code = generateRandomInt()
// query is a simplified function for db.query() from MySQL
query(`SELECT code FROM codes WHERE code = ${code};`, result => {
if (result === []) {
res.send(String(code))
return
} else {
code = generateRandomInt()
// While loop creating a new SQL statement for the new code and regenerating it until the response is []
}
})
res.send(String(code))
Thanks a lot by advance for your help :)
PS : I'm a newbie to Express and I am not that used to post on StackOverflow, so please don't hesitate to tell me if I did something wrong or if you need any complementary information
Here is another approach. Once you have an array of existing codes, use a loop to find an unused one:
...
const go = async () => {
const existingCodes = await getExistingCodes(); // your function that returns a list
// of existing codes; return a Promise
// find an unused code
let code = generateRandomInt();
while (existingCodes.indexOf(code) !== -1) {
code = generateRandomInt();
}
// use your new code
console.log(`new code is: ${code}`);
};
go();

Only fetch objects with specified keys in Firebase from large index

I have an array = [ 'something', 'other' ]
And I want to retrieve only the values of those 2 ids from Firebase, which contains more than 2 items ( potentially millions ), but if I do this:
var questionRef = new Firebase(fireBaseURL+"/morethanamillionitems/");
loadUID.once('value', function (dataSnapshot) {
dataSnapshot.forEach(function(childSnapshot) { // Firebase method
console.log(dataSnapshot.numChildren()); // potentially outputs 1.000.000 +
var uid = childSnapshot.name();
var childData = childSnapshot.val();
console.log(uid.indexOf('something'));
result.push(uid)
});
}
I first basically load the whole database, which is not that efficient
Now I could do:
array.forEach(key, function() {
var questionRef = new Firebase(fireBaseURL+"/morethanamillionitems/"+key);
refID = questionRef.val();
result.push(refID);
})
Or maybe:
questionRef = new Firebase(fireBaseURL+"/morethanamillionitems/");
array.forEach(key, function() {
if ( questionRef.child(key) !== null ){
refID = questionRef.val();
result.push(refID);
}
})
The last one seems the nicest, the previous one seems a bit expensive on the old RAM.
However, I apparently have to call questionRef.once('value', function(){}) each time, hence already loading the whole document-root...
Or am I misunderstanding how Firebase handles these requests? is the .numChildren() just an answer directly from the server?
Is the .forEach actually remotely executed?
I'm wondering if there is any other way to reduce traffic per request. Which brings me to another question: it seems that firebase searches locally first, but eventually will search remotely, but it's not clear when this exactly happens. Does it periodically check if something has changed? Or will that only happend when I use .on() and not .once().
Or am I using the wrong backend for this purpose? Any other suggestions? I tried hood.ie which is still very beta, looked at Parse but firebase seemed to have the simplicity I need.
(sorry for the sloppy syntax, but you can see what I intended)
[update]
I now have this:
load: function(uids){
var FB = new Firebase(URL);
uids.map(function(uid) {
var currentRef = FB.child( uid+"/_current" );
currentRef.once('value', function (each) {
eachVal = each.val()
if (eachVal !== null){
var localSave = {};
localSave[uid] = eachVal;
this.saveLocal(localSave)
} else {
console.error("Not found: [%s]", uid)
}}, function (err) { });
});
}
But I'm still wondering when the request actually happens, on .child()? or in .once() and if the latter, what is the use of .child() exactly? It seems it's only used for referencing.
Then the second thing, if I want to retrieve an array of a hundred items, this would still mean a hundred seperate requests? or does Firebase have a way of collecting requests and then send them in a batch?
In that last case .once would be a more 'conservative' option for initial retrieval, then later you could attach a .on listener if you need real-time updates.

Efficient way to append JSON-object without parse it

I wrote a small script listening on an udp-port and stores all incoming messages (one JSON object) inside a single file. The empty file contains an array in JSON format.
I'm looking for an efficient way to store all (concurrently) incoming messages from multiple clients inside this single file.
The files size can be multiple hundred of megabytes large. Parsing the file and appending the new object wouldn't be efficient as needed.
Do you have an approach?
EDIT
My solution, based on #t-j-crowder approach:
var dgram = require("dgram");
var fs = require("fs");
var udp_server = dgram.createSocket("udp4");
var udp_server_port = 5000
udp_server.on("message", function (msg, rinfo) {
var json_part = "{\"message\": " + msg + "}";
fs.open('./data/stats.json','r+',function(err,fd){
if(err) throw err
fs.fstat(fd,function(err,stats){
if(err) throw err
if(stats.size>2){
json_part = new Buffer(','+json_part+']','utf-8');
var pos = parseInt(stats.size)-1;
}else{
json_part = new Buffer('['+json_part+']','utf-8');
var pos = 0;
}
fs.write(fd,json_part, 0, json_part.length, pos, function(err,written,buffer){
if(err) throw err
fs.close(fd,function(){
});
});
});
});
});
udp_server.bind(udp_server_port);
Regards, Marcus
Fundamentally, you'll need to:
Open the file using a seekable, writable stream.
Seek to the end of it.
Back up one character (over the closing ] of the array).
Write out a comma (if this isn't the first entry) and the JSON of your new entry.
Write a closing ].
Close the file.
Looking at the NodeJS docs, it looks like Steps 2-4 (and arguably 5) are done all together, using the position argument of fs.write. (Be sure you open the file using r+, not one of the "append" modes.)

JSON.parse(dt) doesn't work at all, give all the errors it can imagine

I have this code at server side (nodejs):
socket.on('data', function(dt){
var rdata = dt;
var msg = JSON.parse(rdata);
broadcast(msg);
});
Also I tried this way: var msg = JSON.parse(dt);
dt gets either:
{"chat":"hey","nickname":"nick_name"} OR
'{"chat":"hey","nickname":"nick_name"}'
Also I have this at the client side (AS3), tried both:
var msg = JSON.stringify({nickname: nname.text, chat: input_txt.text}); OR
var msg = "'" + JSON.stringify({nickname: nname.text, chat: input_txt.text}) + "'";
That is what console gives:
undefined:1
{"chat":"hey","nickname":"nick_name"}
^
SyntaxError: Unexpected token
DEBUG: Program node app exited with code 8
Also in some other situations, it gives all kinds of messages.
Just have no idea what is going on.
BTW, also tried JSONStream, still doesn't work.
What kind of socket exactly are you using? If you are using a websocket you might have already received an object as a response (I think most frameworks do so). If you are using plain net.socket you might be receiving a buffer or the data in chunks and not all at once. This seems like an appropriate fix for that situation:
var buffer;
socket.setEncoding('utf8');
socket.on('data', function(data) {
buffer += data;
});
socket.on('end', function() {
var object = JSON.parse(buffer);
});
Unexpected token at the end of data string, is some ghost symbol that is not a white space. trim() doesn't work, so to substring the last symbor works. This is AS3 symbol, so we have to keep it. First you save this symbol in the new variable. the you erase this symbol from the line. After that you can parse the string. work with it.
undefined:1
{"chat":"hey","nickname":"nick_name"}
^
SyntaxError: Unexpected token
DEBUG: Program node app exited with code 8
When you finish working with it, stringify the object, then add ghost symbol to the end and send over the socket. Without this symbol AS3 will not parse the data.
I don't know why is that, but that works for me.

How write and immediately read a file nodeJS

I have to obtain a json that is incrusted inside a script tag in certain page... so I can't use regular scraping techniques, like cheerio.
Easy way out, write the file (download the page) to the server and then read it using string manipulation to extract the json (there are several) work on them and save to my db hapily.
the thing is that I'm too new to nodeJS, and can't get the code to work, I think that I'm trying to read the file before it is fully written, and if read it time before obtain [Object Object]...
Here's what I have so far...
var http = require('http');
var fs = require('fs');
var request = require('request');
var localFile = 'tmp/scraped_site_.html';
var url = "siteToBeScraped.com/?searchTerm=foobar"
// writing
var file = fs.createWriteStream(localFile);
var request = http.get(url, function(response) {
response.pipe(file);
});
//reading
var readedInfo = fs.readFileSync(localFile, function (err, content) {
callback(url, localFile);
console.log("READING: " + localFile);
console.log(err);
});
So first of all I think you should understand what went wrong.
The http request operation is asynchronous. This means that the callback code in http.get() will run sometime in the future, but the fs.readFileSync, due to its synchronous nature will execute and complete even before the http request will actually be sent to the background thread that will execute it, since they are both invoked in what is commonly known as the (same) tick. Also fs.readFileSync returns a value and does not use a callback.
Even if you replace fs.readFileSync with fs.readFile instead the code still might not work properly since the readFile operation might execute before the http response is fully read from the socket and written to the disk.
I strongly suggest reading: stackoverflow question and/or Understanding the node.js event loop
The correct place to invoke the file read is when the response stream has finished writing to the file, which would look something like this:
var request = http.get(url, function(response) {
response.pipe(file);
file.once('finish', function () {
fs.readFile(localFile, /* fill encoding here */, function(err, data) {
// do something with the data if there is no error
});
});
});
Of course this is a very raw and not recommended way to write asynchronous code but that is another discussion altogether.
Having said that, if you download a file, write it to the disk and then read it all back again to the memory for manipulation, you might as well forgo the file part and just read the response into a string right away. Your code will then look something like so (this can be implemented in several ways):
var request = http.get(url, function(response) {
var data = '';
function read() {
var chunk;
while ( chunk = response.read() ) {
data += chunk;
}
}
response.on('readable', read);
response.on('end', function () {
console.log('[%s]', data);
});
});
What you really should do IMO is to create a transform stream that will strip away all the data you need from the response, while not consuming too much memory and yielding this more elegantly looking code:
var request = http.get(url, function(response) {
response.pipe(yourTransformStream).pipe(file)
});
Implementing this transform stream, however, might prove slightly more complex. So if you're a node beginner and you don't plan on downloading big files or lots of small files than maybe loading the whole thing into memory and doing string manipulations on it might be simpler.
For further information about transformation streams:
node.js stream api
this wonderful guide by substack
this post from strongloop
Lastly, see if you can use any of the million node.js crawlers already out there :-) take a look at these search results on npm
According to the http module help 'get' does not return the response body
This is modified from the request example on the same page
What you need to do is process the response with in the callback (function) passed into http.request so it can be called when it is ready (async)
var http = require('http')
var fs = require('fs')
var localFile = 'tmp/scraped_site_.html'
var file = fs.createWriteStream(localFile)
var req = http.request('http://www.google.com.au', function(res) {
res.pipe(file)
res.on('end', function(){
file.end()
fs.readFile(localFile, function(err, buf){
console.log(buf.toString())
})
})
})
req.on('error', function(e) {
console.log('problem with request: ' + e.message)
})
req.end();
EDIT
I updated the example to read the file after it is created. This works by having a callback on the end event of the response which closes the pipe and then it can reopen the file for reading. Alternatively you can use
req.on('data', function(chunk){...})
to process the data as it arrives without putting it into a temporary file
My impression is that you serializing a js object into JSON by reading it from a stream that's downloading a file containing HTML. This is do-able yet hard. Its difficult to know when you're search expression is found because if you parse as the chunks come in then you never know if you received only context and you could never find what you're looking for because it was split into 2 or many parts which were never analyzed as a whole.
You could try something like this:
http.request('u/r/l',function(res){
res.on('data',function(data){
//parse data as it comes in
}
});
This allows you to read data as it comes in. You can handle it to save to disc, db, or even parse it if you accumulated the contents within the script tags into a single string then parsed objects in that.