I want to cache data that I got from my MySQL DB and for this I am currently storing the data in an object.
Before querying the database, I check if the needed data exists in the meantioned object or not. If not, I will query and insert it.
This works quiet well and my webserver is now just fetching the data once and reuses it.
My concern is now: Do I have to think of concurrent writes/reads for such data structures that lay in the object, when using nodejs's clustering feature?
Every single line of JavaScript that you write on your Node.js program is thread-safe, so to speak - at any given time, only a single statement is ever executed. The fact that you can do async operations is only implemented at a low level implementation that is completely transparent to the programmer. To be precise, you can only run some code in a "truly parallel" way when you do some input/output operation, i.e. reading a file, doing TCP/UDP communication or when you spawn a child process. And even then, the only code that is executed in parallel to your application is that of Node's native C/C++ code.
Since you use a JavaScript object as a cache store, you are guaranteed no one will ever read or write from/to it at the same time.
As for cluster, every worker is created its own process and thus has its own version of every JavaScript variable or object that exists in your code.
Related
I've tried many ways but can't find an efficient and performant way to open millions of files in a folder and insert their content into a database with nodejs.
It needs to be memory efficient and asynchronous because of SQL queries.
Any insight ?
I guess you are not creating an app but more of a one time migration right?
If you are going to just let NodeJS read everything at once and insert to DB using a simple JS loop, you are probably going to face errors.
Either your DB will hang due to insufficient memory / choke up due to too many connections at once.
NodeJS is lightweight.. it just reads the "million of files"
My take on this vague question is that you need to control the insertion:
You can use modules like https://caolan.github.io/async/v3/ to help you control which calls are asynchronous or synchronous using async.eachSeries() or async.waterfall()
Reading files you can use the Nodejs' fs module which can be found here https://www.tutorialspoint.com/nodejs/nodejs_file_system.htm
If you can't control what files your NodeJS is reading, you can.
Read a few files, store it in batches of JSON arrays or objects
Insert them asynchronously / synchronously using the method mentioned above.
This implementation is totally up to how you nest each read and write.
Cheers
I'm using socket.io to send data to client from my database. But my code send data to client every second even the data is the same. How can I send data only when field in db was changed not every 1 second.
Here is my code: http://pastebin.com/kiTNHgnu
With MySQL there is no easy/simple way to get notified of changes. A couple of options include:
If you have access to the server the database is running on, you could stream MySQL's binlog through some sort of parser and then check for events that modify the desired table/column. There are already such binlog parsers on npm if you want to go this route.
Use a MySQL trigger to call out to a UDF (user-defined function). This is a little tricky because there aren't a whole lot of these for your particular need. There are some that could be useful however, such as mysql2redis which pushes to a Redis queue which would work if you already have Redis installed somewhere. There is a STOMP UDF for various queue implementations that support that wire format. There are also other UDFs such as log_error which writes to a file and sys_exec which executes arbitrary commands on the server (obviously dangerous so it should be an absolute last resort). If none of these work for you, you may have to write your own UDF which does take quite some time (speaking from experience) if you're not already familiar with the UDF C interface.
I should also note that UDFs could introduce delays in triggered queries depending on how long the UDF takes to execute.
I am currently importing a huge CSV file from my iPhone to a rails server. In this case, the server will parse the data and then start inserting rows of data into the database. The CSV file is fairly large and would take a lot time for the operation to end.
Since I am doing this asynchronously, my iPhone is then able to go to other views and do other stuff.
However, when it requests another query in another table.. this will HANG because the first operation is still trying to insert the CSV's information into the database.
Is there a way to resolve this type of issue?
As long as the phone doesn't care when the database insert is complete, you might want to try storing the CSV file in a tmp directory on your server and then have a script write from that file to the database. Or simply store it in memory. That way, once the phone has posted the CSV file, it can move on to other things while the script handles the database inserts asynchronously. And yes, #Barmar is right about using an InnoDB engine rather than MyISAM (which may be default in some configurations).
Or, you might want to consider enabling "low-priority updates" which will delay write calls until all pending read calls have finished. See this article about MySQL table locking. (I'm not sure what exactly you say is hanging: the update, or reads while performing the update…)
Regardless, if you are posting the data asynchronously from your phone (i.e., not from the UI thread), it shouldn't be an issue as long as you don't try to use more than the maximum number of concurrent HTTP connections.
So I have a small game in node.js(only the server of course) which has map data and player accounts stored in a mysql database. Right now I constructed it in a way that minimizes the amount of queries made by loading data from the database and keeping it in javascript objects/arrays or whatever seems appropriate and only writing to the database when needed.
Now I was thinking: Is this really worth it? In many cases it would be alot better(in terms of data would be more save and WAY more up-to-date) to hardly store data in the server and just loading it from the database when needed(respectively writing when it needs to be changed).
My question is: Is it efficient/save/recommendable to have the server read/write from the database often rather than having data from the database in javascript variables in the server?
Additional info:
-The nodejs server and my mysql server are on the same machine and a query usually takes less than 1ms or maybe 3ms for big queries like loading room data.
-I am using a module simply called mysql.
-If needed I will include extra info, just ask in a comment.
Really depends on your Use-Case. Generally speaking, I would not add another layer of caching in node.js but handle that in your db with a bigger cache and optimized queries.
I have a C++ application that loads lots of data from a database, then executes algorithms on that data (these algorithms are quite CPU- and data-intensive that's way I load all the data before hand), then saves all the data that has been changed back to the database.
The database-part is nicely separate from the rest of the application. In fact, the application does not need to know where the data comes from. The application could even be started on file (in this case a separate file-module loads the files into the application and at the end saves all data back to the files).
Now:
the database layer only wants to save the changed instances back to the database (not the full data), therefore it needs to know what has been changed by the application.
on the other hand, the application doesn't need to know where the data comes from, hence it does not want to feel forced to keep a change-state per instance of its data.
To keep my application and its datastructures as separate as possible from the layer that loads and saves the data (could be database or could be file), I don't want to pollute the application data structures with information about whether instances were changed since startup or not.
But to make the database layer as efficient as possible, it needs a way to determine which data has been changed by the application.
Duplicating all data and comparing the data while saving is not an option since the data could easily fill several GB of memory.
Adding observers to the application data structures is not an option either since performance within the application algorithms is very important (and looping over all observers and calling virtual functions may cause an important performance bottleneck in the algorithms).
Any other solution? Or am I trying to be too 'modular' if I don't want to add logic to my application classes in an intrusive way? Is it better to be pragmatic in these cases?
How do ORM tools solve this problem? Do they also force application classes to keep a kind of change-state, or do they force the classes to have change-observers?
If you can't copy the data and compare, then clearly you need some kind of record somewhere of what has changed. The question, then, is how to update those records.
ORM tools can (if they want) solve the problem by keeping flags in the objects, saying whether the data has been changed or not, and if so what. It sounds as though you're making raw data structures available to the application, rather than objects with neatly encapsulated mutators that could update flags.
So an ORM doesn't normally require applications to track changes in any great detail. The application generally has to say which object(s) to save, but the ORM then works out what needs persisting to the DB in order to do that, and might apply optimizations there.
I guess that means that in your terms, the ORM is adding observers to the data structures in some loose sense. It's not an external observer, it's the object knowing how to mutate itself, but of course there's some overhead to recording what has changed.
One option would be to provide "slow" mutators for your data structures, which update flags, and also "fast" direct access, and a function that marks the object dirty. It would then be the application's choice whether to use the potentially-slower mutators that permit it to ignore the issue, or the potentially-faster mutators which require it to mark the object dirty before it starts (or after it finishes, perhaps, depending what you do about transactions and inconsistent intermediate states).
You would then have two basic situations:
I'm looping over a very large set of objects, conditionally making a single change to a few of them. Use the "slow" mutators, for application simplicity.
I'm making lots of different changes to the same object, and I really care about the performance of the accessors. Use the "fast" mutators, which perhaps directly expose some array in the data. You gain performance in return for knowing more about the persistence model.
There are only two hard problems in Computer Science: cache invalidation and naming things.
Phil Karlton