I'm looking to store data temporarily on IPFS, probably using a JS library, may or may not host an IPFS node. The technology is so new it isn't easy to locate answers. Please, I appreciate your help, especially in both cases of hosted and non-hosted solutions. Thank you.
Data is only stored persistently on nodes that have "pinned" the content hash. Nodes that request data will cache it for an indeterminate amount of time (as the data is immutable and referenced by hash, the cache would always be valid).
Once you are finished with the data you could remove it from your node, and as long as no one else requested it it would gradually disappear from the network. You could not rely on that happening (for instance if you have a requirement that the data be inaccessible after a certain amount of time).
You would need to be running your own node to initially host the file
Related
I am working on designing a little project where I need to use Consul to manage application configuration in a dynamic way so that all my app machines can get the configuration at the same time without any inconsistency issue. We are using Consul already for service discovery purpose so I was reading more about it and it looks like they have a Key/Value store which I can use to manage my configurations.
All our configurations are json file so we make a zip file with all our json config files in it and store the reference from where you can download this zip file in a particular key in Consul Key/Value store. And all our app machines need to download this zip file from that reference (mentioned in a key in Consul) and store it on disk on each app machine. Now I need all app machines to switch to this new config at the same time approximately to avoid any inconsistency issue.
Let's say I have 10 app machines and all these 10 machines needs to download zip file which has all my configs and then switch to new configs at the same time atomically to avoid any inconsistency (since they are taking traffic). Below are the steps I came up with but I am confuse on how loading new files in memory along with switch to new configs will work:
All 10 machines are already up and running with default config files as of now which is also there on the disk.
Some outside process will update the key in my consul key/value store with latest zip file reference.
All the 10 machines have a watch on that key so once someone updates the value of the key, watch will be triggered and then all those 10 machines will download the zip file onto the disk and uncompress it to get all the config files.
(..)
(..)
(..)
Now this is where I am confuse on how remaining steps should work.
How apps should load these config files in memory and then switch all at same time?
Do I need to use leadership election with consul or anything else to achieve any of these things?
What will be the logic around this since all 10 apps are already running with default configs in memory (which is also stored on disk). Do we need two separate directories one with default and other for new configs and then work with these two directories?
Let's say if this is the node I have in Consul just a random design (could be wrong here) -
{"path":"path-to-new-config", "machines":"ip1:ip2:ip3:ip4:ip5:ip6:ip7:ip8:ip9:ip10", ...}
where path will have new zip file reference and machines could be a key here where I can have list of all machines so now I can put each machine ip address as soon as they have downloaded the file successfully in that key? And once machines key list has size of 10 then I can say we are ready to switch? If yes, then how can I atomically update machines key in that node? Maybe this logic is wrong here but I just wanted to throw out something. And also need to clean up all those machines list after switch since for the next config update I need to do similar exercise.
Can someone outline the logic on how can I efficiently manage configuration on all my app machines dynamically and also avoid inconsistency issue at the same time? Maybe I need one more node as status which can have details about each machine config, when it downloaded, when it switched and other details?
I can think of several possible solutions, depending on your scenario.
The simplest solution is not to store your config in memory and files at all, just store the config directly in the consul kv store. And I'm not talking about a single key that maps to the entire json (I'm assuming your json is big, otherwise you wouldn't zip it), but extracting smaller key/value sets from the json (this way you won't need to pull the whole thing every time you make a query to consul).
If you get the config directly from consul, your consistency guarantees match consul consistency guarantees. I'm guessing you're worried about performance if you lose your in-memory config, that's something you need to measure. If you can tolerate the performance loss, though, this will save you a lot of pain.
If performance is a problem here, a variation on this might be to use fsconsul. With this, you'll still extract your json into multiple key/value sets in consul, and then fsconsul will map that to files for your apps.
If that's off the table, then the question is how much inconsistencies are you willing to tolerate.
If you can stand a few seconds of inconsistencies, your best bet might be to put a TTL (time-to-live) on your in-memory config. You'll still have the watch on consul but you combine it with evicting your in-memory cache every few seconds, as a fallback in case the watch fails (or stalls) for some reason. This should give you a worst-case few seconds inconsistencies (depending on the value you set for your TTL), but normal case (I think) should be fast.
If that's not acceptable (does downloading the zip take a lot of time, maybe?), you can go down the route you mentioned. To update a value atomically you can use their cas (check-and-set) operation. It will give you an error if an update had happened between the time you sent the request and the time consul tried to apply it. Then you need to pull the list of machines, and apply your change again and retry (until it succeeds).
I don't see why you would need 2 directories, but maybe I'm misunderstanding the question: when your app starts, before you do anything else, you check if there's a new config and if there is you download it and load it to memory. So you shouldn't have a "default config" if you want to be consistent. After you downloaded the config on startup, you're up and alive. When your watch signals a key change you can download the config to directly override your old config. This is assuming you're running the watch triggered code on a single thread, so you're not going to be downloading the file multiple times in parallel. If the download failed, it's not like you're going to load the corrupt file to your memory. And if you crashed mid-download, then you'll download again on startup, so should be fine.
I am in the process of building my first live node.js web app. It contains a form that accepts data regarding my clients current stock. When submitted, an object is made and saved to an array of current stock. This stock is then permanently displayed on their website until the entry is modified or deleted.
It is unlikely that there will ever be more than 20 objects stored at any time and these will only be updated perhaps once a week. I am not sure if it is necessary to use MongoDB to store these, or whether there could be a simpler more appropriate alternative. Perhaps the objects could be stored to a JSON file instead? Or would this have too big an implication on page load times?
You could potentially store in a JSON file or even in a cache of sorts such as Redis but I still think MongoDB would be your best bet for a live site.
Storing something in a JSON file is not scalable so if you end up storing a lot more data than originally planned (this often happens) you may find you run out of storage on your server hard drive. Also if you end up scaling and putting your app behind a load balancer, then you will need to make sure there are matching copy's of that JSON file on each server. Further more, it is easy to run into race conditions when updating a JSON file. If two processes are trying to update the file at the same time, you are going to potentially lose data. Technically speaking, JSON file would work but it's not recommended.
Storing in memory (i.e.) Redis has similar implications that the data is only available on that one server. Also the data is not persistent, so if your server restarted for whatever reason, you'd lose what was stored in memory.
For all intents and purposes, MongoDB is your best bet.
The only way to know for sure is test it with a load test. But as you probably read html and js files from the file system when serving web pages anyway, the extra load of reading a few json files shouldn't be a problem.
If you want to go with simpler way i.e JSON file use nedb API which is plenty fast as well.
For example, consider something like Facebook or Twitter. All the user tweets / posts are retained indefinitely (so they must ultimately be stored within a static database). At the same time, they can rapidly change (e.g. with replies, likes, etc), so some sort of caching layer is necessary (e.g. you obviously can't be writing directly to the database every time a user "likes" a post).
In a case like this, how are the database / caching layers designed and implemented? How are they tied together?
For example, is it typical to begin by implementing the database in its entirety, and then add the caching layer afterword?
What about the other way around? In other words, begin by implementing the majority of functionality into the cache layer, and then write another layer which periodically flushes the cache to the database (at some point when its activity has gone down)? In this scenario, for current / rapidly changing data, the entire application would essentially be stored in cache.
Or perhaps implement some sort of cache-ranking algorithm based on access / update frequency?
How then should it be handled when a user accesses less frequent data (which isn't currently in cache)? Simply bypass cache completely / query the database directly, or should all data be cached before it's sent to users?
In cases like this, does it make sense to design the database schema with the caching layer in mind, or should it be designed independently?
I'm not necessarily asking for direct answers to all these questions, but they're just to give an idea of where I'm coming from.
I've found quite a bit of information / books on implementing the database, and implementing the caching layer independent of one another, but not a whole lot of information on using them in conjunction / tying them together.
Any information, suggestions, general patters, articles, books, would be much appreciated. It's just difficult to find some direction here.
Thanks
Probably not the best solution, but I worked on a personal project using Openresty where I used their shared memory zones to cache, to avoid the overhead of connecting to something like Redis, then used Redis as the backend DB.
When a user loads a resource, it checks the shared dict, if it misses then it loads it from Redis and writes it to the cache on the way back.
If a resource is created or updated, it's written to the cache, and also queued to a shared dict queue.
A background worker ticks away waiting for new items in the queue, writing them to Redis and then sending an event to other servers to either invalidate the resource in their cache if they have it, or even pre-cache it if needed.
I am making a new version of a old static website that grew up to a 50+ static pages.
So I made a JSON file with the old content so the new website can be more CMS (with templates for common pages) and so backend gets more DRY.
I wonder if I can serve that content to my views from the JSON or if I should have it in a MySQL database?
I am using Node.js, and in Node I can store that JSON file in memory so no file reading is done when user asks for data.
Is there a correct practise for this? are there performance differences between them serving a cached JSON file or via MySQL?
The file in question is about 400Kb. If the filesize is relevant to the choice of one tecnhology over the other?
Why add another layer of indirection? Just serve the views straight from JSON.
Normally, database is used for serving dynamic content that changes frequently, records have one-to-many or many-to-many relationships, and you need to query the data based on various criteria.
In the case you described, it looks like you will be OK with JSON file cached in server memory. Just make sure you update the cache whenever content of the file changes, i.e. by restarting the server, triggering cache update via http request or monitoring the file at the file system level.
Aside from that, you should consider caching of static files on the server and on the browser for better performance
Cache and Gzip static files (html,js,css,jpg) in server memory on startup. This can be easily done using npm package like connect-static
Use browser cache of the client by setting proper response headers. One way to do it is adding maxAge header on the Express route definition, i.e:
app.use "/bower", express.static("bower-components", {maxAge:
31536000})
Here is a good article about browser caching
If you are already storing your views as JSON and using Node, it may be worth considering using a MEAN stack (MongoDB, Express, Angular, Node):
http://meanjs.org/
http://mean.io/
This way you can code the whole thing in JS, including the document store in the MongoDB. I should point out I haven't used MEAN myself.
MySQL can store and serve JSON no problem, but as it doesn't parse it, it's very inflexible unless you split it out into components and indexing within the document is close to impossible.
Whether you 'should' do this depends entirely on your individual project and whether it is/how it is likely to evolve.
As you are implementing a new version (with CMS) of the website it would suggest that it is live and subject to growth or change and perhaps storing JSON in MySQL is storing up problems for the future. If it really is just one file, pulling from the file system and caching it in RAM is probably easier for now.
I have stored JSON in MySQL for our projects before, and in all but a few niche cases ended up splitting up the component data.
400KB is tiny. All the data will live in RAM, so I/O won't be an issue.
Dynamically building pages -- All the heavy hitters do that, if for no other reason than inserting ads. (I used to work in the bowels of such a company. There were million of pages live all the time; only a few were "static".)
Which CMS -- too many to choose from. Pick a couple that sound easy; then see if you can get comfortable with them. Then pick between them.
Linux/Windows; Apache/Tomcat/nginx; PHP/Perl/Java/VB. Again, your comfort level is an important criteria in this tiny web site; any of them can do the task.
Where might it go wrong? I'm sure you have hit web pages that are miserably slow to render. So, it is obviously possible to go the wrong direction. You are already switching gears; be prepared to switch gears a year or two from now if your decision turns out to be less than perfect.
Do avoid any CMS that is too heavy into EAV (key-value) schemas. They might work ok for 400KB of data, but they are ugly to scale.
Its a good practice to serve the json directly from the RAM itself if your data size will not grow in future. but if data is going to be increased in future then it will become a worst application case.
If you are not expecting to add (m)any new pages, I'd go for the simplest solution: read the JSON once into memory, then serve from memory. 400KB is very little memory.
No need to involve a database. Sure, you can do it, but it's overkill here.
I would recomend to generate static html content at build time(use grunt or ..). If you would like to apply the changes, trigger a build and generate static content and deploy it.
I am developing a Live chat application using Node JS, Socket IO and JSON file. I am using JSON file to read and write the chat data. Now I am stuck on one issue, When I do the stress testing i.e pushing continuous messages into the JSON file, the JSON format becomes invalid and my application crashes.Although I am using forever.js which should keep application up but still the application crashes.
Does anybody have idea on this?
Thanks in advance for any help.
It is highly recommended that you re-consider your approach for persisting data to disk.
Among other things, one really big issue is that you will likely experience data loss. If we both get the file at the exact same time - {"foo":"bar"} - we both make a change and you save it before me, my change will overwrite yours since I started with the same thing as you. Although you saved it before me, I didn't re-open it after you saved.
What you are possibly seeing now in an append-only approach is that we're both adding bits and pieces without regard to valid JSON structure (IE: {"fo"bao":r":"ba"for"o"} from {"foo":"bar"} x 2).
Disk I/O is actually pretty slow. Even with an SSD hard drive. Memory is where it's at.
As recommended, you may want to consider MongoDB, MySQL, or otherwise. This may be a decent use case for Couchbase which is an in-memory key/value store based on memcache that persists things to disk ASAP. It is extremely JSON friendly (it is actually mostly based on JSON), offers great map/reduce support to query data, is super easy to scale to multiple servers, and has a node.js module.
This would allow you to very easily migrate your existing data storage routine into a database. Also, it provides CAS support which will prevent you from data loss in the scenarios outlined earlier.
At minimum though, you should possibly just modify an in memory object that you save to disk ever so often to prevent permanent data loss. However, this only works well with 1 server and then you're back at likely needing to look at a database.