I need to do some cleanup work so I wrote a View in the Couchbase Console that get my data. How do I update or delete them ?
Thanks
For deletion, one of the better ways would be for each objectID from the view do the touch() method to set a random TTL for the next whatever range of time you do; hours, days, month whatever. Then Couchbase will delete the objects gracefully over time with little to no load on the cluster. This would be very fast as you are not editing the object itself, but only the metadata Couchbase keeps for that object and it'd be very very fast.
For updating of the object, the high level would be; for each objectid in the view you'd read the object, make the changes to it, then save it back to the DB.
If there is something more specific you are looking for let me know.
Related
Our cloud service deals with chunks of JSON data (item) which is being manipulated all the time. It can be changed as fast as every second.
At the moment item is JSON object that is being modified all the time. Now we need to implement versioning of these items as well.
Basically, every time request to modify the object arrives, it is modified, saved to DB and then we also need to store that version somewhere. So later on you will be able to say "give me version 345 of this item".
My question is - what would be the ideal way to store this history. Mind you, we do not need to query or alter the data once saved, all we need is to load it if necessary (0.01% of time) - the data is meaningless blob basically.
We are researching multiple approaches:
Simple text files (file system)
Cloud storage (eg S3)
Version control (eg GIT)
Database (any)
Vault (For example Vault from hashicorp)
Main problem is that since items are updated every second, we end up with a lot of blobs. Consider - 100 items, updated every second - thats 8,640,000 records in a single day. Not to mention 100rps for the DB.
Do you have any recommendation as to what would be the optimal approach? We need it to be scalable, fast, reliable, encryption out-of-the-box would be great plus.
I am currently in a development team that has implemented a search app using Flask-WhooshAlchemy. Admittedly, we did not think this completely through.
The greatest problem we face is being unable to store query results into a Flask session without serializing the data set first. The '__QueryObject' being returned via Whoosh can be JSON serialized using Marshmallow. We have gone through this route and, yes, we are able to store and manipulate the retrieved data, but at a tradeoff: initial searches will take a very long time (at least 30 seconds for larger result sets, due to serialization). For the time being, we are currently stuck with having to re-query anytime there are changes to the data set (changes that shouldn't require a fresh search, such as switching between result views and changing the number of results per page). Adding insult to injury, whoosh is probably not scalable for our purposes; Elasticsearch seems a better contender.
In short:
How can we store elasticsearch query results in a Django session so that we may be able to manipulate these results?
Any other guidance will be greatly appreciated.
If anyone cares, we finally got everything up and running and yes, it is possible to store elasticsearch query results in a Django session.
I'm working on an application that allows users to edit documents (spreadsheets and other docs) live.
When editing the files, a keyup function is triggered with AJAX which auto-sends a post request to save/update the file.
What I'd like to do, is add a notifications or someway of registering/logging that the user has updated the file. That could then get put into some-sort of a feed.
The problem is, that because there are a so many AJAX requests it would be impractical to log the edit based on AJAX save requests.
What would be a good structure to handle this?
I was thinking of using some sort of time-stamping method, and only log an edits if the previous time-stamp is out of a certain range (like 15mins or something).
Does anyone have any experience with this kind of thing? I'm really not sure what the best solution would be. I'm Trying to come up with few ideas to see what a suitable solution would be to this (in terms of table(s) structure and general direction). Perhaps someone here can help.
If you have so many AJAX request, is going to be quite hard if you have to read on the server every time that you send an AJAX request if there is a log row for your user in the period of time you want, as you have to make a read action over a time field, and then a write action, that is, you'll be all the time reading a table that will have millions of rows. That can hit your performance and the speed of response, and load unnecessarily your database server.
I think that an improvement would be send a var as "log this" signal with your AJAX request. And your table won't have to check if there is or not a previous value, it will only insert your log request if the signal is ON. You only have to keep in the client side a time counter which will set the "log this" signal according to the time, vars, length of edit...
We have a Meteor-based system that basically polls for data from a third-party REST API, loops through the retrieved data, inserts or updates each record to a Meteor collection.
But then it hit me: What happens when an entry is deleted from the data of the third-party?
One would say insert/update the data, and then loop through the collection and find out which one isn't in the fetched data. True, that's one way of doing it.
Another would be to clear the collection, and rewrite everything from the fetched data.
But with thousands of entries (currently at 1500+ records, will potentially explode), both seem to be very slow and CPU consuming.
What is the most optimal procedure to mirror data from JS Object to a Meteor/Mongo collection in such a way that deleted items from the data are also deleted on the collection?.
I think code is irrelevant here since this could be applicable to other languages that can do a similar feat.
For this kind of usage try using something that's more optimized. The meteor guys are working on using meteor as a sort of replica mongodb set to get/and set data.
For the moment there is Smart-Collections that uses mongodb's oplog to significantly boost performance. It could work in a sort of one size fits all scenario without optimizing for specifics. There are benchmarks that show this.
When Meteor 1.0 comes out I think they'll have optimized their own mongodb driver.
I think this may help with thousands of entries. If you're changing thousands of documents every second you need to get something closer to mongodb. Meteor employs lots of caching techniques which aren't too optimal for this. I think it polls the database every 5 seconds to refresh its cache.
Smart Collections: http://meteorhacks.com/introducing-smart-collections.html
Please do let know if it helps I'm interested to know if its useful in this scenario.
If this doesn't work, redis might be helpful too since everything is stored in-memory. Not sure what your use case is but if you don't need persistence redis would squeeze out more performance than mongo.
I find a interesting problem when working with web service in JSON format.
Assume there's web service. accept several parameters. each parameter has different value set. You can get the response by passing different request parameters.
The request is in JSON format. Because there're so many different combination of request parameters. For performance optimization, I want to cache the request and response pair. and store it into local database. If there's big hash table, I may want to store the request as key, the response as value.
I am thinking the MongoDB maybe a solution. But I am not sure. Is it possible to store request-response as key-value pair in these kind of database? So I can cache the result and response to user immediately.
Thank you.
You won't get any benefit from that level of caching unless your code and database have spectacularly bad performance (in which case you have bigger problems than setting up a cache).
You can use JSON as a key with any key/value store, though it probably makes sense to use a hash as the cache key rather than using the JSON string directly, and a non-persistent in-memory cache with memcached or redis will work a lot better than a complete document database like MongoDB.
Where you will run into big problems with this approach is managing cache expiry - to get real time updates, you need to know exactly which cached objects are affected by a change to a given object. That's easy if the request is a simple get by ID, but next to impossible in the scenario you describe.
The other way to manage cache is expiry is to delete objects from the cache after a given time. This assumes that it is acceptable to show stale data after an update. Caches usually have support for expiry built in. Databases generally don't.