Google Scripts CacheService Optimizaiton/ Limit - json

I found the CacheService is quite fast (duh) so decided to create a CacheManager to store a whole manner of things.
JS Object --> JSON--> Blob --> Zip --> base64 encoded string
if the base64 string is > 1E5 chars (100kb) I create an MD5 checksum for the base64 string then split into 100kb sections and then cache those separately as a multipart zip string
I was able to store/ recall ~3MB of raw JSON data in this fashion in ~1.2s (similar speed to a DriveApp API call)
I tried searching for an overall limit for how many total cached objects could be created but didn't find much. Is anyone aware of of an overall limit or performance degradation with large numbers of cached strings?
Source Code for my "cache manager"
Edit: Fixed source URL

There is no documented value for this in the docs, but you can try the tests conducted according to this SO post. I think the value shouldn't exceed 10MB. Read more on the test conducted here.

Related

Json file that would be used in Elasticsearch

I want to know, if the Json files that would be used in Elasticsearch should have a predefined structure. Or can any Json document can be uploaded?
I've seen some Json documents that before each record there's such:
{"index":{"_index":"plos","_type":"article","_id":0}}
{"id":"10.1371/journal.pone.0007737","title":"Phospholipase C-β4 Is Essential for the Progression of the Normal Sleep Sequence and Ultradian Body Temperature Rhythms in Mice"}
Theoretically you can upload any JSON document. However, be mindful that Elasticsearch can create/change the index mapping based on your create/update actions. So if you send a JSON that includes a previously unknown field? Congratulations, your index mapping now contains a new field! In this same way the data type of a field might also be affected by introducing a document with data of a different type. So, my advice is be very careful in constructing your requests to avoid surprises.
Fyi, the syntax you posted looks like a bulk request (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html). Those do have some demands on the syntax to clarify what you want to do to which documents. "Index" call sending one document is very unrestricted though.

REST interface: design for many parameters and big data

I have a RESTful interface and mainly one function. The function mainly takes one big binary file as input and returns the modified file. Let's assume it's an encryption function (it's not, but similar). Nothing gets stored on the server, so repeating the call is no problem. I also need to pass many parameters, like type of encryption, some information about the caller, if the file needs to be converted first, etc.
I was thinking to implement this with a URL with something like this: //server/v1/encrypt, a POST request, and all parameters including the file (base64 encoded) in a JSON in the body.
Now the JSON definition says POST should be used for creation requests and it cannot be cached. Caching is not really important for me (as the file is always different), but I would like to follow standards and recommendations and best practice.
I would assume for the type of request, the best would be to use GET, but GET cannot have a file in the body (as per question 978061). And also the parameters in the body as JSON is probably also not a good idea. But is it really better to have 50 parameters in the URL (either as part of the path or part of GET parameters)?
How would you implement this and why?

How does DHT work?

I grabbed the basic idea about DHT from wiki:
Store Data:
In a DHT-network, every node is responsible for a specific range of key-space. To store a file in the DHT, first, hash the file's name to get the file's key; second, send a message put(key, file-content) to any node of the DHT, the message will be relayed to the node which is responsible for key and that node will store the pair (key, file-content).
Get Data:
When getting a file from DHT, first, hash the file's name to get the key; second send a message get(key) to any node, relay the message until...
Questions:
To store a file, we can hash the file's name to get its key, but wiki says:
In the real world the key k could be a hash of a file's content rather
than a hash of a file's name to provide content-addressable storage,
so that renaming of the file does not prevent users from finding it.
Hash file's content? How am I supposed to know the file's content? If I've already know the file's content, then WHY would I search it in the DHT?
According to the wiki, every participating node will spare some space to store files. So does it mean that, if I participate in a DHT, I have to spare 10G disk space to store those files whose key falls into the specific key-space I'm responsible for?
If indeed I should spare some disk space to store those files, then how should I store those (key, file-content) on the disk? I mean, should the file be arranged into a B-tree or something on my disk?
When a query happens, how does my computer respond? I assume, first, check the queried key, if in my key-space, then find the corresponding file on my disk. right?
A DHT is just an algorithm. At its base it provides distributed key-value PUT and GET operations. Similar to a normal Map or associative array found in many programming languages.
Due to the real-world limitations such as untrustworthy nodes, failure rates and so on actual DHT implementations don't provide an arbitrary-length PUT(<uint8[]>, <uint8[]>) operation.
Example:
The kademlia implementation for bittorrent for example provides the following interfaces:
PUT(uint8[20], uint16)
GET(uint8[20]) -> List<Pair<IP, uint16>> where the list only represents a randomly sampled subset of the actual data
As you can see it actually is a specialized asymmetric interface when compared to more generic associative arrays.
The IP address is always derived from the PUT sender's source address, i.e. cannot be explicitly set.
And the GET returns a list instead of a single value, so it implements a MultiMap or Map<List>, if you want to see it like that.
In bittorrent's case a hash is used as content descriptor, where peers which have the content announce themselves on the DHT. If someone wants the file(s) they look up IP/Port pairs on the DHT, then contact the peers via a separate protocol and then download the data.
But other uses for a DHT are also possible, i.e. they could store signed, structured data, tweet-like text snippets or whatever. It always depends on your applications' needs.
It's just a basic building block.

Grails JSON max length

I am aware that JSON strings often have a max length defined in either Apache on PHP, however where is the max length for JSON strings defined in Grails using TomCat?
The JSON string I am sending is 13,636 characters in length, however I can shorten it a lot (although I don't want to while we're testing) - also, we may be sending images via JSON in the future which I've read requires base64 encoding and thus a considerable overhead. If we were to do such a thing then I am worried that if this limit is causing problems, it's something we should overcome now.
If there is no limit, then perhaps I am doing something wrong. I have a finite amount of domain objects that I am encoding as JSON using domainInstance as grails.converters.deep.JSON - this is done using a for loop and each time the JSON string is appended to a StringBuilder
I then render the StringBuilder in a view using render(stringBuilder.toString()) and the first JSON string is fine, however the second is truncated near to the end. If I were to guestimate I'd say I am getting around 80% of the total length of the StringBuilder.
EDIT/SOLUTION: Apologies guys & girls, I've noticed that when I view source on the page I get the complete JSON string, however when I just view the page it's truncated. It's an odd error, I'll accept answers on why it's truncated, though. Thanks.
There is a maximum size in Tomcat for POST requests, which you may be concerned with later if you start sending huge JSON / Base64 image requests (like you mentioned).
The default value in tomcat is 2,097,152 bytes (2 MB); it can be changed by setting the maxPostSize attribute of the <Connector> in server.xml.
From the docs (for maxPostSize in Tomcat 7):
The maximum size in bytes of the POST which will be handled by the
container FORM URL parameter parsing. The limit can be disabled by
setting this attribute to a value less than or equal to 0. If not
specified, this attribute is set to 2097152 (2 megabytes).
This is pretty straightforward to configure if you're deploying a Grails war to a standalone Tomcat instance. However, I can't find much about actually configuring server.xml if you're using the tomcat plugin. If it were me, I'd probably just run large-file tests using the war instead of with grails run-app.

Appcelerator. Cache JSON output for a short time

I am developing an iOS app that uses a single context architecture. I make frequent calls to my API (PHP) and I want to "cache" the output for as long as the session is active. Right now I am saving the output to a variable that is defined in the app.s.
var contacts = {
contactsData: null
};
So I do this to save the output, is it really a good idea? Will it slow things down?
contacts.contactsData = output;
Thankful for all input!
It consist of how big is json file in mb. If device have enough RAM - it is the best way. Also be sure you save decoded json not just request response, so you will not decode it every time.
If json data is too big you must think about some kind of local storage. If Json is always the same (no need to synch every time) save it local.
If you need update it often you can upload extremly needed part with 1 limited request (API config needed) and other data with second background request.