I'm working on an image processing service that has two layers. The top layer is a REST based WCF service that takes the image upload, processes and the saves it to the file system. Since my top layer doesn't have any direct database access (by design) I need to pass the image to my application layer (WsHTTPBinding WCF) which does have database access. As it stands right now, the images can be up to 2MB in size and I'm trying to figure out the best way to transport the data across the wire.
I currently am sending the image data as a byte array and the object will have to be stored in memory at least temporarily in order to be written out to the database (in this case, a MySQL server) so I don't know that using a Stream would help eliminate the potential memory issues or if I am going to have to deal with potentially filling up my memory no matter what I do. Or am I just over thinking this?
Check out the Streaming Data section of this MSDN article: Large Data and Streaming
I've used the exact method described to successfully upload large documents and even stream video contents from a WCF service. The keys are to pass a Stream object in the message contract and setting the transferMode to Streaming in the client and service configuration.
I saw this post regarding efficiently pushing that stream into MySQL, hopefully that gets you pointed in the right direction.
Related
Is it possible to send big amounts of text (csv database export) through an Indy HTTP Server in chunks to a requesting client to avoid hitting memory restrictions?
I am implementing a REST interface in an application server written in Delphi 10.4.2 as I can not expose the database connection for several reasons that are not negotiable.
The data will be consumed by statisticans using R.
As the amount of data can grow up to a GB, I have no good feeling filling a string and writing it to the connection.
#MichaSchumann,
Modern Internet Browser's use often "gzip"ed data that comes from an Apache2.4 WebServer (for Windows 10 Version - see XAMPP). The Browser commonly depack the data, and display the contents based on the file, and the magic content/type(s) (this can be text (html) or binary (pictures).
Indy 9 is sending there packets one to one. This means, the data is "not" secured, and "not" packed before sending.
So, You could be "gzip" Your CSV file (csv files consists commonly of ASCII-Text, which can be quick pack to a binary file that is a mount huge smaller in size, and the transmitting of the data stream does quicker on the stage (if You thinking over the time, what a 1 GiByte file need on time !
Note: you should be send a header for each chunck.
In this header, You save the relevant informations about the size of the actual transmitted chunk, and the position.
So, You can implement Your own FTP-Server that is remember his "position" state.
This is useful when the Connection breaks.
Normal (old FTP-Servers) have the cons that they need the first position of the first chunk of each broken Connection. This means: You have always send the file from 0 .. size.
Feel free to ask other question, I am open, You are welcome.
I'm looking for efficient ways to handle very large json files (i.e. possibly several GB in size, amounting to a few million json objects) in requests to a Django 2.0 server (using Django REST Framework). Each row needs to go through some processing, and then get saved to a DB.
The biggest painpoint thus far is the sheer memory consumption of the file itself, and the memory consumption steadily still increasing while the data is being processed in Django, without any ability to manually release the memory used.
Is there a recommended way of processing very large json files in requests to a Django app, without slaughtering memory consumption? Possible to combine with compressing (gzip)? I'm thinking of uploading the json to the API as a regular file, stream that to disk, and then stream from the file on disk using ijson or similar? Is there a more straightforward way?
Question: What is the best approach for implementing client caching of huge data. I am using Angular4 with Asp.net Web API2.
Problem: I am developing an web analytical tool (support mobile browsers as well) which generates echart metrics based on JSON data returned from Asp.net web api2. Web page have filters and chart events which recalculates charts measures based on same JSON data on client side. To optimize the speed, I have stored the JSON data(minified) in browser localstorage. With this I have avoiding frequent calls to api which are made on filter change and chart events. This JSON data is refreshed with server data at every 20 mins as I have set expiry for each JSON data saved on localstorage.
Problem is localstorage has a size constraint of 10mb and above solution does not work when JSON data (multiple localstorage keys) exceeds 10mb.
Since my data size can vary and can go more than 10mb. What is the best approach to cache such data, as same data can be used for recalculating measures for metrics without making web api server calls.
I though about (below but not implemented yet),
a) client- memory caching (may cause performance issues for users having less memory).
b) Storing json data as a javascript variable and using it.
Please let me know better solution for large client cache.
Working with GWT/GXT i like to speed-up my App with 'local-caching'.
I red about HTML5 session storage, but i was wondering why i shouldn't just use a memory buffer (a big hashmap with all the incoming data).
Whats the pitfall with the memory buffer compared to session storage?
Just as what Thomas Broyer detailed in his comment, the pitfall for using a Map or any similar kind of data structure to save data is that all your data will be lost on page refresh.
If this is not a concern for your given scenario, I don't see any issue using Map/List or anything like that.
In the Errai framework we use a lot of #ApplicationScoped beans to hold data across the whole application, for example the currently logged in user, the latest loaded data from server etc.
We are up to build API which deals with HBase table. Let's say the API has to methods: api.com/get/to get something out of HBase and api.com/put/ to put a matrix into HBase. We want to put and get matrices of size 200mb.
We can't come to conclusion of how to send data to this API. Do you think it sounds OK to send HTTPS request and represent the 200mb input matrix as JSON and put it to POST parameter?
Can't find any best practices for this case. Thank you.
The payload limits depends of the client and server RAM size and processor.
Teorically there is no limit in the standard (RFC 2616). However is not a good idea to construct a big payload because it probably fails because of one of this reasons:
lost packets on data transmission
limits on server side
limits on client side
The best is try to split your 200mb input matrix in smaller inputs and make multiple requests.