Delphi Indy HTTPServer: Write text in Chunks possible? - csv

Is it possible to send big amounts of text (csv database export) through an Indy HTTP Server in chunks to a requesting client to avoid hitting memory restrictions?
I am implementing a REST interface in an application server written in Delphi 10.4.2 as I can not expose the database connection for several reasons that are not negotiable.
The data will be consumed by statisticans using R.
As the amount of data can grow up to a GB, I have no good feeling filling a string and writing it to the connection.

#MichaSchumann,
Modern Internet Browser's use often "gzip"ed data that comes from an Apache2.4 WebServer (for Windows 10 Version - see XAMPP). The Browser commonly depack the data, and display the contents based on the file, and the magic content/type(s) (this can be text (html) or binary (pictures).
Indy 9 is sending there packets one to one. This means, the data is "not" secured, and "not" packed before sending.
So, You could be "gzip" Your CSV file (csv files consists commonly of ASCII-Text, which can be quick pack to a binary file that is a mount huge smaller in size, and the transmitting of the data stream does quicker on the stage (if You thinking over the time, what a 1 GiByte file need on time !
Note: you should be send a header for each chunck.
In this header, You save the relevant informations about the size of the actual transmitted chunk, and the position.
So, You can implement Your own FTP-Server that is remember his "position" state.
This is useful when the Connection breaks.
Normal (old FTP-Servers) have the cons that they need the first position of the first chunk of each broken Connection. This means: You have always send the file from 0 .. size.
Feel free to ask other question, I am open, You are welcome.

Related

Elasticsearch as Image Server vs Apache

I use elasticsearch to query stock quotes. My browser calls the elastic cluster which returns a list of urls inside of <img> tags. The browser then calls the images (stock charts of associated quote). These images are on a separate Apache 2 http server. Both servers are identical Centos Quad core 2.0Ghz, 16GB RAM, 1Tb HD.
From reading previous SO posts it seems one can store base64 images in Elasticsearch.
Has anyone created a production image server in elasticsearch and perhaps compared benchmarks to a static web server? In my case the images are 80 to 150 kb.
My specific question is (1) Would it be faster to have the image in my document map as binary and elastic reply back base64 images as opposed to <img> tags which then require another call to Apache? (2) Is elasticsearch as an image server comparable to static nginx or apache image server?
Elasticsearch is a search engine (among other things) which excels at providing fast searches for your data. It is not a content server.
The only reason I would store images in ES would be if I needed to search for similar images. In your case, you seem to be willing to use Elasticsearch as a content server to retrieve your images, which would be better stored on a content delivery network (CDN) as you're doing now with your second Apache server.
Pragmatically, though, it's probably ok to store the base64 of your images in ES if you have a few stock quote documents, i.e. not millions.
The best thing to do is always to try it out and see how your cluster handles it. Maybe for your specific use case it's perfectly ok. It's just that you'll be putting an extra load on ES, which it isn't meant to handle in the first place.
For instance, if you return ten results, your response will grow from a few KB to at least 1 MB and your users will need to wait for that transfer to be done in order to see some results, whereas if you stored your images elsewhere, you could at least show the results very quickly to the user and let the browser handle the image retrieval asynchronously without having to care about it.
Although it is possible to store binary data in a search index you should avoid doing so for large binaries.
Storing binaries as in-memory fielddata (FieldCache) can make your system quickly running out of heap space whereas storing them as disk-based fielddata (DocValues) - making ElasticSearch behave more like a typical "column store" - will load the images of all documents to the file system cache. (DocValues are documented here).
Therefore, serving and caching images from nginx or Apache still seems the better choice.

Is there any limitation of JSON data size with HTTP Protocol?

I need to separate Server/Client(now it is one, but have to divided) Program.
So Here is my plan
server : queries to database. send data as type of JSON.
client : receives json data from server.
The thing I worry about is data size. I expect server will send data which almost sizes 200MB. Is it proper size to transfer with http protocol? or should I make this as a file and send via FTP?(but I expect that client will not open extra port for this :<)
P.S
Is there any reference what is the proper size of json data?
Thanks ahead.
There is no limit for the amount of data that can be transmitted over HTTP. HTTP also doesn't care what kind of data you are sending/receiving. It can be audio, video or JSON, so you should be safe.
Moreover, HTTP servers and clients can easily use gzip to make the requests/response more compact, and since JSON is text based, the content can be compressed quite a lot.
In short, there is no problem with your approach.

WebSocket progress

I have a case where I may need to send 500KB - 1MB of data to the client via WebSockets. Therefore, I was wondering if it was possible to track the progress of how much data has been received by the client. That way the application does not appear to be non-responsive when connecting via a slower connection.
There is no built-in way to do this. (That is, while the websocket protocol allows for fragmentation of messages, any client using Javascript's websocket API has no access to this and is only informed when a browser receives all message fragments and combines their contents into a single buffer.)
You could indicate progress in application code however by breaking your single large message into several smaller ones.
If you do this, you'll also need to define your own simple protocol. At a minimum, this could be an initial message that informs the client that following messages are to be combined and add up to x bytes. Or, if you don't know the size of data in advance, a second message that follows the final data transfer and indicates the end of your fragmented message.

Storing image in database directly or as base64 data?

The common method to store images in a database is to convert the image to base64 data before storing the data. This process will increase the size by 33%. Alternatively it is possible to directly store the image as a BLOB; for example:
$image = new Imagick("image.jpg");
$data = $image->getImageBlob();
$data = $mysqli->real_escape_string($data);
$mysqli->query("INSERT INTO images (data) VALUES ('$data')");
and then display the image with
<img src="data:image/jpeg;base64,' . base64_encode($data) . '" />
With the latter method, we save 1/3 storage space. Why is it more common to store images as base64 in MySQL databases?
UPDATE: There are many debates about advantages and disadvantages of storing images in databases, and most people believe it is not a practical approach. Anyway, here I assume we store image in database, and discussing the best method to do so.
I contend that images (files) are NOT usually stored in a database base64 encoded. Instead, they are stored in their raw binary form in a binary column, blob column, or file.
Base64 is only used as a transport mechanism, not for storage. For example, you can embed a base64 encoded image into an XML document or an email message.
Base64 is also stream friendly. You can encode and decode on the fly (without knowing the total size of the data).
While base64 is fine for transport, do not store your images base64 encoded.
Base64 provides no checksum or anything of any value for storage.
Base64 encoding increases the storage requirement by 33% over a raw binary format. It also increases the amount of data that must be read from persistent storage, which is still generally the largest bottleneck in computing. It's generally faster to read less bytes and encode them on the fly. Only if your system is CPU bound instead of IO bound, and you're regularly outputting the image in base64, then consider storing in base64.
Inline images (base64 encoded images embedded in HTML) are a bottleneck themselves--you're sending 33% more data over the wire, and doing it serially (the web browser has to wait on the inline images before it can finish downloading the page HTML).
On MySQL, and perhaps similar databases, for performance reasons, you might wish to store very small images in binary format in BINARY or VARBINARY columns so that they are on the same page as the primary key, as opposed to BLOB columns, which are always stored on a separate page and sometimes force the use of temporary tables.
If you still wish to store images base64 encoded, please, whatever you do, make sure you don't store base64 encoded data in a UTF8 column then index it.
Pro base64: the encoded representation you handle is a pretty safe string. It contains neither control chars nor quotes. The latter point helps against SQL injection attempts. I wouldn't expect any problem to just add the value to a "hand coded" SQL query string.
Pro BLOB: the database manager software knows what type of data it has to expect. It can optimize for that. If you'd store base64 in a TEXT field it might try to build some index or other data structure for it, which would be really nice and useful for "real" text data but pointless and a waste of time and space for image data. And it is the smaller, as in number of bytes, representation.
Just want to give one example why we decided to store image in DB not files or CDN, it is storing images of signatures.
We have tried to do so via CDN, cloud storage, files, and finally decided to store in DB and happy about the decision as it was proven us right in our subsequent events when we moved, upgraded our scripts and migrated the sites serveral times.
For my case, we wanted the signatures to be with the records that belong to the author of documents.
Storing in files format risks missing them or deleted by accident.
We store it as a blob binary format in MySQL, and later as based64 encoded image in a text field. The decision to change to based64 was due to smaller size as result for some reason, and faster loading. Blob was slowing down the page load for some reason.
In our case, this solution to store signature images in DB, (whether as blob or based64), was driven by:
Most signature images are very small.
We don't need to index the signature images stored in DB.
Index is done on the primary key.
We may have to move or switch servers, moving physical images files to different servers, may cause the images not found due to links change.
it is embarrassed to ask the author to re-sign their signatures.
it is more secured saving in the DB as compared to exposing it as files which can be downloaded if security is compromised. Storing in DB allows us better control over its access.
any future migrations, change of web design, hosting, servers, we have zero worries about reconcilating the signature file names against the physical files, it is all in the DB!
AC
I recommend looking at modern databases like NoSQL and also I agree with user1252434's post. For instance I am storing a few < 500kb PNGs as base64 on my Mongo db with binary set to true with no performance hit at all. Mongo can be used to store large files like 10MB videos and that can offer huge time saving advantages in metadata searches for those videos, see storing large objects and files in mongodb.

BLOBs, Streams, Byte Arrays and WCF

I'm working on an image processing service that has two layers. The top layer is a REST based WCF service that takes the image upload, processes and the saves it to the file system. Since my top layer doesn't have any direct database access (by design) I need to pass the image to my application layer (WsHTTPBinding WCF) which does have database access. As it stands right now, the images can be up to 2MB in size and I'm trying to figure out the best way to transport the data across the wire.
I currently am sending the image data as a byte array and the object will have to be stored in memory at least temporarily in order to be written out to the database (in this case, a MySQL server) so I don't know that using a Stream would help eliminate the potential memory issues or if I am going to have to deal with potentially filling up my memory no matter what I do. Or am I just over thinking this?
Check out the Streaming Data section of this MSDN article: Large Data and Streaming
I've used the exact method described to successfully upload large documents and even stream video contents from a WCF service. The keys are to pass a Stream object in the message contract and setting the transferMode to Streaming in the client and service configuration.
I saw this post regarding efficiently pushing that stream into MySQL, hopefully that gets you pointed in the right direction.