Elasticsearch as Image Server vs Apache - html

I use elasticsearch to query stock quotes. My browser calls the elastic cluster which returns a list of urls inside of <img> tags. The browser then calls the images (stock charts of associated quote). These images are on a separate Apache 2 http server. Both servers are identical Centos Quad core 2.0Ghz, 16GB RAM, 1Tb HD.
From reading previous SO posts it seems one can store base64 images in Elasticsearch.
Has anyone created a production image server in elasticsearch and perhaps compared benchmarks to a static web server? In my case the images are 80 to 150 kb.
My specific question is (1) Would it be faster to have the image in my document map as binary and elastic reply back base64 images as opposed to <img> tags which then require another call to Apache? (2) Is elasticsearch as an image server comparable to static nginx or apache image server?

Elasticsearch is a search engine (among other things) which excels at providing fast searches for your data. It is not a content server.
The only reason I would store images in ES would be if I needed to search for similar images. In your case, you seem to be willing to use Elasticsearch as a content server to retrieve your images, which would be better stored on a content delivery network (CDN) as you're doing now with your second Apache server.
Pragmatically, though, it's probably ok to store the base64 of your images in ES if you have a few stock quote documents, i.e. not millions.
The best thing to do is always to try it out and see how your cluster handles it. Maybe for your specific use case it's perfectly ok. It's just that you'll be putting an extra load on ES, which it isn't meant to handle in the first place.
For instance, if you return ten results, your response will grow from a few KB to at least 1 MB and your users will need to wait for that transfer to be done in order to see some results, whereas if you stored your images elsewhere, you could at least show the results very quickly to the user and let the browser handle the image retrieval asynchronously without having to care about it.

Although it is possible to store binary data in a search index you should avoid doing so for large binaries.
Storing binaries as in-memory fielddata (FieldCache) can make your system quickly running out of heap space whereas storing them as disk-based fielddata (DocValues) - making ElasticSearch behave more like a typical "column store" - will load the images of all documents to the file system cache. (DocValues are documented here).
Therefore, serving and caching images from nginx or Apache still seems the better choice.

Related

effect of loading files from mysql database on site performance

I've a website that all of it's images are loading from MySQL database.
Sometimes when many clients connect to the site, it slows down & I'm doing some optimizations on my server & codes to increase total performance.
As a candidate of change, I want to know that moving files out of the database & loading from static files instead of dynamically generated contents, can cause any significant improvement on my performance?
If yes, is there any benchmark available about it?
Storing images in a database is generally a bad idea, yet you see lots of people doing it without any good reason.
In 99% of cases I would recommend only storing file path references to the images in the database and have the images stored statically.
Here are some reasons why:
You don't tie up both the application server and the databases server transmitting images to the browser, you can offload this to web server itself which is more optimized for this.
If you have a sizeable site, you would eventually want to move static images onto a CDN anyway. You can't do this with files in database
You application will be slower when trying to insert images into the database, as you basically have to upload file to application server, then turn around and write into the DB as opposed to simply writing the path reference.
You DB itself could grow in size at a significant rate with enough images. You don't want to tie up your DB file system with a bunch of files that can be stored at low cost in other ways (like distributed file storage services like Amazon S3)
I have a similar situation to yours. The solution is simple: cache the content.
When you run the first time the query to get an image, ie:
SELECT * FROM images WHERE id = 1
Then simply cache the result to a file:
file_put_contents("image1.png",$row['data']);
Next time simply check whenever there is the file, this will avoid to query the database

Can large sets of binary data can be store in Database? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
database for huge files like audio and video
I'm seeking for the best (or at least good enough) way of storing large sets of binary data (images, videos, documents, etc.). The solution has to be scalable and can't get stuck after X amount of data.
I would like to have a one place for example MySQL database where all the data is kept. When one of web front ends needs it (on request) It can acquire it from the the DB and cache it permanently for later.
From this what I can see on http://dev.mysql.com/doc/refman/5.0/en/table-size-limit.html MySQL table can't store more then 4TB per table. Is there something more appropriate like perhaps nosql databases or perhaps it's better to store everything in files on one server and propagate it to all web frontends?
You typically don't want to store large files in a relational database -- it's not what they're designed for. I would also advise against using a NoSQL solution, since they're also typically not designed for this, although there are a few exceptions (see below).
Your last idea, storing the files on the filesystem (do note that this is what filesystems are designed for ;) is most likely the right approach. This can be somewhat difficult depending on what your scalability requirements are, but you will likely want to go with one of the following:
SAN. SANs provide redundant, highly-available storage solutions within a network. Multiple servers can be attached to storage provided by a SAN and share files between each other. Note that this solution is typically enterprise-oriented and fairly expensive to implement reliably (you'll need physical hardware for it as well as RAID controllers and a lot of disks, at minimum).
CDN. A content delivery network is a remote, globally distributed system for serving files to end users over the Internet. You typically put a file in a location on your server that is then replicated to the CDN for actual distribution. The way a CDN works is that if it doesn't have the file a user is requesting, it'll automatically try to fetch it from your server; once it has a copy of the file once, it caches the file for some period of time. It can be really helpful if you're normally constrained by bandwidth costs or processing overhead from serving up a huge number of files concurrently.
Cloud offering (Amazon S3, Rackspace Cloud Files). These are similar to a CDN, but work well with your existing cloud infrastructure, if that's something you're using. You issue a request to the cloud API to store your file, and it subsequently becomes available over the Internet, just like with a CDN. The major difference is that you have to issue any storage requests (create, delete, or update) manually.
If the number of files you're serving is small, you can also go with an in-house solution. Store files on two or three servers (perhaps have a larger set of servers and use a hash calculation for sharding if space becomes an issue). Build a small API for your frontend servers to request files from your storage servers, falling back to alternate servers if one is unavailable.
One solution that I almost forgot (although I haven't ever used beyond research purposes) is Riak's Luwak project. Luwak is an extension of Riak, which is an efficient distributed key/value store, that provides large file support by breaking the large files into consistently-sized segments and then storing those segments in a tree structure for quick access. It might be something to look into, because it gives you the redundancy, sharding, and API that I mentioned in the last paragraph for free.
I work as a (volunteer) developer on a fairly large website - we have some 2GB of images in 14000 images [that's clearly nowhere near a "world record"], and a database of 150MB of database. Image files are stored as separate files instead of as database objects, partly because we resize images for different usages - thumbnails, medium and large images are created programattically from the stored image (which may be larger than the "large" size we use for the site).
Whilst it's possible to store "blobs" (Binary Large Objects) in SQL databases, I don't believe it's the best solution. Storing a reference in the database, so that you can make a path/filename combination for the actual stored file [and possibly hiding the actual image behind some sort of script - php, jsp, ruby or whatever you prefer] would be a better solution.

Store image files or URLs in MySQL database? Which is better? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Storing Images in DB - Yea or Nay?
Images in database vs file system
I've been developing a web application using RIA technologies (Flex + PHP + MySQL + Ajax) and now I'm in a dilemma about image files.
I use some images in my Flex app, so I think "it could be awesome if I store them into database, and then retrieve from it; consecuently, maintain process should be more easy". But, here is my dilemma:
Should I store the physical URL of my images, or is going to be better if I store directly the image?
For example, should my Cars table looks like:
ID (autonumeric) | Source (text)
or like this?
ID (autonumeric) | Image (longblob or blob)
I know that here are cool people that can answer me this question, explaining me which is better and why :)
I personally recommend to Store Images in the database. Of course it both advantages and disadvantages.
Advantages of storing BLOB data in the database:
It is easier to keep the BLOB data synchronized with the remaining items in the row.
BLOB data is backed up with the database. Having a single storage system can ease administration.
BLOB data can be accessed through XML support in MySQL, which can return a base 64–encoded representation of the data in the XML stream.
MySQL Full Text Search (FTS) operations can be performed against columns that contain fixed or variable-length character (including Unicode) data. You can also perform FTS operations against formatted text-based data contained within image fields—for example, Microsoft Word or Microsoft Excel documents.
Disadvantages of Storing BLOB Data in the Database:
Carefully consider what resources might be better stored on the file system rather than in a database. Good examples are images that are typically referenced via HTTP HREF. This is because:
Retrieving an image from a database incurs significant overhead compared to using the file system.
Disk storage on database SANs is typically more expensive than storage on disks used in Web server farms.
As a general rule you wan't to keep your databases small, so they perform better (and backup better too). So if you can store only a filesystem reference (path + filename) or URL in the DB, that would be better.
Its probably a question of personal preference.
As a general rule its better to keep the database small. However when you come to enterprise applications they regulary add the images directly to the database. If you place them on the file system the db and your file system can get out of sync.
Larger CMS will regulary place those files in the db. However be aware that this requires a larger DB sizing when everything is growing...
When you are saving the url and name only, be sure that these won't change in the future.
With files stored in the database you can implement security easier and you don't have to worry about duplicate filenames.
I used to store the path into the URL, but then adding an additional web server to the mix proved less than ideal. For one thing, you'll have to share the path to where the images are stored. We were using NFS and it became slow after a while. We tried syncing the files from one web server to another but the process became cumbersome.
Having said that, I would store them in the DB. I've since moved all my image/file storage over to MongoDB. I know this doesn't satisfy your needs but we've tried it all (even S3) and we weren't happy with the other solutions. If we had to, I would definite throw them inside MySQL.
Personally, I've always stored the URL.
There's no real reason not to store the image directly in the database, but there are benefits to not storing it in the database.
You get more flexibility when you don't store the image in the database. You can easily move it around and just update the URL in the file. So, if you wanted to move the image from your webserver to a service such as Flickr or Amazon Web Services, it would just be as easy as updating the link to the new files. That also gives you easy access to content delivery networks so that the images are delivered to end users quicker.
I'd store the url, it's less data and that means a smaller database and faster data fetching from it ;)

Storing image files in Mongo database, is it a good idea?

When working with mysql, it is a bad idea to store images as BLOB in the database, as it makes the database quite large which is harmful for normal usage of the database. Then, it is better to save image files on disk and save link to them within the database.
However, I think this is different for MongoDB, as increasing the database file size has a negligible influence on performance (this is the reason that MongoDB can successfully handle billions of records).
Do you think it is better to save image files on MongoDB (as GridFS) to reduce number of files stored on the server; or still it is better to keep the database as small as possible?
The problem isn't so much that the database gets big, databases can handle that (although MongoDB isn't as good as many other in that respect). The problem is that to send the data to the client it first has to be moved into RAM by the database, then copied over to the application's memory, then handed off to the kernel to be sent through the socket. It's wasting lots of RAM and CPU cycles. The reason it's better to have large files in the filesystem is that it's easier to get around copying it, you can ask the kernel to stream the file from disk to the socket directly.
The downside of storing large files in the filesystem is that it's much harder to distribute. Using a database, and something like Mongo's GridFS makes it possible to scale out. You just have to make sure you don't copy the whole file into the application's memory at once, but a chunk at a time. Most web app frameworks have some support for sending chunked HTTP responses nowadays.
The answer is yes. Back in the old cave-man days, servers had mutable file systems you could change. This was great till we tried to scale things.
Cave-people nowadays build apps with immutable deployments. Heroku and Dokku are examples of this. Because the web app server has no state, they can be created, upgraded, scaled, and destroyed easily.
Since we still have files, we need to put them somewhere. There are several solutions: nfs, our database, someone elses database.
nfs is a 'network file system' which let's you do file i/o on network resources. If you're dealing with the network anyways, IMHO it doesn't add much value unless it's what you know already.
Our database - For MongoDB there are two options: (file > 16mb) ? GridFS : BinData
Someone elses database - Some are basic like Amazon S3 and some offer extra services like Cloudinary or Dropbox.
If you're on an big-budget enterprise team and someone spends 40 hrs a week taking care of servers then sure - use the file system. If you're building web apps that scale, putting files in the DB makes sense.
If you're concerned about performance:
1) Using a proxy (e.g. nginx) or a CDN to host your content for clients. Your server should just be serving cache misses.
2) Use streaming IO Nodeschool has a cool tutorial for Node.js.
Storing images is not a good idea in any DB, because:
read/write to a DB is always slower than a filesystem
your DB backups grow to be huge and more time consuming
access to the files now requires going through your app and DB layers
The last two are the real killers.
Source: Three things you should never put in your database.
So if you can make your application crafty, then better not to upload your pictures to MongoDB.
However, if you are close to deadline... and the database will be so small that it will not grow up a lot and its size will never exceed the available RAM on the machine running your application, then I think (as opposed to the author of the cited article), you may consider storing the images in MongoDB. It's simply, convenient, quick to implement and gives you some flexibility.
MongoDB's GridFS is designed for this sort of storage and is quite handy for storing image files across many different servers in a way that all servers can use them.

Storing image data in a MySQL database?

I am implementing a project that deals with a significant amount of images.
In your opinion what are the cons/pros of the following two approaches:
I need to store thousands of items, each item as several string properties and an image.
Each item as an ID (integer)
MyISAM tables
How would you store the images:
approach 1: store images into a directory and each image named as ID.jpg
approach 2: store images into the database as a binary BLOB
Using approach 1 I can access the image directly and that's it
<img src="same_directory/10.jpg" />
Using approach 2, I can still use the above HTML, but need to redirect that jpg access to a PHP script which will return the real image from the DB.
In terms of performance which one do you think its faster?
I am keen to approach 1.
advantages of approach 1:
Retrieving the flat file form webserver is more faster.
most of the web hosts likely to follow this approach.
the file system is faster for flat file storage.
advantages of approach 2:
All your data is kept in one place, if you migrate your
website/database the images will just be there
Its easier to sort/delete/etc...
Since you have to serve it via a PHP script, you can perform
additional things such as security if required, or image processing
(obviously you can do this with flat file too, but you have to make
sure the security cant be bypassed by leaving the images in a public
directory).
considering performance approach 1 is best to proceed.
Storing on filesystem is faster.
I'm be tempted to use the first approach as there's no real value in cluttering up the database with image data. (Fetching the data from the database will also be significantly slower than simply loading it off disk.)
However, as an suggestion you might not want to store the full path on disk to the image in the database table, to aid portability in the future. (i.e.: Just store the portion of the path and filename off a 'known' base folder.)
Keep the image files as image files on the server to lower your DB load and allow the server to handle caching etc.
Overall it really depends on the kind of images we're talking about. Small thumbnails (e.g. for file icons) wouldn't be that bad, but I wouldn't store whole images in the DB. In general I guess the file system approach would be faster.
Lets investigate problem on web browser.
When you load page with 10 pictures saved in database. You browser send new http request to the server. Each request init DB connection and server side script. Or just read static image from the file system.
what will be faster?
Other part - get data from file system or database. If we do not use cache for the database (but for 10 GB of images you should have 10 GB RAM to cache this data). Database and HTTP server reads data from file system in any case. But I think HTTP browser reads data faster then Database server.
Only one thing cons for the Database storage - very easy to migrate data from one server to other. But this is not matter for system performance.
And do not forget make path for images like /a/b/c/abc.jpg - it will be faster for big amount of images, then put all images in one directory.