I'm writing a forum website, but i am having a problem with designing the database to hold the posts right now. From what I have read, phpBB 3 stores the user posted data as BLOB, and some store data as text. Is there an advantage that one would have over another? What about max characters if one were to store the data as text? I have never retrieved BLOB data from database and parse it to text (or whatever data it is meant to be for that matter) but I think learning how to use it would be interesting.
The text data for the post will be utf8 encoded. I'm using MySQL database.
Also, how would you allow uploading images for the forum and should the images be stored as part of the database, like in a BLOB for example, should should it be stored as individual files?
Any recommendation/suggestions are welcomed.
Text and blob are nearly identical, the main difference is that you have to care about the encoding yourself when using blob. On the other hand you will be forced to use the encoding of a text field...
Storing files in the database has some advantages and disatvantages:
Advantages:
Backups and switches to another server are easier to achieve
No need to add another method to access the files beside the database
Disadvantages:
Accessing an image means retrieving it's bytes and maybe storing that as temporary local files and may be more complicated
File systems are simply faster due to lower overhead
The decission is yours, the most opinions that I have heard so far are to store the pictures as files; but that are opinions, try to base your decision on your projects needs.
Related
I have a question about the blob data type in MySQL.
I read that the data type can be used to store files. I also read that an alternative is to store the file on disk and include a pointer to its location in the database (via a varchar column).
But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents. So is that any different than storing a pointer to a file on the file system?
I read that the data type can be used to store files.
According to MySQL manual page on Blob, A BLOB is a binary large object that can hold a variable amount of data.
Since it's a data type specific to store binary data it's common to use it to store files in binary format, being storing image files a very common use on web applications.
For web applications this would mean that you would first need to convert your file into binary format and then store it, and every time you need to retrieve your file you would need to do the reverse process of converting them back to it's original format.
Besides that, storing large amount of data in your db MAY slow it down. Specially in systems that are not dedicated only to host a database.
I also read that an alternative is to store the file on disk and include a pointer to its location in the database
Bearing in mind all above considerations a common practice for web applications is to store your files elsewhere than your MySQL and then simply store it's path on your database. This approach MAY speed up your database when dealing with large amount of data.
But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents.
In fact that would depend on what storage engine you are using since every engine treats data and stores it in different ways. For the InnoDB engine, which is suited for relational database you may want to read this article from MySQL Performance blog on how the blob is stored in MySQL.
But in abstract, on MySQL 5 and forward the blob is stored as following:
Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them.
So you are probably thinking now that the right way to go is to store them as separate file, but there are some advantages of using blob to store data, the first one (in my opinion) is the backup. I manage a small server and I had to create another subroutine only to copy my files stored as paths to another storage disk (We couldn't afford to buy a decent tape backup system). If I had designed my application to use blobs a simple mysqldump would be everything that I needed to backup my whole database.
The advantage of storing blobs for backups are better discussed on this post where the person who answered had a similar problem than mine.
Another advantage is security and the easiness of managing permission and access. All the data inside your MySQL server is password protected and you can easily manage permissions for your users about who access what and who doesn't.
In a application which relies on MySQL privileges system for authentication and use. It's certain a plus since it would be a little harder for let's say an invader to retrieve an image (or a binary file like a zipped one) from your disk or an user without access privileges to access it.
So I'd say that
If you gonna manage your MySQL and all the data you have in it and must do regular backups or intend to change or even consider a future change of OS, and have a decent hardware and optimized your MySQL to it, go for BLOB.
If you will not manage your MySQL (as in a web host for example) and doesn't intend to change OS or make backups, stick with varchar columns pointing to your files.
I hope it helped. Cheers
If you store data is BLOB field, you are making it part of your object abstraction.
BLOB advantages:
Should you want to remove row with BLOB, or remove it as part of master/slave table relationship or maybe the whole table hierarchy, your BLOB is handled automatically and has same lifetime as any other object in database.
Your scripts do not have a need to access anything but database to get everything they require. In many situations, having direct file access open whole can of worms on how to bypass access or security restrictions. For example, with file access, they may have to mount filesystems which contain actual files. But with BLOB in database, you only have to be able to connect to database, no matter where you are.
If you store it in file and file is replaced, removed or no longer accessible, your database would never know - in effect, you cannot guarantee integrity. Also, it is difficult to reliably support multiple versions when using files. If you use and depend on transactions, it becomes almost impossible.
File advantages:
Some databases handle BLOBs rather poorly. For example, while official BLOB limit in MySQL is 4GB, but in reality it is only 1MB in default configuration. You can increase this to 16-32MB by tweaking both client and server configuration to increase MySQL command buffer, but this has a lot of other implications in terms of performance and security.
Even if database does not have some weird size limits, it always will have some overhead in storing BLOB compared to just a file. Also, if BLOB is large, some databases do not provide interface to access blob piece by piece, or stream it, which can be large impediment for your workflow.
In the end, it is up to you. I typically try to keep it in BLOB, unless this creates unreasonable performance problems.
Yes, MySQL blobs that don't fit within the same page as a row get stored on overflow pages Note that some blobs are small enough that they're stored with the rest of the row, like any other column. The blob pages are not adjacent to the page their row is stored on, so they may result in extra I/O to read them.
On the other hand, just like with any other page type, blob pages can occupy memory in the InnoDB buffer pool, so reading the blobs subsequently is very fast even if they are on separate pages. Files can be cached by the operating system, but typically they're read from disk.
Here are a few other factors that may affect your decision:
Blobs are stored logically with a row. This means if you DELETE the row, the associated blob is deleted automatically. But if you store the blob outside the database, you end up with orphaned blob files after you delete rows from the database. You have to do manual steps to find and delete these files.
Blobs stored in the row also follow transaction semantics. For instance, a new blob or an updated blob is invisible to other transactions until you commit. You can also roll back a change. Storing blobs in files outside the database makes this a lot harder.
When you back up a database containing blobs, the database is a lot bigger of course, but when you backup, you get all the data and associated blobs in one step. If you store blobs externally, you have to back up the database and also back up the filesystem where you store blob files. If you need to ensure that the data and blobs are captured from one instant in time, you pretty much need to use some kind of filesystem snapshots.
If you use replication, the only automatic way of ensuring the blobs get copied to the replication slave automatically is to store blobs in the database.
Filesystem access will be faster than through the database. Blobs columns have some disadvantages in terms of indexing/sorting etc, which you could do with your filename column if you wished to in the future.
The database can also grow quickly with large blobs and then tasks like backing up become slower. I would go with a file location in database with the physical storage on the file system.
The better approach is to store your file in the filesystem folder and point to their paths through a varchar field in the database. One of the drawbacks of saving files in the database is slowing it or reducing its performance.
The common method to store images in a database is to convert the image to base64 data before storing the data. This process will increase the size by 33%. Alternatively it is possible to directly store the image as a BLOB; for example:
$image = new Imagick("image.jpg");
$data = $image->getImageBlob();
$data = $mysqli->real_escape_string($data);
$mysqli->query("INSERT INTO images (data) VALUES ('$data')");
and then display the image with
<img src="data:image/jpeg;base64,' . base64_encode($data) . '" />
With the latter method, we save 1/3 storage space. Why is it more common to store images as base64 in MySQL databases?
UPDATE: There are many debates about advantages and disadvantages of storing images in databases, and most people believe it is not a practical approach. Anyway, here I assume we store image in database, and discussing the best method to do so.
I contend that images (files) are NOT usually stored in a database base64 encoded. Instead, they are stored in their raw binary form in a binary column, blob column, or file.
Base64 is only used as a transport mechanism, not for storage. For example, you can embed a base64 encoded image into an XML document or an email message.
Base64 is also stream friendly. You can encode and decode on the fly (without knowing the total size of the data).
While base64 is fine for transport, do not store your images base64 encoded.
Base64 provides no checksum or anything of any value for storage.
Base64 encoding increases the storage requirement by 33% over a raw binary format. It also increases the amount of data that must be read from persistent storage, which is still generally the largest bottleneck in computing. It's generally faster to read less bytes and encode them on the fly. Only if your system is CPU bound instead of IO bound, and you're regularly outputting the image in base64, then consider storing in base64.
Inline images (base64 encoded images embedded in HTML) are a bottleneck themselves--you're sending 33% more data over the wire, and doing it serially (the web browser has to wait on the inline images before it can finish downloading the page HTML).
On MySQL, and perhaps similar databases, for performance reasons, you might wish to store very small images in binary format in BINARY or VARBINARY columns so that they are on the same page as the primary key, as opposed to BLOB columns, which are always stored on a separate page and sometimes force the use of temporary tables.
If you still wish to store images base64 encoded, please, whatever you do, make sure you don't store base64 encoded data in a UTF8 column then index it.
Pro base64: the encoded representation you handle is a pretty safe string. It contains neither control chars nor quotes. The latter point helps against SQL injection attempts. I wouldn't expect any problem to just add the value to a "hand coded" SQL query string.
Pro BLOB: the database manager software knows what type of data it has to expect. It can optimize for that. If you'd store base64 in a TEXT field it might try to build some index or other data structure for it, which would be really nice and useful for "real" text data but pointless and a waste of time and space for image data. And it is the smaller, as in number of bytes, representation.
Just want to give one example why we decided to store image in DB not files or CDN, it is storing images of signatures.
We have tried to do so via CDN, cloud storage, files, and finally decided to store in DB and happy about the decision as it was proven us right in our subsequent events when we moved, upgraded our scripts and migrated the sites serveral times.
For my case, we wanted the signatures to be with the records that belong to the author of documents.
Storing in files format risks missing them or deleted by accident.
We store it as a blob binary format in MySQL, and later as based64 encoded image in a text field. The decision to change to based64 was due to smaller size as result for some reason, and faster loading. Blob was slowing down the page load for some reason.
In our case, this solution to store signature images in DB, (whether as blob or based64), was driven by:
Most signature images are very small.
We don't need to index the signature images stored in DB.
Index is done on the primary key.
We may have to move or switch servers, moving physical images files to different servers, may cause the images not found due to links change.
it is embarrassed to ask the author to re-sign their signatures.
it is more secured saving in the DB as compared to exposing it as files which can be downloaded if security is compromised. Storing in DB allows us better control over its access.
any future migrations, change of web design, hosting, servers, we have zero worries about reconcilating the signature file names against the physical files, it is all in the DB!
AC
I recommend looking at modern databases like NoSQL and also I agree with user1252434's post. For instance I am storing a few < 500kb PNGs as base64 on my Mongo db with binary set to true with no performance hit at all. Mongo can be used to store large files like 10MB videos and that can offer huge time saving advantages in metadata searches for those videos, see storing large objects and files in mongodb.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Storing Images in DB - Yea or Nay?
Images in database vs file system
I've been developing a web application using RIA technologies (Flex + PHP + MySQL + Ajax) and now I'm in a dilemma about image files.
I use some images in my Flex app, so I think "it could be awesome if I store them into database, and then retrieve from it; consecuently, maintain process should be more easy". But, here is my dilemma:
Should I store the physical URL of my images, or is going to be better if I store directly the image?
For example, should my Cars table looks like:
ID (autonumeric) | Source (text)
or like this?
ID (autonumeric) | Image (longblob or blob)
I know that here are cool people that can answer me this question, explaining me which is better and why :)
I personally recommend to Store Images in the database. Of course it both advantages and disadvantages.
Advantages of storing BLOB data in the database:
It is easier to keep the BLOB data synchronized with the remaining items in the row.
BLOB data is backed up with the database. Having a single storage system can ease administration.
BLOB data can be accessed through XML support in MySQL, which can return a base 64–encoded representation of the data in the XML stream.
MySQL Full Text Search (FTS) operations can be performed against columns that contain fixed or variable-length character (including Unicode) data. You can also perform FTS operations against formatted text-based data contained within image fields—for example, Microsoft Word or Microsoft Excel documents.
Disadvantages of Storing BLOB Data in the Database:
Carefully consider what resources might be better stored on the file system rather than in a database. Good examples are images that are typically referenced via HTTP HREF. This is because:
Retrieving an image from a database incurs significant overhead compared to using the file system.
Disk storage on database SANs is typically more expensive than storage on disks used in Web server farms.
As a general rule you wan't to keep your databases small, so they perform better (and backup better too). So if you can store only a filesystem reference (path + filename) or URL in the DB, that would be better.
Its probably a question of personal preference.
As a general rule its better to keep the database small. However when you come to enterprise applications they regulary add the images directly to the database. If you place them on the file system the db and your file system can get out of sync.
Larger CMS will regulary place those files in the db. However be aware that this requires a larger DB sizing when everything is growing...
When you are saving the url and name only, be sure that these won't change in the future.
With files stored in the database you can implement security easier and you don't have to worry about duplicate filenames.
I used to store the path into the URL, but then adding an additional web server to the mix proved less than ideal. For one thing, you'll have to share the path to where the images are stored. We were using NFS and it became slow after a while. We tried syncing the files from one web server to another but the process became cumbersome.
Having said that, I would store them in the DB. I've since moved all my image/file storage over to MongoDB. I know this doesn't satisfy your needs but we've tried it all (even S3) and we weren't happy with the other solutions. If we had to, I would definite throw them inside MySQL.
Personally, I've always stored the URL.
There's no real reason not to store the image directly in the database, but there are benefits to not storing it in the database.
You get more flexibility when you don't store the image in the database. You can easily move it around and just update the URL in the file. So, if you wanted to move the image from your webserver to a service such as Flickr or Amazon Web Services, it would just be as easy as updating the link to the new files. That also gives you easy access to content delivery networks so that the images are delivered to end users quicker.
I'd store the url, it's less data and that means a smaller database and faster data fetching from it ;)
I'm writing a webcrawler in Python that will store the HTML code of a large set of pages in a MySQL database. I'd like to make sure my methods of storage and processing are optimal before I begin processing data. I would like to:
Minimize storage space used in the database - possibly by minifying HTML code, Huffman encoding, or some other form of compression. I'd like to maintain the possibility of fulltext searching the field - I don't know if compression algorithms like Huffman encoding will allow this.
Minimize the processor usage necessary to encode and store large volumes of rows.
Does anyone have any suggestions or experience in this or a similar issue? Is Python the optimal language to be doing this in, given that it's going to require a number of HTTP requests and regular expressions plus whatever compression is optimal?
If you don't mind the HTML being opaque to MySQL, you can use the COMPRESS function to store the data and UNCOMPRESS to retrieve it. You won't be able to use the HTML contents in a WHERE clause (using, e.g., LIKE).
Do you actully need to store the source in the database?
Trying to run 'LIKE' queries against the data is going to suck big time anyway.
Store the raw data on the file system, as standard files. Just dont stick them all in one folder. use hashes of the id, to store them in predictable folders.
(while of course it is perfectly possible to store the text in the database, it bloats the size of your database, and makes it harder to work with. backups are (much!) bigger, changing storage engine, becomes more painful etc. Scaling your filesystem, is usually just a case of adding another harddisk. That doesnt work so easily with a database - you start needing to shard)
... to do any sort of searching on the data, you looking at building an index. I only have experience with SphinxSearch, but that allows you to specify a filename in the input database.
Just one little question.
Shall i put the HTML textarea content into a DB or a flat file? The textarea content can be very long (like an article or a project).
Thanks for helping.
Silvio.
Use text type instead of string, varchar type in db.
Note:-Text fields in mysql are limited to 65kb
EDIT
OR You should look at using MySQL's LONGBLOB or LONGTEXT data type. They can store up to 4 gigabytes of binary or textual data, respectively.
Here are some pros and cons off the top of my head (I've done it both ways with varying levels of success):
Database Pros
Well known model for reading / writing data.
If the rest of your application is based on a database, this solution fits nicely.
The db has concurrency mechanisms already in place.
As long as you back up your database, your documents are backed up to (and in sync with the state of your database).
Database Cons
Theoretically, it is less efficient to pull a file from a database than directly from the file system. The db server has to read from disk, translate into its network protocol, etc.
The serving of these files from the db is a potential scalability bottleneck if only using one database server.
Flat File Pros
Dirt simple operations: Write, Delete, Read.
Presumably low overhead. If serving from web, the server can just be pointed to the files.
Flat File Cons
You have to deal with concurrency operations (what if one user wants to write to the file while another is reading, etc). This may or may not be an issue in your case.
It's one more type of information to back up / keep in sync / maintain.
You have to deal with security of the files as a separate issue from the rest of the data in the database.
Database without a doubt.
There's not much point in flat files because when you start expanding which you will do then you have to create more files and it's easier to lose track.
Start with a database and when you grow you can do things much faster and more complex selections by using structured query language the name says it all.