From blob to filesystem. Which is the best location to store images? - mysql

I have read a lot of discussions about this topic and we all agree that, in most cases, the best solution is to save the images on the filesystem, and record the path in the database.
For this reason, we are redesigning our web application and enjoy all the advantages of this type of solution.
My question however is:
what is the best location where to save the images, especially to avoid problems of version control and deploying? The same way as the database, I think it's better that these files are not under version control, right?
For this I would like to get an advice on what is the best location where to save the file system:
Inside the Working Copy saying to version control to ignore them?
Outside the working copy?
In the folder where the database stores the data so you have "all the data together"?
are 3000 images of about 2mb
the control version is svn but soon will migrate to git
the db is MySQL with InnoDB tables with innodb_file_per_table

Related

Right way to manage application generated files

tl;dr
In my node.js application I create pdf documents. What is the best/right way to save them? Right now I use node.js fileserver and shell.js to do it.
I am working on a node.js web application to manage apartments and tenants for learning purpose and on some point I create PDF Documents that I want to save under a path
/documents/building_name/apartment_name/tenant_name/year/example.pfd
Now if the user wants to change the building, apartment or tenant name via an http PUT request I change the database but also the want to change the path.
Well both works but I can't write good tests for these functions.
Now a friend told me that it's a bad practice to save documents on a file server and I better should use BLOB.
On the other side google doesn't really agree on using blobs
So what is the right way to save documents?
Thanks
Amit
You should first define a source of truth. Unless you're legally obliged to keep copies of those files and they are not being accessed very often, I wouldn't even bother storing those at all and just generate them upon request.
If not, keep the DB clean, blobs will make it huge. Put them into cold storage (again assuming they are not being accessed too frequently) without those paths. If the paths are reliant on often changing information, that can't be performant for neither the file server nor your system.
Instead store a revision number in your DB that the file can be found under and limit the path structure to information that rarely change.
Like {building}/{apartment}/{tenant}_{revision}.pfd
That - depending on your backup structure - will allow you to time-travel if necessary and doesn't force a re-index all the time.
Note: I don't know too much about your use case.

MySQL & GIT - Is it possible to use GIT versioning to merge MySQL databases? [duplicate]

I am a WordPress Designer/Developer, who is getting more and more heavily involved with using version control, notably Git, though I do use SVN for some projects. I am currently using Beanstalk for my remote repo.
Adding all of the WordPress files to my repo is no problem, if I wanted to I know I could .gitignore the wp-config file, but since I'm the only developer, currently, and these projects are closed source, it really makes little sense.
WordPress relies heavily on the database, as any CMS does, to keep textual content, and many settings depending on the specific plugin/theme configuration I'm using. I'm wondering what the best way of using version control on the database would be, if it's even possible. I guess I could do a SQL dump, though my MySQL server is running on Windows (read as: I don't know how to do it), and then add the SQL dump to my repository. But when I push something live, that poses huge security threats.
Is there an accepted practice of doing this?
You can backup your database within a git repository. Of course, if you place the data into git in a binary form, you will lose all of git's ability to efficiently store the data using diffs (changes). So the number one best practice is this: store the data in a text serialised format.
mysqldump is a suitable program to help you do this. It isn't perfect though. If anything disturbs the serialisation order of items (eg. as a result of creating new tables, etc.) then artificial breaks will enter into the diff. That will decrease the efficiency of storage. You could write a custom serialiser to serialise changes only -- but then you are doing the hard work that git is already good at. Just use the sql dump.
That being said, what you are wanting to do isn't what devs normally mean when they talk about putting the database in git. For instance, if you read the link posted by #eggyal (link to codinghorror) you will see that what is actually placed in git are the scripts needed to generate the initial database. There may be additional scripts, like those to populate the database data with a clean state, or to populate it with testing data. All such sql scripts are text files, and pretty much the same format as the sql dump you would get from mysqldump. So there's no reason you can't do it that way with your day-to-day data as well.
There are not many software available to version control databases like MySQL and MongoDB.
But one is under development and the beta version is about to be launched soon. Check out Klonio - Version Control for databases
The article How to Sync A Local & Remote WordPress Blog Using Version Control gives advice on how to automate sync between two instances (development, production) of a WordPress blog using Mercurial. Is mentions that for this scenario, Git and Mercurial are very similar.
Step 4 (Synchronizing The Databases) is of interest here.
The database content will be exported to a file that is tracked by the revision control. Each time we pull changes, the database content will be replaced by this file, making our database up-to-date.
Then, it elaborates on conflicts and the scripting part of the job.
There is a version control tutorial in Mercurial out there, if you're not familiar with it.
If you are only interested in schema changes under version control, there is a nice stuff SqlRog. It extracts schema into the project files that can be put under the git.
Be aware that Wordpress stores all news feed content in the database, so even if you don't make any changes, there will be a lot of changing content.

Using version control (Git) on a MySQL database

I am a WordPress Designer/Developer, who is getting more and more heavily involved with using version control, notably Git, though I do use SVN for some projects. I am currently using Beanstalk for my remote repo.
Adding all of the WordPress files to my repo is no problem, if I wanted to I know I could .gitignore the wp-config file, but since I'm the only developer, currently, and these projects are closed source, it really makes little sense.
WordPress relies heavily on the database, as any CMS does, to keep textual content, and many settings depending on the specific plugin/theme configuration I'm using. I'm wondering what the best way of using version control on the database would be, if it's even possible. I guess I could do a SQL dump, though my MySQL server is running on Windows (read as: I don't know how to do it), and then add the SQL dump to my repository. But when I push something live, that poses huge security threats.
Is there an accepted practice of doing this?
You can backup your database within a git repository. Of course, if you place the data into git in a binary form, you will lose all of git's ability to efficiently store the data using diffs (changes). So the number one best practice is this: store the data in a text serialised format.
mysqldump is a suitable program to help you do this. It isn't perfect though. If anything disturbs the serialisation order of items (eg. as a result of creating new tables, etc.) then artificial breaks will enter into the diff. That will decrease the efficiency of storage. You could write a custom serialiser to serialise changes only -- but then you are doing the hard work that git is already good at. Just use the sql dump.
That being said, what you are wanting to do isn't what devs normally mean when they talk about putting the database in git. For instance, if you read the link posted by #eggyal (link to codinghorror) you will see that what is actually placed in git are the scripts needed to generate the initial database. There may be additional scripts, like those to populate the database data with a clean state, or to populate it with testing data. All such sql scripts are text files, and pretty much the same format as the sql dump you would get from mysqldump. So there's no reason you can't do it that way with your day-to-day data as well.
There are not many software available to version control databases like MySQL and MongoDB.
But one is under development and the beta version is about to be launched soon. Check out Klonio - Version Control for databases
The article How to Sync A Local & Remote WordPress Blog Using Version Control gives advice on how to automate sync between two instances (development, production) of a WordPress blog using Mercurial. Is mentions that for this scenario, Git and Mercurial are very similar.
Step 4 (Synchronizing The Databases) is of interest here.
The database content will be exported to a file that is tracked by the revision control. Each time we pull changes, the database content will be replaced by this file, making our database up-to-date.
Then, it elaborates on conflicts and the scripting part of the job.
There is a version control tutorial in Mercurial out there, if you're not familiar with it.
If you are only interested in schema changes under version control, there is a nice stuff SqlRog. It extracts schema into the project files that can be put under the git.
Be aware that Wordpress stores all news feed content in the database, so even if you don't make any changes, there will be a lot of changing content.

Dealing with mass images, some small - some large, in spring/java application using mysql

I was wondering what the best pattern was to handle the management of images these days when using spring/java and mysql.
I have several options. Some of the
images are just small avatars for
the users. Is it fine to put these
directly into mysql? Or use the file
system?
For the larger images, is file
system pretty much the only option,
and then use mysql to store the
location on the file system?
Where is a good spot to put them on
a linux server? /var/files/images?
Since the files are hidden from the
war deployment directory, what is
the best way to stream them? Use
some kind of a file output stream as
the response body for an http
request?
Also, do I have to develop all of
the file management stuff myself,
like cleaning up unused files and
the like?
What about image security? Some images should not be accessed by everyone. I think I'd need to use a separate url with Spring security checking the current user for this.
I'd appreciate advice on all of these questions. Thanks.
You could use MySQL, and that would have the advantage of centralization and easy cleanup, but IMHO it's a waste of the database's resources if you plan to scale.
For data like images where everything is public, consider something like Amazon S3 which allows you to serve images directly from S3's web servers. If you plan to host everything yourself, just serve from a directory. Just remember to turn directory listings off :)

whats best way to create documents Archive of images in the database's?

What is the best way to create an Archive of image documents in the database ?
Given we have about 2-10 million records and each record includes 2-4 images and about 20 text fields , what is the best way for create this archive so that we have good speed and high security for data?
Also, what database is good for this project?
Definitely use the file system as Minor suggested.
One option is SQL Server FILESTREAM. See http://msdn.microsoft.com/en-us/library/cc949109.aspx.
Use file system storage for archive image. You must save link in DB for the image file. And if you use a HTTP content you can use the cache proxy server such as Squid, Nginx, etc.
More questions for you:
How dynamic is the data? Do you store it once and never change it or it gets frequently changed?
Do you need versioning for the documents or the latest version overwrites the previous and that's it.
Are the documents always edited using one application or they can be changed outside (ex: using Word)
Are the documents related to other "non-document" data (database rows) or is it the only thing that you need to store?
File system won't offer any real security, so I would discount that straight off.
In Oracle there is built-in image support through the ORDImage type.
Check out Marcel's blog as he, and the Piction company, do a lot of work in this area and he has lots of useful material to download.
You can use control downloads. Look at http://kovyrin.net/2006/11/01/nginx-x-accel-redirect-php-rails/lang/en/