I'll use the read operation many times.
According to some tests I did, MySQL is fatest in reading one single row than MongoDB. Is it better to use MySQL?
also, i heard someong saying that sometimes is better storing session in files, is it true?
If you tested it and it was faster then why are you not trusting your own research? You can save the data in a relational database like MySQL, a JSON document store like MongoDB, or a flat file (the flat file may run into i/o contention to open &n read the file). Go with what you are comfortable with and trust your research.
For a project we are working with an several external partner. For the project we need access to their MySQL database. The problem is, they cant do that. Their databse is hosted in a managed environment where they don't have much configuration possibilities. And they dont want do give us access to all of their data. So the solution they came up with, is the federated storage engine.
We now have one table for each table of their database. The problem is, the amount of data we get is huge and will even increase in the future. That means there are a lot of inserts performed on our database. The optimal solution for us would be to intercept all incoming MySQL traffic, process it and then store it in bulk. We also thought about using someting like redis to store the data.
Additionnaly, we plan to get more data from different partners. They will potentialy provide us the data in different ways. So using redis would allow us, to have all our data in one place.
Copying the data to redis after its stored in the mysql database is not an option. We just cant handle that many inserts and we need the data as fast as possible.
TL;DR
is there a way to pretend to be a MySQL server so we can directly process data received via the federated storage engine?
We also thought about using the blackhole engine in combination with binary logging on our side. So incoming data would only be written to the binary log and wouldn't be stored in the database. But then performance would still be limited by Disk I/O.
I am going to work on a distributed application. The data is going to be streamed and analyzed. Also, the end users need to have access to the last streamed data as quickly as possible. Also, I need to keep back-up of the data as well as worked on it.
My initial idea is as follows:
1) Keep redis as a cache to hold the last entries.
2) MySQL - storing data
3) Hadoop/Hbase - convenient way of storing data to analyze it.
What do you think of such a setup? Would you recommend anything else?
Thanks!
I think a combination of Spark and Cassandra would be an excellent way to go. Cassandra can easily handle the data throughput and storage. Spark provides lightning quick analytics.
I've only recently started to deal with database systems.
I'm developing an ios app that will have a local database (sqlite) and that will have to periodically update the internal database with the contents of a database stored in a webserver (mySQL). My questions is, whats the best way to fetch the data from the webserver and store it in the local database? There are some options that came to me, don't know if all of them are possible
Webserver->XML/JSON->Send it->Locally convert and store in local database
Webserver->backupFile->Send it->Feed it to the SQLite db
Are there any other options? Which one is better in terms of amount of data taken?
Thank you
The XML/JSON route is by far the simplest while providing sufficient flexibility to handle updates to the database schema/older versions of the app accessing your web service.
In terms of the second option you mention, there are two approaches - either use an SQL statement dump, or a CSV dump. However:
The "default" (i.e.: mysqldump generated) backup files won't import into SQLite without substantial massaging.
Using a CSV extract/import will mean you have considerably less flexibility in terms of schema changes, etc. so it's probably not a sensible approach if the data format is ever likely to change.
As such, I'd recommend sticking with the tried and tested XML/JSON approach.
In terms of the amount of data transmitted, JSON may be smaller than the equivalent XML, but it really depends on the variable/element names used, etc. (See the existing How does JSON compare to XML in terms of file size and serialisation/deserialisation time? question for more information on this.)
I am struggling to decide if I should be using the MySQL blob field type in an upcoming project I have.
My basic requirements are, there will be certain database records that can be viewed and have multiple files uploaded and "attached" to those records. Seeing said records can be limited to certain people on a case by case basis. Any type of file can be uploaded with virtually no restriction.
So looking at it one way, if I go the MySQL route, I don't have to worry about virus's creeping up or random php files getting uploaded and somehow executed. I also have a much easier path for permissioning and keeping data tied close to a record.
The other obvious route is storing the data in a specific folder structure outside of the webroot. in this case I'd have to come up with a special naming convention for folders/files to keep track of what they reference inside the database.
Is there a performance hit with using MySQL blob field type? I'm concerned about choosing a solution that will hinder future growth of the website as well as choosing a solution that wont be easy to maintain.
Is there a performance hit with using MySQL blob field type?
Not inherently, but if you have big BLOBs clogging up your tables and memory cache that will certainly result in a performance hit.
The other obvious route is storing the data in a specific folder structure outside of the webroot. in this case I'd have to come up with a special naming convention for folders/files to keep track of what they reference inside the database.
Yes, this is a common approach. You'd usually do something like have folders named after each table they're associated with, containing filenames based only on the primary key (ideally a integer; certainly never anything user-submitted).
Is this a better idea? It depends. There are deployment-simplicity advantages to having only a single data store, and not having to worry about giving the web user write access to anything. Also if there might be multiple copies of the app running (eg active-active load balancing) then you need to synchronise the storage, which is much easier with a database than it is with a filesystem.
If you do use the filesystem rather than a blob, the question is then, do you get the web server to serve it by pointing an Alias at the folder?
+ is super fast
+ caches well
- extra server config: virtual directory; needs appropriate file extension to return desired Content-Type
- extra server config: need to add Content-Disposition: attachment/X-Content-Type-Options headers to stop IE sniffing for HTML as part of anti-XSS measures
or do you serve the file manually by having a server-side script spit it out, as you would have to serving from a MySQL blob?
- is potentially slow
- needs a fair bit of manual If-Modified-Since and ETag handling to cache properly
+ can use application's own access control methods
+ easy to add correct Content-Type and Content-Disposition headers from the serving script
This is a trade-off there's not one globally-accepted answer for.
If your web server will be serving these uploaded files over the web, the performance will almost certainly be better if they are stored on the filesystem. The web server will then be able to apply HTTP caching hints such as Last-Modified and ETag which will help performance for users accessing the same file multiple times. Additionally, the web server will automatically set the correct Content-Type for the file when serving. If you store blobs in the database, you'll end up implementing the above mentioned features and more when you should be getting them for free from your web server.
Additionally, pulling large blob data out of your database may end up being a performance bottleneck on your database. Also, your database backups will probabaly be slower because they'll be backing up more data. If you're doing ad-hoc queries during development, it'll be inconvenient seeing large blobs in result sets for select statements. If you want to simply inspect an uploaded file, it will be inconvenient and roundabout to do so because it'll be awkwardly stored in a database column.
I would stick with the common practice of storing the files on the filesystem and the path to the file in the database.
In my experience storing a BLOB in MySQL is OK, as long you store only the blob in one table, while other fields are in another (joined) table. Conversely, searching in the fields of a table with a few standard fields and one blob field with 100 MB of data can slow queries dramatically.
I had to change the data layer of a mailing app for this issue where emails were stored with content in the same table as date sent, email addresses, etc. It was taking 9 secs to search 10000 emails. Now it takes what it should take ;-)
Data should be stored in one consistent place: the database.
This performance and Content-Type thing is not an issue at all, because there is nothing stopping you from caching those BLOB fields to the local web server and serving it from there as it is requested for the first time. You do not need to access that table on every page view.
This file system cache can be emptied out at any moment, which will only impact performance temporarily as it is being refilled automagically. It will also enable you to use one database and many web servers as your application grows, they will simply all have a local cache on the file system.
Many people recommend against storing file attachments (usually this applies to images) in blobs in the database. Instead they prefer to store a pathname as a string in the database, and store the file somewhere safe on the filesystem. There are some merits to this:
Database and database backups are smaller.
It's easier to edit files on the filesystem if you need to work with them ad hoc.
Filesystems are good at storing files. Databases are good at storing tuples. Let each one do what it's good at.
There are counter-arguments too, that support putting attachments in a blob:
Deleting a row in a database automatically deletes the associated attachment.
Rollback and transaction isolation work as expected when data is in a row, but not when some part of the data is on the filesystem.
Backups are simpler if all data is in the database. No need to worry about making consistent backups of data that's changing concurrently during the backup procedure.
So the best solution depends on how you're going to be using the data in your application. There's no one-size-fits-all answer.
I know you tagged your question with MySQL, but if folks reading this question use other brands of RDBMS, they might want to look into BFILE when using Oracle, or FILESTREAM when using Microsoft SQL Server 2008. These give you the ability store files outside the database but access them like they're part of a row in a database table (more or less).
Large volumes of data will eventually take their toll on performance. MS SQL 2008 has a specialized way of storing binary data in the file system:
http://msdn.microsoft.com/en-us/library/cc949109.aspx
I would employ the similar approach too for your project too.
You can create a FILES table that will keep information about files such as original names for example. To safely store files on the disk rename them using for example GUIDs. Store new file names in your FILES table and when user needs to download it you can easily locate it on disk and stream it to user.
In my opinion storing files in database is bad idea. What you can store there is id, name, type, possibly md5 hash of file, and date inserted. Files can be uploaded in to folder outside public location. Also you should be concern that it is not advised to keep more than 1000 files in one folder. So what you have to create new folder each time file id is increased by 1000.