Related
I'm having trouble getting a clear understanding of what MySQL 5.6 is introducing w/r/t memcache.
As I understand it, memcache by itself is essentially a huge, shared, memory-resident hash table that is managed by a server, memcached. In particular, it knows nothing about a persistent data store, and offers no services in that regard. It simply knows about keys and values (like a Perl hash).
What I think mySQL 5.6 introduces is a NoSQL API, whereby mySQL clients can request data from the mySQL server by key, rather than by a SELECT statement. (And similarly, they can perform updates with key=value pairs). MySQL uses memcached to cache these in memory as a performance boost, but also takes care of things like writing updates back to the database before they age out of the cache, etc.
In other words, the use of memcached is an implementation detail of the mySQL 5.6 NoSQL feature, and is not something the application programmer needs to be aware of.
I'd welcome any corrections or amplification to my understanding.
Thanks,
Chap
I think it's quite simple (from the official documentation):
I disagree with your last sentence, the application programmer has to be really aware of the memcache plugin because having it onboard of the MySQL server means that he can decide (maybe he will be forced to) access data through a memcached language interface or via the SQL interface
To better understand the impact of this plugin onto an app design you should know that there are 3 configuration tables used by MySQL for a proper memcached management; understanding how the "cache_policies" works will shade some light to some of your doubts:
Table cache_policies specifies whether to use InnoDB as the data store of memcached (innodb_only), or to use the traditional memcached engine as the backstore (cache-only), or both (caching). In the last case, if memcached cannot find a key in memory, it searches for the value in an InnoDB table.
here is the link: innodb-memcached-internals
This quote above means that, depending on what you decided for a specific key-value, you will have different application scenarios :
innodb_only -> means that you can query the data via a sql interface or via a memcached interface, here is a link to some memcached language interface examples memcached-interfaces
cache-only -> means that you should query the data via the memchached interface only
caching -> means that you can use both the interfaces (note that the storage mechanism slightly changes)
Of course this latter configuration decision is strictly related to your specific needs
I don't really have a complete answer for you I'm afraid, as I too am struggling to find the detail I require before toying around with it.
That said however there is one important point which I have managed to uncover that you seem to have missed, namely that by accessing the InnoDB storage engine via the new plugin you are actually completely bypassing SQL and avoiding all the overhead that comes with it.
This of course makes it essentially a key/value store more akin to most NoSQL databases complete with all the drawbacks associated with them. i.e. no joins etc...
However on the flip side for many applications these days, this is exactly what we want. There has been only a handful of real world performance mentions that I have come across but all seem to point to this implementation significantly outperforming MongoDB and other similar NoSQL solutions (how much truth is in it I do not know) with even one (relatively in depth) comparison claiming as high as 700k qps on a commodity server (compared with around 100k on a well tuned MySQL setup), which is incredible if true.
Resource here:
http://yoshinorimatsunobu.blogspot.co.uk/search/label/handlersocket
Anyway, sorry I can't be any more help but its food for thought at least!
I'm not sure, if it fits exactly stackoverflow, however as i'm seeking for some code rather than a tool, i think it does.
I'm looking for a way of how to replicate / synchronize different database systems -- in this case: mysql and mongodb. We are running both for different purpose. We started with a mysql database and added mongodb later on for special applications. There's data we would like to have in both databases, where we want to have constraints in mysql respectivly dbrefs in mongodb. For example: We need a user-record in mysql, but also in mongodb for references between tables respectivly objects. At the moment we have a cronjob, which dumps the mysql data and imports it in mongodb. However though it works quite well, that's not the solution we would like to have.
I think for the moment a one-way replication would be enough -- mysql->mongodb, the important part is, that the replication works in "realtime", much like a mysql master->slave replication works.
Are there already any solutions for this problem or ideas anyone of how to achieve this?
Thanks!
SymmetricDS is open source, Java-based, web-enabled, database independent, data synchronization/replication software that might do the trick with a few tweaks. It has an extension point called IDataLoaderFilter which you could use to implement a MongodbDataLoader.
This would help with one way database replication. It might be a little more difficult to synchronized from MongoDb -> relational database, but the SymmetricDS team would be very helpful in trying to find the solution.
What you're looking for is called EAI (Enterprise application integration). There are a lot of commercial tools around but under the provided link, you'll also find a couple OSS solutions. The basis of EAI is that you have data sources and data sinks. The EAI framework offers tools to build custom pumps between the two.
I suggest to either use a DB trigger to start the synchronization or send a trigger signal in your applications. Note that there is no key-hole solution since synchronization can become arbitrarily complex (for example, how do you make sure that all rows are copied?).
As far as I see you need to develop some sort of "Control program" that has the drivers for each DBMS and run it as a daemon. The daemon should have a trigger or a very small recheck interval to keep the DBs synchronized
Technically, you could set up a process which parses the binary log of the MySQL server and replicate the relevant sql queries. I've never done such a thing with a a different database as a slave, but maybe it is worth a shot?
Every single book that teaches programming (or almost anything else) starts off with a whole bunch of spiel on why what it's about (C++, MySQL, waterskiing, skydiving, dentistry, whatever) is the greatest thing in the world. So I open the MySQL O'Reilly book, and read the intro, and get the traditional sermon. The main points that the book mentioned were:
MySQL has been shown to have tied Oracle as the fastest and most scalable database software.
It's free and open source.
Sounds pretty convincing, but I know there's always at least two sides of every story. I knew I needed to be disillusioned when I saw someone suggest to someone to use Oracle instead of MySQL and thought, "Why in the world would you want to do that?!", just because of the few paragraphs I'd read, with no other justification. So lets investigate the other side of the story:
What are some reasons NOT to use MySQL?
Here's just a random list of stuff that popped into my head. It's CW, so feel free to add to it as necessary.
Oracle provides a top notch ERP built on their database. If your company is subject to Sarbanes-Oxley regulations, this is quite a bit above "crucial."
SQL Server licenses come with Analysis Services, Integration Services, and Reporting Services. If you want to do anything with OLAP, ETL, or reporting, these three are great applications that are built on the SQL Server stack.
SQL Server has native .NET data types (in 2008). Absolutely brilliant for .NET shops dealing with geospatial datasets.
MySQL does not support check constraints.
SQL Server includes the over clause, which helps when dealing with the "top n rows in each group" problem. Essentially, you can do aggregate functions partitioned over the dataset any way you'd like.
SQL Server uses Kerberos and Windows authentication natively. MySQL does not tie into Active Directory.
Superior performance on subqueries (almost any database has subquery performance that is superior to MySQL's)
Oracle, SQL Server, PostgreSQL and others have a richer set of join algorithms available to them; this means joins can often be performed faster, especially when large tables are involved.
MySQL has been shown to have tied oracle as the fastest and most scalable database software.
Making that statement about any two database systems is probably enough to throw the book away without reading the rest. Database systems are not commodities that can be compared with a couple lines of information, and will not be for the foreseeable future.
One reason that the statement is obviously false is that MySQL has very limited plan choices available. For instance, MySQL can't use merge join or hash join -- two fundamental algorithms that have useful performance characteristics. That's pretty much the end of the story for many query workloads. It is trivial to show a reasonable query that is orders of magnitude faster with a merge join.
There are plenty of other criticisms of MySQL versus XYZ and vice-versa. My point is that this is a complex issue, and the book is drastically oversimplifying. If you're getting involved in databases at all, you need to spend time diversifying your knowledge and understanding fundamentals.
My personal opinion is that MySQL and SQLite are the worst places to start. Pick something like Oracle (which can be downloaded free of charge for learning/evaluation, which many don't realize), PostgreSQL (BSD license), or MS SQL. FirebirdSQL might be good, too. Once you familiarize yourself with a few systems, you'll be able to make an informed choice about whether the trade-offs MySQL makes are right for you.
Everyone seems to be missing one of the main reasons to stick with Oracle/MS. You've already got a stable full of DBAs that know those products inside and out.
The default collation in mysql is case-insensitive. This is not a problem per se, but I think this strange default is an indication that it was targeted at hobby-developers, rather than professionals. This is a big assumption, but I'd think any professional would expect a database to compare strings for identity by default (i.e. using a binary collation).
Manipulation of tables during transactions causes implicit COMMITs. While this might not look grieve at the first glance, you will notice that you cannot cannot work under ACID conditions if altering/creating tables is an inherent part of your application.
MySQL can certainly match or beat Oracle in speed. I've done it numerous times myself. Ok, so I had to use various table types like black hole, merge, innodb, and myisam in just the right laces. And it took me a few days to get everything working just right. The Oracle DBA got things working in an hour or two.
MySQL is fine for 98% of the sites out there, maybe more. But it is fairly easy to bring it to a crawl without a lot of data if you don't know what you are doing. Oracle is quite a bit harder to bring to a crawl, but it can still be done. I've worked with both with datasets in the hundreds of millions of records (tiny by some measures). MySQL takes quite a bit more attention.
No database can scale indefinitely, which is why nosql "databases" are becoming so popular. I think the real question is if MySQL is "good enough" for what you need to do. The price is certainly right. The same could be said about PHP.
Why does Facebook use MySQL? Could you imagine what it would cost them to buy enough Oracle licenses!? It's good enough.
The future is of sun (the company behind mysql) is unclear and you don't know whether there will be a company to back the product.
MySQL is very tolerant of ambiguities -- something you don't want in a database system. Here are a few examples off the top of my head:
As another poster stated, CHAR and VARCHAR columns are case-insensitive, already a pretty bad sign.
You can INSERT into a table that has a column without a default value that is also NOT NULL. Yes, really! Instead of throwing an error, MySQL will pick a value for you based on the data type, e.g. 0 for numbers.
You can use a GROUP BY statement while some columns are neither using an aggregate function, nor included in the GROUP BY statement. The outcome is pretty much random. No warnings or errors here either, in my experience.
MySQL is also far from rock-solid. Just this month, I discovered a bug in the (admittedly old, but a "stable release") version of MySQL used by DreamHost that results in data loss. (Certain conditions when creating a table with variable-length rows.)
I've been using MySQL for many years and still do, but would never dream of using it for anything serious, where data loss would be a big problem. It's great for non-mission-critical web sites and blogs though.
I knew I needed to be disillusioned
when I saw someone suggest to someone
to use oracle instead of MySQL and
thought, "Why in the world would you
want to do that?!"
Because your company has been using Oracle for the past ten years, or because you equate enterprise usage with 'must be good' and open-source with 'free crap'. That's just about the only reason. Everyone I know who has worked with Oracle loathes it. Everyone I know who has worked with MySQL, assuming they don't love it, at least consider it a better alternative to Oracle in almost every regard.
SQL RMDBs are so complex though, that in almost every respect there's something one DB does that another doesn't. It is also, unfortunately, a fact of comparing databases that people quote statistics without using properly configured servers. If you have two default configurations for a server, one might be better than the other, but that's about as far as the comparisons usually go. They don't reflect the fact that these gigantic applications have a million little switches and toggles you can use to speed certain things up, increase reliability and generally screw up bad science.
MySQL tends to be a very general purpose database system, you can use it for almost anything that you'd use Oracle, SQL Server, PostgreSQL, DB2, etc for.
However, these different systems have different strengths, PostgreSQL has a ton more functionality than MySQL and can handle some very specific tasks that MySQL struggles with. SQL Server usually integrates with Microsoft products very easily whereas MySQL you'd have to do some extra work to make them play together. Oracle is MASSIVE, they're not just databases and when you're dealing with large, expansive systems Oracle probably has the gear to cover everything under the 1 roof, whereas you'd need to tie a bunch of disparate systems together to have MySQL has your database system.
Whether or not to use MySQL should be based upon whether or not it is reasonable to use MySQL.
Disclaimer: I have been using MySQL since 2001 and still love it, but here are a few reasons that make me doubt about my fidelity...
There are some false arguments (it was true a few years ago) in some of the answers I read. Before making a choice, check MySQL documentation and its up-to-date list of features. You could be surprised.
Each DB server lack functionalities. This is not a real blocking issue if you do not specifically need them.
For me, the main issues are elsewhere:
The time needed to have a bug fixed and published in a stable release. It is a shame. (For some bugs... it takes years (no kidding)!)
The frequency of stable releases.
But since this year, the new issues are:
The number of increasing branches (Percona, Google, Facebook, etc.).
Sun is unclear with his strategy.
Many MySQL employees left the company.
It's free and open source.
True. But keep in mind that MySQL is, in many cases, not free for commercial use. MySQL and the connectors (the official drivers for various languages), are GPL licensed.
If you use, say, the Connector/.NET to connect to MySQL your code have to be GPL compatible. It's dual licensed though, so you can buy an enterprise version under another license - and I believe they have a (either free or just very cheap) program that lets you license the connectors under a different license.
Everyone I know using MySQL is unaware of this :-)
Basically, there are several choices for a database. Frankly, in today's world, DB choice is less important than it was a few years ago. Here are a few issues to consider.
Most of the current database systems in widespread use such as SQL Server (and SQL Server Express), Oracle, MySQL, SQLLite, etc. are relatively standards compliant and can be used somewhat interchangeably. Some serve different niche markets. For example, SQL Server, MySQL, and Oracle are all good choices for large Enterprise applications. SQLLite is very good for applications which deploy on a client and need a local database with a small footprint and minimal configuration. (In my opinion, Oracle is extremely over-priced, is backed by an arrogant unresponsive company. It would never be my first choice on any project. I would only use it if it was mandated by the client or by necessity.)
A high percentage of top-end developers are using tools such as Hibernate(Java)/NHibernate(.NET) to build their data access layers. Hibernate variants strongly encourage developers to start with development of the object model rather than the database model. The Hibernate application then generates the data model automatically--and even handles data model updates. Hibernate variants can be used with any of the major database vendors. Changing your database choice can be as simple and painless as selecting a different database type in your configuration. On a side note, I should mention that while Hibernate and NHibernate are cross-database-compatible, they do not work on the lowest common denominator. The data access code in these applications is often designed to take advantages of special features within a given database engine. For example NHibernate supports access to the NVarchar(Max) data type in SQL Server which allows for very long strings.
In most applications, issues with database performance do not derive directly from the speed of reads and writes. Most of the issues relate to how the application manages the caching of frequently accessed data. For example, in online blog site, it makes sense to cache blog posts once they have been read so they are not repeatedly fetched from the database. This caching mechanism is almost always primarily handled by the application code rather than database server--though database servers do provide some caching. Hibernate/NHibernate have excellent caching support built in as does Microsoft's ASP.NET and their new MVC framework built on top of ASP.NET.
Enterpise databases (SQL Server, Oracle, MySQL) are best for situations where functionality such as replication, clustering, huge datasets, etc. are required.
I don't like MySQL licence : Firebird and PostgreSQL are better
There is no real hotbackup include in the MySQL by Sun
you can also look here which is interresting link and comment !
MySQL is free, but it takes an expert to maintain. Someone who naturally uses the command prompt and is not afraid to experiment. In some cases, MySQL problems are too complex, and the right people to troubleshoot them may not be available for any amount of money.
SQL Server is priced in the middle range. It can be maintained by "normal people", the kind who go home every day on 17:00 and have a natural disinclination to fifty page HOW-TO's. SQL Sever performs well in most instances but can break down in specific scenarios.
Oracle is the most expensive and requires highly paid operators. If you have the money, Oracle is a "safe" choice, because there's nothing Oracle won't do for money.
Three products, three markets!
A couple of pages listing gotchas (such as this and this) make me want to stay as far away from MySQL as possible. Here's a more neutral comparison of Postgres and MySQL.
As for the open source aspect others mentioned: MySQL is open source and free, only if your application is, too. If it's not, you need a commercial license.
My personal story:
Adding a new index to a table of about 10k rows.
MySQL side
about 30 seconds.
Postgres side
about 1 second.
I've worked with MySQL for years, and SQL Server only over the past year. I don't really see one being any easier or harder to use than the other in most cases. I do wish, however, that MSSQL had some of the features that MySQL possesses (e.g. being able to insert multiple rows on a single INSERT statement).
Also, if you don't have to use RDBMS, checkout redis. It is basically memchached with persistence with asynchronous write through. The performance is not on the same scale with MySQL.
Well... I guess the comparison isn't really fair to MySQL since it's not RDBMS...
I'm building a web app. This app will use MySQL to store all the information associated with each user. However, it will also use MySQL to store sys admin type stuff like error logs, event logs, various temporary tokens, etc. This second set of information will probably be larger than the first set, and it's not as important. If I lost all my error logs, the site would go on without a hiccup.
I am torn on whether to have multiple databases for these different types of information, or stuff it all into a single database, in multiple tables.
The reason to keep it all in one, is that I only have to open up one connection. I've noticed a measurable time penalty for connection opening, particularly using remote mysql servers.
What do you guys do?
Fisrt,i must say, i think storing all your event logs, error logs in db is a very bad idea, instead you may want to store them on the filesystem.
You will only need error logs or event logs if something in your web app goes unexpected. Then you download the file, and examine it, thats all. No need to store it on the db. It will slow down your db and your web app.
As an answer to your question, if you really want to do that, you should seperate them, and you should find a way to keep your page running even your event og and error log databases are loaded and responding slowly.
Going with two distinct database (one for your application's "core" data, and another one for "technical" data) might not be a bad idea, at least if you expect your application to have a lot of users :
it'll allow you to put one DB on one server, and the other DB on a second server
and you can think about scaling a bit more, later : more servers for the "core" data, and still only one for the "technical" data -- or the opposite
if the "technical" data is not as important, you can (more easily) have two distinct backup processes / policies
having two distinct databases, and two distinct servers, also means you can have heavy calculations on the technical data, without impacting the DB server that hosts the "core" data -- and those calculations can be useful, on logs, or stuff like that.
as a sidenote : if you don't need that kind of "reporting" calculations, maybe storing those data to a DB is not useful, and files would do perfectly ?
Maybe opening two connections means a bit more time -- but that difference is probably rather negligible, is it not ?
I've worked a couple of times on applications that would use two database :
One "master" / "write" database, that would be used only for writes
and one "slave" database (a replication of the first one, to several slave servers), that would be used for reads
This way, yes, we sometimes open two connections -- bu one server alone would not have been able to handle the load...
Use connection pooling anyway. So the time to get a connection is not a problem. But if you have 2 connections, transaction handling become more complicated. On the other hand, sometimes it's handy to have 2 connections: if something goes wrong on the business transaction, you can rollback transaction and still log the failure on the admin transaction. But I would still stick to one database.
I would only use one databse - mostly for the reason you supply: You only need one connection to reach both logging and user stored data.
Depending on your programming language, some frameworks (J2EE as an example) provide connection pooling. With two databases you would need two pools. In PHP on the other hand, the performance come in to perspective when setting up a connection (or two).
I see no reason for two databases. It'd be perfectly acceptable to have tables that are devoted to "technical" and "business"data, but the logical separation should be sufficient.
Physical separation doesn't seem necessary to me, unless you mean an application and data warehouse star schema. In that case, it's either real-time updates or, more typically, a nightly batch ETL.
It makes no difference to mysql in any way whether you use separate "datbases", they are simply catalogues.
It may make setting permissions easier, this is a legitimate reason to do it. Other than that, it is exactly the same as keeping the tables in the same db (except you can have several tables with the same name ... but please don't)
Putting them on separate servers might be a good idea however, as you probably don't want your core critical (user info, for example) data mixed in with your high-volume, unimportant data. This is particularly true for old audit data, debug logs etc.
Also short-lived data, such as search results, sessions etc, could be placed on a different server - it presumably has no high availability[1] requirement.
Having said that, if you don't need to do this, dump it all on one server where it's easier to manage (backup, provide high availibilty, manage security etc).
It is not generally possible to take a consistent snapshot of data on >1 server. This is a good reason to only have one (or one that you care about for backup purposes)
[1] Of the data, not the database.
In MySQL, InnoDB has an option of storing all tables of a certain database in one file, or having one file per table.
Having one file per table is somewhat recommended anyway, and if you do that, it makes difference on the database storage level if you have one database or several.
With connection pooling, one database or several is probably not going to matter either.
So, in my opinion, the question is if you'd ever consider separating the "other half" of the database to a separate server - with the separate server having perhaps a very different hardware configuration, such as no RAID. If so, consider using separate databases. If not, use a single database.
Ok, maybe I'm missing something here but I'm looking at various PHP hosting options and I see things like "10 MySQL databases", or 25 or even unlimited.
Now I've worked on sites with an Oracle backend that have 10,000+ concurrent users and we've had... one database.
The idea of a database is, of course, that you can store whatever you want in it. So why is it for MySQL that the number matters? Is there some table, row or overall database limit I'm not aware of (entirely possible)? Or is it a question or concurrent connections? Or some other performance issue (eg sharding)? The sharding aspect seems unlikely because even basic hosting options (ie under $5/month) I see with 10 databases.
If someone could clue me in on this one, it'd be great.
It's mostly a marketing tactic, although there are some technical and historical considerations.
First, apologies if this is obvious, but SCHEMAs are to Oracle as DATABASES are to MySQL (in over simplified terms, a logical collections of tables).
The host is saying you can have XX number of configured logical databases on a server. Lots of web applications need a database to run. Modern web applications like Wordpress, Movable Type, Joomla, etc., will let you name your tables with a custom prefix. However, if an application doesn't have this configuration feature that means you need one database per install. Also, in a similar vein, if two applications have the same table name, they can't coexist in a single database. Lots of early web applications started out like this, so early on number of databases was an important feature to consider.
There's also access and security. While MySQL (and other databases) can be configured to give users fine grained access-control down to the table and column level, it's often easier to create one user who has full permission on a logical Database. This is important to people who sell services but pass off the actual hosting of completed sites/applications to the shared web-host.
Some people like one database per app
It's marketing, not technical. They want something to advertise. "10" sounds like a good number.
For development purposes, sometimes it's good to make a copy of your entire database to test new software against. Beats renaming all the tables in your code (although apps like Wordpress let you specify a prefix for all your table names in case you don't have the luxury of multiple DBs).
When I used shared hosting, I set up a separate database for each site/client for custom apps, and if you use Fantastico to install applications it will use a database for each one by default.
I believe the limits are there to prompt you to upgrade to the next tier of service when you outgrow the current level.
Nick is partially correct, but it also has to do with people who will try to host multiple sites on one shared account and will use a different database for each and a script to serve the correct content with a little dns masquerading.
Additionally its possibly a marketing perspective.
If you're only setting up databases for yourself, the low count is fine. but for commerical users, whom may want to have multiple sites for multiple clients on the one service, trying to cut corners, you're likely to need 1 Database ( or more ) per client/project.
So putting a limit on number of databases controls somewhat the variety services you offer, and potentially limits potential for your "resale" value, ie: to stop you buying 1 plan and then selling it on to somebody else, like "subleasing".
This is mainly for when you are hosting multiple sites on the same box. For me, I buy/sell a lot websites so I need to be able to keep each website as detached from the others as possible.