We are developing a payment system for venues.
Now we're considering what DBMS to use for our application.
At the moment we're using microsoft's sql express to store our data.
But because the system is going to be used on very busy venues, we think we need a failover system for the case the database server goes down.
We have been looking at using mssql server for replication, but that is going to cost a lot of money for a case that is (hopefully) never going to happen.
The database has to be up for only a couple of hours (duration of venue) to max a couple of days.
But if the database is down for 30 minutes, noone can order drinks or get access to the venue. And with thousands of people in the venue, thats going to cause a lot of trouble.
So... can anyone share some of his expretise and/or share some reads for me about failovers, replication or anything?
Thanks in advance
I would suggest that you forget about the DBMS until you have a clear business plan, systems architecture and defined goals for availability. Ideally, find someone who has implemented a system like this before and hire them or use them as a consultant.
You also have to look into how much money you will lose if the system goes down: actual sales, loss of reputation and future business, contractual penalty clauses etc. Compared to those costs, adding a second server might start looking fairly cheap. But where will they be hosted, how does connectivity work, who does Operations etc.? Having a fully redundant failover database cluster will not help you if your one and only Wifi base station for the POS terminals suddenly dies.
Perhaps you have already considered and answered all these questions, and if so it would be helpful to add some more details to your question about the main constraints and requirements you have.
Related
We're in the process of adding a bank-like sub-system to our own shop.
We already have customers, so each will be given a sort of account and transactions of some kind will be possible (adding to the account or subtracting from it).
So we at least need the account entity, the transaction one and operations will then have to recalculate overall balances.
How would you structure your database to handle this?
Is there any standard bank system have to use that I could mock?
By the way, we're on mysql but will also look at some nosql solution for performance boost.
I don't imagine you would need NoSQL for any speed boost, since it's unlikely to need much/any parallelism and not sure how non-schema-oriented you might need to be. Except when you start getting into complex business requirements for analysis across many million customers and hundreds of millions of transactions, like profitability, and even then that's kind of a data warehousing-style problem anyway which you probably wouldn't run on your transactional schema in the first place if it had gotten that large.
In relational designs, I would tend to avoid any design which requires balance-recalculation because then you end up with balance-repair programs etc. With proper indexing and a simple enough design, you can do a simple SUM on the transactions (positive and negative) to get a balance. With good consistent sign conventions on the transactions (no ambiguity on whether to add or subtract - always add the values) and appropriate constraints (with limited number of types of transactions, you can specify with constraints that all deposits are positive and all withdrawals are negative) you can let the database ensure there are no anomalies like negative deposits.
Even if you want to cache the balance in some way, you could still rely on such a simple mechanism augmented with a trigger on the transaction table to update an account summary table.
I'm not a big fan of putting any of this in a middle layer outside of the database. Your basic accounting should be fairly simple that it can be handled within the database engine at speed so that anyone or any part of the application executing a query is going to get the same answer without any client-code logic getting involved. And so the database ensures integrity at a level slightly above referential integrity (accounts with non-zero balance might not be allowed to be closed, balances might not be allowed to go negative etc) using a combination of constraints, triggers and stored procedures, in increasing order of complexity as required. I'm not talking about all your business logic, just prohibiting low-level situations you feel the database should never get into due to bad client programming or a failure to do things in the right order or call things with the right parameters.
In real banking (i.e. COBOL apps) typically the database schema (usually non-relational and non-normalized - a lot of these things predate SQL) you see a lot of things like 12 monthly buckets of past balances which are updated and shifted when the account rolls over. Some of the databases these systems use are kind of hierarchical. And this is where the code is really important, because everything gets done in code. Again, it's kind of old-fashioned and subject to all kinds of problems (i.e. probably a lot like what NatWest is going through) and NoSQL is a trend back towards this code-is-king way of looking at things. I just tend to think after a long time working with these things - I don't like having systems with balances cached and I don't like systems where you really don't have point-in-time accountability - i.e. you ignore transactions after a certain date and you can see EXACTLY what things looked like on a certain date/time.
I'm sure someone has "standard" patterns of bank-like database design, but I'm not aware of them despite having built several accounting-like systems over the years - accounts and transactions are just not that complex and once you get beyond that concept, everything gets highly customized.
For instance, in some cases, you might recognize earnings on contracts on some kind of schedule according to GAAP and contracts which are paid over time. In banking you have a lot of interest-related things with different interest rates for cost of funds etc. Everything just gets unique once you start mixing the business needs in with just the basics of the accounting of ins and outs of money.
You don't say whether or not you have a middle tier in your app, between the UI and the database. If you do, you have a choice as to where you'll mark transactions and recalculate balances. If this database is wholly owned by the one application, you can move the calculations to the middle tier and just use the database for persistence.
Spring is a framework that has a nice annotation-based way to declare transactions. It's based on POJOs; an alternative to EJBs. It's a three legged stool of dependency injection, aspect-oriented programming, and great libraries. Perhaps it can help you with both structuring and implementing your app.
If you do have a middle tier, and it's written in an object-oriented language, I'd recommend having a look at Martin Fowler's "Analysis Patterns". It's been around for a long time, but the chapter on financial systems is as good today as it was when it was first written.
EDIT: I've learnt, and it's probably true that YouTube uses MySQL. But it probably would be the enterprise edition and not free edition. The only alternative seems to be PostgreSQL. Long question short - - Can PostgreSQL used instead of MySQL? Is it a very good alternative in any case?
Firstly, I noticed that these are the most common names when it comes to (relational) database management systems - - DB2 (IBM), Oracle Database, Microsoft SQL, Ingres, MySQL, PostgreSQL and FireBird. So, should I presume these are the best?
Okay, of the above - - DB2 (IBM), Oracle Database and Microsoft SQL, the so-called Enterprise DBMSs, come with a bill; while MySQL (exclude enterprise version), PostgreSQL and FireBird are open source and free.
As should be clear from my previous questions here, I plan to build a photo-sharing site (something like Flickr, Picasa), and like any other, it's going to be database-heavy and (hopefully) busy.
Here's what I would love to know: (1) does any one of the free DBMSs stand up to the mark with the paid enterprise DBMSs? (2) Can any of the free DBMSs scale and perform well for enormous and busy databases without too much headbanging and facepalming?
Things in my mind w.r.t the DB:
Mature
Fast
Perform great/fine under heavy load
Perform great/fine as database grows
Scalable (smooth transition)
support for languages (preferably Python, PHP, JS, C++)
Feature-rich
etc (whatever I am missing)
PLZ NOTE: I know Facebook, Twitter etc use (or at least used) MySQL, and I see reports from time to time, how their sysadmins cry over that decision. So, please don't say, XXX uses it, so why can't you. They've started small, I am too. They've made mistakes, I don't want to. I want to keep the scaling-transition smooth. I hope I am not asking too much. Thanks.
"Which is the best database" is a huge question and is the subject of much contention. I've noticed on StackOverflow there is a tendency to close such questions; although the question is interesting, it is also quite unresolvable ;-)
FWIW, I would go with this:
Use what you know
If it doesn't conflict too heavily with the first rule, use something that is free of charge
Use what works with other parts of your stack
Use what you can hire for at reasonable cost (so, maybe not Oracle unless you really have to)
Don't optimise too early. Working slowly is much better than an unfinished, efficient website.
Also, scalability is not really to do with your db platform, but to do with how you design your site. Note also that some platforms scale better when adding more servers (MySQL) and others do better when increasing your server resources (PostgreSQL).
Please note as of today, MySql is not a free project aka as free as postgresql. One of the main reason why i had to switch over to PG. (Thankx to NPGSQL and PgAdmin III, it was a lot easier than it was rumoured)
However MySql does have number of advantages related to applications,addons,forums and looked good on resume.
PostgreSql is a much mature DBMS. It is a objectRDBMS. It has been around for more than 15 years. It is not known to have defaulted on any major issues. It is well known to handle transactions running in millions of rows successfully. The most important is, it's high rate of compliance with SQL standards. Infact in professional circles, it is more of an Oracle of Free RDBMS rather than MySql of popular applications.
Ok guys.
I've begun developing a little sparetime project that might become big someday. Before I really get started, I want to be certain that I'm starting with the right setup. So I come to you.
I'm making a service, which will work mostly as a todolist/project planner.
In this system there will be an amount of users and an amount of tasks. Each task can be assigned to multiple users, and each user can have multiple tasks (many to many relation).
Until now I was planning to use MySQL, but a friend of mine, who is part of the project, sugested MongoDB instead. He tells me that it would increase performance and be more scaleable.
On the other hand I'm thinking that in order to either get all tasks assigned to a specific user, or all users assigned to a specifik task, one would need to use joins, which MongoDB doesnt have (or have in a cumbersome way as far as I have understood).
Now my question to you is "Which DB system would you suggest. MySQL or MongoDB or a third option? And why?"
Thank you for your time and your assistance.
Morten
We use MySQL at IGN to store person relationships (many-to-many like your use case), and have about 5M records in the relationship table. We have 4 MySQL servers in a cluster and the reads are distributed across 3 MySQL slaves. BTW you can always denormalize to optimize reads and penalizing writes among other things based on the read/write heavyness of your system.
We use the DAO pattern with Spring, so its fairly easy for us to swap DB providers through configuration (and by writing a Mongo/MySQL DAO Implementation as applicable). We have moved activities (like in Social Media) to Mongo almost a year ago but the person relationships are living happily in MySQL.
The comment to your post by Jonas says it all,
If need be, you can always scale later.
This.
I am very much of the mindset that If you don't have scaling problems, don't worry too much (if at all) about scaling problems. Why not use what is easiest, smartest and cleanest to deliver the features clients pay for (in my case at least!) This approach saves a lot of time and energy and is the proper one for 9 projects out of 10.
Learning a technology because it scales is great. Being tied to an unlearned technology and unknown technology because it scales in an upcoming project, is not as great. There are many other factors than scalability, when using 3rd party stuff.
MySQL would seem to be a good choice MySQL being more mature and having loads of client libraries, ORM's and other timesaving technologies. MySQL can handle millions (billions if you have the ram) of rows. I have yet to encounter a project it could not handle, and I have seen some pretty impressive datasets!
Of course, when you will need performance, sure maybe you will find yourself ripping out orm and sql generating code to replace with your own hand tweaked queries, but that day is way down the line and chances are, that day will never even come.
Mongodb, although it is real cool I am sorry to say may well bring you issues having nothing to do with scaling.
My 2 cents, happy coding!
MySQL
Either would likely work for your purposes, but your database seems relatively rigid in its structure, something which SQL deals well with. As such, I would recommend MySQL. A many-to-many relationship is relatively easy to implement and access, as well.
You may take a tiny bit of a performance hit, but in my experience, this is generally not extremely noticeable with smaller scale applications (i.e. databases with less than millions of entries). I do agree with #Jonas Elfström's comment, however: you should have an abstraction layer between your application and the database, so that should scaling become an issue, you can address it without too many problems.
Stick with a relational database, it can handle many to many relationships and is fully featured for backup and recovery, high availability and importantly you will find that every developer you need is familiar with it. There are plenty of documented methods for scaling a relational database.
Pick an open source databases either MySQL or Postgres dependant upon which your team is most familiar with and how it integrates into the rest of your infrastructure stack.
Make sure you design your data model correctly most importantly the relationships between the entities.
Good luck!
is mysql capable of managing the data for a site which holds lots of data (say with hundreds of millions of users)? which database would be the most capable/beneficial?
Wikipedia is based on MySQL. I don't think it has 100M users, but it must be close by now.
No database will handle hundreds of millions of users unless you know how to set it up properly. No single server could handle that kind of traffic, so you need to know how to setup replication and load balancing. Once you reach a certain level, there is no out of the box solution, only tools you can use. MySQL being a very capable tool.
There are a couple of answers to this.
Yes, MySQL can store hundreds of millions of records; you need to know what you're doing, have a decent database schema, pretty robust hardware, but you're not pushing the limits.
When you talk about "hundreds of millions of users", you're talking about a site along the lines of Wikipedia/Facebook/Google/Amazon in scale. You need a custom, highly cached, distributed architecture to run a site at that scale - and the traditional database driven application architecture will almost certainly not be enough. You could still store your data in MySQL, but you'd need a whole bunch of additional components to make it all work - and without knowing more about the application, nobody could tell you what that might be. At that scale, none of the commonly used databases would suffice, so MySQL is no better or worse than any of the other options...
Your question is really irrelevant, because creating a product or service that hundreds of millions of customers actually want is a much bigger and more difficult challenge than choosing a database engine.
If you're starting a business from nothing, pick a technical platform you already know and go with it: productivity and quick implementation will be more important than being scalable to a level you may never reach anyway.
If you do eventually become successful enough to have to deal with hundreds of millions of customers, then you'll certainly be able to raise the cash to buy whatever expertise and hardware you need.
IMPORTANT NOTE: I recieved many answers and I thank you all. But all the answers are more comments than answers. My question is related on the number of roundtrips per RDBMS. An experienced person told me that MySQL has less roundtrips than Firebird. I would like that the answer stays in the same area. I agree that this is not the first thing to consider, there are many others (application design, network settings, protocol settings...), anyway I 'd like to recieve an answer to my question, not a comment. By the way I found the comments all very useful. Thanks.
When the latency is high ("when pinging the server takes time") the server roundtrips make the difference.
Now I don't want to focus on the roundtrips created in programming, but the roundtrips that occur "under the hood" in the DB engine+Protocol+DataAccessLayer.
I have been told that FireBird has more roundtrips than MySQL. But this is the only information I know.
I am currently supporting MS SQL but I'd like to change RDBMS, so to make a wise choice I would like to include also this point into "my RDBMS comparison feature matrix" to understand which is the best RDBMS to choose as an alternative to MS SQL.
So the bold sentence above would make me prefer MySQL to Firebird (for the roundtrips concept, not in general), but can anyone add informations?
And MS SQL where is it located? Is someone able to "rank" the roundtrip performance of the main RDBMS, or at least:
MS SQL, MySql, Postegresql, Firebird (I am not interested in Oracle since it is not free, and if I have to change I would change to a free RDBMS).
Anyway MySql (as mentioned several times on stackoverflow) has a not clear future and a not 100% free license. So my final choice will probably dall on PostgreSQL or Firebird.
Additional info:
somehow you can answer my question by making a simple list like:
MSSQL:3;
MySQL:1;
Firebird:2;
Postgresql:2
(where 1 is good, 2 average, 3 bad). Of course if you can post some links where the roundtrips per RDBMSs are compared it would be great
Update:
I use Delphi and I plan to use DevArt DAC (UNIDAC), so somehow the "same" Data Access component is used, so if there are significant roundtrip differences they are due to the different RDBMS used.
Further update:
I have a 2 tier application (inserting a middle tier is not an option), so by choosing a RDBMS that is optimized "roundtrip-side" I have a chance to further improve the performance of the application. This kind of "optimization" is like "buy a faster internet connection" or "put more memory on the server" or "upgrade the server CPUs". Anyway also those "optimizations" are important.
Why are you concentrating on roundtrips? Normally they shouldn't affect your performance unless you had a very slow and unreliable network. For example, the difference between ODBC and OLEDB drivers for any database is nearly an order of magnitude in favor of OLEDB.
If you go to either MySQL or Firebird using ODBC instead of OLEDB/ADO.NET drivers you incur an overhead several orders of magnituted greater than the roundtrips you might save.
How your application is coded and how and when data are accessed and transferred have a much greater impact in slow connection or high latency situations than the db network protocol itself. Some database protocols may be tuned to better work in uncommon scenarios, i.e. increasing or decreasing the data packet size.
You may also encounter slow down at the TCP/IP layer itself, which could require TCP/IP tuning as well.
Until v2.1, Firebird certainly creates more traffic than MS SQL Server. I have a friend which developed a MSSQL C/S application here in Brazil where the db is hosted in a datacenter. The client apps runs from many stores directly connecting on server over VPN/Internet using end-user broadband connections (1Mbps, mostly) for 5+ years and no trouble with it. The distances involved range from few hundred to thousands of kilometers from datacenter.
After v2.1, I can't figure out if this remains true, because I haven't made a fair comparison since and Firebird's remote protocol had been changed to optimize network traffic on slow connections. More on FirebirdSQL site.
Can't say on PostGres ou MySQL, since I didn't used any.
I can't give round trip details but i was in a very similar situation a while back when i was trying to find alternatives to MS SQL due to budgeting. myself and 4 others spent some time comparing MySQL, Postgres, and FireBird.
Having worked with MySQL for a long time we quickly ruled it out for most of our larger projects. The decision fell between Postgres and FireBird. One thing just starting off was the lack of popular support/documentation with FireBird in contrast to Postgres. Our bench tests always either had Postgres on top or on level with FireBird, never under. In terms of features; Postgres again answered our needs while FiredBird had us needing to come up with creative solutions.
Below is a feature comparison chart. i'll admit it is now a bit dated but still very helpful:
Here is also a long forum thread discussing the difference
Good luck!
Sometimes the "roundtrips" are also in the protocol or data access layer, not the "DB engine"
I will not rank the client-server DBMS's from the roundtrips side. There are a lot of options to make one DBMS the best (ask SQL Server to use the default cursor), and other the worse (create an Oracle cursor with nested datasets).
What you are looking for is, probably, the general approach, oriented on the trafic minimization and the independent work of a client from a server. That are the middle-tier data access libraries.
So, if your application is so sensitive to the trafic optimization, then look for such libraries like the DataAbstract, kbmMW or ThinDAC.