As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've been asked to screen some candidates for a MySQL DBA / Developer position for a role that requires an enterprise level skill set.
I myself am a SQL Server person so I know what I would be looking for from that point of view with regards to scalability / design etc but is there anything specific I should be asking with regards to MySQL?
I would ideally like to ask them about enterprise level features of MySQL that they would typically only use when working on a big database. Need to separate out the enterprise developers from the home / small website kind of guys.
Thanks.
Although SQL Server and MySQL are both RDBMs, MySQL has many unique features that can illustrate the difference between novice and expert.
Your first step should be to ensure that the candidate is comfortable using the command line, not just GUI tools such as phpMyAdmin. During the interview, try asking the candidate to write MySQL code to create a database table or add a new index. These are very basic queries, but exactly the type that GUI tools prevent novices from mastering. You can double-check the answers with someone who is more familiar with MySQL.
Can the candidate demonstrate knowledge of how JOINs work? For example, try asking the candidate to construct a query that returns all rows from Table One where no matching entries exist in Table Two. The answer should involve a LEFT JOIN.
Ask the candidate to discuss backup strategies, and the various strengths and weaknesses of each. The candidate should know that backing up the database files directly is not an effective strategy unless all the tables are MyISAM. The candidate should definitely mention mysqldump as a cornerstone for backups. More sophisticated backup solutions include ibbackup/innobackup and LVM snapshots. Ideally, the candidate should also discuss how backups can affect performance (a common solution is to use a slave server for taking backups).
Does the candidate have experience with replication? What are some of the common replication configurations and the various advantages of each? The most common setup is master-slave, allowing the application to offload SELECT queries to slave servers, along with taking backups using a slave to prevent performance issues on the master. Another common setup is master-master, the main benefit being the ability to make schema changes without impacting performance. Make sure the candidate discusses common issues such as cloning a slave server (mysqldump + notation of the binlog position), load distribution using a load balancer or MySQL proxy, resolving slave lag by breaking larger queries into chunks, and how to promote a slave to become a new master.
How would the candidate troubleshoot performance issues? Do they have sufficient knowledge of the underlying operating system and hardware to diagnose whether a bottleneck is CPU bound, IO bound, or network bound? Can they demonstrate how to use EXPLAIN to discover indexing problems? Do they mention the slow query log or configuration options such as the key buffer, tmp table size, innodb buffer pool size, etc?
Does the candidate appreciate the subtleties of each storage engine? (MyISAM, InnoDB, and MEMORY are the main ones). Do they understand how each storage engine optimizes queries, and how locking is handled? At the least, the candidate should mention that MyISAM issues a table-level lock whereas InnODB uses row-level locking.
What is the safest way to make schema changes to a live database? The candidate should mention master-master replication, as well as avoiding the locking and performance issues of ALTER TABLE by creating a new table with the desired configuration and using mysqldump or INSERT INTO ... SELECT followed by RENAME TABLE.
Lastly, the only true measurement of a pro is experience. If the candidate cannot point to specific experience managing large data sets in a high availability environment, they might not be able to back up any knowledge they possess on a purely intellectual level.
I'd ask about the differences between the the various storage engines, their perceived benefits and drawbacks.
Defiantly cover replication, and dig into the drawbacks of replication, esp when using tables with auto increment keys.
If they are still with you then ask about replication lag, it's effects and standard patterns for monitoring it.
I think it would depend on the database type: transactional or data warehouse?
Anyhow, for all types I'd ask about specific to MySQL replication and clustering, performance tuning and monitorization concepts.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We are running an application that uses MySql with Engine InnoDB and we are planning to revamp the application (source code), so I was looking at postgres as it seems to be very popular and suggested by many people around the world. But there is something which really has put me on hold:
Taken from this thread.
When Not To Use PostgreSQL
Speed: If all you require is fast read operations, PostgreSQL is not
the tool to go for.
Simple set ups: Unless you require absolute data integrity, ACID
compliance or complex designs, PostgreSQL can be an over-kill for
simple set-ups.
Replication: Unless you are willing to spend the time, energy and
resources, achieving replication with MySQL might be simpler for those
who lack the database and system administration experience.
So, about speed, I am not sure what exactly it means by fast read operations. Does it mean simple read operations or complex? Because I also have read that postgres optimizes the query before executing it, so not sure if I truly understand the point or missing something?
In the end, I am not sure, which factors exactly should I look for choosing Postgres or Mysql for the application?
Note: I have read and tried to understand the differences between postgres and mysql but couldn't conclude anything, that is why I am posting question here. Also, I am not a DBA.
PostgreSQL can compress and decompress its data on the fly with a fast compression scheme to fit more data in an allotted disk space. The advantage of compressed data, besides saving disk space, is that reading data takes less IO, resulting in faster data reads.
Mysql: MyISAM tables suffer from table-level locking, and do not support ACID features such as data durability, crash recovery, transactions or foreign keys. Previously it has been claimed to perform better in read-only or read-heavy operations, but this is no longer necessarily the case.
Also see Benchmarking PostgreSQL vs. MySQL performance
It is highly depends on how your table structure maintained and how you are organising data.
Pinterest though using mysql have managed huge data with faster read.
All depends upon your application. If you are creating web application and that can be more complex, many tables with joins you are using, real time data. In that case you can prefer Postgresql.
PostgreSql Features : ORDBMS, MVCC, It can also be accessed by Routines from the platform native C library as well as Streaming API for large objects, Table inheritance, it is unified database server with a single storage engine, more reliable and fast in complex operation where many joins you are using, Locking to avoid race condition, having a lot of functions like --> To text search to_tsvecter() and to_tsquery(), get data in json format, having shared buffer cache, indexing, triggers, backup, master-slave replication and many more.
If your application is small, mobile platform, similar types of queries you are using, not many users for this application. In that case you can prefer Mysql.
Mysql Features : RDBMS, used JDBC ODBC, fast for similar types of queries, master-master replication.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I understand that this is very broad so let me give you the setting and be specific about my focus points.
Setting:
I am working with an existing PHP application using MYSQL. Tables almost all use the MYISAM engine and contain millions of rows for the most part. One of the largest tables uses an EAV design which is necessary but impacts on performance. The application was written to best leverage MYSQL cache. It requests a fair amount of requests per page load (partialy because of this) and is complex enough to have to go through most tables of the whole DB on each page load.
Pros:
it's free
MYISAM tables support full text indexes which are important to the application
Cons:
With the way things are set up, MYSQL is limited to one CPU for the whole of the application. If one very demanding query is run (or server is under a lot of load) it will queue all others making the site unresponsive
MYSQL caching and lack of "WITH" or "INTERSECT" means we have to break our queries down to better use cache. Thus multiplying the number of queries made. For instance, using subqueries over multiple tables with millions of rows (even with decent indexing) turns out to be a big issue with the current/upcomming load and the constraint layed out in the point above (CPU usage)
Feeling the need to scale up in the upcomming year, but not necessarily ready to pay for licensing right away, I've been thinking about rewriting the application and switching DBs.
The three options being considered are to either continue using mysql but with the INNODB engine, this way we can leverage more CPU power. Adapt to Oracle XE and get a license when we need to scale upwards of 4Gb database, 1Gb RAM or the 1 CPU limit (all of which we haven't hit yet). Or adapt to PostgreSQL
So the questions are :
How would losing full text indexing impact performance in the three cases (does oracle or postgreSQL have an equivalent?)
How do oracle and postgreSQL leverage cache on subqueries, WITH, and UNION/INTERSECT statements
How do Oracle and PostgreSQL leverage multicore/cpu power (if/when we get an oracle license)
I think that's already a lot to answer so I'll stop here. I don't mind simple/incomplete answers if there are links to compliment.
If you need any more information just let me know
Thanks in advance guys, the help is appreciated.
PostgreSQL supports full text search and indexes. Details here.
And it can use any number of CPU cores. It creates separate process for every session + some additional support processes. Details here.
PostgreSQL doesn't have built in query caching, but there are lots of open source utilities for this purpose.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I know that both these databases are better for different scenarios but in terms of a website where users will login and enter numerical data to a daily log, which one would it be best to use? I read that mySQL is faster to begin with but PostgreSQL is more scalable if the website were to start getting a lot of users?
The downside is that my host only offers mySQL and so to use postgreSQL I would have to purchase VPS hosting which is more expensive. I have also read people advising people to not worry about it to begin with, however it concerns me that I would have to rewrite queries and forms if I later moved to postgreSQL? I would appreciate everyone's thoughts on this.
I don't understand why people have given this question negative marks when I clearly stated that I am from a finance background and only started learning 3 weeks ago. I think you need to remember that everyone has to start somewhere and that we haven't all been doing this as a job/hobby for years. I would love to see some of you come out of your comfort zone and come and do my job for a day as you would be equally as clueless and I can guarantee that I would not be so rude as some of you have been here. You should be trying to create an environment of learning and innovation, rather than an environment of arrogance. If everyone knew everything, what would be the point in this website?
Disclaimer: I have worked a lot more with PostgreSQL than with MySQL
From a performance/scalability point of view both are probably pretty much the same. There are workloads where Postgres is better and there are workloads where MySQL is better. Unless you test it in your environment it's hard to tell which one would work better for you.
Postgres seems (seemed?) to be faster in a workload with a lot of concurrent writes, whereas MySQL seems to be better with heavy read-only workload. But those benchmarks are about 3-4 years old now, so they are probably no longer true - especially since InnoDB in MySQL 5.5 improved a lot in that area.
However PostgreSQL's SQL features are far more advanced than MySQL's and MySQL has a tendency to silently ignore things you tell it to do - especially in a default installation (and if you rely on a foreign key to be created that might be a very unpleasant surprise). MySQL still has an advantage in terms of clustering as far as I can tell.
They are both equal when it comes to High Availability solutions.
I strongly disagree with the opinion that one should avoid any DBMS specific features - utilizing all features of a DBMS will make your application more scalable and will increase performance.
Traditionally MySQL wasn't known for stability and quality of their releases, but that seems to have improved since Oracle has taken over.
I still don't like MySQL's release policy where they introduce major changes and features in minor releases. The PostgreSQL dev team has a much more strict policy about what goes into a minor release. Upgrading a minor release (i.e. bugfix releases) is much less "dangerous" in PostgreSQL than it is in MySQL.
Someone once said the big difference between the PG development and MySQL is: the Postgres team first makes sure your data is safe, then it makes sure everything is working correctly, then it makes it fast. Whereas the MySQL team first makes it fast, then correct and finally stable. But that too might have changed since the Oracle takeover.
Personally I'd always prefer PostgreSQL over MySQL because of the much better SQL feature set and the overall quality of the product.
MySQL is the more popular solution and is used by very large companies for very large databases, so MySQL is far from unscalable.
If you want the ability to move between both databases at a later date in case you decide to switch, I would recommend using an ORM (Look at http://www.doctrine-project.org/); this way you'll only have to write the queries once and if you change to a different database down the road, you only need to change a config variable. Doctrine will also have you build your database structure in a YAML file which it can convert for you as well.
It's also capable of migrating between database types.
You'll also want to take into account the different MySQL Engines which perform differently as well. I was just looking at a comparison between PostgreSQL and MySQL which in their conclusion, they didn't like the fact that MySQL wasn't built with transactions, however, InnoDB does provide transactional support for MySQL as well as speed and memory improvements in some cases.
So the bottom line is this: If you can make your application in such a way that you can use either database (as mentioned above) run your own benchmarks against your application and your databases and see what kind of a difference it makes to you.
There's certainly other things to think about if you have the budget for it and that's getting DBA's specific to the database you're using and get them to optimize it.
First, SQL is SQL, be sure that you use strict SQL, then you don't rewrite anything. The different between the both dbs is the level of SQL support. PosgreSQL has better support, but the support by MySQL depends on the used storage engine.
Yes, you can better scale your application with PostgeSQL, but how mach load have you on your server? 1GB per day, less more?
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Simple question, when should one not use MySQL?
There are two facets to my curiosity:
When to avoid MySQL in particular?
When to not use relational databases in general?
I wanted to be sure of my choice of MySQL (with PHP on Apache) as my employer was insistent on using something that's not free or open-source but I insisted otherwise. So I just want to be sure.
When your data is not relational, or when (based on your data access pattern and/or data model) you can choose better model, than the relational, use it. If you are not sure there is a better model for your problem, use RDBMS - of of the reasons of it's popularity is that it fits really good for most of the problems. Facebook and Google use MySQL (although not only MySQL, but major part of Facebook is on top of MySQL), so thing about this when you considering a NoSQL solution.
There are different type of databases, like graph databases, which are good for specific tasks. If you have such specific task, research the field of the task.
As for choosing vendor for a RDBMS, this is more a business objective, then a technical one. Sometimes the presense of support, certified professionals, training/consulting, and even matching the company infrastructure (if it has extensive Windows network and experienced windows-administrator it may prefer using windows server over a linux-based one) are the reasons particular software to be choosen.
1. When to avoid MySQL in particular?
When concurrent database sessions are both modifying and querying the database.
MySQL is fine for read-only or read-mostly scenarios (it is no accident that MySQL is frequently used for Web), but more advanced multi-version concurrency control capabilities of Oracle, MS SQL Server, PostgreSQL or even Firebird/Interbase can often handle read-write workloads not just with better performance but with better correctness as well (i.e. they are better at avoiding various concurrency artifacts that may endanger data consistency).
Even traditional "locking" databases such as DB2 or Sybase are likely to handle read-write workloads better than MySQL.
2. When to not use relational databases in general?
In short: when your data is not relational (i.e. it does not fit well in the paradigm of entities, attributes and relationships).
That being said, many modern DBMSes have capabilities outside traditional relational model, such as ability to "understand" hierarchical structure of XML. So even unstructured data that would not normally be stored in the relational DB (or at best would be stored in a BLOB) is no longer necessarily off-limits.
Not a difficult question to answer. Don't use MySQL if another DBMS is going to prove cheaper / better value. Other leading DBMSs like Oracle or SQL Server have many features that MySQL does not. Also if your employer already has a large investment in other DBMSs it may be prohibitively expensive and difficult to support MySQL without good reason. For what reason are you insisting on MySQL?
Also bear in mind that no business buys a DBMS. They buy a complete solution of which the DBMS is part. Consider the return on investment of the whole solution and not just the DBMS.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am working on a web application using Python (Django) and would like to know whether MySQL or PostgreSQL would be more suitable when deploying for production.
In one podcast Joel said that he had some problems with MySQL and the data wasn't consistent.
I would like to know whether someone had any such problems. Also when it comes to performance which can be easily tweaked?
A note to future readers: The text below was last edited in August 2008. That's nearly 11 years ago as of this edit. Software can change rapidly from version to version, so before you go choosing a DBMS based on the advice below, do some research to see if it's still accurate.
Check for newer answers below.
Better?
MySQL is much more commonly provided by web hosts.
PostgreSQL is a much more mature product.
There's this discussion addressing your "better" question
Apparently, according to this web page, MySQL is fast when concurrent access levels are low, and when there are many more reads than writes. On the other hand, it exhibits low scalability with increasing loads and write/read ratios. PostgreSQL is relatively slow at low concurrency levels, but scales well with increasing load levels, while providing enough isolation between concurrent accesses to avoid slowdowns at high write/read ratios. It goes on to link to a number of performance comparisons, because these things are very... sensitive to conditions.
So if your decision factor is, "which is faster?" Then the answer is "it depends. If it really matters, test your application against both." And if you really, really care, you get in two DBAs (one who specializes in each database) and get them to tune the crap out of the databases, and then choose. It's astonishing how expensive good DBAs are; and they are worth every cent.
When it matters.
Which it probably doesn't, so just pick whichever database you like the sound of and go with it; better performance can be bought with more RAM and CPU, and more appropriate database design, and clever stored procedure tricks and so on - and all of that is cheaper and easier for random-website-X than agonizing over which to pick, MySQL or PostgreSQL, and specialist tuning from expensive DBAs.
Joel also said in that podcast that comment would come back to bite him because people would be saying that MySQL was a piece of crap - Joel couldn't get a count of rows back. The plural of anecdote is not data. He said:
MySQL is the only database I've ever programmed against in my career that has had data integrity problems, where you do queries and you get nonsense answers back, that are incorrect.
and he also said:
It's just an anecdote. And that's one of the things that frustrates me, actually, about blogging or just the Internet in general. [...] There's just a weird tendency to make anecdotes into truths and I actually as a blogger I'm starting to feel a little bit guilty about this
Just chiming in many months later.
The geographical capabilities of the two databases are very, very different. PostgreSQL has the exceptional PostGIS extension. MySQL's geographical functionality is practically zero in comparison.
If your web service has a location component, choose PostgreSQL.
I haven't used Django, but I have used both MySQL and PostgreSQL. If you'll be using your database only as a backend for Django, it doesn't matter much, because it will abstract away most of the differences. PostgreSQL is a little more scalable because it doesn't hit the brick wall as fast as MySQL as data-size/client-count increase.
The real difference comes in if you are doing a new system. Then I'd recommend PostgreSQL hands down, because it has a lot more features which make your DB layer much more customizable, so that you can fine-tune it to any requirements you might have.
Although it's a bit out of date, it would be worth reading the MySQL Gotchas page. Many of the items listed there are still true, to the best of my knowledge.
I use PostgreSQL.
I use both extensively. My choice for a particular project boils down to:
Licensing - Are you going to distribute your app (IANAL)
Existing Infrastructure and Knowledge Base
Any special sauce you have to have.
By special sauce I mean things like:
Easy/cheap replication = MySQL
Huge dataset problems with small results = PostgreSQL. Use the language extensions, and have very efficient data operations. (PL/Python, PL/TCL, PL/Perl, etc)
Interface with R Statistical Libraries = PostgreSQL PL/R available in debian/ubuntu
Well, I don't think you should be using a different database brand in anything past development (build, staging, prod) as that will come back to bite you.
From how I understand it PostgreSQL is a more 'correct' database implementation while mySQl is less correct (less compliant) but faster.
So if you are pretty much writing a CRUD application mySQL is the way to go. If you require certain features out of your database (if you're not sure then you don't) then you may want to look into postgreSQL.
If you are writing an application which may get distributed quite a bit on different servers, MySQL carries a lot of weight over PostgreSQL because of the portability. PostgreSQL is difficult to find on less than satisfactory web hosts, albeit there are a few. In most regards, PostgreSQL is slower than MySQL, especially when it comes to fine tuning in the end. All in all, I'd say to give PostgreSQL a shot for a short amount of time, that way you aren't completely avoiding it, and then make a judgement.
Thank you. I've used Django with MySQL and it's fine. Choose your database on the features you need. Hard to compare MySQL and Postgres. Better to compare Postgress to SQl Server.
#WolfmanDragon
PostgreSQL has (tiny) support for objects, but it is, by nature, a relational database. From its about page:
PostgreSQL is a powerful, open source relational database system.
MySQL is a relational database management system while PostgreSQL is an object-relational database management system. PostgreSQL is suited well for C++ or Java developers, as it gives us more control over how queries are written. ORDBMS also gives us Objects and User Defined Types. The SQL queries themselves are much closer to the ISO standards than MySQL.
Do you need an ORDBMS or a RDBMS? That will better answer your question.