Mysql vs Postgres Read Operations [closed] - mysql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We are running an application that uses MySql with Engine InnoDB and we are planning to revamp the application (source code), so I was looking at postgres as it seems to be very popular and suggested by many people around the world. But there is something which really has put me on hold:
Taken from this thread.
When Not To Use PostgreSQL
Speed: If all you require is fast read operations, PostgreSQL is not
the tool to go for.
Simple set ups: Unless you require absolute data integrity, ACID
compliance or complex designs, PostgreSQL can be an over-kill for
simple set-ups.
Replication: Unless you are willing to spend the time, energy and
resources, achieving replication with MySQL might be simpler for those
who lack the database and system administration experience.
So, about speed, I am not sure what exactly it means by fast read operations. Does it mean simple read operations or complex? Because I also have read that postgres optimizes the query before executing it, so not sure if I truly understand the point or missing something?
In the end, I am not sure, which factors exactly should I look for choosing Postgres or Mysql for the application?
Note: I have read and tried to understand the differences between postgres and mysql but couldn't conclude anything, that is why I am posting question here. Also, I am not a DBA.

PostgreSQL can compress and decompress its data on the fly with a fast compression scheme to fit more data in an allotted disk space. The advantage of compressed data, besides saving disk space, is that reading data takes less IO, resulting in faster data reads.
Mysql: MyISAM tables suffer from table-level locking, and do not support ACID features such as data durability, crash recovery, transactions or foreign keys. Previously it has been claimed to perform better in read-only or read-heavy operations, but this is no longer necessarily the case.
Also see Benchmarking PostgreSQL vs. MySQL performance
It is highly depends on how your table structure maintained and how you are organising data.
Pinterest though using mysql have managed huge data with faster read.

All depends upon your application. If you are creating web application and that can be more complex, many tables with joins you are using, real time data. In that case you can prefer Postgresql.
PostgreSql Features : ORDBMS, MVCC, It can also be accessed by Routines from the platform native C library as well as Streaming API for large objects, Table inheritance, it is unified database server with a single storage engine, more reliable and fast in complex operation where many joins you are using, Locking to avoid race condition, having a lot of functions like --> To text search to_tsvecter() and to_tsquery(), get data in json format, having shared buffer cache, indexing, triggers, backup, master-slave replication and many more.
If your application is small, mobile platform, similar types of queries you are using, not many users for this application. In that case you can prefer Mysql.
Mysql Features : RDBMS, used JDBC ODBC, fast for similar types of queries, master-master replication.

Related

AWS MySQL RDS vs AWS DynamoDB [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've been using MySQL for a fair while now and I'm comfortable with its structure & SQL Queries etc.
Currently building a new system in AWS and I've been looking at DynamoDB. Currently I only know a little about it.
Is one better then the other?
What are the advantage of DynamoDB?
what is the transition like from MySQL queries etc to this flat style DB?
Really DynamoDB and MySQL are apples and oranges. DynamoDB is a NoSQL storage layer while MySQL is used for relational storage. You should pick what to use based on the actual needs of your application. In fact, some applications might be well served by using both.
If, for example, you are storing data that does not lend itself well to a relational schema (tree structures, schema-less JSON representations, etc.) that can be looked up against a single key or a key/range combination then DynamoDB (or some other NoSQL store) would likely be your best bet.
If you have a well-defined schema for your data that can fit well in a relational structure and you need the flexibility to query the data in a number of different ways (adding indexes as necessary of course), then RDS might be a better solution.
The main benefit for using DynamoDB as a NoSQL store is that you get guaranteed read/write throughput at whatever level you require without having to worry about managing a clustered data store. So if your application requires 1000 reads/writes per second, you can just provision your DynamoDB table for that level of throughput and not really have to worry about the underlying infrastructure.
RDS has much of the same benefit of not having to worry about the infrastructure itself, however if you end up needing to do a significant number of writes to the point where the largest instance size will no longer keep up, you are kind of left without options (you can scale horizontally for reads using read replicas).
Updated note: DynamoDb does now support global secondary indexing, so you do now have the capability to perform optimized lookups on data fields other than the hash or combination of hash and range keys.
We have just migrated all of our DynamoDB tables to RDS MySQL.
While using DynamoDB for specific tasks may make sense, building a new system on top of DynamoDB is really a bad idea. Best laid plans etc., you always need that extra flexibility from your DB.
Here are our reasons we moved from DynamoDB:
Indexing - Changing or adding keys on-the-fly is impossible without creating a new table.
Queries - Querying data is extremely limited. Especially if you want to query non-indexed data. Joins are of course impossible so you have to manage complex data relations on your code/cache layer.
Backup - Such a tedious backup procedure is a disappointing surprise compared to the slick backup of RDS
GUI - bad UX, limited search, no fun.
Speed - Response time is problematic compared to RDS. You find yourself building elaborate caching mechanism to compensate for it in places you would have settled for RDS's internal caching.
Data Integrity - While the concept of fluid data structure sounds nice to begin with, some of your data is better "set in stone". Strong typing is a blessing when a little bug tries to destroy your database. With DynamoDB anything is possible and indeed anything that can go wrong does.
We now use DynamoDB as a backup for some systems and I'm sure we'll use it in the future for specific, well defined tasks. It's not a bad DB, it's just not the DB to serve 100% of your core system.
As far as advantages go, I'd say Scalability and Durability. It scales incredibly and transparently and it's (sort of) always up. These are really great features, but they do not compensate in any way for the downside aspects.
You can read AWS explanation about it here.
In short, if you have mainly Lookup queries (and not Join queries), DynamoDB (and other NoSQL DB) is better. If you need to handle a lot of data, you will be limited when using MySQL (and other RDBMS).
You can't reuse your MySQL queries nor your data schema, but if you spend the effort to learn NoSQL, you will add an important tool to your tool box. There are many cases where DynamoDB is giving the simplest solution.
When using DynamoDB you should also know that the items/records in DynamoDB are limited to 400KB (See DynamoDB Limits). For many use cases this will not work. So DynamoDB will be good for few things but not all. Same goes for many of the other NoSQL database.

Is there a high performance difference in a Key-Value db on a single server with MySQL vs. NoSQL

In my PHP application I have a 470M rows table weighing 200GB in a MySQL MyISAM partitioned table on one server. Usage includes 70% Writes/30% Reads.
I'm trying to improve performance. Main problem currently is read/write contentions due to table-level locks. I'm trying to decide between two options:
Changing MySQL to Innodb. Pros: avoiding the table level locks. Cons: Much more disk space, need bigger HDs which might not be as fast as these (currently using RAID10 6*300GB SAS 15k).
Moving data to a NoSQL db. Main Con: Learning curve. Have never used NoSQL before.
Question is, while trying to still avoid sharding the data, and considering the fact I'm using the RDMS MySQL as a simple key-value storage, are there high differences between performances between the two approaches or is the NoSQL main advantage here comes when moving to a distributed system?
I can only answer your question partially but hopefully more than a comment.
MongoDB is not typically a key-value store and has been known to have certain performance hits when used as one.
MongoDb also has a locking problem here that could come back to haunt you. It has a DB level lock atm which means it could (would need testing) cause write lock saturation.
It is also heavily designed for a 80% read app (which is said to be the most common setup for websites now-a-days) so the more writes you do the more you will notice a performance drop over time. That being said you can tweak MongoDB to be more write friendly and the distributed nature does help to stop write lock saturation a little.
However that being said my personal opinion the learning curve of MongoDB from SQL:
Was next to null
More natural and simpler to implement into my app than SQL
Query language is simple making it dead easy to get to grips with
Query language has a lot of similarities to SQL
The drivers are standardised so that the syntax you see in the Docs for the JS driver in the console is consistent across the board.
My personal opinion on the general matter is the distributed notion of it. If you get a NoSQL solution designed for key-value stores then it could be really good. A quick search on Google pulled out a small list of NoSQL key-value stores on Wikipedia: http://en.wikipedia.org/wiki/NoSQL#Key-value_stores_on_solid_state_or_rotating_disk

When to use MongoDB [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm writing an application that doesn't necessarily need scaling abilities as it won't be collecting large amounts data at the beginning. (However, if I'm lucky, I could down the road potentially.)
I will be running my web server and database on the same box (for now).
That being said, I am looking for performance and efficiency.
The main part of my application will be loading blog articles. Using an RDBMS (MySQL) I will make 6 queries (2 of the queries being joins), just to load a single blog article page.
select blog
select blog_album
select blog_tags
select blog_notes
select blog_comments (join with users)
select blog_author_participants (join with users)
However, with MongoDB I can de-normalize and flatten 6 tables into just 2 tables/collections and minimizes my queries to potentially just one 1 query,
users
blogs
->blog_album
->blog_tags
->blog_notes
->blog_comments
->blog_author_participants
Now, going with the MongoDB schema, there will be some data redundancy. However, hard drive space is cheaper than CPU/servers.
1.) Would this be a good scenario to use MongoDB?
2.) Do you only benefit in performance using MongoDB when scaling beyond a single server?
3.) Are there any durability risks using MongoDB? I hear that there is potential for loss of data while performing inserts - as insert are written to memory first, then to the database.
4.) Should this stop me from using MongoDB in production?
You would use MongoDB when you have a use case that matches its strengths.
Do you need a schema-less document store? Nope, you have a stable schema.
Do you need automatic sharding? Nope, you don't have extraordinary data needs or budget for horizontally scaling hardware.
Do you need map/reduce data processing? Not for something like a blog.
So why are you even considering it?
However, with MongoDB I can de-normalize and flatten 6 tables into just 2 tables/collections and minimizes my queries to potentially just one 1 query
But you can easily query MySQL for 6 tables worth of information related to a single blog post with a single properly crafted SQL statement.
however hard drive space is cheaper than CPU/servers.
If performance and scaling is a priority then you are going to be concerned with having enough RAM to fit everything into main memory and enough CPU cores to run queries. An enterprise grade RAID 10 array is a requirement, don't get me wrong, but as soon as your database software (MongoDB or MySQL) needs to scan an index that can't fit into main memory you'll be in for a world of pain assuming a large active database. :)
I like MongoDB, but it's big strength in my mind is map/reduce and its document-orientation. You require neither of those features. MySQL is time-tested in large scale deployments and supports partitioning (but I would argue that your database would have to be in the order of 50-100 GB before you can realize substantial gain from partitioning vs a single (plus passive backup) server with tons (64 GB+) of RAM. I would also argue that if performance is truly a concern then MySQL would be preferable as you would have supreme control over your indexes.
That's not to say that MongoDB isn't high performance, but its place probably isn't serving blogs. Your concern with inserts is valid as well. MongoDB is not an ACID system. Google transactions in both systems and compare.
Here is a good explanation: http://mod.erni.st/nosql-if-only-it-was-that-easy/
The last paragraph summarizes it:
What am I going to build my next app on? Probably Postgres. Will I use NoSQL? Maybe. I might also use Hadoop and Hive. I might keep everything in flat files. Maybe I’ll start hacking on Maglev. I’ll use whatever is best for the job. If I need reporting, I won’t be using any NoSQL. If I need caching, I’ll probably use Tokyo Tyrant. If I need ACIDity, I won’t use NoSQL. If I need a ton of counters, I’ll use Redis. If I need transactions, I’ll use Postgres. If I have a ton of a single type of documents, I’ll probably use Mongo. If I need to write 1 billion objects a day, I’d probably use Voldemort. If I need full text search, I’d probably use Solr. If I need full text search of volatile data, I’d probably use Sphinx.
NoSQL vs. RDBMS: Apples and Oranges?
I would advise you to read up a little on what NoSQL is and what it does before you decide whether you can use it. You can't take a normal database and turn it into a NoSQL thing just like that. The way you work with the data is completely different.
NoSQL definitely has its uses. But it's definitely not the answer for everything. The main advantage of NoSQL is the easily changeable data model.
Advantages of using mongodb ( as per Moshe Kaplan published in dzone article)
Schema-less design
Scalability in managing Tera bytes of data
Rapid replicaSet with high availability feature
Sharding enables linear and scale out growth w/o running out of budget
Support high write load
Use of Data locality for query processing
MongoDB meets Consistency & Partitioning requirements in CAP theory ( Consistency, Availability and Partitioning)
Related SE questions:
What are the advantages of using a schema-free database like MongoDB compared to a relational database?
When to Redis? When to MongoDB?
I can't speak to the performance considerations, but for me, the first consideration of whether you want to use a SQL-DB vs MongoDB is the structure of the data you want to store.
MongoDB is "schema-less" in the sense that you don't need to know what "tables" and "columns" you want beforehand. It is very flexible. So, if you don't know what information you want to store in your "blogs" Collection for example, or if different blog posts may store different information, then MongoDB allows this flexibility. Whereas with SQL relational databases, you have to know your schema upfront.
But it sounds like you already know what information you want to store, in which case I might just stick with a SQL relational database. I don't think performance is the first consideration in your case - you're not building a real-time application where one or two milliseconds matter all that much.

How to compare MySQL database schemas [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am looking for a tool that will allow me to compare schemas of MySQL databases.
Which is the best tool to do that?
Navicat is able to do that for you. It will also synchronize schema and/or data between two mysql database instances. I've used it with success in the past.
Link
There is a screenshot of the data and structure synchronization tool here:
http://www.navicat.com/en/products/navicat_mysql/mysql_detail_mac.html#7
Perhaps a bit late to the party, but I've just written a simple tool in PHP to compare MySQL database schemas:
PHP script to compare MySQL database schemas
How to use the script
It exports the schema and serialises it before doing the comparison. This is so that databases can be compared that reside on different hosts (where both hosts may not be accessible by the PHP script).
Edit:
Python Script
I use SQLyog:
http://www.webyog.com/en/
It isn't free but is a very good tool and has saved the cost of it's license many many times over. I'm in no way affiliated with the company, just someone who has used a number of MySQL tools.
Free trial(30-day) available from here.
The best thing to do is to try out some performance benchmarks that are already out there. It's always better to use tried-and-tested benchmarks, unless you're thoroughly convinced that your data and database loading is going to be significantly different to the traditional usage patterns (but, then, what are you using a database for?). I'm going to steal my own answer from ServerFault:
There are a good number of benchmarks
out there for different MySQL database
engines. There's a decent one
comparing MyISAM, InnoDB and Falcon on
the Percona MySQL Performance
Blog, see here.
Another thing to consider between the
two aforementioned engines (MyISAM and
InnoDB) are their approaches to
locking. MyISAM performs
table-locking, whilst InnoDB performs
row-locking. There are a variety of
things to consider, not only downright
performance figures.

MySQL Interview Questions [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've been asked to screen some candidates for a MySQL DBA / Developer position for a role that requires an enterprise level skill set.
I myself am a SQL Server person so I know what I would be looking for from that point of view with regards to scalability / design etc but is there anything specific I should be asking with regards to MySQL?
I would ideally like to ask them about enterprise level features of MySQL that they would typically only use when working on a big database. Need to separate out the enterprise developers from the home / small website kind of guys.
Thanks.
Although SQL Server and MySQL are both RDBMs, MySQL has many unique features that can illustrate the difference between novice and expert.
Your first step should be to ensure that the candidate is comfortable using the command line, not just GUI tools such as phpMyAdmin. During the interview, try asking the candidate to write MySQL code to create a database table or add a new index. These are very basic queries, but exactly the type that GUI tools prevent novices from mastering. You can double-check the answers with someone who is more familiar with MySQL.
Can the candidate demonstrate knowledge of how JOINs work? For example, try asking the candidate to construct a query that returns all rows from Table One where no matching entries exist in Table Two. The answer should involve a LEFT JOIN.
Ask the candidate to discuss backup strategies, and the various strengths and weaknesses of each. The candidate should know that backing up the database files directly is not an effective strategy unless all the tables are MyISAM. The candidate should definitely mention mysqldump as a cornerstone for backups. More sophisticated backup solutions include ibbackup/innobackup and LVM snapshots. Ideally, the candidate should also discuss how backups can affect performance (a common solution is to use a slave server for taking backups).
Does the candidate have experience with replication? What are some of the common replication configurations and the various advantages of each? The most common setup is master-slave, allowing the application to offload SELECT queries to slave servers, along with taking backups using a slave to prevent performance issues on the master. Another common setup is master-master, the main benefit being the ability to make schema changes without impacting performance. Make sure the candidate discusses common issues such as cloning a slave server (mysqldump + notation of the binlog position), load distribution using a load balancer or MySQL proxy, resolving slave lag by breaking larger queries into chunks, and how to promote a slave to become a new master.
How would the candidate troubleshoot performance issues? Do they have sufficient knowledge of the underlying operating system and hardware to diagnose whether a bottleneck is CPU bound, IO bound, or network bound? Can they demonstrate how to use EXPLAIN to discover indexing problems? Do they mention the slow query log or configuration options such as the key buffer, tmp table size, innodb buffer pool size, etc?
Does the candidate appreciate the subtleties of each storage engine? (MyISAM, InnoDB, and MEMORY are the main ones). Do they understand how each storage engine optimizes queries, and how locking is handled? At the least, the candidate should mention that MyISAM issues a table-level lock whereas InnODB uses row-level locking.
What is the safest way to make schema changes to a live database? The candidate should mention master-master replication, as well as avoiding the locking and performance issues of ALTER TABLE by creating a new table with the desired configuration and using mysqldump or INSERT INTO ... SELECT followed by RENAME TABLE.
Lastly, the only true measurement of a pro is experience. If the candidate cannot point to specific experience managing large data sets in a high availability environment, they might not be able to back up any knowledge they possess on a purely intellectual level.
I'd ask about the differences between the the various storage engines, their perceived benefits and drawbacks.
Defiantly cover replication, and dig into the drawbacks of replication, esp when using tables with auto increment keys.
If they are still with you then ask about replication lag, it's effects and standard patterns for monitoring it.
I think it would depend on the database type: transactional or data warehouse?
Anyhow, for all types I'd ask about specific to MySQL replication and clustering, performance tuning and monitorization concepts.