I have a large database in MongoDB volume of 25gb, because of complex queries aggregation MongoDB cope worse than MySQL, but there is a fear that MySQL will take much more space on disk, is there any way to know the approximate size of the database but in MySQL? Perhaps someone has already done a comparison of these databases in terms of size?
The answer depends on a lot of choices specific to your database, such as:
MongoDB storage engine
MySQL storage engine
Number of indexes
Data types of indexed and non-indexed columns
Compression options used in either brand of database
Probably other factors
The best way to get an accurate comparison is to try it yourself using your data and your data model.
See also my answer to https://stackoverflow.com/a/66873904/20860
Related
MySQL temporary table are stored in memory as long as computer has enough RAM (and MySQL was set up accordingly). One can created any indexes for any fields.
Redis stores data in memory indexed by one key at time and in my understanding MySQL can do this job too.
Are there any things that make Redis better for storing big amount(100-200k rows) of volatile data? I can only explain the appearance of Redis that not every project has mysql inside and probably some other databases don't support temporary tables.
If I already have MySql in my project, does it make sense to put up with Redis?
Redis is like working with indexes directly. There's no ACID, SQL parser and many other things between you and the data.
It provides some basic data structures and they're specifically optimized to be held in memory, and they also have specific operations to read and modify them.
In the other hand, Redis isn't designed to query data (but you can implement very powerful and high-performant filters with SORT, SCAN, intersections and other operations) but to store the data as you're going to be consumed later. If you want to get, for example, customers sorted by 3 different criterias, you'll need to work to fill 3 different sorted sets. There're a lot of use cases with other data structures, but I would end up writing a book in an answer...
Also, one of most powerful features found in Redis is how easy can be replicated, and since its 3.0 version, it supports data sharding out-of-the-box.
About why you would need to use Redis instead of temporary tables on MySQL (and other engines which have them too) is up to you. You need to study your case and check if caching or storing data in a NoSQL storage like Redis can both outperform your actual approach and it provides you a more elegant data architecture.
By using Redis alongside the other database, you're effectively reducing the load on it. Also, when Redis is running on a different server, scaling can be performed independently on each tier.
I am planning to use mysql to store my datasets.
I have about 10^8 (hundred million) records:
ID(int), x(float), y(float), z(float), property(float).
Which database engine is suited for this kind of data-sets InnoDB or MyISAM? Or maybe ndb (I have no idea on scalability or performance)?
I am planning to query the static dataset with following questions:
Select getRectagularRegion or getPointsInSphere;
I am assuming you are trying to store points in 3d space and then find all points within a region.
How the underlining database codes with a lot of records is a lot less important to you then having a very good 3d spatial indexing system build in to the database. Without the spatial indexing you can’t do the queries with wish.
You should also consider writing your own data storage as a simple 3rd quod tree may give you good indexing depending on how your points are clustered – but using a "off the self" database would be less work for you.
So I think you need to investigate support for spatial indexing in databases, rather than ask about support for lots of rows. Storing lots of rows is a given for most databases these days…
Your table seems to be pretty simple and you won't need transactions and foreign keys. So I guess MyISAM would be better suited than InnoDB. But I guess MEMORY might be your fastest choice.
how much ram needs mongo in comparison with MySQL?
MongoDB does its best to keep as much useful information in RAM. MySQL generally does the same thing.
Both databases will use all of the RAM they have available.
Comparing the two is not easy, because it really depends on a lot of things. Things like your table structure and your data size and your indexes.
If you give MongoDB and MySQL the same amount of RAM, you will typically find the following:
MongoDB will be very good at finding individual records. (like looking up a user or updating an entry)
MySQL will be very good at loading and using sets of related data.
The performance will really be dictated by your usage of the database.
The short answer is : the same.
Another way to ask is: if I am using MySQL + memcached, how much ram do i need to use Mongo instead of that combination? The answer would be on the order of the same total amount of memory for both clusters (the mongodb cluster being sharded probably in this scenario).
For the same data set, with mostly text data, how do the data (table + index) size of Postgresql compared to that of MySQL?
Postgresql uses MVCC, that would suggest its data size would be bigger
In this presentation, the largest blog site in Japan talked about their migration from Postgresql to MySQL. One of their reasons for moving away from Postgresql was that data size in Postgresql was too large (p. 41):
Migrating from PostgreSQL to MySQL at Cocolog, Japan's Largest Blog Community
Postgresql has data compression, so that should make the data size smaller. But MySQL Plugin also has compression.
Does anyone have any actual experience about how the data sizes of Postgresql & MySQL compare to each other?
MySQL uses MVCC as well, just check
innoDB. But, in PostgreSQL you can
change the FILLFACTOR to make space
for future updates. With this, you
can create a database that has space
for current data but also for some
future updates and deletes. When
autovacuum and HOT do their things
right, the size of your database can
be stable.
The blog is about old versions, a lot
of things have changed and PostgreSQL
does a much better job in compression
as it did in the old days.
Compression depends on the datatype,
configuration and speed as well. You
have to test to see how it's working
for you situation.
I did a couple of conversions from MySQL to PostgreSQL and in all these cases, PostgreSQL was about 10% smaller (MySQL 5.0 => PostgreSQL 8.3 and 8.4). This 10% was used to change the fillfactor on the most updated tables, these were set to a fillfactor 60 to 70. Speed was much better (no more problems with over 20 concurrent users) and data size was stable as well, no MVCC going out of control or vacuum to far behind.
MySQL and PostgreSQL are two different beasts, PostgreSQL is all about reliability where MySQL is populair.
Both have their storage requirements in their respective documentation:
MySQL: http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html
Postgres: http://www.postgresql.org/docs/current/interactive/datatype.html
A quick comparison of the two don't show any flagrant "zomg PostGres requires 2 megabytes to store a bit field" type differences. I suppose Postgres could have higher metadata overhead than MySQL, or has to extend its data files in larger chunks, but I can't find anything obvious that Postgres "wastes" space for which migrating to MySQL is the cure.
I'd like to add that for large columns stores, postgresql also takes advantage of compressing them using a "fairly simple and very fast member of the LZ family of compression techniques"
To read more about this, check out http://www.postgresql.org/docs/9.0/static/storage-toast.html
It's rather low-level and probably not necessary to know, but since you're using a blog, you may benefit from it.
About indexes,
MySQL stores the data within the index which makes them huge. Postgres doesn't. This means that the storage size of a b-tree index in Postgres doesn't depend on the number of the column it spans or which data type the column has.
Postgres also supports partial indexes (e.g. WHERE status=0) which is a very powerful feature to prevent building indexes over millions of rows when only a few hundred is needed.
Since you're going to put a lot of data in Postgres you will probably find it practical to be able to create indexes without locking the table.
Sent from my iPhone. Sorry for bad spelling and lack of references
Can mysql handle a dataset of 50gb (only text) efficiently ? If not, what database technologies should I use ?
thanks
Technically, I would say yes. MySQL can handle 50GB of data, efficiently.
If you are looking for a few examples, Facebook moved to Cassandra only after it was storing over 7 Terabytes of inbox data.
Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.
Wikipedia also handles hundreds of Gigabytes of text data in MySQL.
Any backend that uses b-trees (like all the popular ones for MySQL) get dramatically slower when the index doesn't fit in RAM anymore. Depending on your query needs, Cassandra might be a good fit, or Lucandra (Lucene + Cassandra) -- http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/