Recently I've found these two MySQL functions compress and uncompress and I'm planning to use it in my database to compress some xml files (around 40 million ~ 600GB+) to reduce my database size. Unfortunately, I didn't found anything about performance and ram/hardware usage from those functions.
How would it be in terms of performance and server ram usage performing hundred/thousands of queries like these per minute?
SELECT cast(uncompress(document) AS char(10000))
FROM documents;
Related
I have nearly 2.54GB database datas, which have tens millions of listing.
Now I have optimized mysql query as good as I can. but still I got 10 to 12 secs to get data. SO can anyone help me what should I do now ?
There are several things you could do:
If it's feasible, optimize your database by choosing the data sizes and types which fit best;
Add indexes to the most searched columns in your queries;
Choose the right configuration parameters for your database. You should use MySQLTuner-perl and/or the database configuration wizard from Percona (free registration required). Remember, tuning MySQL is a trial-and-error process; there is no "right" configuration, only one that works better for you. For instance, you could find that you get better performances with a large query cache, or with a disabled query cache altogether;
You could move your database to a SSD drive to increase disk access times.
I have Drupal site with MySQL database on my hosting provider. He tells me that my DB is almost on max limit about 1GB but when I export DB to output file, the file is only 80 MB for whole DB.
It is logical DB to be smaller than my output file or same but almost when DB on hosting is 10 times larger than export file, I think it is impossible.
Can you help me to find out is it possible, or my hosting provider manipulate my data and write me every day messages to get bigger DB storage for more money?
Functioning MySQL databases use plenty of data storage for such things as indexes. A 12x ratio between your raw (exported) data size and the disk space your database uses is a large ratio, but it's still very possible.
MySQL makes the tradeoff of using lots of disk space to improve query performance. That's a wise tradeoff these days because the cost of disk space is low, and decreasing very fast.
You may (or may not) recover some data space by going in to phpmyadmin and optimizing all your tables. This query will optimize a list of tables.
OPTIMIZE TABLE tablename, tablename, tablename
It's smart to back up your data base before doing this.
A 1GiB limit on a MySQL database size is absurdly low. (A GiB of storage at AWS or another cloud provider costs less than $US 0.20 per month these days.) It may be time to get a better hosting service.
Basic info:
My Mysql database is using TokuDB, InnoDB, MyIsam tables.
Server info:
16 core, 64GB RAM, CentOS 6.2, MySQL v 5.5
Process:
1. Import large amount data from one text file to one TokuDB table.
2. Select data by joining different table.
When process 1 and 2 running at the same time, the whole operation speed will be much slower.
Does anyone know specific reason?
Any suggestions to improve it?
Separate the IO to different disks/arrays. Having all IO on a single partition/array results in horrible performance. If possible, invest in a dedicated drive array such as IBM's DS3524 or HP Smart Array. Connecting the DB Server though Fibre Channel (or better yet SAS2) will give you an incredible performance gain. I stopped putting lots of disks into the server itself a few years ago. I get 5X the performance with MySQL on a drive array than disk in the server.
in tokudb, load data infile works much faster when importing on empty tables (especially when you have non-increment primarey key, or unique index)
Is there any tools which is equivalent to pgpool-II (which is for PostgreSQL) for MySQL db?
Of particular importance for me is the feature - Load Balalce
Load Balance
If a database is replicated, executing
a SELECT query on any server will
return the same result. pgpool-II
takes an advantage of the replication
feature to reduce the load on each
PostgreSQL server by distributing
SELECT queries among multiple servers,
improving system's overall throughput.
At best, performance improves
proportionally to the number of
PostgreSQL servers. Load balance works
best in a situation where there are a
lot of users executing many queries at
the same time.
I'm developing a database that holds large scientific datasets. Typical usage scenario is that on the order of 5GB of new data will be written to the database every day; 5GB will also be deleted each day. The total database size will be around 50GB. The server I'm running on will not be able to store the entire dataset in memory.
I've structured the database such that the main data table is just a key/value store consisting of a unique ID and a Value.
Queries are typically for around 100 consecutive values,
eg. SELECT Value WHERE ID BETWEEN 7000000 AND 7000100;
I'm currently using MySQL / MyISAM, and these queries take on the order of 0.1 - 0.3 seconds, but recently I've come to realize that MySQL is probably not the optimal solution for what is basically a large key/value store.
Before I start doing lots of work installing the new software and rewriting the whole database I wanted to get a rough idea of whether I am likely to see a significant performance boost when using a NoSQL DB (e.g. Tokyo Tyrant, Cassandra, MongoDB) instead of MySQL for these types of retrievals.
Thanks
Please consider also OrientDB. It uses indexes with RB+Tree algorithm. In my tests with 100GB of database reads of 100 items took 0.001-0.015 seconds on my laptop, but it depends how the key/value are distributed inside the index.
To make your own test with it should take less than 1 hour.
One bad news is that OrientDB not supports a clustered configuration yet (planned for September 2010).
I use MongoDB in production for a write intensive operation where I do well over the rates you are referring to for both WRITE and READ operations, the size of the database is around 90GB and a single instance (amazon m1.xlarge) does 100QPS I can tell you that a typical key->value query takes about 1-15ms on a database with 150M entries, with query times reaching the 30-50ms time under heavy load.
at any rate 200ms is way too much for a key/value store.
If you only use a single commodity server I would suggest mongoDB as it quite efficient and easy to learn
if you are looking for a distributed solution you can try any Dynamo clone:
Cassandra (Facebook) or Project Volemort (LinkedIn) being the most popular.
keep in mind that looking for strong consistency slows down these systems quite a bit.
I would expect Cassandra to do better where the dataset does not fit in memory than a b-tree based system like TC, MySQL, or MongoDB. Of course, Cassandra is also designed so that if you need more performance, it's trivial to add more machines to support your workload.