Mysql is sucking for large data [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have a MySQL database with around 30gb data. Some of the tables contain over 40 million rows. Iam using InnoDB. I query by only use "select count(*) from table_name" in local PC takes me around 5 minutes. I think it's impossible for me to the joining of the table. I would like to ask would there anything I could do to improve the performance. Or do I need to switch to another database? I never encounter such large data in DB. Please help.

I have run mysql instances with over 100 million entries, and delivering over 30 million queries per day. So it can be done.
The problems you are experiencing will occur with any other database system if similarly configured.
I can only give you a few tips, if this is mission critical consider hiring a professional to tweak your system.
Basics that you need to look at;
This size database is best run on a dedicated server with SSD disks, and at least 2 cores;
Your going to need a lot of RAM in your server, at least your total database size + 20% for other system resources;
Make sure mysql has been configured with enough memory, 80% of your total RAM. The primary setting that does this will be innodb_buffer_pool_size;
Optimize your queries, and index where needed - this is a fine art but can drastically improve performance, learn to use EXPLAIN ... on your queries.

MySQL InnoDB tables do not keep a count of rows, thus SELECT COUNT(*) can be slow. It's not an indication of how other queries might perform, but it is an indication of how slow a full table scan might be. Five minutes is really bad for just 40 million rows and might indicate a serious problem with your database or disk.
Here is a performance blog on the subject. Also see this related answer.

I had encounter the large date size problem before and hope my experience is useful for you.
first, your need create index for your table, but which kind of index should be used depending on your query logic.
after indexing, if the query is still slow, you'd better divide the data into a hierarchy, for example, source tables, middle tables and report tables. the report table just store some final data and the query will be fast, also create index for it.
third, try to use something like memsql if above mentioned can not meet your require.
besides, learn some command like :
set profiling = 1
do some slow query
show profiles;
SHOW PROFILE FOR QUERY N;

Related

Choosing Appropriate Database [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm working on a website which have requirement to store large numbers of data on single table. It will be over 100K entries per month and stores for minimum 5 years. It will approx 100k × 60 months = 6 million entries.
My Question is which is the best DBMS system which can handles this kind of data? Mysql/Oracle/PostgreSQL?
First of all, 6M records is not very much, so in these days it should not be a problem for any mainstream DBMS. However, I see two aspects:
1) Space assessment - approximate how much space will be needed. For this you can insert in a table several records that will be similar to yours and extrapolate this to 6M records. E.g. (I have used SQL Server, but this should be available for any other DBMS such as MySQL):
Record looks like this (4 integers and a varchar)
103 1033 15 0 The %S_MSG that starts with '%.*ls' is too long. Maximum length is %d.
I have inserted about 1M rows in a table and space usage returns something like:
rows reserved
1008656 268232 KB
So, it will be about 1.5GB for 6M rows.
2) Usage assessment - already specified by chanaka wije. If you do only SELECTs or INSERTs, no special features are required (like support for many transactions per time unit).
Also, in order to improve SELECT performance, you should take a look into partitioning (by time your case) - see here, here or here.
depends on the usage of your table whether you want insert only or whether you do selects frequently, I'm using a table to store web page views, 4 million records per month and I'm using mysql, also every 6 months I do trimming, no issues so far, if you want to use select queries use correct database engine like Innodb has Row-level locking, and MyISAM has Table-level locking
This is a good question. Apart from what has been suggested here, I think one issue to be considered would be how you connect to the database. Oracle itself can only scale well if you are using a connection pool (limited fixed amount of connections). If you are connecting all the time, peaking some data and disconnecting, don't use Oracle. Seriously, go for MySQL.
And if your application is very simple, consider the least expensive option. Don't throw Oracle at it just because is "the best out there".

How does database views keep the fast query speed when multiple FK-PK relationships among all the tables [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
These days I've faced performance issues when binding data with Java object to database. Especially when paring the data from database to java code when a lot of FK-PK relationship involved. I realized the issue and solved the performance slowdown by creating database views and create POJOs to map with the view.
I did some research online but couldn't find a good answer for this: How does database(I am using mysql) keeps the fast data querying speed in views?
For example, if I create a view among 10 tables, with FK-PK relationship, the view is still pretty fast to query and display the result pretty fast. How exactly happened behind the scenes for the database engine?
Indexes.
MySQL implicitly creates a foreign key index (i.e. an index on columns that compose the foreign key), unless one already exists. Not all database engines do so.
A view is little more than an aliased query. As such, any view, as trivial as it may seem, could kill the server if written poorly. Execution time is not proportional with the number of joined tables, but with the quality of indexes*.
Side effect: the default index might not be the most efficient one.
*tables sizes also start to matter when the tables grow large, as in millions of records

how to improve speed in database? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am starting to create my first web application in my career using mysql.
I am going to make table which contain users information (like id, firstname, lastname, email, password, phone number).
Which of the following is better?
Put all data into one single table (userinfo).
Divide all data by alphabet character and put data into many tables. for example, if user's email id is Joe#gmail.com that put into table (userinfo_j) and if user's email id is kevin#gmail.com that put into table (userinfo_k).
I don't want to sound condescending, but I think you should spend some time reading up on database design before tackling this project, especially the concept of normalization, which provides consistent and proven rules for how to store information in a relational database.
In general, my recommendation is to build your database to be easy to maintain and understand first and foremost. On modern hardware, a reasonably well-designed database with indexes running relational queries can support millions of records, often tens or hundreds of millions of records without performance problems.
If your database has a performance problem, tune the query first; add indexes second, buy better hardware third, and if that doesn't work, you may consider a design that makes the application harder to maintain (often called denormalization).
Your second solution will almost certainly be slower for most cases.
Relational databases are really, really fast when searching by indexed fields; searching for "email like 'Joe#gmail.com'" on a reasonable database will be too fast to measure on a database with tens of millions of records.
However, including the logic to find the right table in which to search will almost certainly be slower than searching in all the tables.
Especially if you want to search by things other than email address - imagine finding all the users who signed up in the last week. Or who have permission to do a certain thing in your application. Or who have a #gmail.com account.
So, the second solution is bad from a design/maintenance point of view, and will almost certainly be slower.
First one is better. In second you will have to write extra logic to find out which table you will start looking into. And for speeding up the search you can implement indexers. Here I suppose you will do equal operations more often rather than less than or more than operations so you can try implementing indexer with Hash. For comparison operations B-Tree are better.
Like others said, the first one is better. Specially if you need to add other tables in your database and link them to user´s table, as the second one will soon get impossible to work and create relationships when your number of tables increase.

CakePHP Database - MyISAM, InnoDB, or Postgresql [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've always just used MyISAM for all of my projects, but I am looking for a seasoned opinion before I start this next one.
I'm about to start a project that will be dealing with hundreds of thousands of rows across many tables. (Several tables may even have millions of rows as the years go on). The project will primarily need fast-read access because it is a Web App, but fast-write obviously doesn't hurt. It needs to be very scalable.
The project will also be dealing with sensitive and important information, meaning it needs to be reliable. MySQL seems to be notorious for ignoring validation.
The project is going to be using CakePHP as a framework, and I'm fairly sure it supports MySQL and Postgresql equally, but if anyone can disagree with me on that please let me know.
I was tempted to go with InnoDB, but I've heard it has terrible performance. Postgresql seems to be the most reliable, but also is not as fast as MyISAM.
If I were able to upgrade the server's version of MySQL to 5.5, would InnoDB be a safer bet than Postgres? Or is MyISAM still a better fit for most needs and more scaleable than the others?
The only answer that this really needs is "not MyISAM". Not if you care about your data. After all, /dev/null has truly amazing performance, but it doesn't meet your reliability requirement either ;-)
The rest is the usual MySQL vs PostgreSQL opinion that we close every time someone asks a new flavour because it really doesn't lead to much that's useful.
What's way more important than your DB choice is how you use it:
Do you cache commonly hit data that can afford to be a little stale in something like Redis or Memcached?
Do you avoid "n+1" selects from inefficient ORMs in favour of somewhat sane joins?
Do you avoid selecting lots of data you don't need?
Do you do selective cache invalidation (I use LISTEN and NOTIFY for this), or just flush the whole cache when something changes?
Do you minimize pagination and when you must paginate, do so based on last-seen ID rather than offset? SELECT ... FROM ... WHERE id > ? ORDER BY id LIMIT 100 can be immensely faster than SELECT ... FROM ... ORDER BY id OFFSET ? LIMIT 100.
Do you monitor query performance and hand-tune problem queries, create appropriate indexes, etc?
(Marked community wiki because I close-voted this question and it seems inappropriate to close-vote and answer unless it's CW).

Many Associations Leading to Slow Query [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I currently have a database that has a lot of many to many associations. I have services which have many variations which have many staff who can perform the variation who then have details on themselves like name, role, etc...
At 10 services with 3 variations each and up to 4 out of 20 staff attached to each service even doing something as getting all variations and the staff associated with them takes 4s.
Is there a way I can reduce these queries that take a while to process? I've cut down the queries by doing eager loading in my DBM to reduce the problems that arise from 1+N issues, but still 4s is a long query for just a testing stage.
Is there a structure out there that would help make such nested many to many associations much quicker to select?
Maybe combining everything past the service level into a single table with a 'TYPE' column ?? I'm just not knowledgable enough to know the solution that turns this 4s query into a 300MS query... Any suggestions would be helpful.
A: It may be possible to restructure the data to make queries more efficient. This usually implies a trade-off with redundancy (repeated values), which can overly complicate the algorithms for insert/update/delete.
Without seeing the schema, and the query (queries?) you are running, it's impossible to diagnose the problem.
I think the the most likely explanation is that MySQL does not have suitable indexes available to efficiently satisfy the query (queries?) being run. Running an EXPLAIN query can be useful to show the access path, and give insight whether suitable indexes are available, whether indexes are even being considered, whether statistics are up-to-date, etc.
But you also mention "N+1" performance issues, and "eager loading", which leads me to believe that you might be using an ORM (like ADO Entity Framework, Hibernate, etc.) These are notorious sources of performance issues, issuing lots of SQL statements (N+1), OR doing a single query that does joins down several distinct paths, that produce a humongous result set, where the query is essentially doing a semi cross join.
To really diagnose the performance issue, you would really need to have the actual SQL statements being issued, and in a development environment, enabling the MySQL general log will capture the SQL being issued along with a rudimentary timing.
The table schemas would be nice to see for this question. As far as MySQL performance in general, make sure you research disk alignment, set the proper block sizes and for this particular issue check your execution plans and evaluate adding indexes.