CakePHP Database - MyISAM, InnoDB, or Postgresql [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've always just used MyISAM for all of my projects, but I am looking for a seasoned opinion before I start this next one.
I'm about to start a project that will be dealing with hundreds of thousands of rows across many tables. (Several tables may even have millions of rows as the years go on). The project will primarily need fast-read access because it is a Web App, but fast-write obviously doesn't hurt. It needs to be very scalable.
The project will also be dealing with sensitive and important information, meaning it needs to be reliable. MySQL seems to be notorious for ignoring validation.
The project is going to be using CakePHP as a framework, and I'm fairly sure it supports MySQL and Postgresql equally, but if anyone can disagree with me on that please let me know.
I was tempted to go with InnoDB, but I've heard it has terrible performance. Postgresql seems to be the most reliable, but also is not as fast as MyISAM.
If I were able to upgrade the server's version of MySQL to 5.5, would InnoDB be a safer bet than Postgres? Or is MyISAM still a better fit for most needs and more scaleable than the others?

The only answer that this really needs is "not MyISAM". Not if you care about your data. After all, /dev/null has truly amazing performance, but it doesn't meet your reliability requirement either ;-)
The rest is the usual MySQL vs PostgreSQL opinion that we close every time someone asks a new flavour because it really doesn't lead to much that's useful.
What's way more important than your DB choice is how you use it:
Do you cache commonly hit data that can afford to be a little stale in something like Redis or Memcached?
Do you avoid "n+1" selects from inefficient ORMs in favour of somewhat sane joins?
Do you avoid selecting lots of data you don't need?
Do you do selective cache invalidation (I use LISTEN and NOTIFY for this), or just flush the whole cache when something changes?
Do you minimize pagination and when you must paginate, do so based on last-seen ID rather than offset? SELECT ... FROM ... WHERE id > ? ORDER BY id LIMIT 100 can be immensely faster than SELECT ... FROM ... ORDER BY id OFFSET ? LIMIT 100.
Do you monitor query performance and hand-tune problem queries, create appropriate indexes, etc?
(Marked community wiki because I close-voted this question and it seems inappropriate to close-vote and answer unless it's CW).

Related

Mysql is sucking for large data [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have a MySQL database with around 30gb data. Some of the tables contain over 40 million rows. Iam using InnoDB. I query by only use "select count(*) from table_name" in local PC takes me around 5 minutes. I think it's impossible for me to the joining of the table. I would like to ask would there anything I could do to improve the performance. Or do I need to switch to another database? I never encounter such large data in DB. Please help.
I have run mysql instances with over 100 million entries, and delivering over 30 million queries per day. So it can be done.
The problems you are experiencing will occur with any other database system if similarly configured.
I can only give you a few tips, if this is mission critical consider hiring a professional to tweak your system.
Basics that you need to look at;
This size database is best run on a dedicated server with SSD disks, and at least 2 cores;
Your going to need a lot of RAM in your server, at least your total database size + 20% for other system resources;
Make sure mysql has been configured with enough memory, 80% of your total RAM. The primary setting that does this will be innodb_buffer_pool_size;
Optimize your queries, and index where needed - this is a fine art but can drastically improve performance, learn to use EXPLAIN ... on your queries.
MySQL InnoDB tables do not keep a count of rows, thus SELECT COUNT(*) can be slow. It's not an indication of how other queries might perform, but it is an indication of how slow a full table scan might be. Five minutes is really bad for just 40 million rows and might indicate a serious problem with your database or disk.
Here is a performance blog on the subject. Also see this related answer.
I had encounter the large date size problem before and hope my experience is useful for you.
first, your need create index for your table, but which kind of index should be used depending on your query logic.
after indexing, if the query is still slow, you'd better divide the data into a hierarchy, for example, source tables, middle tables and report tables. the report table just store some final data and the query will be fast, also create index for it.
third, try to use something like memsql if above mentioned can not meet your require.
besides, learn some command like :
set profiling = 1
do some slow query
show profiles;
SHOW PROFILE FOR QUERY N;

how to improve speed in database? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am starting to create my first web application in my career using mysql.
I am going to make table which contain users information (like id, firstname, lastname, email, password, phone number).
Which of the following is better?
Put all data into one single table (userinfo).
Divide all data by alphabet character and put data into many tables. for example, if user's email id is Joe#gmail.com that put into table (userinfo_j) and if user's email id is kevin#gmail.com that put into table (userinfo_k).
I don't want to sound condescending, but I think you should spend some time reading up on database design before tackling this project, especially the concept of normalization, which provides consistent and proven rules for how to store information in a relational database.
In general, my recommendation is to build your database to be easy to maintain and understand first and foremost. On modern hardware, a reasonably well-designed database with indexes running relational queries can support millions of records, often tens or hundreds of millions of records without performance problems.
If your database has a performance problem, tune the query first; add indexes second, buy better hardware third, and if that doesn't work, you may consider a design that makes the application harder to maintain (often called denormalization).
Your second solution will almost certainly be slower for most cases.
Relational databases are really, really fast when searching by indexed fields; searching for "email like 'Joe#gmail.com'" on a reasonable database will be too fast to measure on a database with tens of millions of records.
However, including the logic to find the right table in which to search will almost certainly be slower than searching in all the tables.
Especially if you want to search by things other than email address - imagine finding all the users who signed up in the last week. Or who have permission to do a certain thing in your application. Or who have a #gmail.com account.
So, the second solution is bad from a design/maintenance point of view, and will almost certainly be slower.
First one is better. In second you will have to write extra logic to find out which table you will start looking into. And for speeding up the search you can implement indexers. Here I suppose you will do equal operations more often rather than less than or more than operations so you can try implementing indexer with Hash. For comparison operations B-Tree are better.
Like others said, the first one is better. Specially if you need to add other tables in your database and link them to user´s table, as the second one will soon get impossible to work and create relationships when your number of tables increase.

what type of database should be used to store millions of restaurants and query them [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am designing a system which will store all the available restaurants in the world and then user should be able to perform generic search on such large data sets with very low letency. Will normal RDBMS be enough for this or should I go for big data framework like cassandra. What should I use to make generic search efficient. What will be the best way to store comments for each restaurant.
You can use any RDBMS to store your data. But for a fast search use a search engine like lucene which offer variety of fast search and aggregations.
Using lucene directly may involve more effort so you can use already available tools built around lucene like solr and elasticsearch.
So the first question is: how much data do you think will be there?
Big data approaches are more suitable for lets say billions of records, but of course if you don't have the proper hardware and database design, a few million records could cause very poor performance on a MySQL server, for example.
NoSQL is more suitable for non related related data, and I think in your case there will be many relations between the tables (for example you could have restaurants table having direct relation (with foreign key for example) with restaurant_comments table.
In this case using MySQL (innoDB engine) will be very useful when for example you delete a restaurant - all its comments can be deleted with it, and save disc space and time.
If you plan to have no more than 100-200 million restaurants with a proper hardware (dedicated MySQL server or multiple servers with a load balancer), and you design your database (tables, relations, data types and indexes) in a good way, then you will have excellent performance.
If you plan on having a lot more data and many users querying that data a lot, then you should probably consider using Apache Hadoop (with HBase or Cassandra).

How often is database optimization NOT possible? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Currently I am working on a database that requires me to take raw data from a 3rd party and store it into a database. The problem is that the raw data is obviously not optimized, and the people who I'm building the database for, don't want any data entry involved when uploading the raw data into the database, they pretty much just want to upload the data and be done with it. Some of the raw data files have empty cells all over the place and many instances of duplicate names/numbers/entries. Is there a way to still optimize the data quickly and efficiently without too much data entry or reworking each time data is uploaded or is this an instant where optimization is impossible due to constrants? Does this happen a lot, or do I need to tell them their dreams of just uploading are not possible for long team success?
There are many ways to optimize data and one way to optimize data in one use case may be horrible in another use case. There are tools that will tell you there are multiple values in columns that need to be optimized but there is no single advice which works in all cases.
without specific details, this is always good:
With regards to empty entries, that should not be an issue
With regards to duplicate data, it may be worth considering adding a one to many relationship
One thing need to make sure is to put a key in any field you are going to search for, this will speed up a lot your queries no matter the dataset
as far as changing the database schema... rare are the schemas that do not change over time.
My advice is think through your schema but do not try to over optimize things because you can not plan in advance what the exact usage will be. As long as it is working and there is no bottleneck, focus on other areas. If there is a bottleneck, then by all means, rewrite the affected part, making sure indices are present (consider composite indices in some cases). Consider avoiding unions when possible. and remember the KISS principle (Keep It Simple and Sweet).

Many Associations Leading to Slow Query [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I currently have a database that has a lot of many to many associations. I have services which have many variations which have many staff who can perform the variation who then have details on themselves like name, role, etc...
At 10 services with 3 variations each and up to 4 out of 20 staff attached to each service even doing something as getting all variations and the staff associated with them takes 4s.
Is there a way I can reduce these queries that take a while to process? I've cut down the queries by doing eager loading in my DBM to reduce the problems that arise from 1+N issues, but still 4s is a long query for just a testing stage.
Is there a structure out there that would help make such nested many to many associations much quicker to select?
Maybe combining everything past the service level into a single table with a 'TYPE' column ?? I'm just not knowledgable enough to know the solution that turns this 4s query into a 300MS query... Any suggestions would be helpful.
A: It may be possible to restructure the data to make queries more efficient. This usually implies a trade-off with redundancy (repeated values), which can overly complicate the algorithms for insert/update/delete.
Without seeing the schema, and the query (queries?) you are running, it's impossible to diagnose the problem.
I think the the most likely explanation is that MySQL does not have suitable indexes available to efficiently satisfy the query (queries?) being run. Running an EXPLAIN query can be useful to show the access path, and give insight whether suitable indexes are available, whether indexes are even being considered, whether statistics are up-to-date, etc.
But you also mention "N+1" performance issues, and "eager loading", which leads me to believe that you might be using an ORM (like ADO Entity Framework, Hibernate, etc.) These are notorious sources of performance issues, issuing lots of SQL statements (N+1), OR doing a single query that does joins down several distinct paths, that produce a humongous result set, where the query is essentially doing a semi cross join.
To really diagnose the performance issue, you would really need to have the actual SQL statements being issued, and in a development environment, enabling the MySQL general log will capture the SQL being issued along with a rudimentary timing.
The table schemas would be nice to see for this question. As far as MySQL performance in general, make sure you research disk alignment, set the proper block sizes and for this particular issue check your execution plans and evaluate adding indexes.