Determining Best Table Structure for MySQL Performance - mysql

I'm working on a browser-based RPG for one of my websites, and right now I'm trying to determine the best way to organize my SQL tables for performance and maintenance.
Here's my question:
Does the number of columns in an SQL table affect the speed in which it can be queried?
I am not a newbie when it comes to PHP or MySQL. I used to develop things with the common goal of getting them to work, but I've recently advanced to the stage where a functional program is not good enough unless it's fast and reliable.
Anyways, right now I have a members table that has around 15 columns. It contains information such as the player's username, password, email, logins, page views, etcetera. It doesn't contain any information on the player's progress in the game, however. If I added columns for things such as army size, gold, turns, and whatnot, then it could easily rise to around 40 or 50 total columns.
Oh, and my database structure IS normalized.
Will a table with 50 columns that gets constantly queried be a bad idea? Should I split it into two tables; one for the user's general information and one for the user's game statistics?
I know I could check the query time myself, but I haven't actually created the tables yet and I think I'd be better off with some professional advice on this important decision for my game.
Thank you for your time! :)

The number of columns can have a measurable cost if you're relying on table-scans or on caching pages of table data. But the best way to get good performance is to create indexes to assist your queries. If you have indexes in place that benefit your queries, then the width of a row in the table is pretty much inconsequential. You're looking up specific rows through much faster means than scanning through the table.
Here are some resources for you:
EXPLAIN Demystified
More Mastering the Art of Indexing
Based on your caveat at the end of your question, you already know that you should be measuring performance and only fixing code that has problems. Don't try to make premature optimizations.
Unfortunately, there are no one-size-fits-all rules for defining indexes. The best set of indexes need to be designed custom for the queries that you need to be fastest. It's hard work, requiring a lot of analysis, testing, and taking comparative measurements for performance. It also requires a lot of reading to understand how your given RDBMS technology uses indexes.

Related

Tips on designing a MySQL database

I am designing a mysql database for a listing website. This is my first time and I did some googling on this:)
I wanted to cross check if there is any problems in my approach.
So basically, I will have 5 tables. Lets say
Studio (Actual studio data points)
Studio Facilities (Like, transport, water etc)
Studio Images
Studio Reviews. (Name, stars, etc..)
Location & Locality (City,City Direction,Locality e.g Bangalore, North Bangalore, Hebbal).
So Studio table will hold the master record. Facilities, images and reviews will hold data by using studio as FK.
Location ID will be saved on the studio record.
My question now is that if I want to show a listing of all hebbal studios, I will have to perform a join of all 5 tables to show the studio data, images, reviews and facilities.
1) Is this ok? Do you see any potential problems with this approach? Are there any better approaches?
2) will the query execution time increase as the number of records increase?
Thanks
This looks like a solid approaches, databases are designed to use joins and those are appropriate. Denormalization will cause more problems especially concerning data integrity than your current design,
Will the queries get slower with more data? Of course they will but that is true of any design. Sending 10 gig of data back across the network or internet is going to be slower than sending 10 bytes.
You don't ask but are there things you can do to prevent the queries from getting unacceptably slower? Yes. First you need to make sure that Foreign key and fields that will appear frequently in the where clause are indexed as well as the PKs. This will be enough to prevent horrible performance for most simple queries.
Another thing you should do is read a good book on performance tuning for the particular database backend you are using. Most of the ones I have read have at least one chapter on designing queries for performance. Read it and re-read it and re-read it until this is sunk into your bones and you would not consider writing queries any other way.
Many databases are considered slow because their developers are incompetent at writing performant queries and the joins get blamed. Learn about sargabilty. Understand how indexes work and what the tradeoffs for having them are. Understand how wide you table can be before performance suffers (Hint sometimes two tables in a one-one relationship are faster than using one table with the table would be very wide.).
You cannot effectively design a database without a solid grounding in performance tuning.
You can use one master table (here it is Studio) and can use reference keys to all other tables.
It is always better to split up the tables for normalization.
Benefits of normalization
Eliminate data redundancy
Improve performance
Query optimization
Faster update due to less number of columns in one table
Index improvement
Research normalization and you will get good idea to build DB.

Storing and analysis of historical data - What kind of Database?

I'm currently designing a system that watches the ranks / views of youtube videos. of LOTS of youtube videos (> 500.000 and growing) on a daily basis.
I'm currently considering storing this in a MySQL database, but what disturbs me, is that the table would grow into billions and trillions of rows, which I don't think would perform well.
I need to analyse this data, for example:
Which videos grew a lot in the time between X and Y
Plot the clicks per day
Plot the clicks per week ...
some more things I don't know yet about
So, what came into my web 2.0 mind was, is there a way a NoSQL database could handle this better? I didn't quite learn these (almost) new databases and don't know what they are capable of.
What would your advice be, what type of database to use?
Relational or not? If not, which NoSQL database?
PS: first priority is the fast evaluation and insertion of the results, second is high availability (or just replication)
It is very difficult to give an advice for a database system, because it always depends. However, considering that Facebook is built on MySQL, it shows that there probably performance is not a limit on MySQL for you.
What is helpful and you'll probably have done, is creating a structure of how your table structure should look like. Then also think of queries you would like to run against the tables.
If you have the right indexes (which is the main and crucial factor query speed relies on), you will not have to worry about performance in MySQL. What you should consider are (what I've had to experience), that there are many interesting things how MySQL deals with indexes. Let me give a few examples I had to figure out during the time:
if you want to use an index for a range scan, the index cannot be used for ORDER BY anymore
a range column has to be the last in an concatenated index for the full index to be used, same for ORDER BY again
For more information, a useful link on mysqlperformanceblog.com: http://www.mysqlperformanceblog.com/2009/09/12/3-ways-mysql-uses-indexes/
In general, if the structure of the database is well thought and the indexing is good, in my experience it does not matter actually if you only have 10.000 rows or 10 billion, the query time would be about the same.

Optimizing of database with multiple JOINs

First, some details about the website and the database structure -
With my website you can learn English words, and you can insert on each word a sentence, an association, an image, in addition - each word has a category, sub category, group...
My database includes about 20 tables. any user who registers to my website 'add' to users table something like 4000 rows - the number of the words on my website. I have a serious problem while the user is filtering words (somthing like 'search' word but according char/s & category/s & group/s etc.. I have 9 JOINs in my sql query, and it takes something like 1 MIN to display results..
The target of JOINs - inside the table users (where each user has 4000 rows / each row = word) there are joins on this style:
$this->db->join('users', 'sentences.id = users.sentence_id' ,'left');
The same thing with associations, groups, images, binds between words etc..
The users table includes id of sentences, associations, groups.. and with the JOIN there is a connection.
I don't know what to do.. it takes too much time. maybe the problem is the structure of the database? multiple joins? maybe using indexing? but how and where? because it's necessary sometimes retrieve all the words so indexing wouldn't help.
I'm using MySQL.
First of all, if you're using that many joins, indexes will not save you (as they will not be used in joins most of the time).
There are a few things you can do.
Schema Design
You probably would want to reconsider your schema design/query if you need 9 joins to achieve what you are doing!
From the looks of it, it seems your your tables are very normalized, perhaps in 3rd normal form? In that case consider denormalizing your tables into a larger one to avoid joins (joins are more expensive than full table scans!). There are many online documentations on this, however there's always costs to this, as it increases development complexity and data redundancy. Also by denormalizing your tables you avoid joins and can make better use of indexes.
Also I believe MyISAM is the only storage engine in MySQL that supports FULL TEXT indexes. However it does not have transactions and have table level-locking and no MVCC, so it depends on what you need.
Resources
I suggest you have a read at this book High Performance MySQL.
A truly awesome book on tuning MySQL databases
I also suggest having a read at the official documentation on your chosen storage engine. This is significant as each storage engine is VERY DIFFERENT! InnoDB is completely different from MyISAM which is also completely different from PBXT. Each engine has its benefits and you will have to consider which one fits your situation.
I would draw out the relational schema and work out the number of operations for the queries you are running, and go from there. Most DBMS's attempt to optimise queries implicitly, but not always optimally. You should look into re-ordering the joins so that the most restrictive are carried out first. Indexes could help, and again, would require some analysis to find which attributes you are searching on.
Building databases to deal with natural language is a very challenging subject and there is a lot of research on the subject. Have you looked into Markov chains? Have you taken a step back and thought about the computational complexity of what you are trying to do? If you arrive at the same conclusion of nine joins, then it may be fair to say that the problem is not scalable enough for a real-time application.
As an aside, I believe Google App Engine's data store attempts to index attributes for you, with implicit scalability. If you're running your database on a small web server, then you may see better results deploying it with a more comprehensive DBMS. I would only look into this as a last resort, however.

is better to create tables based on content or views?

i'm learning mysql and was working on a database for work. Everything's fine so far but I had a question. I am organizing financial statements for firms(balance sheet table, income statement table, cashflow table,etc.) and most companies have quarterly statements(they are unaudited) and annual statements(which are audited). Right now for each statement I have a column that flags it for annual or quarterly.
Its not likely that someone will be running a report on an audited and unaudited statement at the same time, so I was thinking if it was worth it to create a table for audited and one for unaudited. The reason I was thinking this was eventually the data will get fairly large and I thought the smaller the tables the faster performance.
So when I design a database should I be designing based on the content(i.e. group everything thats the same regardless) or should I be grouping based on how people will access it?
Another question this raises is should I be grouping financial statements by countries..since all analysis down at our firm in 90% within the same country
This is impossible to answer definitively without knowing the whole problem.
However, usually you want a single table to represent each logical entity in your system. From the sound of it, quarterly and annual statements represent the same logical entity, but differ by a single category column/field. The same holds true for the country question--if the only difference is the country (a categorization), then they likely should all be stored in the same table.
If you were to split your data into separate tables by category, your data would be scattered across multiple tables, and would be very hard to query. For example, if you wanted a count of all statements in the system, you would have to query ALL country tables and add the results together.
Edit: Joe Celko calls this anti-pattern "Attribute Splitting".
First of all I have to point out, I'm not a professional DB designer.
But if I ware you, in this case I would create one table as the entities are the same basically.
If you fear of mysql's performace on lager datasets, maybe it would be better to start building your app on Postgres. You can boost mysql's performace with stored functions/procedures or maybe views if you have to run complicated queries and of course you can use memcache or any nosql stuff to let the SQL rest a bit.
If you are sure in that users will search mainly only for this or that type of records, you can build three tables. One for all of the records, one-one for the audited and unaudited ones. You can keep them syncronized with the InnoDB's triggers (ON UPDATE/DELETE/INSERT). They could work like views, but I think (not tested) they would be faster then views. In this case you have to manage only the first "large" table. If you insert an audited record, the trigger fires and puts a record in to the audited table an so on...
Best wishes!
I agree with Phil and Damien - one table is better. What you want is one table per type of real business thing. If you design your tables to resemble real things, even abstract or conceptual things, then your data design is more likely to stand the test of time. Once you've sketched out a schema based on the real things you have data about, then you can go back and apply the rules of normalization to formalize your design.
As a rule, it is a bad idea to design for a performance problem you are worried about, but haven't actually seen. Your intuition about big tables being slower might actually be wrong. Most DBMS systems like bigger tables, at least to a point. When tables are big the query optimizers choose to use indexes. When tables are small they often end up getting full table scans, which can really slow down concurrent access. If your tables get so big that they are beyond the capabilities of your DBMS then it is time to consider either archiving out old data that you aren't using anymore or buying a more scalable DBMS.

Database structure - To join or not to join

We're drawing up the database structure with the help of mySQL Workbench for a new app and the number of joins required to make a listing of the data is increasing drastically as the many-to-many relationships increases.
The application will be quite read-heavy and have a couple of hundred thousand rows per table.
The questions:
Is it really that bad to merge tables where needed and thereby reducing joins?
Should we start looking at horizontal partitioning? (in conjunction with merging tables)
Is there a better way then pivot tables to take care of many-to-many relationships?
We discussed about instead storing all data in serialized text columns and having the application make the sorting instead of the database, but this seems like a very bad idea, even though that the database will be heavily cached. What do you think?
Go with the normalized form of the database. For most part of the tasks you won't need more than 3 or 4 Joins and you still can write views for the most common joins. Denormalization will have you to always think of updating fields in multiple places/tables when changing one property and will surely lead to more problems than benefits.
If you worry about reporting performance then you still can extract the data in timed batches into separate tables to get the desired performance for your reporting queries. If it's for query simplicity you can use views.
In inverse order:
Forget it. Use the database. People saynig "make it in the application" are pretty often those ignorant to the amount of work going into writing databases.
Depends on exact need.
Depends on exact need. OLTP (Transaction processing) - go for for firth normal form. OLAP (Analytical processing) - go for a proper star diagram and denormalize to get optimal performance. Mixed - forget it. Does not work for larger installs because the theories are different... except if you make the database OLTP and then use a special OLAP cube database (which mySQL does not have).
Databases are designed to handle lots of joins. Use this feature as it will make many kinds of data manipulation in the database much easier. Otherwise, why not just use a flat file?
As always, it depends on your application, but in general, too much denormalisation can come back and bite you later on. A well normalised database means that you should be able to query your data in most ways that you may need later on, particularly for reporting (which often is an afterthought).
If you stick all your data in serialized text columns and your client asks for a report showing all rows that have a particular attribute, then you're going to have to do a bunch of string manipulation to get this data out.
If you're worried about too many joins for your queries, you could consider exposing certain sets of the data as a view...
If you make sure to index the foreign keys (you did set up foreign keys didn't you?) and have proper where clauses in your queries, 10-15 joins should be easily handled by a database. Especially with so few rows. I have queries with that many joins on tables with millions of rows and they run fine.
Usually it is better to partition data than to denormalize.
As far as denomalizing goes, don't do it unless you also institute a strategy for keeping the denormalized data in synch with the parent table.
As to whether you really need that many tables or if your design is bad, well the only way we could comment on that is if we saw the table structure.
Unless you have clear evidence that performance is suffering because of the joins, stay normalised. Otherwise, as others have said, you'll have to worry about multiple updates.
Especially if the database is heavily cached, as you say, you'll be surprised how quick the DBMS is at doing this kind of thing - it is what it's designed for, after all.
Unless it's the sort of monster application, with huge amounts of data, that demands special performance optimisations, you'll find that keeping down the development, testing, and later, maintenance effort, will be much more important.
Joins are good, usually, not bad. They allow you to keep the data where it should be, which gives you maximum flexibility.
And as has been said many times, premature optimisation is usually bad, not good.