I'm writing a game for social network and we have a lot of formulas for weapons stats, items stats etc that depend on player's characteristics. Formulas look like
player.money += 10 * player.level
It looks like a good idea to just store functions like these in db and let game-designer enter them through admin site.
But i'm not sure about this. What problems can occur with this approach?
Thank you.
The issue you need to contend with is not whether you should use a database (you should) but the proper design of such. Consider the object hierarchy you have in the game and how that would be reflected in a database.
Making a database column for each property is a bad idea in that it is too rigid. You want to look at a "property bag" approach where you have look-up tables for most of it that can be indexed for performance.
This of a model like this:
itemId, propertyId, propertyValue
For higher performance, combine this with something like Memcached.
This would be an OK approach. As Jon pointed out, it's not really "code", more like item/equipment attributes.
Pros
Easy to modify
Cons
Requires a lookup for each item or piece of equipment
The fact that you will be doing a lookup for each item could be a performance bottleneck. If you decide to use a DB in the end, I would suggest pulling data from it once and then caching it for, say, the rest of the session. The item attributes aren't likely to change frequently, and this will limit the queries to the database.
Related
I have a table named 'Customers' which has all the information about users with different types (user, drivers, admins) and I cannot separate this table right now because it's working on production and this is not a proper time to do this.
so If I make 3 views: the first has users types only, the second has drivers and the third has admins.
My goal is to use 3 models instead one in the project I'm working on so
is this a good solution and what does it cost on performance?
How big is your table 'Customers'? According to the name it doesn't sounds like heavy one.
How often these views will be queried?
Do you have some indices or pk constraints on the attribute you're are going to use in where clause for the views?
I cannot separate this table right now because it's working on
production and this is not a proper time to do this.
From what you said it sounds like a temporarily solution so it probably the good one. Later you сan replace the views with three tables and it will not affect the interface.
I suggest that it is improper to give end-users logins directly into the database. Instead, all requests should go through a database-access layer (API) that users must log into. This layer can provide the filtering you require without (perhaps) any impact to the user. The layer would, while constructing the needed SELECT, tack on, for example, AND type = 'admin' to achieve the goal.
For performance, you might also need to have type at the beginning of some of the INDEXes.
I'm just wondering what solution to chose to implement a follower system?
In MySQL i would have a table
userID INT PRIMARY,
followID INT PRIMARY
And in Redis I would just use a SET and add to the UserID all the followIDs.
What would be faster for lets say someone having 2000 followers and you want to list all the followers?(in a table that has about 1M entries)
What would be faster to find out if two Users follow each other?
Thank you very much!
By modern standards, 1M items are nothing. Any database or NoSQL system will work fine with such volume, so you just have to pick the one you are the most comfortable with.
In term of absolute performance, Redis will be faster than MySQL on this use case, because:
the whole dataset will be in memory
hash tables are faster than btrees
there is no SQL query to parse or execute
However, please note a relational database is far more flexible than a key/value store like Redis. If you can anticipate all the access paths to your data, then Redis is a good solution. Otherwise you will be better served by a more traditional database.
In my opinion, go with MySQL.
The two biggest points you will think about when making the decision are:
1) Have you thought about your use-cases?
You said you want to implement a follower system. If you're only going to be displaying a list of followers which each user has, then the Redis SET will be enough.
But what if you want to get a list of "A list of users which you are currently following"? You can't dig that up easily from your Redis SET, right? Or how about if you wanted to know if User-X is following User-A ? If User-A had 10,000 followers, this wouldn't be easy either would it?
MySQL is much more flexible when querying different types of results in different scenes.
2) Do you really need the performance difference?
As you know, Redis IS faster than MySQL in these kinds of cases.
It is a simple Key-Value system, so it will exceed the performance of MySQL.
Checking out performance results like these:
http://colinhowe.wordpress.com/2009/04/27/redis-vs-mysql/
http://ruturaj.net/redis-memcached-tokyo-tyrant-and-mysql-comparision/
But the performance difference between Redis and MySQL really starts to kick in
only after about 5,000request/sec .
Otherwise you'd wouldn't be seeing a difference of more than 50ms.
Performance difference will not be an issue until you have a VERY large traffic.
So, after thinking about these two points, MySQL would be a better answer.
Redis will be good only if:
1) The purpose of the set/list is specific, and there is no need for flexibility in the future
2) You feel that the performance difference will actually have an effect on your architecture.
It depends on what you want to do with the data. You gave some examples but it does not sound as though you are really giving a full definition of what the product needs to do. If all you really want to do is show users if they follow each other? Then either is fine as you are just talking about 2 simple queries. However, what if you want to show two users the intersection of users they share or you want to make suggestions off of the data based on profile data for the users. Then, it becomes more interesting as Redis has functionality to easily give you the intersection of sets very very quickly (we're talking magnitude differences in terms of speed not just milliseconds - and the difference gets exponentially larger as there are more users/relationships to parse as the sql joins required to get the data can become prohibitive if you want to give the data in real time).
sadd friends:alex george paul bart
sadd friends:alice mary sarah bart
sinterstore friends:alex_alice friends:alex friends:alice
Note that the above can be done with mysql as well, but your performance will suffer and it would be something that you are more likely to run as a batch job and then store the results for future use. On the other hand, keep in mind that the largest "friends" network in the world, Facebook, started with mysql to store relationships. The graphs of those relationships were batched and heavily denormalized for storage in thousands of memcached servers to get decent performance.
Then if you are looking for more options beyond mysq1 or redis, you might want to read what Michael Stonebaker has to say (he helped create Postgres and Ingres) about using an RDBMS system for graph data such as friend relationships. http://gigaom.com/2011/07/07/facebook-trapped-in-mysql-fate-worse-than-death/. Of course, he's trying to sell his new VoltDB but it is interesting food for thought.
So I think you really need to map out the requirements for the app (as I assume it will do more than just show you who your friends are) in terms of both expected load (did you just throw out 2000 or is that really what you expect to handle) and features and budget. Then really examine many of the different options on the market.
I'm currently designing a web application using php, javascript, and MySQL. I'm considering two options for the databases.
Having a master table for all the tournaments, with basic information stored there along with a tournament id. Then I would create divisions, brackets, matches, etc. tables with the tournament id appended to each table name. Then when accessing that tournament, I would simply do something like "SELECT * FROM BRACKETS_[insert tournamentID here]".
My other option is to just have generic brackets, divisions, matches, etc. tables with each record being linked to the appropriate tournament, (or matches to brackets, brackets to divisions etc.) by a foreign key in the appropriate column.
My concern with the first approach is that it's a bit too on the fly for me, and seems like the database could get messy very quickly. My concern with the second approach is performance. This program will hopefully have a national if not international reach, and I'm concerned with so many records in a single table, and with so many people possibly hitting it at the same time, it could cause problems.
I'm not a complete newb when it comes to database management; however, this is the first one I've done completely solo, so any and all help is appreciated. Thanks!
Do not create tables for each tournament. A table is a type of an entity, not an instance of an entity. Maintainability and scalability would be horrible if you mix up those concepts. You even say so yourself:
This program will hopefully have a national if not international reach, and I'm concerned with so many records in a single table, and with so many people possibly hitting it at the same time, it could cause problems.
How on Earth would you scale to that level if you need to create a whole table for each record?
Regarding the performance of your second approach, why are you concerned? Do you have specific metrics to back up those concerns? Relational databases tend to be very good at querying relational data. So keep your data relational. Don't try to be creative and undermine the design of the database technology you're using.
You've named a few types of entities:
Tournament
Division
Bracket
Match
Competitor
etc.
These sound like tables to me. Manage your indexes based on how you query the data (that is, don't over-index or you'll pay for it with inserts/updates/deletes). Normalize the data appropriately, de-normalize where audits and reporting are more prevalent, etc. If you're worried about performance then keep an eye on the query execution paths for the ways in which you access the data. Slight tweaks can make a big difference.
Don't pre-maturely optimize. It adds complexity without any actual reason.
First, find the entities that you will need to store; things like tournament, event, team, competitor, prize etc. Each of these entities will probably be tables.
It is standard practice to have a primary key for each of them. Sometimes there are columns (or group of columns) that uniquely identify a row, so you can use that as primary key. However, usually it's best just to have a column named ID or something similar of numeric type. It will be faster and easier for the RDBMS to create and use indexes for such columns.
Store the data where it belongs: I expect to see the date and time of an event in the events table, not in the prizes table.
Another crucial point is conforming to the First normal form, since that assures data atomicity. This is important because it will save you a lot of headache later on. By doing this correctly, you will also have the correct number of tables.
Last but not least: add relevant indexes to the columns that appear most often in queries. This will help a lot with performance. Don't worry about tables having too many rows, RDBMS-es these days handle table with hundreds of millions of rows, they're designed to be able to do that efficiently.
Beside compromising the quality and maintainability of your code (as others have pointed out), it's questionable whether you'd actually gain any performance either.
When you execute...
SELECT * FROM BRACKETS_XXX
...the DBMS needs to find the table whose name matches "BRACKETS_XXX" and that search is done in the DBMS'es data dictionary which itself is a bunch of tables. So, you are replacing a search within your tables with a search within data dictionary tables. You pay the price of the search either way.
(The dictionary tables may or may not be "real" tables, and may or may not have similar performance characteristics as real tables, but I bet these performance characteristics are unlikely to be better than "normal" tables for large numbers of rows. Also, performance of data dictionary is unlikely to be documented and you really shouldn't rely on undocumented features.)
Also, the DBMS would suddenly need to prepare many more SQL statements (since they are now different statements, referring to separate tables), which would present the additional pressure on performance.
The idea of creating new tables whenever a new instance of an item appears is really bad, sorry.
A (surely incomplete) list of why this is a bad idea:
Your code will need to automatically add tables whenever a new Division or whatever is created. This is definitely a bad practice and should be limited to extremely niche cases - which yours definitely isn't.
In case you decide to add or revise a table structure later (e.g. adding a new field) you will have to add it to hundreds of tables which will be cumbersome, error prone and a big maintenance headache
A RDBMS is built to scale in terms of rows, not tables and associated (indexes, triggers, constraints) elements - so you are working against your tool and not with it.
THIS ONE SHOULD BE THE REAL CLINCHER - how do you plan to handle requests like "list all matches which were played on a Sunday" or "find the most recent three brackets where Frank Perry was active"?
You say:
I'm not a complete newb when it comes to database management; however, this is the first one I've done completely solo...
Can you remember another project where tables were cloned whenever a new set was required? If yes, didn't you notice some problems with that approach? If not, have you considered that this is precisely what a DBA would never ever do for any reason whatsoever?
We need to implement a search filter (Net-log like) for my social networking site against user profile, filters on profile include age range, gender and interests
we have approx 1M profiles running on MySQL, MySQL doesn't seems the right option to implement such filters so we are looking on Cassandra as well,
So what is the best way to implement such filter, The result need to be very quick
e.g. age = 18 - 24 and gender = male and interest = Football
Age in Date, Gender and interests are varchar
EDITED:
Let me rephrase the problem, How can I get fastest result of any type of search.
It could be on the bases of profile name, or any other profile thing on 1M profile records.
Thanks
It would serve your project well to make an underlying SQL change. You might want to consider changing the Interest column from a free-input field (varchar) to a tag (Many-to-many on an additional table, for example).
You used the example of Football and having a like operator on it. If you changed it to a tag, then you will have an initial structural problem of deciding where to place:
football
Football
American Football
Australian-rules football
But once you have done so, the tags will help your select statement go much faster.
Without this change, you will be pushing your data management problem from a database (which is equipped to handle it) to Java (which might not be).
It may make some sense to try to optimize your query (there may at least be some things that you can do). It sounds like you have a large database, and if you are returning a large result set and filtering the results with java, you may get performance issues because of all of the data kept in cache.
If this is the case, one thing that you could try is looking into caching the results, outside of the database and reading from that. This is something that Hibernate does very well, but you could implement your own version if needed. If this is something that you are interested in, Memcached, is a good starting place.
I just noticed this for MySQL. I do not know how efficient it is but they have some build in full text searching functions, that may help speed things up.
Say I have an entity that will have many attributes, some I know about now and others will be user defined. What's the best way to model this?
1) Do I have a main table and relate it to a secondary name-value pair table? All the attributes go in the secondary EAV table.
OR -
2) Do I put the most common attributes (not all users will need them, so I expect a lot of NULL entries) in the main table and have the secondary EAV table for the user defined attributes?
OR -
3) Some other approach I have not thought of?
You may use solution two for efficiency reason, in particular if you need to select often on these quantities. These values may be "cache" of the EAV table, if you want. You introduce duplication but speed up lookup.
EAV is a good solution for this problem unless you have to perform joins at the db level. An alternative is to move away from the relational model and move to a RDF based model.
Typically, lots of empty cells are cheap and not worth normalizing away. The only draw back to #2 is if you have a very large number of rows (millions - where performance problems could arise), a very large number of columns (more than about 20 - where it's just annoying to look at the data), or there are a number of unique constraints on the EAV table.
With that said, it is now 2011 and it makes sense to use a programming framework with a database abstraction layer these days so that you're not designing database relationships directly. Something like Django's Object Relational Mapper allow you to focus on the models themselves and let best practices take care of themselves (95% of the time). This tutorial will help you get started. Django only applies to web development database modeling. For non-web environments, other frameworks will be better.
I've done a lot of work with the EAV pattern, and it has served the purpose well enough. I find empty columns, or dynamic columns (like col1, col2, etc) to be much harder to deal with manage after the fact, but it can be easier to query them since you don't need as many joins.
One thing I would very strongly recommend is taking a look at options like Mongo DB. It automatically handles complex dynamic data structures.