I'm wondering what is the best way to do this:
Person submits review (name, email, webhost, domain hosted, ranking data [five numerical factors], etc.)
PHP inserts that review into "SubmittedReviews" table
I then oversee the submitted reviews in the back end, then submit the ones I want
4 . PHP inserts that info into another table called "LiveReviews" (which has the same table structure as "SubmittedReviews")
(OR, might be better to have PHP create a table for each host, with that hosts' reviews inside it, since there will be many hosts and I'm going to make a separate table to pre-calculate ranking data for my "top hosts" table on the site)
So as I have PHP submit the reviews live (creating a new table [for each host] or just into the "LiveReviews" table) I will also submit the ranking data into another table, adding up all the ranking data for each host, so it is readily available and so I know which hosts are ranking highest.
Or should I just use PHP to calculate the LiveReviews ranking data on the spot when I want to know which host is ranking best? Seeing as the front page will be loading this data often, doesn't seem it'd be good to calculate it every time. I'd rather have it calculated beforehand.
So if I have this "ranking data" table, then it seems I should have tables for each host with all their reviews. Otherwise, it seems just having one large table (LiveReviews) is better.
Hope this makes sense!
What is the best way? I'm pretty new to MySQL.
I must admit that I don't understant exactly what you're trying to achive. What exactly are your users supposed to review?
However I do have two pieces of advice:
Never organize stuff by placing it in different tables. It's heresy in the world of relational databases! And your queries will en up uneccesary complicated.
And consider doing calculation in the DB, not in PHP.
Learning database modeling is a good idea. Entity-Relationship diagrams are not hard but very useful.
Good luck.
Related
I've recently built a simple survey with a competition element. To enter the competition, the user was required to enter their email address, under the promise of anonymity.
In an effort to anonymize the data, the emails were stored in a separate table, with no foreign keys linking to survey data.
However, with a little knowledge, you can see it's quite easy to merely line up the two result sets and correlate the owner of the survey data. Everybody also opted in, which makes this task even easier. MySQL, even without timestamps and auto-increment columns maintains the inserted order.
So it got me to wondering, is there a clever way of preventing this? Some method of randomising the emails table on insert?
Obviously I know this could probably be done with a App-side callback, but I was looking for something more elegant on the MySQL side.
I will describe a problem using a specific scenario:
Imagine that you create a website towhich users can register,
and after they register, they can send Private Messages to each other.
This website enables every user to maintain his own Friends list,
and also maintain a Blocked Users list, from which he prefers not to get messages.
Now the problem:
Imagine this website getting to several millions of users,
and let's also assume that every user has about 10 Friends in the Friends table, and 10 Blocked Users in the Blocked Users table.
The Friends list Table, and the Blocked Users table, will become very long,
but worse than that, every time when someone wants to send a message to another person "X",
we need to go over the whole Blocked Users table, and look for records that the user "X" defined - people he blocked.
This "scanning" of a long database table, each time a message is sent from one user to another, seems quite inefficient to me.
So I have 2 questions about it:
What are possible solutions for this problem?
I am not afraid of long database tables,
but I am afraid of database tables that contain data for so many users,
which means that the whole table needs to be scanned every time, just to pull out a few records from it for that specific user.
A specific solution that I have in my mind, and that I would like to ask about:
One solution that I have in mind for this problem, is that every user that registers to the website, will have his own "mini-database" dynamically (and programmatically) created for him,
that way the Friends table, an the Blocked Users table, will contain only records for him.
This makes scanning those table very easy, because all the records are for him.
Does this idea exist in Databases like MS-SQL Server, or MySQL? And If yes, is it a good solution for the described problem?
(each user will have his own small database created for him, and of course there is also the main (common) database for all other data that is not user specific)
Thank you all
I would wait on the partitioning and on creating mini-database idea. Is your database installed with the data, log and temp files on different RAID drives? Do you have clustered indexes on the tables and indexes on the search and join columns?
Have you tried any kind of reading Query Plans to see how and where the slowdowns are occurring? Don't just add memory or try advanced features blindly before doing the basics.
Creating separate databases will become a maintenance nightmare and it will be challenging to do the type of queries (for all users....) that you will probably like to do in the future.
Partitioning is a wonderful feature of SQL Server and while in 2014 you can have thousands of partitions you probably (unless you put each partition on a separate drive) won't see the big performance bump you are looking for.
SQL Server has very fast response time for tables (especially for tables with 10s of millions of rows (in your case the user table)). Don't let the main table get too wide and the response time will be extremely fast.
Right off the bat my first thought is this:
https://msdn.microsoft.com/en-us/library/ms188730.aspx
Partitioning can allow you to break it up into more manageable pieces and in a way that can be scalable. There will be some choices you have to make about how you break it up, but I believe this is the right path for you.
In regards to table scanning if you have proper indexing you should be getting seeks in your queries. You will want to look at execution plans to know for sure on this though.
As for having mini-DB for each user that is sort of what you can accomplish with partitioning.
Mini-Database for each user is a definite no-go zone.
Plus on a side note A separate table to hold just Two columns UserID and BlockedUserID both being INT columns and having correct indexes, you cannot go wrong with this approach , if you write your queries sensibly :)
look into table partitioning , also a well normalized database with decent indexes will also help.
Also if you can afford Enterprise Licence table partitioning with the table schema described in last point will make it a very good , query friendly database schema.
I did it once for a social network system. Maybe you can look for your normalization. At the time I got a [Relationship] table and it just got
UserAId Int
UserBId Int
RelationshipFlag Smallint
With 1 million users and each one with 10 "friends" that table got 10 millions rows. Not a problem since we put indexes on the columns and it can retrieve a list of all "related" usersB to a specific userA in no time.
Take a good look on your schema and your indexes, if they are ok you DB ill not got problems handling it.
Edit
I agree with #M.Ali
Mini-Database for each user is a definite no-go zone.
IMHO you are fine if you stick with the basic and implement it the right way
I'm working on basic restaurant reservation system and was thinking about using Amazon DynamoDB for this project. That being said, I'm not even sure if DynamoDB is suitable for something like this or if I should stick to MySQL RDS since some of the queries may be quite complex.
Functionality I need:
User will submit a "Find Table" form with date, time and party size.
Check RESTAURANT table if date and party size is even allowed.
Check BLOCKED table for blocked dates (holidays and other closures)
Check HOURS table making sure the restaurant is even open.
Check TABLEINFO table for a table based on party size AND compare with RESERVATION table making sure the table is not already reserved for another guest during the same time
Any suggestions or tips on the DynamoDB database design especially hash & range use for something like this?
Or do you think MySQL database is better suited for this kind of app?
This is a quick DB design to give you better idea what I'm trying to do.
I've done a lot with relational databases, and a bit with NoSQL databases (just so you know where I'm coming from). IMHO, NoSQL databases are best suited to scenarios where either one or more are true:
The data is essentially flat (not a lot of relations, almost like an
old flat-file)
There's a definite "parent" type record with "child"
records which are small enough/accessed frequently enough with the
parent to justify embedding them right in the record.
You need the
freedom to add/populate fields within reason. I like to think of it
like inheritance, where every item in the table shares some common
traits (ID, Name), but different records might have different
traits. For example, an online product catalog might have books,
bikes, and MP3 songs in it. A record for a "book" item would have
stuff like ISBN, number of pages, author, etc. A "bike" might have
wheel size and color, and an "MP3" would have length, artist, genre,
etc. You'd never get all of those things in an "item" table in an
RDS without some serious overloading or leaving fields empty. A
NoSQL database would allow you to store all of that info in the
table, and only for the items that need it.
You can definitely build the schema you include with your question using the indexing abilities of Dynamo, but you'd be trying to make a NoSQL database act like a RDS.
That said: I myself would try it with Dynamo first as a learning experience. :)
I am designing a system which has a database for storing users and information related to the users. More specifically each user in the table has very little information. Something like Name, Password, uid.
Then each user has zero or more containers, and the way I've initially done this is to create a second table in the database which holds containers and have a field referencing the user owning it. So something like containerName, content, owner.
So a query on data from a container would look something like:
SELECT content
FROM containers
WHERE (containerName='someContainer' AND owner='someOwner');
My question is if this is a good way, I am thinking scalability say that we have thousands of users with say... 5 containers each (however each user could have a different number of containers, but 5 would probably be a typical case). My concern is that searching through the database will become slow when there is 5 entries out of 5*1000 entries I could ever want in one query. (We may typically only want a specific container's content from our query and we are looking into the database with basically a overhead of 4995 entries, am I right? And what happen if I subscribed a million users, it would become a huge table which just intuitively feel like a bad idea.
A second take on it which I had would be to have tables per user, however that doesn't feel like a very good solution either since that would give me 1000 tables in the database which (also by intuition) seem like a bad way to do it.
Any help in understanding how to design this would be greatly appreciated, I hope it's all clear and easy to follow.
The accepted way of handling this is by creating an INDEX on the owner field. That way, MySQL optimized queries for owner = 'some value' conditions.
See also: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
You're right in saying that a 1000 tables is not scalable. Once you start reaching a few million records you might want to consider doing sharding (split up records into several locations based on user attributes) ... but by that time you'd already be quite successful I think ;-)
If it is an RBMS(like Oracle / MySQL) datbase, you can create indexes on columns that are frequently queried to optimize the table traversal and query. Indexes are automatically created for PRIMARY and (optionally for) FOREIGN keys.
I am working on an app right now which has the potential to grow quite large. The whole application runs through a single domain, with customers being given sub-domains, which means that it all, of course, runs through a common code-base.
What I am struggling with is the database design. I am not sure if it would be better to have a column in each table specifying the customer id, or to create a new set of tables (in the same database), or to create a complete new database per customer.
The nice thing about a "flag" in the database specifying the customer id is that everything is in a single location. The downfalls are obvious- Tables can (will) get huge, and maintenance can become a complete nightmare. If growth occurs, splitting this up over several servers is going to be a huge pain.
The nice thing about creating new tables it is easy to do, and also keeps the tables pretty small. And since customers data doesn't need to interact, there aren't any problems there. But again, maintenance might become an issue (Although I do have a migrations library that will do updates on the fly per customer, so that is no big deal). The other issue is I have no idea how many tables can be in a single database. Does anyone know what the limit is, and what the performance issues would be?
The nice thing about creating a new database per customer, is that when I need to scale, I will be able to, quite nicely. There are several sites that make use of this design (wordpress.com, etc). It has been shown to be effective, but also have some downfalls.
So, basically I am just looking for some advice on which direction I should (could) go.
Single Database Pros
One database to maintain. One database to rule them all, and in the darkness - bind them...
One connection string
Can use Clustering
Separate Database per Customer Pros
Support for customization on per customer basis
Security: No chance of customers seeing each others data
Conclusion
The separate database approach would be valid if you plan to support customer customization. Otherwise, I don't see the security as a big issue - if someone gets the db credentials, do you really think they won't see what other databases are on that server?
Multiple Databases.
Different customers will have different needs, and it will allow you to serve them better.
Furthermore, if a particular customer is hammering the database, you don't want that to negatively affect the site performance for all your other customers. If everything is on one database, you have no damage control mechanism.
The risk of accidentally sharing data between customers is much smaller with separate database. If you'd like to have all data in one place, for example for reporting, set up a reporting database the customers cannot access.
Separate databases allow you to roll out, and test, a bugfix for just one customer.
There is no limit on the amount of tables in MySQL, you can make an insane amount of them. I'd call anything above a hundred tables per database a maintenance nightmare though.
Are you planning to develop a Cloud App?
I think that you don´t need to make tables or data bases by customer. I recommend you to use a more scalable relational database management system. Personally I don´t know the capabilities of MySQL, but i´m pretty sure that it should support distributed data base model in order to handle the load.
creating tables or databases per customer can lead you to a maintenance nightmare.
I have worked with multi-company databases and every table contains customer ids and to access its data we develop views per customer (for reporting purposes)
Good luck,
You can do whatever you want.
If you've got the customer_id in each column, then you've got to write the whole application that way. That's not exactly true as there should be enough to add that column only to some tables, the rest could be done using some simple joins.
If you've got one database per user, there won't be any additional code in the application so that could be easier.
If you take to first approach there won't be a problem to move to many databases as you can have the customer_id column in all those tables. Of course then there will be the same value in this column in each table, but that's not a problem.
Personally I'd take the simple one customer one database approach. Easier to user more database servers for all customers, more difficult to show a customer data that belongs some other customer.