MySQL Article Database Structure? - mysql

I wanted to create an article database for my users I was wondering how should my MySQL database structure look like?
Here is how my database table structure look like so far.
CREATE TABLE users_articles (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
user_id INT UNSIGNED NOT NULL,
PRIMARY KEY (id)
);

Presumably you will store your articles separately from your users (to satisfy 3NF). To that end, I'd start with something like:
Users:
UserId int primary key.
Other user-specific data (name, address, affiliations, ...).
Articles:
ArticleId int primary key.
UserId references Users(UserId).
ArticleText varchar(big-enough-to-hold-article).
The data types for the primary keys are in your hands (they don't affect the 3NF aspect).
Whether you wish to split the article text into paragraphs or add keywords to articles and so on is expansion from that. This is where you should be starting from.
These are the things that come to mind immediately that I'd be looking at beyond the basic structure given above.
Keywords or search terms for articles, kept in another two tables, one to hold the keywords themselves and the other to hold a many-to-many relationship between keywords and articles.
Synopses of the articles, can simply be another column in the Articles table.
Articles that end up needing more than the maximum space allotted, in which case the article text can be split out to another table with a foreign key reference to Articles(ArticleId) and a sequence number to order the article pieces.

The answer really depends on the spec of your entire application rather than just the article table.
Looking at the create statement in the question, it looks like it could be a join table, many users have many articles (many to many). In that case you may want to use only the user_id and article_id and make them the primary key together, but then where are the user and article tables and what information do you want to store in those tables?
Do you have at least a rough spec of how the entire application will work?
Are there other data elements that will relate to the articles?
Do you need to consider the possibility of expanding the scope of your application in the future?
This article on Database Normalization may help you further.

The article's text could be made searchable by using MySql's support for full-text indexing and searching
But understanding the trade-offs of using such indexes is not necessarily a beginners topic.

Related

A seperate table for the posts which each user has liked - practical or not?

In a social networking site I'm making, I need some way to store which posts a user has 'liked', to ensure they can only 'like' each post one time. I have several ideas.
A seperate table for each user, to store all of the different posts' IDs which they've liked as rows in said table.
A space-seperated string of post IDs as a field in the table users.
A seperate table for each post, storing all of the different users' IDs which have liked said post as rows in the table.
note, users is a table containing all of the site's users, with their ID, username, etc.
-
Initially I liked the idea of a seperate table for each user, but I realised this might be more trouble than it's worth.
So I thought a space-seperated string for each row in users might be a good idea, since I wouldn't have to start working with many more tables (which could complicate things,) but I have a feeling using space-seperated strings would lower performance significantly more than using additional tables, especially with a greater amount of users.
Essentially my question is this: Which, out of the aforementioned methods of making sure a user can only like a post once, is the most practical?
None of these sound like particularly good ideas.
Generally, having to create tables on the fly, be it for users or posts, is a bad idea. It will complicate not only your SQL generation, but also clutter up the data dictionary with loads of objects and make maintaining the database much more complicated than it should be.
A comma-delimited string also isn't a good idea. It breaks 1NF will complicate your queries (or worse - make you right code!) to maintain it.
The sane approach is to use a single table to correlate between users and posts. Each row will hold a user ID and the ID of a post he liked, and creating a composite primary key over the two will ensure that a user can't like a post twice:
CREATE TABLE user_post_likes (
user_id INT, -- Or whatever you're using in the users tables
post_id INT, -- Or whatever you're using in the posts tables
PRIMARY KEY (user_id, post_id),
FOREIGN KEY (user_id) REFERENCES user(id),
FOREIGN KEY (post_id) REFERENCES post(id)
);

When is it okay to not to use PRIMARY KEY?

Suppose I want to create a simple database which lets user create playlists and add multiple songs into it. I just want to be able to find which songs are added in a particular playlist.
song table :
`song_id` INT AUTO_INCREMENT PRIMARY KEY, `song_title` VARCHAR
playlist table :
`playlist_id` INT AUTO_INCREMENT PRIMARY KEY, `playlist_title` VARCHAR
What would be the best option to pull this off?
Add another column to the playlist table and insert comma separated ids of songs into that column. Which I don't think would be a proper relational way to do it but does the job.
or
Create separate table just to store song ids with the playlist id to which it belongs. Like playlist_id INT, song_id INT where both columns are foreign keys.
Now, if the second option is better, should I add another column as a primary key and auto_increment knowing that it won't be useful anywhere? Because I read some articles online and many of them suggests that not having a primary key of a table significantly affects its performance in a negative way.
You should strongly lean towards option two, namely creating a table which relates a playlist ID to a song ID. In this case, you can actually create a primary key which is a composite of the playlist and song ID.
CREATE TABLE playlist_songs (
song_id INT,
playlist_id INT,
PRIMARY KEY (song_id, playlist_id)
)
As to whether you also need an auto increment column on playlist_songs, it would depend on your situation. You might not need it from a business logic point of view since you would likely be manipulating the table using the two columns already there.
There are two aspects to your question - the abstract, philosophical view and practical implications.
Philosophically, the way we decide whether a database design is "good" is to see if it's normalized.
You have two entities in your design - song and playlist. You have two relationships - a song can belong to 0..n play lists, and a play lists contain 0..n songs.
You want to store those facts individually, rather than bundling them together. This means that the bridging table is "best" as it stores a single fact (song x belongs to playlist y), independently of the existence of song or playlist.
The alternative design stores several facts in a single row - "a playlist exists, and has the following songs".
The second philosophical issue is "how do I uniquely identify facts?". In your bridging table, the unique fact is "song x belongs to playlist y". It can only belong to that playlist once (actually, that's probably not true - you may need a column to indicate in which order the song appears).
This means you have a natural, compound key right there in your business domain. Philosophically, this is what you want to use to identify those records, so this should be your primary key.
From a practical point of view, the first question (option one or option two) depends on how your application will work and evolve.
If you ever have to answer the question "in which playlists does this song appear", option 2 is much better - option one would require a where clause like 'where playlist.songs like '% songid,&', which will be very slow.
If you ever have to delete a song, and make sure all the references are deleted too - option 2 is much better. Option one would be slow to find, and the code to update the comma-separated list would be horrible.
If you ever have to insert songs in the middle of the play list, option 2 is much better.
As for the question "how do I assign my primary key" - i think you may have misunderstood the articles. A primary key is a logical concept, and doesn't need to be an auto-incrementing integer. As long as you have good indexes (and indexes are different to primary keys) your performance will be fine.
The 2nd option is FAR preferable.
As to the extra primary key, while not necessary I tend to use one even if just to make it easier to process rows from that table.
For example, say you want to delete a dozen rows you can use IN (comma separated list of ids) rather than a lot of where clauses checking each pair of fields in the rows.
As an aside, there are many reasons the 2nd option is preferable:-
What happens when you want more items in the comma separated list than will fit in the field?
What happens when you want to search for a value in that list? You
cannot index for a value half way through that list.
What happens when you want to hang another value from the item on
the play list? For example the number of times that track has been
played on that playlist?
Etc
I would say option two will be the most beneficial to you. You would then have a table such as the following:
playlist_items table
pi_id INT AUTO_INCREMENT PRIMARY KEY
pi_song_id INT
pi_playlist_id INT
With this you you could then add functionality in the future if required such as:
pi_dateadded DATETIME
In InnoDB keep in mind that you access rows by traversing the primary key index in logical order, so you need to ask how you are looking up rows. Traversing an index is O(log(N)) complexity but if you are using a secondary index you are doing that twice.
Usually having a single column pkey in InnoDB is better, but there may be exceptions.
playlist_table
`playlist_id` INT AUTO_INCREMENT PRIMARY KEY, `playlist_title` VARCHAR
songs_table
`song_id` INT AUTO_INCREMENT PRIMARY KEY, `song_title` VARCHAR,playlist_id INT FOREIGN KEY (playlist_id) REFERENCES playlist_table(playlist_id)
When you want a search use join to find songs
select * from songs_table left join playlist_table on(songs_table.playlist_id=playlist_table.playlist_id)

Database Structure for Inconsistent Data

I am creating a database for my company that will store many different types of information. The categories are Brightness, Contrast, Chromaticity, ect. Each category has a number of data points which my company would like to start storing.
Normally, I would create a table for each category which would store the corresponding data. (This is how I learned to do it). However, Sometimes these categories have "sub-data" which would change the number of fields required in each table.
My question is then how do people handle the inconsistency of data when structuring their databases? Do they just keep adding more tables for extra data or is it something else altogether?
There are a few (and thank goodness only a few) unbendable rules about relational database models. One of those is, that if you don't know what to store, you have a hard time storing it. Chances are, you'll have an even harder time retrieving it.
That said, the reality of business rules is often less clear cut than the ivory tower of database design. Most importantly, you might want or even need a way to introduce a new property without changing the schema.
Here are two feasable ways to go at this:
Use a datastore, that specializes in loose or inexistant schemas
(NoSQL and friends). Explaining this in detail is a subject of a CS
Thesis, not a stackoverflow answer.
My recommendation: Use a separate properties table - here is how
this goes:
Assuming for the sake of argument, your products allways have (unique string) name, (integer) id, brightness, contrast, chromaticity plus sometimes (integer) foo and (string) bar, consider these tables
CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL,
brightness INT,
contrast INT,
chromaticity INT,
UNIQUE INDEX(name)
);
CREATE TABLE properties (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL,
proptype ENUM('null','int','string') NOT NULL default 'null',
UNIQUE INDEX(name)
);
INSERT INTO properties VALUES
(0,'foo','int'),
(0,'bar','string');
CREATE TABLE product_properties (
id INT PRIMARY KEY AUTO_INCREMENT,
products_id INT NOT NULL,
properties_id INT NOT NULL,
intvalue INT NOT NULL,
stringvalue VARCHAR(250) NOT NULL,
UNIQUE INDEX(products_id,properties_id)
);
now your "standard" properties would be in the products table as usual, while the "optional" properties would be stored in a row of product_properties, that references the product id and property id, with the value being in intvalue or stringvalue.
Selecting products including their foo if any would look like
SELECT
products.*,
product_properties.intvalue AS foo
FROM products
LEFT JOIN product_properties
ON products.id=product_properties.product_id
AND product_properties.property_id=1
or even
SELECT
products.*,
product_properties.intvalue AS foo
FROM products
LEFT JOIN product_properties
ON products.id=product_properties.product_id
LEFT JOIN properties
ON product_properties.property_id=properties.id
WHERE properties.name='foo' OR properties.name IS NULL
Please understand, that this incurs a performance penalty - in fact you trade performance against flexibility: Adding another property is nothing more than INSERTing a row in properties, the schema stays the same.
If you're not mysql bound then other databases have table inheritance or arrays to solve certain of those niche cases. Postgresql is a very nice database that you can use as easily and freely as mysql.
With mysql you could:
change your tables, add the extra columns and allow for NULL in the subcategory data that you don't need. This way integrity can be checked since you can still put constraints on the columns. Unless you really have a lot of subcategory columns this way I'd recommend this, otherwise option 3.
store subcategory data dynamically in a seperate table, that has a category_id,category_row_id,subcategory identifier(=type of subcategory) and a value column: that way you can retrieve your data by linking it via the category_id (determines table) and the category_row_id (links to PK of the original category table row). The bad thing: you can't use foreign keys or constraints properly to enforce integrity, you'd need to write hairy insert/update triggers to still have some control there which would push the burden of integrity checking and referential checking solely on the client. (in which case you'd properly be better of going NoSQL route) In short I wouldn't recommend this.
You can make a seperate subcategory table per category table, columns can be fixed or variable via value column(s) + optional subcategory identifier, foreign keys can still be used, best to maintain integrity is fixed since you'll have the full range of constraints at your disposal. If you have a lot of subcategory columns that would otherwise hopefully clutter your regular subcategory table then I'd recommend using this with fixed columns. Like the previous option I'd never recommend going dynamic for anything but throwaway data.
Alternatively if your subcategory is very variable and volatile: use NoSQL with a document database such as mongodb, mind you that you can keep all your regular data in a proper RDBMS and just storeside-data in the document database though that's probably not recommended.
If your subcategory data is in a known fixed state and not prone to change I'd just add the extra columns to the specific category table. Keep in mind that the major feature of a proper DBMS is safeguarding the integrity of your data via checks and constraints, doing away with that never really is a good idea.
If you are not limited to MySQL, you can consider Microsoft SQL server and using Sparse Columns This will allow you to expand your schema to include however many columns you want, without incurring the storage penalty for columns that are not pertinent for a given row.

Representing News Post in mySQL

I'm currently working on a blog for a college news organization. Each post, though, will represent a full show, with multiple contributors and multiple titles.
For example, a post might have three news stories, each with its own title and some contributors for each:
"Story 1" by (id1) and (id2)
"Story 2" by (id3)
"Story 3" by (id4) and (id5)
So for each post, there would be an index (1, 2, 3...) for each individual story, a VARCHAR for the title, and id's that represent contributors, whose details are stored in another "contributors" table. The problem is that I don't know how many stories there will be, or how many contributors there will be per story. It could range from ~3 at the least to up to 6. In case our show expands in the future, I'd like to have the capability to scale up to even more than 6 posts, too.
I want to represent this structure concisely in a mySQL column, but I'm not sure how to do that. One solution would be to create another mySQL table to save the details for each individual story, but I'd prefer to avoid that hassle. The ideal solution would be if I could somehow create an "array" within a mySQL column, which could store (for each story) an index, a string, and multiple id's to show who the contributors are.
Is this possible, or will I have to create a new table to keep track of each story?
Don't use a column - use a table. It can be a simple InnoDB table which doesn't really hurt performance at all. Define a combined primary key (story_id, contributor_id) and insert all contributions in that table.
What you name in your question is called a M:N table. Don't ever go there - it's a very bad thing to do and is, in fact, nearly impossible in relational databases.
Save yourself some future heartburn. Create the extra table. It looks like a table of [Posts] with a one-to-many relationship to [Stories] where [Stories] has a many-to-many relationship to [Contributors].
You could store a comma-delimited string value of contributor ids or story ids in one column, but how, exactly would you relate them? What would seem to be your best bet in that case would be to make it an 'array' of 'arrays', where your main string consisted of pairs of strings strung together through commas.. I (so it's just my opinion, okay?) would avoid using unless totally necessary (can't think of one instance at this time)...
So create your relationships tables. Just to illustrate one approach to the idea:
-- a story may have multiple contributors
CREATE TABLE story_contributor_rel (
story_id INT NOT NULL
, contributor_id INT NOT NULL
)
-- a post may have multiple stories
CREATE TABLE post_story_rel (
post_id INT NOT NULL
, story_id INT NOT NULL
)
Or cheat it a bit, but I'd recommend against this also(!):
-- a less-normalized way
CREATE TABLE post_relationships (
post_id INT NOT NULL
, story_id INT NOT NULL
, contributor_id INT NOT NULL
)
These are just the simplest approaches. Naturally, you'd want to have either additional indentity columns and/or proper indexing and primary key settings, but this is just the way I can illustrate the point I'm driving at better.
Imagine this too.. If you were to put all those relationships in logical columns, then without the application it would not be so easy for anyone to understand what's going on in your tables. If you don't put any logic in the table structures and if you would properly set relationships tracking (meaning relationship tables), then it would appear transparent. One look at these tables and one would not take long enough to understand..
That's just my opinion. :) Cheers!

Is it unreasonable to assign a MySQL database to each user on my site?

I'm creating a user-based website. For each user, I'll need a few MySQL tables to store different types of information (that is, userInfo, quotesSubmitted, and ratesSubmitted). Is it a better idea to:
a) Create one database for the site (that is, "mySite") and then hundreds or thousands of tables inside this (that is, "userInfo_bob", "quotessubmitted_bob", "userInfo_shelly", and"quotesSubmitted_shelly")
or
b) Create hundreds or thousands of databases (that is, "Bob", "Shelly", etc.) and only a couple tables per database (that is, Inside of "Bob": userInfo, quotesSubmitted, ratesSubmitted, etc.)
Should I use one database, and many tables in that database, or many databases and few tables per database?
Edit:
The problem is that I need to keep track of who has rated what. That means if a user has rated 300 quotes, I need to be able to know exactly which quotes the user has rated.
Maybe I should do this?
One table for quotes. One table to list users. One table to document ALL ratings that have been made (that is, Three columns: User, Quote, rating). That seems reasonable. Is there any problem with that?
Use one database.
Use one table to hold users and one table to hold quotes.
In between those two tables you have a table that contains information to match users to quotes, this table will hold the rating that a user has given a quote.
This simple design will allow you to store a practically unlimited number of quotes, unlimited users, and you will be able to match up each quote to zero or more users and vice versa.
The table in the middle will contain foreign keys to the user and quote tables.
You might find it helpful to review some database design basics, there are plenty of related questions here on stackoverflow.
Start with these...
What is normalisation?
What is important to keep in mind when designing a database
How many fields is 'too many'?
More tables or more columns?
Should I use one database, and many
tables in that database, or many
databases and few tables per database?
Neither, you should use one database, with a table for users, a table for quotes, a table for rates, etc.
You then have a column in (e.g.) your quotes table which says which user the quote is for.
CREATE TABLE user (
user INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
...
);
CREATE TABLE quote (
quote INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
user INT(10) UNSIGNED NOT NULL,
...
);
CREATE TABLE rate (
rate INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
user INT(10) UNSIGNED NOT NULL,
...
);
You then use SQL JOINs in your SELECT statements to link the tables together.
EDIT - the above was assuming a many-to-one relationship between users and rates - where there are 'many-to-many' relationships you need a table for each sort of data, and then another table with rows for each User <-> Rate pair.
Two problems with many databases (actually many more, but start with these.)
You can't use parameters for database names.
What will you do whe you make your first change to a table? (Hint: Work X (# databases) ).
And "many tables" suggests you're thinking about tables per user. That's another equally problematic idea.
Neither. You create a single database with a single table for each type of data, then use foreign keys to link the data for each user together. If you had only a few, fixed users, this wouldn't be so bad, but what you're suggesting is simply not at all scalable.
we've got a similar system, having many users and their relevant datas. We've followed a single database and common tables approach. This way you would have a single table holding user information and a table holding all their data. Along with the data we have a reference to the userid which helps us segregate the information.
For comparison, a time when you actually do want separate databases would be when you have multiple webhosting clients that want to do their own things with the database - then you can set up security so they can access only their own data.
But if you are writing the code to interface with the data base, not them, then what you want is normalized tables as described in several other answers here.
Your design is flawed.. You should constrain the tuples using foreign keys to the correct user, rather than adding new entities for each account.
What is the difference between many tables in one database or many databases with same tables? Is it for better security or for different types of backups?
I m not sure about mySQL but in MSSQL it is like this:
If you need to backup databases in different way you need to consider keeping tables in different data files. By default they all are in PRIMARY file. You can specify different storage.
All transactions are hold in tempdb. This is not very good because if it transaction log becomes full then all databases stop functioning. Then you can end up with separate SQL servers for each user. Which is sort of nonsense if you are talking about thousands of clients.
One table with properly created indexes per each required entity set (one table for submitted quotes, one table for submitted rates).
CREATE TABLE quotesSubmtited (
userid INTEGER,
submittime DATETIME,
quote INTEGER,
quotedata INTEGER,
PRIMARY KEY (userid, submittime),
FOREIGN KEY quote REFERENCES quotesList (quoteId),
FOREIGN KEY userid REFERENCES userList (userId)
);
CREATE INDEX idx1 ON quotesSubmitted (quote);
Remember: more indexes you create, slower the updating. So take a closer look at what you use in queries and create indexes for that. A good database optimization tutorial will be of invaluable help in understanding what indexes you need to create (I cannot summarize it in this answer).
I also presume you don't know about JOINs and FOREIGN KEYs so make sure you read about them as well. Quite useful!
Use one database and one table. You will require a table "User" and this table will be linked (Primary--Foreign key) to these table.