Recommended Database Structure for User's Information - mysql

I am trying to figure out how I should go about building my database structure (tables) for user information on my website. The kinds of information that I will be storing (at this time anyways) are:
About Me
Birthday (January, 1, 1970)
Sex (Male/Female)
Interested In: (Male, Female, Both)
Relationship Status: (Single, In a Relationship, Engaged, Married)
Website: (mywebsite.com)
From: (Cupertino, California)
So this is the type of information I will be storing for now. My question basically is, should I have this be one table only? Or would it be better to split the information up depending on what it was (my users have a unique ID which would go along with each table of information, obviously). So I'm not sure if I should have a table exclusively for Birthdays with the columns: userID, Month, Day, Year; or what.

If a user only needs to store one piece of information for an attribute, then you don't need a separate table for it. For example, a user only has one birthday. The only reason you would need a separate Birthdays table would be if you want to store multiple birthdays for the same userid. Each one of the attributes you've listed look like they'd be fine in one Users table.
As for splitting up Birthdays into the columns: userID, Month, Day, Year, it all depends on how you're going to use that information. Will you ever need to know just the Month, Day, or Year that a user's birthday falls on? If that's a common need, you might want to store them separately. It's usually not, so you probably just want to store it as a single Date value.
Note: You can take a look at the schema used by Stack Overflow by checking out the Data Explorer. They keep a similar collection of data in one Users table.

In the vast majority of cases, I've seen what you're asking being stored in one table - usually user or users.
Perhaps including a number of other elements too:
user id (unique)
registration date
status (live/expired/banned)
user hash
plus a variety of others...
Honestly - It's dependent on what you're building and how it's built, but my advice would be to start simple.
On your point about birthdays, just store the date in mysql date format:
YYYY-MM-DD
That way, you can manipulate it in a variety of ways using mysql functions.
Hope this helps.

Related

Is there a best practice for storing data for a database object (model) that will change or be deleted in the future (Django)?

I am building an order management system for an online store and would like to store information about the Product being ordered.
If I use a Foreign Key relationship to the Product, when someone changes the price, brand, supplier etc. of the Product or deletes it, the Order will be affected as well. I want the order management system to be able to display the state of the Product when it was ordered even if it is altered or deleted from the database afterwards.
I have thought about it long and hard and have come up with ideas such as storing a JSON string representation of the object; creating a duplicate Product whose foreign key I then use for the Order etc. However, I was wondering if there is a best practice or what other people use to handle this kind of situation in commercial software?
PS: I also have other slightly more complex situations, for instance, I would like the data for a User object attached to the Order to change as the User changes but then never get deleted when the User is deleted. An answer to the above question would definitely give me a good starting point.
This price-change problem is commonly handled in RDBMS (SQL) commerce applications by doing two things.
inserting rows into an order_detail table when an order is placed. Each row of that table contains the particulars of the item as sold: item_id, item_count, unit_price, total_price, unit_weight, total_weight, tax_status, and so forth. So, the app captures what actually was sold, and at what price. A later price change doesn't mess up sales records. You really have to do this.
a price table containing item_id, price, start_time, end_time. You retrieve the current price something like this:
SELECT item.item, price.price
FROM item
JOIN price ON item.item = price.item
AND price.start_date <= NOW()
AND (price.end_date > NOW() OR price.end_date IS NULL)
This approach allows you to keep track of historical prices, and also to set up future price changes. But you still copy the price into the order_detail table.
The point is: once you've accepted an order, its details cannot change in the future. You copy the actual customer data (name, shipping address, etc) into a separate order table from your current customer table when you accept the order, and (as mentioned above) the details of each item into an order_detail table.
Your auditors will hate you if you don't do this. Ask me how I know that sometime.
I would recommend creating attributes for the Order model and extracting the data you need one by one into those attributes while you are saving the model and then implementing a historical data table where you store JSONFields or some other version of the Product etc. when it is created or updated; that way people can refer to the historical data table if need be. This would be more efficient usage than storing the full fledged representation of the Product in the Order object as time taken to create the historical data is essentially charged to the admin creating the Product rather than the customer creating the Order. You can even create historical data objects in the background using threads etc. when you get to those advanced levels.
While it is hard answering your question without seeing your models.py at least, I will suggest archiving the results. You can add a boolean field called historical which defaults to False. When an order is made you need to set the previous order's (or orders') historical value to True in your view set or function.
Here, historical=True means the record is being archived. You can filter on this historical column to display what you want when. Sorry this is just a high-level outline.

How to store recent usage frequency in MySQL

I'm working on the Product Catalog module of an Invoicing application.
When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalog.
How can I store this "usage recency/frequency" in the database?
I'm thinking about adding a new field recency which would be increased by 1 every time the product was used, and decreased by 1/(count of all products), when an other product is used. Then use this recency field for ordering, but it doesn't seem to me the best solution.
Can you help me what is the best practice for this kind of problem?
Solution for the recency calculation:
Create a new column in the products table, named last_used_on for example. Its data type should be TIMESTAMP (the MySQL representation for the Unix-time).
Advantages:
Timestamps contains both date and time parts.
It makes possible VERY precise calculations and comparisons in regard
to dates and times.
It lets you format the saved values in the date-time format of your
choice.
You can convert from any date-time format into a timestamp.
In regard to your autocomplete fields, it allows you to filter
the products list as you wish. For example, to display all products
used since [date-time]. Or to fetch all products used between
[date-time-1] and [date-time-2]. Or get the products used only on Mondays, at 1:37:12 PM, in the last two years, two months and three
days (so flexible timestamps are).
Resources:
Unix-Time
The DATE, DATETIME, and TIMESTAMP Types
How should unix timestamps be stored in int columns?
How to convert human date to unix timestamp in Mysql?
Solution for the usage rate calculation:
Well, actually, you are not speaking about a frequency calculation, but about a rate - even though one can argue that frequency is a rate, too.
Frequency implies using the time as the reference unit and it's measured in Hertz (Hz = [1/second]). For example, let's say you want to query how many times a product was used in the last year.
A rate, on the other hand, is a comparison, a relation between two related units. Like for example the exchange rate USD/EUR - they are both currencies. If the comparison takes place between two terms of the same type, then the result is a number without measurement units: a percentage. Like: 50 apples / 273 apples = 0.1832 = 18.32%
That said, I suppose you tried to calculate the usage rate: the number of usages of a product in relation with the number of usages of all products. Like, for a product: usage rate of the product = 17 usages of the product / 112 total usages = 0.1517... = 15.17%. And in the autocomplete you'd want to display the products with a usage rate bigger than a given percentage (like 9% for example).
This is easy to implement. In the products table add a column usages of type int or bigint and simply increment its value each time a product is used. And then, when you want to fetch the most used products, just apply a filter like in this sql statement:
SELECT
id,
name,
(usages*100) / (SELECT sum(usages) as total_usages FROM products) as usage_rate
FROM products
GROUP BY id
HAVING usage_rate > 9
ORDER BY usage_rate DESC;
Here's a little study case:
In the end, recency, frequency and rate are three different things.
Good luck.
To allow for future flexibility, I'd suggest the following additional (*) table to store the entire history of product usage by all users:
Name: product_usage
Columns:
id - internal surrogate auto-incrementing primary key
product_id (int) - foreign key to product identifier
user_id (int) - foreign key to user identifier
timestamp (datetime) - date/time the product was used
This would allow the query to be fine tuned as necessary. E.g. you may decide to only order by past usage for the logged in user. Or perhaps total usage within a particular timeframe would be more relevant. Such a table may also have a dual purpose of auditing - e.g. to report on the most popular or unpopular products amongst all users.
(*) assuming something similar doesn't already exist in your database schema
Your problem is related to many other web-scale search applications, such as e.g. showing spell corrections, related searches, or "trending" topics. You recognized correctly that both recency and frequency are important criteria in determining "popular" suggestions. In practice, it is desirable to compromise between the two: Recency alone will suffer from random fluctuations; but you also don't want to use only frequency, since some products might have been purchased a lot in the past, but their popularity is declining (or they might have gone out of stock or replaced by successor models).
A very simple but effective implementation that is typically used in these scenarios is exponential smoothing. First of all, most of the time it suffices to update popularities at fixed intervals (say, once each day). Set a decay parameter α (say, .95) that tells you how much yesterday's orders count compared to today's. Similarly, orders from two days ago will be worth α*α~.9 times as today's, and so on. To estimate this parameter, note that the value decays to one half after log(.5)/log(α) days (about 14 days for α=.95).
The implementation only requires a single additional field per product,
orders_decayed. Then, all you have to do is to update this value each night with the total daily orders:
orders_decayed = α * orders_decayed + (1-α) * orders_today.
You can sort your applicable suggestions according to this value.
To have an individual user experience, you should not rely on a field in the product table, but rather on the history of the user.
The occurrences of the product in past invoices created by the user would be a good starting point. The advantage is that you don't need to add fields or tables for this functionality. You simply rely on data that is already present anyway.
Since it is an auto-complete field, maybe past usage is not really relevant. Display n search results as the user types. If you feel that results are better if you include recency in the calculation of the order, go with it.
Now, implementation may defer depending on how and when product should be displayed. Whether it has to be user specific usage frequency or application specific (overall). But, in both case, I would suggest to have a history table, which later you can use for other analysis.
You could design you history table with atleast below columns:
Id | ProductId | LastUsed (timestamp) | UserId
And, now you can create a view, which will query this table for specific time range (something like product frequency of last week, last month or last year) and will give you highest sold product for specific time range.
Same can be used for User's specific frequency by adding additional condition to filter by Userid.
I'm thinking about adding a new field recency which would be increased
by 1 every time the product was used, and decreased by 1/(count of all
products), when an other product is used. Then use this recency field
for ordering, but it doesn't seem to me the best solution.
Yes, it is not a good practice to add a column for this and update every time. Imagine, this product is most awaiting product and people love to buy it. Now, at a time, 1000 people or may be more requested for this product and for every request you are going to update same row, since to maintain the concurrency database has to lock that specific row and update for each request, which is definitely going to hit your database and application performance instead you can simply insert a new row.
The other possible solution is, you could use your existing invoice table as it will definitely have all product and user specific information and create a view to get frequently used product as I mentioned above.
Please note that, this is an another option to achieve what you are expecting. But, I would personally recommend to have history table instead.
The scenario
When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalogue.
your suggested solution
How can I store this "usage recency/frequency" in the database?
If it is a web application, don't store it in a Database in your server. Each user has different choices.
Store it in the user's browser as Cookie or Localstorage because it will improve the User Experience.
If you still want to store it in MySQL table,
Do the following
Create a column recency as said in question.
When each time the item used, increase the count by 1 as said in question.
Don't decrease it when other items get used.
To get the recent most used item,
query
SELECT * FROM table WHERE recence = (SELECT MAX(recence) FROM table);
Side note
Go for the database use only if you want to show the recent most used products without depending the user.
As you aren't certain on wich measure to choose, and it's rather user experience related problem, I advice you have a number of measures and provide a user an option to choose one he/she prefers. For example the set of available measures could include most popular product last week, last month, last 3 months, last year, overall total. For the sake of performance I'd prefer to store those statistics in a separate table which is refreshed by a scheduled job running every 3 hours for example.

Proper way to model user groups

So I have this application that I'm drawing up and I start to think about my users. Well, My initial thought was to create a table for each group type. I've been thinking this over though and I'm not sure that this is the best way.
Example:
// Users
Users [id, name, email, age, etc]
// User Groups
Player [id, years playing, etc]
Ref [id, certified, etc]
Manufacturer Rep [id, years employed, etc]
So everyone would be making an account, but each user would have a different group. They can also be in multiple different groups. Each group has it's own list of different columns. So what is the best way to do this? Lets say I have 5 groups. Do I need 8 tables + a relational table connecting each one to the user table?
I just want to be sure that this is the best way to organize it before I build it.
Edit:
A player would have columns regarding the gear that they use to play, the teams they've played with, events they've gone to.
A ref would have info regarding the certifications they have and the events they've reffed.
Manufacturer reps would have info regarding their position within the company they rep.
A parent would have information regarding how long they've been involved with the sport, perhaps relations with the users they are parent of.
Just as an example.
Edit 2:
**Player Table
id
user id
started date
stopped date
rank
**Ref Table
id
user id
started date
stopped date
is certified
certified by
verified
**Photographer / Videographer / News Reporter Table
id
user id
started date
stopped date
worked under name
website / channel link
about
verified
**Tournament / Big Game Rep Table
id
user id
started date
stopped date
position
tourney id
verified
**Store / Field / Manufacturer Rep Table
id
user id
started date
stopped date
position
store / field / man. id
verified
This is what I planned out so far. I'm still new to this so I could be doing it completely wrong. And it's only five groups. It was more until I condensed it some.
Although I find it weird having so many entities which are different from each other, but I will ignore this and get to the question.
It depends on the group criteria you need, in the case you described where each group has its own columns and information I guess your design is a good one, especially if you need the information in a readable form in the database. If you need all groups in a single table you will have to save the group relevant information in a kind of object, either a blob, XML string or any other form, but then you will lose the ability to filter on these criteria using the database.
In a relational Database I would do it using the design you described.
The design of your tables greatly depends on the requirements of your software.
E.g. your description of users led me in a wrong direction, I was at first thinking about a "normal" user of a software. Basically name, login-information and stuff like that. This I would never split over different tables as it really makes tasks like login, session handling, ... really complicated.
Another point which surprised me, was that you want to store the equipment in columns of those user's tables. Usually the relationship between a person and his equipment is not 1 to 1 and in most cases the amount of different equipment varies. Thus you usually have a relationship between users and their equipment (1:n). Thus you would design an equipment table and there refer to the owner's user id.
But after you have an idea of which data you have in your application and which relationships exist between your data, the design of the tables and so on is rather straitforward.
The good news is, that your data model and database design will develop over time. Try to start with a basic model, covering the majority of your use cases. Then slowly add more use cases / aspects.
As long as you are in the stage of planning and early implementation phasis, it is rather easy to change your database design.

MYSQL Database Schema Question

I need opinions on the best way to go about creating a table or collection of tables to handle this unique problem. Basically, I'm designing this site with business profiles. The profile table contains all your usual things such as name, uniqueID, address, ect. Now, the whole idea of the site is that it's going to be collecting a small string of informative text. I want to allow the clients to be able to store one per date, with as many as 30 days in advance. The program is only going to show the information from the current date on forward, with expired dates not being shown.
The only way I can really see this being done is a table consisting of the uniqueID, date, and the informative block of text, but this creates pretty extensive queries. Eventually this table is going to be at least 20 times larger than the table of businesses in the first place as these businesses are going to be able to post up to 30 items in this table using their uniqueID.
Now, imagine the search page brings up a list of businesses in the area, it's then got to query the new table for all of those ids to get that block of information I want to show based on the date. I'm pretty sure it would be a rather intensive couple of queries just to show a rather simple block of text, but I imagine this is how status updates work for social networking sites in general? Does facebook store updates in a table of updates tied to a users ID number or have they come up with a better way?
I'm just trying to gain a little more insight into DB design, so throw out any ideas you might have.
The only way I can really see this being done is a table consisting of the uniqueID, date, and the informative block of text...
Assuming you mean the profile uniqueID, and not a unique ID for the text table, you're correct.
As pascal said in his comment, you'd need a primary index on uniqueID and date. A person could only enter one row of text for a given date.
If you want to retrieve the next text row for a person, your SQL query would have the following clauses:
WHERE UNIQUE_ID = PROFILE.UNIQUE_ID
AND DATE >= CURRENT_DATE
LIMIT 1
Since you have an index on uniqueID and date, this should be a fast query.
If you want to retrieve the next 5 texts for a particular person, you'd just have to make one change:
WHERE UNIQUE_ID = PROFILE.UNIQUE_ID
AND DATE >= CURRENT_DATE
LIMIT 5

Storing user data as text or ID based using lookups?

Is there any benefit to storing Days of weeks, Months, Week number, user age, etc as Lookups vs plain text entry into the database? I am creating a social website with some analytic and planning to use a Question table, Answer table and Question_Answer table to store all the data like Gender, Birth months, Age, etc so I can give each an ID to use throughout the system but some older projects I worked on people always stored these are normal text entry only. So I am seeing which design is better to go with to store all the system and user fixed list data which may or may not be used later for reporting using various metrics. If using lookus then how deep to go, do i need to create days of weeks, days of year, weeks of month, etc if I want to create a report like: Comparing number of photos shared on the first of every month for a given set of users vs last day or every month for the same users?
Given that you are using MySQL, I suggest populating a date helper table, as suggested in the top-rated answer to this question: How to fill date gaps in MySQL?
The question itself should explain why you would want this table.