I need to create a table where each user (approx 60 atm) would have a defined task for each day. Right now the database have one column for each user with the task name in it (which is bad in my opinion as each new user would need to change the scheme of the table) and a "date" column.
A solution would be to have a "user" column and add a "task" column but that would mean there would be 60 (number of current users) rows per day.
I don't really know what's the best situation in this case.
Should I use more columns or more rows?
They're two completely different things, so this comparison doesn't make much sense...
Right now the database have one column for each user
Bad idea. Full stop. A user is a record of data, not a structural element of the database itself. For example, a table of users might contain columns like Username, Email, RegistrationDate, etc. It would not be a single row of data in which you add a column for each new user.
This would be a nightmare to maintain, would render things like Foreign Keys useless (and, honestly, render the entire concept of a relational database useless), would reach resource limits very quickly, etc.
Each record of information is a row, not a column (or table). In this case, each row in your table is a "User Task". It defines (or has a Feorign Key to) a User and defines (or has a Foreign Key to) a Task.
but that would mean there would be 60 (number of current users) rows per day
If the number of records in the table starts to become a problem, you can start looking into things like sharding and partitioning, archiving old data, etc. You've got time though, because "dozens of records per day" is sustainable for thousands of years. (And by then I imagine the hardware will be at least twice as good as it is today.)
Right now the database have one column for each user with the task
name in it (which is bad in my opinion as each new user would need > to change the scheme of the table)
You're right, this is very bad. Using one column for user, one for the task and one for the date, will be much better.
60 rows per day is not much. This means 21.900 rows per years and 219.000 rows in ten years. Mysql is able to handle millions of rows in a table
If you have two indexes, one for user and one for the date, searching for data will be fast enough.
Knowing nothing else about your database or schema, why not create a dimension table to store your users and fact table to track your task details?
That way you can more easily add new users and the tasks table would continue to grow as new facts are added. It would also be very easy to denormalize this model for query and/or reporting purposes.
Adding columns is a nuisance and can be slow. Instead have a table with columns (user, task, etc)
Even "60 rows per second" is not a problem. 600/second might be.
See the tag [pivot-table] for how to turn rows into columns for output display.
Related
Currently, I have 48 fields.
I'm completely new to access. This is how I decided to connect everything together.
It doesn't seem to be very effective. Could somebody help me understand how to normalize this database?
Should I try to put employee information in one table, job information in another table and then have an equipment lookup table?
The current job, last job, and previous job can all the SAME table. If you sort this table by descending job start date, then then you have current, last and previous. You thus don’t need nor want a separate table for each of these which really amounts to the concept of a “job”. If sorting by date is not enough, then you could add a column called Job Type (current, previous, etc.). Again, we still only using the one table.
The same goes for Equipment. You really don’t care if the limit is 3 last, or 300 last. By building a normalized table, then ONE form can edit all types and you save MASSIVE amounts of coding and building of tables, User interface software, and that of building quires to retrieve + show the last 3 jobs in a form.
The fact that your design with FAR LESS cost of development allows 3 or 300 last jobs is really moot. More important if some manager comes along and now wants you to save the last 4 jobs, you don’t have some massive re-design here. And you can on the fly add new job types. So in place of current, and say previous, you can also have un-completed, or failed jobs. So adding new business rules means again you don’t add a new type of job table, but only a “type” to the one column you already using to define the job as current or previous.
Identify like objects and make one table to store all of them. In your design you have three tables for equipment but each item of equipment has the same fields; they should be one table. Similarly for jobs, each job is pretty much the same; they should be one table. The same for departments.
Figure out one or more column in each table that can uniquely identify the row in the table (that is, if you know the values for those columns it is impossible for there ever to be two rows with those values). These are your primary keys for your tables.
Identify cases in which an item in one table needs to "point to" (refer to) an item in another table. In this case, make sure that the referring table has a set of columns that match the referred-to table.
When you've done that, you'll have the beginnings of a correctly factored relational database design.
Using MySQL I have table of users, a table of matches (Updated with the actual result) and a table called users_picks (at first it's always going to be 10 football matches pr. gameweek pr. league because there's only one league as of now, but more leagues will come along eventually, and some of them only have 8 matches pr. gameweek).
In the users_picks table should i store each 'pick' (by pick I mean both 'hometeam score' and 'awayteam score') in a different row, or have all 10 picks in one single row? Both with a FK for user and gameweek. All picks in one row would mean I had columns with appended numbers like this:
Option 1: [pick_id, user_id, league_id, gameweek_id, match1_hometeam_score, match1_awayteam_score, match2_hometeam_score, match2_awayteam_score ... etc]
and that option doesn't quite fill me with joy, and looks a bit stupid. Especially since there's going to be lots of potential NULLs in the db. The second option would mean eventually millions of rows. But would look like this:
Option 2: [pick_id, user_id, league_id, gameweek_id, match_id, hometeam_score, awayteam_score]
What's the best practice? And would it be a PITA to do all sorts of statistics using the second option? eg. Calculating how many matches a user has hit correctly in a specific round, how many alltime correct hits etc.
If I'm not making much sense, I'll try to elaborate anything. I just wan't my table design to be good from the start, so I won't have a huge headache in a couple of months.
Thanks in advance.
The second choice is much better than the first. This is called database normalisation and makes querying easier, not harder. I would suggest reading the linked article, and the related descriptions of the various "normal forms", and aiming for a 3rd Normal Form data structure as a minimum.
To see the flaw in your first option, imagine if there were to be included later a new league with 11 matches. Or 400.
You should read up about database normalization.
When you have a 1:n relation, like in your case one team having many matches, you would create two tables. One table "teams" and a second table "matches" where each row includes the ID of the team which played the match.
In the same manner you should also have separate tables for users, picks and leagues.
Option two is better, provided you INDEX your table properly, since (as you indicate) it will grow quite large. The pick_id is the primary key, but also create an INDEX on the user_id field, as likely the most common query will be
SELECT * FROM `users_pics` WHERE `user_id`=?;
to get all the picks for a given user.
I am creating a new DB in MySQL for an application and wondered if anyone could provide some advice on the following set up. I'll try and simplify things as best as I can.
This DB is designed to store alerts which are related to specific items created by a user. In turn there is the need to store notes related to the items and/or alerts. At first I considered the following structure...
USERS table - to store basic app user info (e.g. user_id. name, email) - this is the only bit I'm fairly certain does not need to be changed
ITEMS table: contains info on particular item (4 fields or so). Contains user_id to indicate which user created/owns this item
ALERTS table: contains info on the alert, item_id to indicate which item the alert is related to, contains user_id to indicate which user created alert
NOTES table: contains note info, user_id of note owner, item_id if associated with an item, alert_id if associated with alert
Relationships:
An item does not always have an an alert associated with it
An item or alert does not always have a note associated with it
An alert is always associated with an item. More than one alert can be associated with the same item.
A note is always associated with an item or alert. More than one note can be associated with the same item or alert.
Once first created item info is unlikely to be updated by a user.
For arguments sake let's say that each user will create an average of 10 items, each item will have an average of 2 alerts associated with it. There will be an average of 2 notes per item/alert.
Very common queries that will be run:
1) Return all items created by a particular user with any associated alerts and notes. Given a user_id this query would span 3 tables
2) Checking each day for alerts that need to be sent to a user's email address. WHERE alert date==today, return user's email address, item name and any associated notes. This would require a query spanning 4 tables which is why I'm wondering if I need to take a different approach...
Option 1) one table to cover items, alerts and notes. user_id owner for each row. Every time you add a note to an item or alert you are repeating the alert and/or item info. Seems a bit wasteful but item and alert info won't be large.
Option 2) I don't foresee the need to query notes (famous last words?) so how about serializing note data so multiple notes are stored in one row in either the item or alert table (or just a combined alert/item table)
Option 3) Anything else you can think of? I'm asking this question as each option I've considered doesn't feel quite right.
I appreciate this is currently a small project and so performance shouldn't be of great concern and I should just go with the 4 tables. It's more that my common queries will end up being relatively complex that makes me think I need to re-evaluate the structure.
I would say that the common wisdom is to normalize to start and denormalize only when performance data suggest that it's necessary.
Make sure that your tables are indexed properly, with foreign key relationships for JOINs.
If you think you'll end up with a lot of data, this might be a good time to think about a partitioning strategy. Partitioning your fast-growing tables by time would be a good first step.
Four tables is not complex. I commonly write report queries that hit 15 or more tables in a database structure that has hundreds of tables (most with millions of records) and I wouldn't even say our dbs are anything more than medium sized (a typical db in our system might have around 200 gigs of data, so not large at all as databases go). Because they are properly indexed, they still run fast unless I am doing very complex calculations. Normalize, don't even consider denormalizing until you are an experienced database designer who knows better than to worry about the number of tables.
What would be the best MySQL type to use to store very long CSV data? say half a million integers of 5 digits or less?
Also, what would be the benefit/drawback of adding a new column to the table instead of adding a new value to the CSV string? Can a MySQL table even have half a million columns?
I would be updating the table either way, one 5 digit integer at a time, and I would need to search through either the CSV string or the columns a lot?
Basically what I'm doing is recording which of my users have voted for a certain idea so no one can vote more than once. There is not a set number of ideas however, it is constantly expanding, and I dont want to add anything to my already fairly big Users table.
Would it be better to create a new table for each idea that will be voted on?
What would be the fastest/least processing intensive route here?
The relationship between users and ideas they've voted on should be represented by a table with a column identifying the user and a column identifying the idea. When a user votes on an idea, you insert a row into this table. The column pair is the primary key of this table, which enforces uniqueness (preventing duplicate votes).
Instead of adding all votes for a user in one row, make one row per vote and user. The table then contain of two columns 1. the user and 2. the idea the user voted for. This solves your problem and also enables you to do more things easier in the future; eg. count number of votes for a certain idea.
I am going to have a database with several (less than 10) "main" tables. Additionally to that I want to have hundreds or thousands tables of the same type (let same "user_1", "user_2", "user_3" and so on). Is it possible to put all these tables in a directory/folder? Or database itself is already considered as a "folder" for tables?
ADDED
Since I go a lot of questions about why I want to do that, I want to elaborate on that. I want to have many tables to optimize query to the database. If I put everything in one table, the table is going to be huge. Than, if I want to extract information about a particular user, I first need to find those rows in the table which have a given user in a given column. And it can be time consuming. I decided to create a table for every user. So, if I need to know something about a user I just read the required information from a "small" table.
To be more specific, I can have 10 000 user and information about a given user can contain 10 000 lines. I do not want to have one table with 100 000 000 lines.
The answer is—you shouldn't be doing this in the first place.
Don't have separate tables for each user—instead, use one table for all your user data, and add a column (e.g. userId) to store information on who it's about.
If you want separate tables based on the user, this tends to be done using an owner or schema concept. In other words, you use:
create table pax.table1 ...
and pax is them the owner of that table. Each user can then have their own data.
If you don't mind everyone seeing the data in each others "folders", you can opt for a single table with a column specifying the particular user but you tend to lose user-based protection in that case.
Having each user's data in their own schema (or owner) means that you can restrict access based on user name. Keep in mind that these are then separate tables so it becomes harder to consolidate data from them should you wish to do so.
It's pretty unusual to have hundreds of thousands of tables, even in the biggest database setups. You might want to consider the possibility that you're doing something unwise. Posting the "why" of this question instead of the "how" will help us in assisting you further.