I'm setting up a system where for every user (1000+), I want to add a set of values every single day.
Hypotetically:
A system where I can log when Alice and Bob woke up and what they had for dinner on the August 1st 2019 or 2024.
Any suggestions on how to best structure the database tables?
A person table with a primary person ID?
rows: n
A date table with a primary date ID?
rows: m
And a personDate table the person ID and date ID as foreign keys?
rows n x m
I don't think u need a date table unless u want to use it to make specific queries easier. Such as left join against the date to see what days you are missing events. Nevertheless, I would stick to the DATE or DATETIME as the field and avoid making a separate surrogate foreign key. It won't save any space and will potentially perform worse and will be more difficult to use for the developer.
This seems simple and fine to me. I wouldn't worry too much about the performance based upon the number of elements alone. You can insert a billion records with no problem and that implies a very large site.
Just don't insert records if the event didn't happen. In other words you want your database to grow in relation to the real usage. Avoid growth based upon phantom events and you should be okay.
person
person_id
action
action_id
personAction
person_id
action_id
action_datetime
Related
I have a table as such:
id entity_id first_year last_year sessions_attended age
1 2020 1996 2008 3 34.7
2 2024 1993 2005 2 45.1
3 ... ... ...
id is auto-increment primary key, and entity_id is a foreign key that must be unique for the table.
I have a query that calculates first and last year of attendance, and I want to be able to update this table with fresh data each time it is run, only updating the first and last year columns:
This is my insert/update for "first year":
insert into my_table (entity_id, first_year)
( select contact_id, #sd:= year(start_date)
from
( select contact_id, event_id, start_date from participations
join events on participations.event_id = events.id where events.event_type_id = 7
group by contact_id order by event_id ASC) as starter)
ON DUPLICATE KEY UPDATE first_year_85 = #sd;
I have one similar that does "last year", identical except for the target column and the order by.
The queries alone return the desired values, but I am having issues with the insert/update queries. When I run them, I end up with the same values for both fields (the correct first_year value).
Does anything stand out as the cause for this?
Anecdotal Note: This seems to work on MySQL 5.5.54, but when run on my local MariaDB, it just exhibits the above behavior...
Update:
Not my table design to dictate. This is a CRM that allows custom fields to be defined by end-users, I am populating the data via external queries.
The participations table holds all event registrations for all entity_ids, but the start dates are held in a separate events table, hence the join.
The variable is there because the ON DUPLICATE UPDATE will not accept a reference to the column without it.
Age is actually slightly more involved: It is age by the start date of the next active event of a certain type.
Fields are being "hard" updated as the values in this table are being pulled by in-CRM reports and searches, they need to be present, can't be dynamically calculated.
Since you have a 'natural' PK (entity_id), why have the id?
age? Are you going to have to change that column daily, or at least monthly? Not a good design. It would be better to have the constant birth_date in the table, then compute the ages in SELECT.
"calculates first and last year of attendance" -- This implies you have a table that lists all years of attendance (yoa)? If so, MAX(yoa) and MIN(yoa) would probably a better way to compute things.
One rarely needs #variables in queries.
Munch on my comments; come back for more thoughts after you provide a new query, SHOW CREATE TABLE, EXPLAIN, and some sample data.
I am converting a spreadsheet to a database but how do i accommodate multiple values for a field?
This is a database tracking orders with factories.
Import PO# is the unique key. sometimes 1 order will have 0,1,2,3,4 or more customers requiring that we place their price tickets on the product in the factory. every order is different. what's the proper way to accommodate multiple values in 1 field?
Generally, having multiple values in a field is bad database design. Maybe a one to many relationship will work in this scenario.
So you will have an Order table with PO# as the primary key,
Then you will have a OrderDetails table with the PO# as a foriegn key. i.e. it will not be designated as a primary key.
For each row in the Order table you will have a unique PO# that will not repeat across rows.
In the OrderDetails table you will have a customer per row and because the PO# is not a primary key, it can repeat across rows. This will allow you to designate multiple customers per order. Therefore each row will have its own PriceTicketsOrdered field so you can know per customer what the price is.
Note that each customer can repeat across rows in the OrderDetails table as long as its for a different PO# and/or product.
This is the best I can tell you based on the clarity of your question.
Personally, I normally spend time desinging my database on paper or using some drawing software like visio before I start implementing my database in a specific software like MySql pr PostgreSql.
Reading up on ER Diagrams(Entity Relationship diagrams) might help you.
You should also read up on Database normalization. Probably you should read up on database normalization first.
here is a link that might help:
http://code.tutsplus.com/articles/sql-for-beginners-part-3-database-relationships--net-8561
I have 40 students and each have enrolled in five courses. Every course tracks attendance: the date on which each student was present. I'm thinking to create a separate table for each student. Is there any efficient way to do this?
Do NOT create a table for each student.
What you should have is a Student table, a Course table, and an Attendance table. The Attendance table should have columns for the student key, the course key, the date, and the status (present, tardy, absent, etc).
This will be far more efficient than using separate tables for each student.
You could write a stored proc that takes a comma-separated list of your student names and builds the tables from that. However, creating 40 tables that are identical except for the name seems really inefficient. SQL can handle large row sets, no problem. Even if you put all of the students into the same table and they had an attendance row for all 365 days of the year (40 students x 5 courses x 365 days) is only 73,000 rows/year; that's nothing for SQL.
Also, if you ever wanted to do any reporting with multiple student tables, you will need to target EVERY table instead of just targeting your one main table and grouping by studentID.
In short, yes you can write a proc that will dynamically create identical tables for each student. Is it a good (read: efficient) idea? Personally, I don't think so. I think you could get way more performance (losing some because of larger searches but gaining some for not having to JOIN any tables) and reporting ability in a much shorter amount of time/work by keeping it all on one table.
I am making a small system to clean up the database. Every person that visits the site gets put in the db, but if he/she doesn't register, he/she should be removed from the database with a cronjob or so if the time when he/she first visited the site is longer than 2 days. The date is stored in MySQL as a timestamp but looks like this: 2013-06-05 01:18:43.
So what I thought about doing was the following:
$STH = $DBH->query("DELETE FROM user WHERE type=0 AND joindate < ".date('d-m-Y H:i:s',time()-$userLife));
Like this, the format of the timestamp is the same as in MySQL. I'm using $userLife so I can easily adjust this var at the beginning of my script.
The problem is however, that I also need to do queries for other tables containing this user_id. For example the table pages:
id | user_id | level | time | views
In this table, it is possible that there are multiple instances of user_id.
Can this be done in one single query, or do I need to first loop through all the users, for each user then do the DELETE-queries for 3 other tables and after that loop delete all the users?
Ideally, you'd define things with a FOREIGN KEY constraint, and define an ON DELETE CASCADE, which automagically will delete all that related data for you. If that's not possible for some reason (stuck with a MyISAM table for instance), you could simply JOIN the related tables (yes, you can delete from more then 1 table at once). If it's your first time doing that, do it on a testdatabase, and certainly not in production.
I'm designing my first good sized project and I want to be sure I'm on the right path here so I thought I would run it by the community.
I have vendors that submit products to companies. The vendors choose which company they want to submit to and that brings up a page of questions chosen by the company. So far I have a Table of companies, a table of vendors, and table of products. Each with their own primary key, easy enough. My issue is with my table called submissions that starts to tie them all together for each new submission. I am trying to get away from having a submission table with a thousand columns because the companies all want to ask different questions. If I have
Table Submissions
submission_id
date
product_id FK
vendor_id FK
company_id FK
and
Table Questions
question_id
question
and to bridge the many to many
Table Questions_Submissions
questions_submissions_id
submission_id FK
question_id FK
answer
Would this be the recommended path for normalization and if so is there any harm having the column answer contain boolean and string results or should I somehow break the boolean questions into another table? I'm expecting millions of rows of data over the next few years and want to be sure I dont design this wrong from the beginning. Thanks for any feedback if you see a glaring error or red flag in this design.
So far I have a Table of companies, a table of vendors, and table of products. Each with their own primary key, easy enough.
Each row has its own id number. That's not quite the same thing as you'd get by normalizing a relation. In a relational database, the important thing is not identifying a row, it's identifying what the row represents.
So, for example, this table
Table Questions
question_id
question
could quite easily end up with data that looks like this.
question_id question
--
1 What is your name?
2 What is your name?
3 What is your name?
4 What is your name?
5 What is your name?
Each row is uniquely identified, but each question (the important thing) is not. You need a unique constraint on {question}.
I have vendors that submit products to companies.
Table Submissions
submission_id
date
product_id FK
vendor_id FK
company_id FK
You need a unique constraint on either {product_id, vendor_id, company_id} or {date, product_id, vendor_id, company_id}.
You also need a table of vendor products. Your table allows a vendor to submit any product--including every product they don't sell--to a company.
The vendors choose which company they want to submit to and that brings up a page of questions chosen by the company. (Emphasis added)
Nothing in your schema stores the questions a company has chosen.
is there any harm having the column answer contain boolean and string results
You can express just about any common data type as a string. But with this structure, you can't constrain boolean values to just two values. If you add the possibility of numeric results, you can't constrain them to sane values, either.
This is certainly one way to go about it, and it looks pretty good.
You can do some clever things with the answer and some if statements in the query to handle the different types of answers, but it does add some complexity to the solution, so you should think about what you are trying to do with the answers.
For Boolean, you can just as easily get away with "true" or "false" in the varchar field, and do a count on them. If you needed to get answers that are numeric or dates, for sums or averages directly in the query, you could split the answer into types.