Database design to allow for time - mysql

So, I want a table of data I have to be represented in time.
That is, I want to collect the data over time and have it record when that is.
Currently, the table has an id, a column with a category name, and a column for a value.
There are about ten categories - each of them has a value associated with the number of occurrences right now.
What's the most efficient way to denote time in this circumstance? Assign a timestamp column to each row? And then when I want to show this information, merely collect via timestamp? I'm really flying blind here.
Would this require having multiple, repeated values of the category column?
This is all in MySQL.
Let me clarify:
Three columns:
ID | Category | Value
I want to record the values over time - I am going to be making a parse once per hour.
The categories will be like:
Happy
Sad
Angry
etc
How do I record time data with this? I want to keep the values per hour.
I was thinking about just repeating all of the categories with timestamp data. Would that be the best idea?
So you'd have
Category | Value | Time
Happy | 0 | 072012
Happy | 2 | 072112
Happy | 1 | 072212
etc - this is what I'm thinking should be done to store constantly changing data.

What is the timestamp, data related to the category or a child of the category? If it is related to the category then adding a timestamp field to the category is probably best. If it is a concept that is now (or in the future) related to transactions in the child (so child would already have this info), I'd probably use a query/view (or a materialized view if there were performance issues). It might also depend on how often you see that timestamp changing and likely data access requirements on subordinate tables.

MeasuredAt is the timestamp
You may use FeelingName instead of int ID too, it will just use more space.
EDIT

Related

How should I store another table's row in order to have statistics data in the frontend?

I have a table full of businesses each with a scannable QR Code, and another table that stores the scans the users make. Right now, the scan table schema looks like this:
id | user_id | business_id | scanned_date
If I want to create charts and analytics in the front-end of my Application for statistics about business scans I'd just get the business_id and get the business info with it, but the problem is that if a business' data is ever changed then the statistical data will also change, and it shouldn't be this way.
The first thing that came to my mind in order to have static data was to store the whole business row as a JSON String in a new column in the scan table, but it doesn't sound like a good practice. (Although storing JSON String is not advised against if the data won't be tampered with, which won't since it's supposed to be static).
Another thing that I thought of was to make a clone table out of the business table's schema, but that'd mean to work twice whenever I want to make changes to the original one since I must also change the cloned one.
You need a way to represent the history of the businesses' data in your database.
You didn't mention what attributes you store in each business's row, so I will guess. Let's say you have these columns
business_id
name
category
qr_code
website
Your problem is this: if you change any attribute of the business, the old value vanishes.
Here's a solution to that problem. Add start and end columns to the table. They should probably have TIMESTAMP data types.
Then, never DELETE rows from the table. When you UPDATE them, only change the value of the end column. Instead add new rows. Let me explain.
For a row to be active at the time NOW(), it must pass these WHERE criteria:
start_date >= NOW()
AND (end_date IS NULL OR end_date < NOW())
Let's say you start with two businesses in the table.
business_id start end name category qr_code website
1 2019-05-01 NULL Joe's tavern lkjhg12 joes.example.com
2 2019-05-01 NULL Acme rockets sdlfj48 acme.example.com
Good: You can count QR code scans day by day with this query
SELECT COUNT(*), DATE(s.scanned_date) day, b.name
FROM business b
JOIN scan s ON b.business_id = s.business_id
AND b.start >= s.scanned_date
AND (b.end IS NULL OR b.end < s.scanned_date)
GROUP BY DATE(s.scanned_date), b.name
Now, suppose Joe sells his tavern and its name changes. To represent that change you must UPDATE the existing row for Joe's to set the end date, and then INSERT a new row with the new data. Afterward, your table looks like this
business_id start end name category qr_code website
(updated) 1 2019-05-01 2019-05-24 Joe's tavern lkjhg12 joes.example.com
(inserted) 1 2019-05-24 NULL Fancy tavern lkjhg12 fancy.example.com
(unchanged) 2 2019-05-01 NULL Acme rockets sdlfj48 acme.example.com
The query above still works, because it takes into account the start and end dates of the changes.
This approach works best when you have many more scans than changes to businesses. That seems likely in this case.
Your business table needs a composite primary key (business_id, start).
Prof. Richard Snodgrass wrote a book on this subject, Developing Time-Oriented Database Applications in SQL, and generously made a pdf available.
I hope I got your question.
You could try having duplicates in the business table. Instead of editing the business, try adding a new one with a new id. When you are editing your business, instead of updating the existing one, you can INSERT a new one. The stats will use the old id and will not get affected by the changes. When you are trying to get the last business info, try sorting them according to their ids to get the last one. That way you won't need a second table for business data.
Edit: If the business id needs to be specific to a business, instead of using the business id, you can add a column that represents the insertion of data to the table. Again, you can use sorting limiting the query to get the last one.
Edit 2:
Removing entities that were inserted a certain amount of time ago
If you don't need the stats from a month ago, you could remove them from businesses to save up space. You can use the new time column you created to get the time difference and check if it is greater than the range you want.

The optimal way to store multiple-selection survey answers in a database

I'm currently working on a survey creation/administration web application with PHP/MySQL. I have gone through several revisions of the database tables, and I once again find that I may need to rethink the storage of a certain type of answer.
Right now, I have a table that looks like this:
survey_answers
id PK
eid
sesid
intvalue Nullable
charvalue Nullable
id = unique value assigned to each row
eid = Survey question that this answer is in reply to
sesid = The survey 'session' (information about the time and date of a survey take) id
intvalue = The value of the answer if it is a numerical value
charvalue = the value of the answer if it is a textual representation
This allowed me to continue using MySQL's mathematical functions to speed up processing.
I have however found a new challenge: storing questions that have multiple responses.
An example would be:
Which of the following do you enjoy eating? (choose all the apply)
Girl Scout Cookies
Bacon
Corn
Whale Fat
Now, when I want to store the result, I'm not sure of the best way to handle it.
Currently, I have a table just for multiple choice options that looks like this:
survey_element_options
id PK
eid
value
id = unique value associated with each row
eid = question/element that this option is associated with
value = textual value of that option
With this setup, I then store my returned multiple selection answers in 'survey_answers' as strings of comma separated id's of the element_options rows that were selected in the survey. (ie something like "4,6,7,9") I'm wondering if that is indeed the best solution, or if it would be more practical to create a new table that would hold each answer chosen, and then reference back to a given answer row which in turn references back to the element and ultimately the survey.
EDIT
for anyone interested, here is the approach I ended up taking (In PhpMyAdmin Relations View):
And a rudimentary query to gather the counts for a multiple select question would look like this:
SELECT e.question AS question, eo.value AS value, COUNT(eo.value) AS count
FROM survey_elements e, survey_element_options eo, survey_answer_options ao
WHERE e.id = 19
AND eo.eid = e.id
AND ao.oid = eo.id
GROUP BY eo.value
This really depends on a lot of things.
Generally, storing lists of comma separated values in a database is bad, especially if you plan to do anything remotely intelligent with that data. Especially if you want to do any kind of advanced reporting on the answers.
The best relational way to store this is to also define the answers in a second table and then link them to the users response to a question in a third table (with multiple entries per user-question, or possibly user-survey-question if the user could take multiple surveys with the same question on it.
This can get slightly complex as a a possible scenario as a simple example:
Example tables:
Users (Username, UserID)
Questions (qID, QuestionsText)
Answers (AnswerText [in this case example could be reusable, but this does cause an extra layer of complexity as well], aID)
Question_Answers ([Available answers for this question, multiple entries per question] qaID, qID, aID),
UserQuestionAnswers (qaID, uID)
Note: Meant as an example, not a recommendation
Convert primary key to not unique index and add answers for the same question under the same id.
For example.
id | eid | sesid | intval | charval
3 45 30 2
3 45 30 4
You can still add another column for regular unique PK if needed.
Keep things simple. No need for relation here.
It's a horses for courses thing really.
You can store as a comma separated string (But then what happens when you have a literal comma in one of your answers).
You can store as a one-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
intvalue Nullable
charvalue Nullable
And then loop over that table. If you picked one answer, it would create one row in this table. If you pick two answers, it will create two rows in this table, etc. Then you would remove the intvalue and charvalue from the survey_answers table.
Another choice, since you're already storing the element options in their own table, is to create a many-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
survey_element_options_id FK
Again, one row per option selected.
Another option yet again is to store a bitmask value. This will remove the need for a many-to-many table.
survey_element_options
id PK
eid FK
value Text
optionnumber unique for each eid
optionbitmask 2 ^ optionnumber
optionnumber should be unique for each eid, and increment starting with one. There will impose a limit of 63 options if you are using bigint, or 31 options if you are using int.
And then in your survey_answers
id PK
eid
sesid
answerbitmask bigint
Answerbitmask is calculated by adding all of the optionbitmask's together, for each option the user selected. For example, if 7 were stored in Answerbitmask, then that means that the user selected the first three options.
Joins can be done by:
WHERE survey_answers.answerbitmask & survey_element_options.optionbitmask > 0
So yeah, there's a few options to consider.
If you don't use the id as a foreign key in another query, or if you can query results using the sesid, try a many to one relationship.
Otherwise I'd store multiple choice answers as a serialized array, such as JSON or through php's serialize() function.

Is it a bad idea to reference a row from a many-to-many table in another many-to-many table

If I have a schema similar to this:
TABLE 1
id
column
other_column
etc
TABLE 2
id
table1_id
some_other_table_id
Is it a good idea to add a third table like this:
TABLE 3
id
table2_id
row_from_another_table_id
EDIT:
To make things clearer, consider a schema like this:
EVENTS
id
name
other_stuff
RANGES
id
time_from
time_to
max_people
etc
EVENTS_PLACES
id
event_id
place_id
What I want to do is to define a time range for an event. But a specific event in a specific place(EVENTS_PLACES) can 'overwrite' this ranges. Also an event can have multiple ranges.
I hope this makes the question a little bit more clear now.
Its always been my impression that a many to many relationship is a violation Boyce-Codd Normal Form and therefore a violation of a good relational database schema.
Therefore, relating data to a link table is, infact, necessary to achieve BCNF and therefore good. If avoiding data update anomolies is good.
On to the specific schema example you presented. I think you want these logical tables (or entities),
-----------------------
EventClass
-----------------------
Id
Name
... Other attributes common to every instance
-
-----------------------
TimeSlot
-----------------------
Id
Start
End
-
-----------------------
Place
-----------------------
Id
Name
Address
MaxAttendance
... etc
-
----------------------
EventInstance
-----------------------
Id
EventClassId
TimeSlotId
PlaceId
PresenterName
...Other attributes specific to the instance
EventInstance is a realtionship between EventClass, TimeSlot and Place, any attributes specific to the EventInstance should be stored on that entity. Any attributes common to a related group of events should be stored on the EventClass attribute.
Its all a question of Database Normalization, generally speaking, the more normalized the data the better. However, there is a case for compromise when performance is a concern, if the desired data is stored in the output format it does make a select query simpler and faster although, updates might be hell.
I would counteract the case for compromise by suggesting that, with the right Indecies, Materialized Views and, indecies on Materialized Views, you can get the best of both worlds. The maintainability of fully normalized data with the speed of performance. Although, it does require some skill and consideration to get the schema right.
So you have a relation between two tables with properties, and you have a subclass of that relation with some more properties. This is rare but possible.
Suppose in your polygamous hetero dating site one or more Woman entities has a relation with one or more Man. These two tables may be coupled with a junction table, Relationship. Now some of them are married, which you consider a special type of relationship. So Marriage is a subclass of Relationship, and the Marriage table has a reference to the id in the Relationship table.
Of course, it may be simpler to solve such situations in another way, for example to simply have two junction tables between Man and Woman. But there are certainly situations in which you would want to extend on the relationship in the junction table.
Another option would be to add a column to your TABLE2 that describes the nature of the connection between "things". For example, a PERSON table and a RELATIONSHIPS table, you model your "objects" in the first table, then the "links" in the seconds, e.g.
+---------+---------+-------+--------+
| link_id | from_id | to_id | type |
+---------+---------+-------+--------+
| 1 | 2 | 3 | Mother |
| 2 | 8 | 3 | Sister |
+---------+---------+-------+--------+
With appropriate indices, this means you can do things like find all relationships for a given person, or find everyone who has a sister, etc. This is a simple example, but it starts to get interesting when the from_id and to_id can be different types of object i.e. not just people.
I'd used this approach in the past when working with a very generic schema that was aggregating data from a variety of other sources and had to be flexible. Clearly there's a trade-off between flexibility and e.g. speed, query complexity. You have to decide whether it's useful in your case.

Storing lots of user information into one MySQL table?

I'm working on a application where each user has to fill out an extensive profile for themselves.
The first part of the user profile consists of about 25 or so fields of general information
The next section of the user profile is a section where they evaluate themselves on a set list of criteria. ie, "Rate how good you are at cooking" and then they tick a radio box from one to five, there is also a check box that the can check if they are 'extra interested' in the activity/subject they rated themselves on.
There are about 40 of these that they rate themselves on.
So my question is, how should I store this information, should there be columns in my users table for every field and item? This would be nearly 70 fields
or should I setup a table for user_profile, and user_self_evaluation, and have the columns for each in there and have a one-one relationship with the users?
Use separate tables. In this way when you update only self evaluation, you does not need to update the user_profile table. The idea here is to separate the often updated fields in another table, leaving the rarely updated on another location. If the table became large, and the username/password is in separate table, the performance of lookup by userid / username won't be affected by the large amount of update queries, nor you'll bring the whole site down if you alter the self_evaluation table.
But if you are planning to add new evaluations, I'd suggest a different design:
user_profile table with the 25 profile field
self_evaluations table, with id and name, and any meta information about the question; with 1 record per evaluation
user_profile_evaluation with userid, evaluationid, score, extra - with one record for each evaluation of the user.
This way your schema will be much more flexible and you won't need to alter the table in order to add another evaluation.
or should I setup a table for user_profile, and user_self_evaluation,
and have the columns for each in there and have a one-one relationship
with the users?
Yes, this is how you should do it, if you know you won't expand the table in the future. The other option is too bad.
If you think you will expand the evaluations in the future, then you can do it like this:
user_self_evaluation table
user_id | evaluation_type | evaluation_value
1 | cooking | 5
1 | singing | 3
2 | cooking | 2
2 | singing | 5
Make the combined columns (user_id, evaluation_type, evaluation_value) a unique or the primary.
I think the latter one is better, a table with 70 columns is really bad-looking and can get really worse if you try to manage it.
When every question is multiple choice you can also add numbers in one field for each answer.
Let's say you've got four questions with 4 choices:
You could save them as 1433 in one column called answerers, (the first question is answer 1, second answer 4, third answer 3, and last but not least question 4 is answer 3)
Just giving you some choices here.
But if I had to choose between one-one relationship and 1 table, I would choose one one relationship because it's easier to manage later on.

MySQL Using Subquery To Get Table Name?

I'm trying to select some data from a MySQL database.
I have a table containing business details, and a seperate one containing a list of trades. As we have multiple trades
business_details
id | business_name | trade_id | package_id
1 | Happy News | 12 | 1
This is the main table, contains the business name, the trade ID and the package ID
shop_trades
id | trade
1 | newsagents
This contains the trade type of the business
configuration_packages
id | name_of_trade_table
1 | shop_trades
2 | leisure_trades
This contains the name of the trade table to look in
So, basically, if I want to find the trade type (e.g., newsagent, fast food, etc) I look in the XXXX_trades table. But I first need to look up the name of XXXX from the configuration_packages table.
What I would normally do is 2 SQL queries:
SELECT business_details.*, configuration_packages.name_of_trade_table
FROM business_details, configuration_packages
WHERE business_details.package_id = configuration_packages.id
AND business_details.id = '1'
That gives me the name of the database table to look in for the trade name, so I look up the name of the table
SELECT trade FROM XXXX WHERE id='YYYY'
Where XXXX is the name of the table returned as part of the first query and YYYY is the id of the package, again returned from the first query.
Is there a way to combine these two queries so that I only run one?
I've used subqueries before, but only on the SELECT side of the query - not the FROM side.
Typically, this is handled by a union in a single query.
Normalization gets you to a logical model. This helps better understand the data. It is common to denormalize when implementing the model. Subtypes as you have here are commonly implemented in two ways:
Seperate tables as you have, which makes retrieval difficult. This results in your question about how to retreive the data.
A common table for all subtypes with a subtype indicator. This may result in columns which are always null for certain subtypes. It simplifies data access, and may alter the way that the subtypes are handled in code.
If the extra columns for a subtype are relatively rarely accessed, then you may use a hybrid implementation where the common columns are in the type table, and some or all of the subtype columns are in a subtype table. This is more complex to code.
That's not possible, and it sounds like a problem with your model.
Why don't you put shop_trades and leisure_traces into the same table with one column to distinct between the two?
If this is possible, try this
SELECT trade
FROM (SELECT 'TABLE_NAME' FROM 'INFORMATION_SCHEMA'.'TABLES'
WHERE 'TABLE_SCHEMA'='*schema name*')
WHERE id='YYYY'
UPDATE:
I think the code I have above is not possible. :|