I am wondering how to approach this problem.
We have a profile table in our database that will be populated by a process that reads
an uploaded excel document and then dumps the columns and the data in their rows into the
user's profile (the only thing common between all such excel spreadsheets is an email address).
Accordingly, we can't really predict what a given user's profile is going to look like.
How do I create my profile table(s)?
Sorry I have to create another answer, but comment wont let me create the ascii sketch
----------
| user |
----------
| id pk|
| name |
| ..... |
----------
----------------
| preference |
----------------
| user_id fk | <-- reference user.id
| header |
| value |
----------------
csv_row=1,churk,height,11,weight,500lb,width,22,...
OR csv_row=1,churk,height=11,wieght=500lb,width=22......
this will yield 1 row in user table, user.id = 1, user.name = Churk
at least 3 rows in preference. {[1,height,11],[1,weight,500lb],[1,width,22]}
So when you query the DB, all you need is
SELECT * FROM user JOIN preference on preference.user_id = user.id WHERE user.name = 'Churk';
If you have someway to process these data using a programming language, and not blinding doing mappings, the it should be pretty simple.
User_table with an ID, and some fix info such as name and what not.
Then you have a profile table that has a user_id foreign key, and key pair value, header / value
How will this data be queried later? As much as I hate the practice, this may be a case of needing to store csv data in a column.
Update:
This might be a good fit for an Entity Attribute Value schema. I'm not really a fan of EAV either, but at least it's less evil than csv data in a column.
Related
We are creating the database schema for an online quiz. There would be few types of questions asked to each user, for example:
Type 1 question. The question with an answer as yes/no.
Type 2 question. The question with an answer as yes/no. If the user selects yes, then the user has to write few words as to why he says yes.
Type 3 question. The question with an answer as yes/no. If the user selects yes, then the user is given few choices to select. i.e. Is your car accidental? If the user selects yes, then we show check box like engine failed, body repainted, flood damaged, tires damaged etc. The user can select one or more of these choices.
We want to design database for this keeping in mind that it should be flexible to add/delete/update particular type of question in the online quiz and data should be retrieved quickly to check the answer of any given user.
We are considering following DB design:
Table User: Id | Name | Email | Mobile.
Table Question: Id | Type | Title. (This table contains list of all questions)
Table QuestionChoice: Id | QuestionId | Value. (This table contains value of choices for type 3 questions i.e. engine failed, body repainted etc.)
Table UserResponse: Id | UserId | QuestionId | IsYes | TextForYes
I have few concerns like in table "QuestionChoice", should we keep each value in separate rows or we can save them as comma separated values in single row only?
Is it the best database design OR any alternate?
Users:
id | name | Email | Mobile
Questions:
id | title | type
Options:
id | questionId | value
one [questionId] to many [value].
Depending on how much data you have, and if a choice [value] can be used for multiples questions, you could create a table
Values
id | name
and make options as
id | questionId | valueId
userAnswers:
id | userId | questionId | choice | optionId | input
Either keep all three in one table, or make one table for each kind of questions.
[or even one table for input, another for choice/option]
Wouldn't you also want a table to "link" your questions?
[or that part handled with code (?)]
linkQuestions: [from and to are questionId, choice is yes/no]
from | to | optionId | choice
And for your concerns:
Please try to avoid saving data as comma separated values in a single row.
It is bad practice, for "good" reasons.
[ie: Managing your data will be a pain, sql is relational, and easier to use that way]
I'm designing a database (MySQL) that will manage a fleet of vehicles.
Company has many garages across the city, at each garage, vehicles gets serviced (operation). An operation can be any of 3 types of services.
Table Vehicle, Table Garagae, Table Operation, Table Operation Type 1, Table Operation Type 2, Table Operation type 3.
Each Operation has the vehicle ID, garage ID, but how do I link it to the the other tables (service tables) depending on which type of service the user chooses?
I would also like to add a billing table, but I'm lost at how to design the relationship between these tables.
If I have fully understood it I would suggest something like this (first of all you shouldn't have three operation tables):
Vehicles Table
- id
- garage_id
Garages Table
- id
Operations/Services Table
- id
- vehicle_id
- garage_id
- type
Customer Table
- id
- service_id
billings Table
- id
- customer_id
You need six tables:
vechicle: id, ...
garage: id, ...
operation: id, vechicle_id, garage_id, operation_type (which can be
one of the tree options/operations available, with the possibility to be extended)
customer: id, ...
billing: id, customer_id, total_amount
billingoperation: id, billing_id, operation_id, item_amount
You definitely should not creat three tables for operations. In the future if you would like to introduce a new operation that would involve creating a new table in the database.
For the record, I disagree with everyone who is saying you shouldn't have multiple operation tables. I think that's perfectly fine, as long as it is done properly. In fact, I'm doing that with one of my products right now.
If I understand, at the core of your question, you're asking how to do table inheritance, because Op Type 1 and Op Type 2 (etc.) IS A Operation. The short answer is that you can't. The longer answer is that you can't...at least not without some helper logic.
I assume you have some sort of program that will pull data from the database, rather than you just writing sql commands by hand. Working under that assumption, let's use this as a subset of your database:
Garage
------
GarageId | GarageLocation | etc.
---------|----------------|------
1 | 123 Main St. | XX
Operation
---------
OperationId | GarageId | TimeStarted | TimeEnded | OperationTypeDescId | OperationTypeId
------------|----------|-------------|-----------|---------------------|----------------
2 | 1 | noon | NULL | 2 | 2
OperationTypeDesc
-------------
OperationTypeDescId | Name | Description
--------------------|-------|-------------------------
1 | OpFoo | Do things with the stuff
2 | OpBar | Do stuff with the things
OpFoo
-----
OpID | Thing1 | Thing2
-----|--------|-------
1 | 123 | abc
OpBar
-----
OpID | Stuff1 | Stuff2
-----|--------|-------
1 | 456 | def
2 | 789 | ghi
Using this setup, you have the following information:
A garage has it's information, plain and simple
An operation has a unique ID (OperationId), a garage where it was executed, an ID referencing the description of the operation, and the OperationType ID (more on this in a moment).
A pre-populated table of operation types. Each type has a unique ID (OperationTypeDescId), the name of the operation, and a human-readable description of what that operation is.
1 table for each row in OperationTypeDesc. For convenience, the table name should be the same as the Name column
Now we can begin to see where inheritance comes into play. In the operation table, the OperationTypeId references the OpId of the relevant table...the "relevant table" is determined by the OperationTypeDescId.
An example: Let's say we had the above data set. In this example we know that there is an operation happening in a garage at 123 Main St. We know it started at noon, and has not yet ended. We know the type of operation is "OpBar". Since we know we're doing an OpBar operation instead of an OpFoo operation, we can focus on only the OpBar-relevant attributes, namely stuff1 and stuff2. Since the Operations's OperationTypeId is 2, we know that Stuff1 is 789 and Stuff2 is ghi.
Now the tricky part. In your program, this is going to require Reflection. If you don't know what that is, it's the practice of getting a Type from the NAME of that type. In our example, we know what table to look at (OpBar) because of its name in the OperationTypeDesc table. Put another way, you don't automatically know what table to look in; reflection tells you that information.
Edit:
Csaba says "In the future if you would like to introduce a new operation that would involve creating a new table in the database". That is correct. You would also need to add a new row to the OperationTypeDesc table. Csaba implies this is a bad thing, and I disagree - with a few provisions. If you are going to be adding a new operation type frequently, then yes, he makes a very good point. you don't want to be creating new tables constantly. If, however, you know ahead of time what types of operations will be performed, and will very rarely add new types of operations, then I maintain this is the way to go. All of your info common to all operations goes in the Operation table, and all op-specific info goes into the relevant "sub-table".
There is one more very important note regarding this. Because of how this is designed, you, the human, must be aware of the design. Whenever you create a new operation type, it's not as simple as creating the new table. Specifically, you have to make sure that the new table name and the OperationTypeDesc "Name" entry are the same. Think of it as an extra constraint - an "INTEGER" column can only contain ints, otherwise the db won't allow the data. In the same manner, the "Name" column can only contain the name of an existing table. You the human must be aware of that constraint, because it cannot be (easily) automatically enforced.
Sorry if my question seems unclear, I'll try to explain.
I have a column in a row, for example /1/3/5/8/42/239/, let's say I would like to find a similar one where there is as many corresponding "ids" as possible.
Example:
| My Column |
#1 | /1/3/7/2/4/ |
#2 | /1/5/7/2/4/ |
#3 | /1/3/6/8/4/ |
Now, by running the query on #1 I would like to get row #2 as it's the most similar. Is there any way to do it or it's just my fantasy? Thanks for your time.
EDIT:
As suggested I'm expanding my question. This column represents favourite artist of an user from a music site. I'm searching them like thisMyColumn LIKE '%/ID/%' and remove by replacing /ID/ with /
Since you did not provice really much info about your data I have to fill the gaps with my guesses.
So you have a users table
users table
-----------
id
name
other_stuff
And you like to store which artists are favorites of a user. So you must have an artists table
artists table
-------------
id
name
other_stuff
And to relate you can add another table called favorites
favorites table
---------------
user_id
artist_id
In that table you add a record for every artist that a user likes.
Example data
users
id | name
1 | tom
2 | john
artists
id | name
1 | michael jackson
2 | madonna
3 | deep purple
favorites
user_id | artist_id
1 | 1
1 | 3
2 | 2
To select the favorites of user tom for instance you can do
select a.name
from artists a
join favorites f on f.artist_id = a.id
join users u on f.user_id = u.id
where u.name = 'tom'
And if you add proper indexing to your table then this is really fast!
Problem is you're storing this in a really, really awkward way.
I'm guessing you have to deal with an arbitrary number of values. You have two options:
Store the multiple ID's in a blob object in JSON format. While MySQL doesn't have JSON functions built in, there are user defined functions that will extract values for you, etc.
See: http://blog.ulf-wendel.de/2013/mysql-5-7-sql-functions-for-json-udf/
Alternatively, switch to PostGres
Add as many columns to your table as the maximum number of ID's you expect to have. So if /1/3/7/2/4/8/ is the longest entry, have 6 columns in your table. Reason this is bad: you'll have sparse columns that'll unnecessarily slow your tables.
I'm sure you could write some horrific regex to accomplish the task, but I caution on using complex regex's on enormous tables.
Okay... I am working to create a mobile app that allows two groups of users to do two different things.
Essentially, the goal of the project is this:
Group A users: create account/pswd and can enter THEIR data into the database and/or change THEIR existing data (but ONLY their data)
Group B users: can SEARCH the database for information that is inserted by Group A. Down the track I'd like to set it up so that they can create an user account so they can also SAVE key information to THEIR account for faster recall (so they don't have to look up the info they search for regularly) -- but that is down the track.
I have a relational database set up using the mySQL that is available with my web-hosting account (it seemed to be the easiest way to go).
I'm just trying to work out how to handle the user account creation/authentication bit, because each group should ONLY be able to CHANGE/INSERT data to their own account, but can search for information submitted by anyone else.
Thanks in advance.
Use mysql facilites to manage permissions: roles, users and privileges.
Navigate through mysql official documentation (i.e. http://dev.mysql.com/doc/workbench/en/wb-adding-roles.html).
You can create two roles: groupA that can INSERT/SELECT/UPDATE one set of tables, groupB that can do the same but in another set of tables.
You can assign INSERT privilege in just the table you want, but SELECT privileges on all the tables.
Hope this info brings you some light...
Firstly this sounds like a huge project, I am sure there are frameworks out there that can do this for you. However, if you are trying to do this on your own continue reading.
This can be done several ways. I will try to be as detailed as possible. This requires SQL as well as application development/Software engineering knowledge.
Step 1: Setup your database
You will need the following tables: All ids are primary keys auto incremented, the other fields can be varchar, except fields that have date in their name
sessions [id, uid, random_token, datecreated]
resourcescope [rid, name]
user [uid, first, last, email, username, salted_pwd]
user_type [id, name, description]
user_resourcescope [id, uid, rid] //lookup table between userid and resourcescope
I prefer using Java or python because you can use dependency injection or decorators. As a result, you don't have to write a lot of code when checking if a user has access.
Putting it all into practice.
1. When a user signs up, you save them into a user database. Depending on the user type, you give them different permissions. Next, you save the user permissions inside the user_resourcescope table.
You should now have the following.
User Table
UID | first | last | email | username | salted_pwd | usertype
1 | james | iri | example#isp.com | jiri1928 | klasdjf8$kljs | 1
UserType table
usetype_id | Name
1 | Basic users
2 | Searcher
ResourceScope Table
rid | Name
1 | FindContent
2 | CreateContent
3 | DeleteContent
User_Resourcescope
id | uid | rid
1 | 1 | 1
2 | 1 | 3
Session
id | uid | random_token | datecreated
1 | 1 | ldkjfald882u3u | 1391274870322
Each resource represents a request within the system. For example,
http://api.myapi.com/content/add - This would be associated with the ResourceScope CreateContent
http://api.myapi.com/content/delete- This would be associated with the ResourceScope CreateDelete
http://api.myapi.com/content/search - This would be associated with the ResourceScope SearchContent
When someone tries to create content, you check if their cred are correct by validating their session information and you check to see if they have the correct permission by checking the User_Resourcescope table.
To prevent users from deleting content that is not theirs. Inside the content table you can add a creator field and put the user id associated with the content. And if someone try to delete content you can check their user id against the creator field.
I have 6 tables. These are simplified for this example.
user_items
ID | user_id | item_name | version
-------------------------------------
1 | 123 | test | 1
data
ID | name | version | info
----------------------------
1 | test | 1 | info
data_emails
ID | name | version | email_id
------------------------
1 | test | 1 | 1
2 | test | 1 | 2
emails
ID | email
-------------------
1 | email#address.com
2 | second#email.com
data_ips
ID | name | version | ip_id
----------------------------
1 | test | 1 | 1
2 | test | 1 | 2
ips
ID | ip
--------
1 | 1.2.3.4
2 | 2.3.4.5
What I am looking to achieve is the following.
The user (123) has the item with name 'test'. This is the basic information we need for a given entry.
There is data in our 'data' table and the current version is 1 as such the version in our user_items table is also 1. The two tables are linked together by the name and version. The setup is like this as a user could have an item for which we dont have data, likewise there could be an item for which we have data but no user owns..
For each item there are also 0 or more emails and ips associated. These can be the same for many items so rather than duplicate the actual email varchar over and over we have the data_emails and data_ips tables which link to the emails and ips table respectively based on the email_id/ip_id and the respective ID columns.
The emails and ips are associated with the data version again through the item name and version number.
My first query is is this a good/well optimized database setup?
My next query and my main question is joining this complex data structure.
What i had was:
PHP
- get all the user items
- loop through them and get the most recent data entry (if any)
- if there is one get the respective emails
- get the respective ips
Does that count as 3 queries or essentially infinite depending on the number of user items?
I was made to believe that the above was inefficient and as such I wanted to condense my setup into using one query to get the same data.
I have achieved that with the following code
SELECT user_items.name,GROUP_CONCAT( emails.email SEPARATOR ',' ) as emails, x.ip
FROM user_items
JOIN data AS data ON (data.name = user_items.name AND data.version = user_items.version)
LEFT JOIN data_emails AS data_emails ON (data_emails.name = user_items.name AND data_emails.version = user_items.version)
LEFT JOIN emails AS emails ON (data_emails.email_id = emails.ID)
LEFT JOIN
(SELECT name,version,GROUP_CONCAT( the_ips.ip SEPARATOR ',' ) as ip FROM data_ips
LEFT JOIN ips as the_ips ON data_ips.ip_id = the_ips.ID )
x ON (x.name = data.name AND x.version = user_items.version)
I have done loads of reading to get to this point and worked tirelessly to get here.
This works as I require - this question seeks to clarify what are the benefits of using this instead?
I have had to use a subquery (I believe?) to get the ips as previously it was multiplying results (I believe based on the complex joins). How this subquery works I suppose is my main confusion.
Summary of questions.
-Is my database setup well setup for my usage? Any improvements would be appreciated. And any useful resources to help me expand my knowledge would be great.
-How does the subquery in my sql actually work - what is the query doing?
-Am i correct to keep using left joins - I want to return the user item, and null values if applicable to the right.
-Am I essentially replacing a potentially infinite number of queries with 2? Does this make a REAL difference? Can the above be improved?
-Given that when i update a version of an item in my data table i know have to update the version in the user_items table, I now have a few more update queries to do. Is the tradeoff off of this setup in practice worthwhile?
Thanks to anyone who contributes to helping me get a better grasp of this !!
Given your data layout, and your objective, the query is correct. If you've only got a small amount of data it shouldn't be a performance problem - that will change quickly as the amount of data grows. However when you ave a large amount of data there are very few circumstances where you should ever see all your data in one go, implying that the results will be filtered in some way. Exactly how they are filtered has a huge impact on the structure of the query.
How does the subquery in my sql actually work
Currently it doesn't work properly - there is no GROUP BY
Is the tradeoff off of this setup in practice worthwhile?
No - it implies that your schema is too normalized.