Use varbinary as foreign key - sql-server-2008

Is it ok to use data type varbinary for foreign keys?
Why?
I have an EvalAnswer table with a FK to a Score table.
The score is sensitive and should be encrypted. The encrypt/decrypt happens in the asp.net (4.0) project and not in sql server (2008), so the data type needs to be varbinary.
EDIT: more info
Of course.
I have these columns: Id, Score, ScoreText, Description, Index
The Id is an incremental counter. (PK)
The Score is the score as number (such as 1).
The ScoreText is the score as a letter (Score 1 equals letter A).
The Description is a comment for every score.
The reason I have it like this is also that there are special situations,
such as one of the questions have only scoring from 1-4, and the rest has 1-5.
So every question has a score 1, but the the description differs from another questions score 1.
So If I have 5 questions, this gives 5*5 rows in the Score table. (All with different description)
When I page load I get the correct scoring (with description) for every dropdownlist. Normally 1-5.
But when the user has saved the scoring, I need to know the earlier saaved score for every question when I page load.
Therefore I have a relation between EvalAnswer and the scoring.
There are questions with relation to the score table which is NOT sensitive.
But some are. And for them I need to hide the relation beetween EvalAnswer and Score.
What might be a bad design is the fact that I use the same table (the score table) as the
one to show the available scoring for every questions.
and also as the one to hold what the user has chosen. (this is the FK from EvalAnswer to Score)
Please advice.......

I suggest adding an ID int column to the Score table and reference this field from the EvalAnswer table.
This means your table scripts will change to
CREATE TABLE Score (
Id int not null identity primary key
, Code varbinary(max) --> The new field containing the encrypted information
, Score int
, ScoreText varchar(5)
, Description varchar(max)
, Index int)
CREATE TABLE EvalAnswer (
Id int not null identity,
ScoreId int not null references Score(Id)
...
)
As you can see, the "Old" Id field has now become the Code field. The new field is an identity column containing a unique number
There is nothing against using a varbinary column in a foreign key, but it will make querying and debugging much harder.
Also note there is a 900 byte limit on the width of an index which might be easier to hit when storing an encrypted blob.

Related

Database design for multiple models?

I have this design.
Table models:
id - primary key
title - varchar(256)
Table model_instances:
id - primary key
model_id - foreign key to app_models.id
title - varchar(256)
Table model_fields:
id - pk
model_id - foreign key to models.id
instance_id - foreign key to model_instances.id
title - name of the field
type - enum [text, checkbox, radio, select, 'etc']
Table model_field_values:
instance_id - forein key model_instance.id
field_id - foreign key to model_fields.id
value - text
Also there can be many values for some field (like for multiple select dropdown)
The problem is: value is always text field, because I want to store different types of data (text, datetime, integer) and this table contains all values for all instances of all models.
For example, if I have 10 models and every model has 1000 instances with 10 fields then model_field_values (at minimum) would contain 100000 rows, if some fields are multiple, then it would contain (120000-150000 rows).
SQL's select using value field would be slow.
Solution 1:
For every model create new model_field_values like:
model.id = 1, model_field_values_1
...
model.id = 10, model_field_values_10
Solution 2:
Because model_fields contains all fields for model, we can create model_field_values like this
model_fields for model.id=1 (by primary key): 1 - text, 2 - integer, 3 - datetime, 4 - smalltext
Fields for model_field_values_1: field_1 text, field_2 integer, field_3 datetime, field_4 varchar(256)
This solution is not good for fields with multiple values, because every multiple value should have another table with link to the row in model_field_values_1, but it is good for searching through database because mysql would use native datatypes in where clauses (not text fields).
May be I miss something? May be there is a better design?
This database would be used in crm-system, where user can create different model with many instances in these models, so I can not preconfigure all tables with all columns.
Note: 200,000 rows (two tenths of a megarow) is, in the usual operation of MySQL, a medium sized table. It's generally possible to index such a table fairly efficiently. http://use-the-index-luke.com/
That being said, I think I understand your problem. It is, in the jargon of object-oriented design, polymorphism.
You have this model_field_value table, containing
instance_id
field_id
value
Your problem is, the value's native data type is sometimes VARCHAR(255), sometimes DATETIME or maybe TIMESTAMP, and sometimes INT.
And you'll sometimes need to do queries like this one
SELECT fv.instance_id
FROM model_field_value fv
WHERE fv.field_id = something
AND fv.value >= '2017-01-01'
AND fv.value < '2018-01-01'
to find DATETIME values that happened in calendar year 2017. For example.
This is generally a pain in the neck with key/value storage like what you need. For a query like my example to be sargable, you need to be able to put an index on a DATETIME column. But if you don't have such a column, you can't index it. Duh.
Here's a suggestion. Give your table these columns.
instance_id INT pk fk
field_id INT pk fk
value VARCHAR(255) a text representation of every value.
value_double DOUBLE a numeric representation of every numeric value, or NULL
value_ts TIMESTAMP a timestamp value if possible, or NULL
This table will contain redundant data, and you'll have to be very careful when you're writing it to make sure it's correct. But you will be able to put indexes on the value_ts and value_double columns, so you can make those kinds of queries sargable.
Just an idea.

How to store a data whose type can be numeric, date or string in mysql

We're developing a monitoring system. In our system values are reported by agents running on different servers. This observations reported can be values like:
A numeric value. e.g. "CPU USAGE" = 55. Meaning 55% of the CPU is in
use).
Certain event was fired. e.g. "Backup completed".
Status: e.g. SQL Server is offline.
We want to store this observations (which are not know in advance and will be added dynamically to the system without recompiling).
We are considering adding different columns to the observations table like this:
IntMeasure -> INTEGER
FloatMeasure -> FLOAT
Status -> varchar(255)
So if the value we whish to store is a number we can use IntMeasure or FloatMeasure according to the type. If the value is a status we can store the status literal string (or a status id if we decide to add a Statuses(id, name) table).
We suppose it's possible to have a more correct design but would probably become to slow and dark due to joins and dynamic table names depending on types? How would a join work if we can't specify the tables in advance in the query?
I haven't done a formal study, but from my own experience I would guess that more than 80% of database design flaws are generated from designing with performance as the most important (if not only) consideration.
If a good design calls for multiple tables, create multiple tables. Don't automatically assume that joins are something to be avoided. They are rarely the true cause of performance problems.
The primary consideration, first and foremost in all stages of database design, is data integrity. "The answer may not always be correct, but we can get it to you very quickly" is not a goal any shop should be working toward. Once data integrity has been locked down, if performance ever becomes an issue, it can be addressed. Don't sacrifice data integrity, especially to solve problems that may not exist.
With that in mind, look at what you need. You have observations you need to store. These observations can vary in the number and types of attributes and can be things like the value of a measurement, the notification of an event and the change of a status, among others and with the possibility of future observations being added.
This would appear to fit into a standard "type/subtype" pattern, with the "Observation" entry being the type and each type or kind of observation being the subtype, and suggests some form of type indicator field such as:
create table Observations(
...,
ObservationKind char( 1 ) check( ObservationKind in( 'M', 'E', 'S' )),
...
);
But hardcoding a list like this in a check constraint has a very low maintainability level. It becomes part of the schema and can be altered only with DDL statements. Not something your DBA is going to look forward to.
So have the kinds of observations in their own lookup table:
ID Name Meaning
== =========== =======
M Measurement The value of some system metric (CPU_Usage).
E Event An event has been detected.
S Status A change in a status has been detected.
(The char field could just as well be int or smallint. I use char here for illustration.)
Then fill out the Observations table with a PK and the attributes that would be common to all observations.
create table Observations(
ID int identity primary key,
ObservationKind char( 1 ) not null,
DateEntered date not null,
...,
constraint FK_ObservationKind foreign key( ObservationKind )
references ObservationKinds( ID ),
constraint UQ_ObservationIDKind( ID, ObservationKind )
);
It may seem strange to create a unique index on the combination of Kind field and the PK, which is unique all by itself, but bear with me a moment.
Now each kind or subtype gets its own table. Note that each kind of observation gets a table, not the data type.
create table Measurements(
ID int not null,
ObservationKind char( 1 ) check( ObservationKind = 'M' ),
Name varchar( 32 ) not null, -- Such as "CPU Usage"
Value double not null, -- such as 55.00
..., -- other attributes of Measurement observations
constraint PK_Measurements primary key( ID, ObservationKind ),
constraint FK_Measurements_Observations foreign key( ID, ObservationKind )
references Observations( ID, ObservationKind )
);
The first two fields will be the same for the other kinds of observations except the check constraint will force the value to the appropriate kind. The other fields may differ in number, name and data type.
Let's examine an example tuple that may exist in the Measurements table:
ID ObservationKind Name Value ...
==== =============== ========= =====
1001 M CPU Usage 55.0 ...
In order for this tuple to exist in this table, a matching entry must first exist in the Observations table with an ID value of 1001 and an observation kind value of 'M'. No other entry with an ID value of 1001 can exist in either the Observations table or the Measurements table and cannot exist at all in any other of the "kind" tables (Events, Status). This works the same way for all the kind tables.
I would further recommend creating a view for each kind of observation which will provide a join of each kind with the main observation table:
create view MeasurementObservations as
select ...
from Observations o
join Measurements m
on m.ID = o.ID;
Any code that works solely with measurements would need to only hit this view instead of the underlying tables. Using views to create a wall of abstraction between the application code and the raw data greatly enhances the maintainability of the database.
Now the creation of another kind of observation, such as "Error", involves a simple Insert statement to the ObservationKinds table:
F Fault A fault or error has been detected.
Of course, you need to create a new table and view for these error observations, but doing so will have no impact on existing tables, views or application code (except, of course, to write the new code to work with the new observations).
Just create it as a VARCHAR
This will allow you to store whatever data you require in it. It is much more difficult to do queries based on the number in the field such as
Select * from table where MyVARCHARField > 50 //get CPU > 50
However if you think you want to do this, then either you need a field per item or a generalised table such as
Create Table
Description : Varchar
ValueType : Varchar //Can be String, Float, Int
ValueString: Varchar
ValueFloat: Float
ValueInt : Int
Then when you are filling the data you can put your value in the correct field and select like this.
Select Description ,ValueInt from table where Description like '%cpu%' and ValueInt > 50
I had a used two columns for a similar problem. First column was for data type and second value contained data as a Varchar.
First column had codes ( e.g. 1= integer, 2 = string, 3 = date and so on), which could be combined to compare values. ( e.g. find the max integer where type=1)
I did not have joins, but i think you can use this approach. It will also help you if tomorrow more data types are introduced.

Best Practice: find row for unique id from multiple tables

our database contain 5+ tables
user
----------
user_id (PK) int NOT NULL
name varchar(50) NOT NULL
photo
--------
photo_id (PK) int NOT NULL
user_id (FK) int NOT NULL
title varchar(50) NOT NULL
comment
-------
comment_id (PK) int NOT NULL
photo_id int NOT NULL
user_id int NOT NULL
message varchar(50) NOT NULL
all primary key id's are unique id's.
all data are linked to http://domain.com/{primary_key_id}
after user visit the link with id, which is unique for all tables.
how should i implement to find what table this id belongs to?
solution 1
select user_id from user where user_id = {primary_key_id}
// if not found, then move next
select photo_id from photo where photo_id = {primary_key_id}
... continue on, until we find which table this primary key belongs to.
solution 2
create object table to hold all the uniqe id and there data type
create trigger on all the tables for AFTER INSERT, to create row in object table with its data type, which was inserted to a selected table
when required, then do select statement to find the table name the id belongs to.
second solution will be double insert. 1 insert for row to actual table with complete data and 2 insert for inserting unique id and table name in object table, which we created on step 1.
select type from object_table where id = {primary_key_id}
solution 3
prepend table name + id = encode into new unique integer - using php
decode id and get the original id with table name (even if its just as number type)
i don't know how to implement this in php, but this solution sounds better!? what are your suggestion?
I don't know what you mean by Facebook reference in the comments but I'll explain my comment a little further.
You don't need unique ID's across five DB tables, just one per table. You have couple of options how to create your links (you can create the links yourself can you?):
using GET variables: http://domain.com/page.html?pk={id}&table={table}
using plain URL: http://domain.com/{id}{table}
Depending on the syntax of the link you choose the function to parse it. You can for example use one or both of the following:
http://php.net/manual/en/function.explode.php
http://www.php.net/manual/en/function.parse-url.php
When you get the simple model working you may add encoding/decoding/hashing functions. But do you really need them? And in what level? (I have no experience in that area so I'll shut up now.)
Is it actually important to maintain uniqueness across tables?
If no, just implement the solution 3 if you can (e.g. using URL encoding).
If yes, you'll need the "parent" table in any case, so the DBMS can enforce the uniqueness.
You can still try to implement the solution 3 on top of that,
or add a type discriminator1 there and you'll be able to (quickly) know which table is referenced for any given ID.
1 Take a look at the lower part of this answer. This is in fact a form of inheritance.

opinions and advice on database structure

I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
And I have a list of categories to break these rows up into, for example:
Original Cat1 Cat2 Cat3 Cat4 Cat5
---------------------------------------
a:b:c:d:e a b c d e
As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).
Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.
Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:
Table: Generated
generated_id int - unique id for each row generated
generated_timestamp datetime - timestamp of when row was generated
last_updated datetime - timestamp of when row last updated
generated_method varchar(6) - method in which row was generated (manual or auto)
original_string varchar (255) - the original string
Table: Categories
category_id int - unique id for category
category_name varchar(20) - name of category
Table: Category_Values
category_map_id int - unique id for each value (not sure if I actually need this)
category_id int - id value to link to table Categories
generated_id int - id value to link to table Generated
category_value varchar (255) - value for the category
Basically the idea is when I parse a row, I will insert a new entry into table Generated, as well as X entries in table Category_Values, where X is however many categories there currently are. And the category names are stored in another table Categories.
What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.
Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like
select * from Generated where original_string = '$string'
// id is put into $id
and then
select * from Category_Values where generated_id = '$id'
...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?
My suggestion:
Table: Generated
id unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method ENUM('manual','auto')
original_string varchar (255)
Table: Categories
id unsigned int autoincrement primary key
category_name varchar(20)
Table: Category_Values
id unsigned int autoincrement primary key
category_id int
generated_id int
category_value varchar (255) - value for the category
FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id
Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html
I think this solution is perfect for what you want to do. The Categories list is now flexible so that you can add new categories or retire old ones (I would recommend thinking long and hard about it before agreeing to delete a category - would you orphan record or remove them too, etc.)
Basically, I'm saying you are right on target. The structure is simple but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).

How to design a rating system which permits one vote per user considering the performance?

I am using Ruby on Rails\MySQL and I would like to implement a rating system with the following conditions\features considering a large amount of users and articles:
Each article have 3 rating criterias
Each criteria is a binary function (examples: good\bad, +1 \ -1, ...)
Each user can vote one only time per criteria and per article
I would like to know in those conditions what are the best approaches\techniques to design\think of the database (for exmple in UML) in order to ensure a optimal performance (response time of a database query, CPU overloading, ...).
P.S.: I think a rating system as that working for the Stackoverflow website.
Create a table for the ratings:
CREATE TABLE vote
(
userId INT NOT NULL,
articleId INT NOT NULL,
criterion ENUM('language', 'usefulness', 'depth') NOT NULL, -- or whatever
value BOOLEAN NOT NULL,
PRIMARY KEY (userId, articleId, criterion)
)
This will allow each user to cast at most one vote per article per criterion.
criterion has type enum which allows only three different criteria. This constraint is on metadata level: this means that if you want to add the criteria, you will have to change the table's definition rather than data.