compatibility matrix - sql-server-2008

compatibility matrix - sql-server-2008

How can I model a compatibility matrix database in sql server 2008;
i.e. I have 4 groups of design. If I select group 1 and group 4, my choices will influence group 2 and 3 and will make them unselectable because they are not compatible with my choices.
I'm a beginner, so my explanation might seems watery, pls forgive!

Any kind of "matrix" is essentially an M:N relationship, only in this case both "axis" of the matrix represent the same thing ("groups of design"). The M:N relationship of a table with itself can be modeled like this:
Assuming relationships are bidirectional, you should also add: CHECK(GROUP_ID1 < GROUP_ID2) in the COMPATIBILITY table. This would, for example, allow (1, 4), but prevent (4, 1) from being in the table.
The example from your question would be represented by the following data in the database:
GROUP:
GROUP_ID
--------
1
2
3
4
COMPATIBILITY:
GROUP_ID1 GROUP_ID2
--------- ---------
1 4
When user selects a group X, you'd run the following query to find which groups it is compatible with. The remaining groups are in-compatible.
SELECT GROUP_ID1
FROM COMPATIBILITY
WHERE GROUP_ID2 = :X
UNION ALL
SELECT GROUP_ID2
FROM COMPATIBILITY
WHERE GROUP_ID1 = :X
For 1, this would return 4 and for 4 it would return 1. In either of these cases, 2 and 3 would not be returned - a sign they are incompatible.
On the other hand, if you want do disable 2 and 3 when 1 and 4 are selected, but not when just 1 or just 4 is selected, this is a different problem that would be more complicated to model in the relational paradigm. Let me know if that's what you actually need...

Related

How to make an inner join while maintaining unique rows

I have a ternary relationship in which I stablish the relation between Offers, Profiles, and Skills. The ternary relationship table, called ternary for example, has the IDs of the three tables as primary key. It could look something like this:
id_Offer - id_Profile - id_Skill
1 - 1 - 1
1 - 1 - 2
1 - 1 - 3
1 - 2 - 1
2 - 1 - 1
2 - 3 - 2
2 - 1 - 3
2 - 5 - 1
[and so on, there would be more registers for each id_Offer from Offer but I want to limit the example]
So I have 2 offers in total, with a number of profiles in each one.
The table Offer looks something like this:
Offer - business_name
1 - business-1
2 - business-1
3 - business-1
4 - business-1
5 - business-2
6 - business-2
7 - business-2
8 - business-3
So when I do a query like
select distinct id_offer, business_name, COUNT(*)
FROM Offer
GROUP BY business_name
Order by COUNT(*);
I get that for business-1 I have 4 offers.
Now if I want to take into account the offers for some Profile, I have to make a join with my ternary relationship. But even if I do something as simple as the following
select distinct business_name
from Offer
INNER JOIN ternary ON Offer.id_Offer = ternary.id_Offer
GROUP BY business_name
WHERE business_name = 'business-1'
No matter what I put on the group by, or if I write distinct or not, I do not get what I want. The reality is that for business-1, I have 4 offers. Right now in the ternary only appear two. So it should return 2 unique offers for this name with no filtering by profile.
But instead I get 8 offers, because that is how many times it appears in the ternary, the id_Offer's that match.
How should this be done? If I need no filters I can simply look at Offers table alone. But what if I need to filter by id_skill or id_Profile AND want to return the business_name?
I have seen solutions such as this but I can not make them work, I do not understand what the ? is, how is it called to learn more about it, if MariaDB works the same in this sense, I could not find information about it because I do not know how that operation is called. When I try to build that query for my data I get:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '? ORDER BY COUNT(*) DESC' at line 1
But as I said, it is kind of hard to look for '?' as an... Operator? Function?

There are two basic solutions.
SELECT
o.business_name,
COUNT(DISTINCT o.id_offer) AS unique_offers
FROM
Offer AS o
INNER JOIN
ternary AS t
ON t.id_Offer = o.id_Offer
WHERE
o.business_name = 'business-1'
AND t.id_profile IN (1, 2, 3, 5)
GROUP BY
o.business_name
That's the simplest to write and think about. But, it can also be quite intensive because you're still joining each row in offer to 4 rows in ternary - Creating 8 rows to aggregate and process through DISTINCT.
The "better" (in my opinion) route is to filter then aggregate the ternary table in a sub-query.
SELECT
o.business_name,
COUNT(*) AS unique_offers
FROM
Offer AS o
INNER JOIN
(
SELECT id_Offer
FROM ternary
WHERE id_profile IN (1, 2, 3, 5)
GROUP BY id_Offer
)
AS t
ON t.id_Offer = o.id_Offer
WHERE
o.business_name = 'business-1'
GROUP BY
o.business_name
This ensures the t only ever has one row for any given offer. This in turn means that each row in offer only ever joins to one row in t; no duplication. That in turn means there is no need to use COUNT(DISTINCT) and relieves some overhead (By moving it to the inner query's GROUP BY).

Are you saying that you want to see offers for a particular business, but you want to limit these according to certain profiles or skills?
We limit query results in the WHERE clause. If we want to look up data in another table, we use IN or EXISTS. For instance:
select *
from offer
where business_name = 'business-1'
and id_offer in
(
select id_offer
from ternary
where id_profile = 1
and id_skill = 2
);

Storing csv in MySQL field – bad idea?

I have two tables, one user table and an items table. In the user table, there is the field "items". The "items" table only consists of a unique id and an item_name.
Now each user can have multiple items. I wanted to avoid creating a third table that would connect the items with the user but rather have a field in the user_table that stores the item ids connected to the user in a "csv" field.
So any given user would have a field "items" that could have a value like "32,3,98,56".
It maybe is worth mentioning that the maximum number of items per user is rather limited (<5).
The question: Is this approach generally a bad idea compared to having a third table that contains user->item pairs?
Wouldn't a third table create quite an overhead when you want to find all items of a user (I would have to iterate through all elements returned by MySQL individually).

You don't want to store the value in the comma separated form.
Consider the case when you decide to join this column with some other table.
Consider you have,
x items
1 1, 2, 3
1 1, 4
2 1
and you want to find distinct values for each x i.e.:
x items
1 1, 2, 3, 4
2 1
or may be want to check if it has 3 in it
or may be want to convert them into separate rows:
x items
1 1
1 2
1 3
1 1
1 4
2 1
It will be a HUGE PAIN.
Use atleast normalization 1st principle - have separate row for each value.
Now, say originally you had this as you table:
x item
1 1
1 2
1 3
1 1
1 4
2 1
You can easily convert it into csv values:
select x, group_concat(item order by item) items
from t
group by x
If you want to search if x = 1 has item 3. Easy.
select * from t where x = 1 and item = 3
which in earlier case would use horrible find_in_set:
select * from t where x = 1 and find_in_set(3, items);
If you think you can use like with CSV values to search, then first like %x% can't use indexes. Second, it will produce wrong results.
Say you want check if item ab is present and you do %ab% it will return rows with abc abcd abcde .... .
If you have many users and items, then I'd suggest create separate table users with an PK userid, another items with PK itemid and lastly a mapping table user_item having userid, itemid columns.
If you know you'll just need to store and retrieve these values and not do any operation on it such as join, search, distinct, conversion to separate rows etc. etc. - may be just may be, you can (I still wouldn't).

Storing complex data directly in a relational database is a nonstandard use of a relational database. Normally they are designed for normalized data.
There are extensions which vary according to the brand of software which may help. Or you can normalize your CSV file into properly designed table(s). It depends on lots of things. Talk to your enterprise data architect in this case.
Whether it's a bad idea depends on your business needs. I can't assess your business needs from way out here on the internet. Talk to your product manager in this case.

database schema one column entry references many rows from another table

Let's say we have a table called Workorders and another table called Parts. I would like to have a column in Workorders called parts_required. This column would contain a single item that tells me what parts were required for that workorder. Ideally, this would contain the quantities as well, but a second column could contain the quantity information if needed.
Workorders looks like
WorkorderID date parts_required
1 2/24 ?
2 2/25 ?
3 3/16 ?
4 4/20 ?
5 5/13 ?
6 5/14 ?
7 7/8 ?
Parts looks like
PartID name cost
1 engine 100
2 belt 5
3 big bolt 1
4 little bolt 0.5
5 quart oil 8
6 Band-aid 0.1
Idea 1: create a string like '1-1:2-3:4-5:5-4'. My application would parse this string and show that I need --> 1 engine, 3 belts, 5 little bolts, and 4 quarts of oil.
Pros - simple enough to create and understand.
Cons - will make deep introspection into our data much more difficult. (costs over time, etc)
Idea 2: use a binary number. For example, to reference the above list (engine, belt, little bolts, oil) using an 8-bit integer would be 54, because 54 in binary representation is 110110.
Pros - datatype is optimal concerning size. Also, I am guessing there are tricky math tricks I could use in my queries to search for parts used (don't know what those are, correct me if I'm in the clouds here).
Cons - I do not know how to handle quantity using this method. Also, Even with a 64-bit BIGINT still only gives me 64 parts that can be in my table. I expect many hundreds.
Any ideas? I am using MySQL. I may be able to use PostgreSQL, and I understand that they have more flexible datatypes like JSON and arrays, but I am not familiar with how querying those would perform. Also it would be much easier to stay with MySQL

Why not create a Relationship table?
You can create a table named Workorders_Parts with the following content:
|workorderId, partId|
So when you want to get all parts from a specific workorder you just type:
select p.name
from parts p inner join workorders_parts wp on wp.partId = p.partId
where wp.workorderId = x;
what the query says is:
Give me the name of parts that belongs to workorderId=x and are listed in table workorders_parts
Remembering that INNER JOIN means "INTERSECTION" in other words: data i'm looking for should exist (generally the id) in both tables
IT will give you all part names that are used to build workorder x.
Lets say we have workorderId = 1 with partID = 1,2,3, it will be represented in our relationship table as:
workorderId | partId
1 | 1
1 | 2
1 | 3

How to request lists that contain certain items in MySQL

In the application I am developing, the user has to set parameters to define the end product he will get.
My tables look like this :
Categories
-------------
Id Name
1 Material
2 Color
3 Shape
Parameters
-------------
Id CategoryId Name
1 1 Wood
2 1 Plastic
3 1 Metal
4 2 Red
5 2 Green
6 2 Blue
7 3 Round
8 3 Square
9 3 Triangle
Combinations
-------------
Id
1
2
...
ParametersCombinations
----------------------
CombinationId ParameterId
1 1
1 4
1 7
2 1
2 5
2 7
Now only some combinations of parameters are available to the user. In my example, he could get a red round wooden thingy or a green round wooden thingy but not a blue one because I can't produce it.
Let's say the user selected wood and round parameters. How do I make a request to know that there's only red and green available so I can disable the blue option for him ?
Or is there some better way to model my database ?

Let us assume you provide the selected parameters id in the following format
// I call this a **parameterList** for convenience sake.
(1,7) // this is parameter id 1 and id 7.
I am also assuming you are using some scripting language to help you with your app. Like ruby or php.
I am also assuming you want to avoid putting as much logic into your stored procedure or MySQL queries as much as possible.
Another assumption is that you are using one of the Rapid Application MVC Frameworks like Rails, Symfony or CakePHP.
Your logic would be:
Find all the combinations that contain ALL the parameters in your parameterList and put these found combinations in a list called relevantCombinations
Find all the parameters_combinations that contain at least 1 of the combinations in the list relevantCombinations. Retrieve only the unique parameter values.
First two steps can be solved using simple Model::find methods and a forloop in the frameworks I described above.
If you are not using frameworks, it is also cool to use the scripting language raw.
If you require them in MySQL queries, here are some possible queries. Be aware that these are not necessary the best queries.
First one is
SELECT * FROM (
SELECT `PossibleList`.`CombinationId`, COUNT(`PossibleList`.`CombinationId`) as number
FROM (
SELECT `CombinationId` FROM `ParametersCombinations`
WHERE `ParameterId` IN (1, 7)
) `PossibleList` GROUP BY `PossibleList`.`CombinationId`
) `PossibleGroupedList` WHERE `number` = 2;
-- note that the (1, 7) and the number 2 needs to be supplied by your app.
-- 2 refers to the number of parameters supplied.
-- In this case you supplied 1 and 7 therefore 2.
To confirm, look at http://sqlfiddle.com/#!2/16831/3.
Note how I purposely have a Combination 3 which only has the Parameter 1 but not 7. Therefore the query did not give you back 3, but only 1 and 2. Feel free to tweak the asterisk * in the first line.
Second one is
SELECT DISTINCT(`ParameterID`)
FROM `ParametersCombinations`
WHERE `CombinationId` IN (1, 2);
-- note that (1, 2) is the result we expect from the first step.
-- the one we call relevantCombinations
To confirm, look at http://sqlfiddle.com/#!2/16831/5
I do not recommend being a masochist and attempt to get your answer in a single query.
I also do NOT recommend using the MySQL queries I have supplied. It is less masochistic. But sufficiently masochistic for me NOT to recommend this way.
Since you did not indicate any tag other than mysql, I suspect that you are stronger with mysql. Hence my answer contains mysql.
My strongest suggestion would be my first. Make full use of established frameworks and put your logic in the business logic layer. Not in the data layer. Even if you don't use frameworks and just use raw php and ruby, that is still a better place for you to place your logic in than MySQL.
I saw that T gave an answer in a single MySQL query but I can tell you that (s)he considers only 1 parameter.
See this part:
WHERE ParameterId = 7 -- 7 is the selected parameter
You can adapt his/her answer with some trickery using a forloop and appending OR clauses.
Again, I do NOT recommend that in the big picture of building an app.
I have also tested his/her answer with http://sqlfiddle.com/#!2/2eda4/2. There may be 1 or 2 small bugs.
In summary, my recommendations in descending order of strength:
Use a framework like Rails or CakePHP and the pseudocode step 1 and 2 and as many find as you need. (STRONGEST)
Use raw scripting language and the pseudocode step 1 and 2 and as many simple queries as you need.
Use the raw MySQL queries I created. (LEAST STRONG)
P.S. I left out the part in my queries as to how to get the name of the Parameters. But given that you can get the ParameterIDs from my answer, I think that is trivial. I have also left out how you may need to remove the already selected parameters (1, 7). Again, that should be trivial to you.

Try the following
SELECT p.*, pc.CombinationId
FROM Parameters p
-- get the parameter combinations for all the parameters
JOIN ParametersCombinations pc
ON pc.ParameterId = p.Id
-- filter the parameter combinations to only combinations that include the selected parameter
JOIN (
SELECT CombinationId
FROM ParametersCombinations
WHERE ParameterId = 7 -- 7 is the selected parameter
) f ON f.CombinationId = pc.CombinationId
Or removing the already selected parameters
SELECT p.*, pc.CombinationId
FROM Parameters p
JOIN ParametersCombinations pc
ON pc.ParameterId = p.Id
JOIN (
SELECT CombinationId
FROM ParametersCombinations
WHERE ParameterId IN (7, 1)
) f ON f.CombinationId = pc.CombinationId
WHERE ParameterId NOT IN (7, 1)

Storing Hierarchical Data (MySQL) for Referral Marketing

I need to have a 5 levels hierarchy for the users registered to a website. Every user is invited by another, and I need to know all descendants for a user. And also ancestors for a user.
I have in mind 2 solution.
Keeping a table with relationships this way. A closure table:
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
2 3 1
Having this table for relationships. Keeping in a table 5 levels ancestors. A "ancestors" table:
user_id ancestor_level1_id ancestor_level2_id ancestor_level3_id ancestor_level4_id ancestor_level5_id
10 9 7 4 3 2
9 7 4 3 2 1
Are these good ideas?
I know about "the adjacency list model" and "the modified preorder tree traversal algorithm", but are these good solutions for a "referral" system?
The queries that I need to perform on this tree are:
frequently adding a new users
when a user buys something, their referrers get a percentage commission
every user should be able to find out how many people they've referred (and how many people were referred by people who they referred....) at each level

Closure Table
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
2 3 1
To add user 10, referred by user 3. (I don't think you need to lock the table between these two insertions):
insert into ancestor_table
select ancestor_id, 10, distance+1
from ancestor_table
where descendant_id=3;
insert into ancestor_table values (10,10,0);
To find all users referred by user 3.
select descendant_id from ancestor_table where ancestor_id=3;
To count those users by depth:
select distance, count(*) from ancestor_table where ancestor_id=3 group by distance;
To find the ancestors of user 10.
select ancestor_id, distance from ancestor_table where descendant_id=10;
The drawback to this method is amount of storage space this table will take.

Use the OQGRAPH storage engine.
You probably want to keep track of an arbitrary number of levels, rather than just 5 levels. Get one of the MySQL forks that supports the QGRAPH engine (such as MariaDB or OurDelta), and use that to store your tree. It implements the adjacency list model, but by using a special column called latch to send a command to the storage engine, telling it what kind of query to perform, you get all of the advantages of a closure table without needing to do the bookkeeping work each time someone registers for your site.
Here are the queries you'd use in OQGRAPH. See the documentation at
http://openquery.com/graph-computation-engine-documentation
We're going to use origid as the referrer, and destid as the referree.
To add user 11, referred by user 10
insert into ancestors_table (origid,destid) values (10,11)
To find all users referred by user 3.
SELECT linkid FROM ancestors_table WHERE latch = 2 AND origid = 3;
To find the ancestors of user 10.
SELECT linkid FROM ancestors_table WHERE latch = 2 AND destid = 10;
To find the number of users at each level, referred by user 3:
SELECT count(linkid), weight
FROM ancestors_table
WHERE latch = 2 AND origid = 3
GROUP BY weight;

Managing Hierarchical Data in MySQL
In general, I like the "nested set", esp. in MySQL which doesn't really have language support for hierarchical data.
It's fast, but you'll need to make sure your developers read that article if ease of maintenance is a big deal. It's very flexible - which doesn't seem to matter much in your case.
It seems a good fit for your problem - in the referral model, you need to find the tree of referrers, which is fast in the nested set model; you also need to know who are the ~children# of a given user, and the depth of their relationship; this is also fast.

Delimited String of Ancestors
If you're strongly considering the 5-level relationship table, it may simplify things to use a delimited string of ancestors instead of 5 separate columns.
user_id depth ancestors
10 7 9,7,4,3,2,1
9 6 7,4,3,2,1
...
2 2 1
1 1 (empty string)
Here are some SQL commands you'd use with this model:
To add user 11, referred by user 10
insert into ancestors_table (user_id, depth, ancestors)
select 11, depth+1, concat(10,',',ancestors)
from ancestors_table
where user_id=10;
To find all users referred by user 3. (Note that this query can't use an index.)
select user_id
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3';
To find the ancestors of user 10. You need to break up the string in your client program. In Ruby, the code would be ancestorscolumn.split(",").map{|x| x.to_i}. There's no good way to break up the string in SQL.
select ancestors from ancestors_table where user_id=10;
To find the number of users at each level, referred by user 3:
select
depth-(select depth from ancestors_table where user_id=3),
count(*)
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3'
group by depth;
You can avoid SQL injection attacks in the like '%,3,%' parts of these queries by using like concat('%,', ?, ',%') instead and binding the an integer for the user number to the placeholder.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008