database primary key based on variable number of customer subdivisions - mysql

Doing this in Groovy/Grails and GORM with a MySQL DB.
When storing data for our different customers we need to be able to identify their subdivisions. Some will have 0 levels of subdivisions. Some, 1, 2, 3, levels etc. We'd like to allow any number of levels of subdivisions, but could settle for a fixed number, such as 5, 7 or 10.
For instance:
Company ABC has 2 subdivision levels:
Company) ABC (the root level)
Subdivision Level 1) ABC->Div1, ABC->Div2, ABC->Div3
Subdivision Level 2) ABC->Div1->SubDiv1, ABC->Div1->SubDiv2, ABC->Div1->SubDiv3
Company DEF has 1 levels:
Company) DEF (the root level)
Level 1) DEF->Div1, DEF->Div2, DEF->Div3
We need to define both 1 to 1 and 1 to many unique attributes for each level (for example associate an address with ABC->Div1->SubDiv3)
And Company ABC may want us to display a certain image for all instances for Div2 and all of Div2's subdivisions.
The question is, how is it best to create a variable number of division levels for an identifier or primary key to then use as a foreign key on related data?
Have a fixed number of columns (like 7 and a complex key):
ID-Level-1, ID-Level-2, ID-Level-3, ID-Level-4, ID-Level-5, ID-Level-6, ID-Level-7
Or, create some sort of a tree of hierarchical levels and use the various key values as identifiers/foreign keys?

You could model this like the following
class Company {
static hasMany = [subdivisions: Subdivision]
}
class Subdivision {
static belongsTo = [company: Company, subdivision: Subdivision]
static hasMany = [subdivisions: Subdivision]
}

Store it just like you described: Company->Div->Subdiv:
id subdivision_name subdivision_level subdivision_path
-----------------------------------------------------------------
1 ABC 0 ABC
2 Div1 1 ABC->Div1
3 Div2 1 ABC->Div2
4 Div3 1 ABC->Div3
5 SubDiv1 2 ABC->Div1->SubDiv1
6 SubDiv2 2 ABC->Div1->SubDiv2
7 SubDiv3 2 ABC->Div1->SubDiv3
8 DEF 0 DEF
9 Div1 1 DEF->Div1
10 Div2 1 DEF->Div2
11 Div3 1 DEF->Div3
Number of levels is only limited by length of subdivision_path - standard datatypes should be more than enough.
So let's associate an address with ABC->Div1->SubDiv3:
CREATE TABLE addresses (id ... , subdivision_id INTEGER, ...);
ALTER TABLE addresses ADD CONSTRAINT fk_addresses_subdivisions FOREIGN KEY (subdivision_id) REFERENCES subdivisions (id);
... query all instances for Div2 and all of Div2's subdivisions.
SELECT * FROM subdivisions WHERE subdivision_path LIKE 'ABC->Div2%';
... for all subdivisions of Div2:
SELECT * FROM subdivisions WHERE subdivision_path LIKE 'ABC->Div2->%';
... for all root companies:
SELECT * FROM subdivisions WHERE level = 0;
... and more.
Alternatively subdivision_path can store ids, not names (maybe it's even better).

Related

Multi Ordering 4 SQL columns with a single query

Environment: MySQL 5.6
SqlTable name = CategoryTable
Sql Columns
CATEGORY_ID (INT)
CATEGORY_NAME (VARCHAR)
LEVEL (INT)
MOTHER_CATEGORY (INT)
I've tried with
SELECT
CATEGORY_ID, CATEGORY_NAME , LEVEL , MOTHER_CATEGORY
FROM
CategoryTable
But I don't know how to use the ORDER BY in order to get that result.
So the first line here are the columns, and from the second lines, there start the table content:
CATEGORY_ID CATEGORY_NAME LEVEL MOTHER_CATEGORY
1 MainCategory 0 0
2 -SubCategory1 1 1
3 --SubCategory2 2 2
4 ---SubCategory3 3 3
5 2Nd_Main_Category 0 0
6 -SubCategory1 1 5
7 --SubCategory2 2 6
8 ---SubCategory3 3 7
is there a way to achieve something like this with a mysql query?
You aren't very clear in what you are trying to achieve. I'll take a guess that you want to order using a multi-level parent child structure. there are some very complicated ways of handling such a feat within mysql 5.6, a DB that's not really ideal for such a structure, but I have come up with something simple myself that I use in my own apps. you create a special ordering field that creates a path of zero filled ids for each record.
ordering_path_field
/
/0000000001/
/
/0000000001/0000000002
/0000000003
/0000000003/0000000005
/0000000003/0000000005/0000000006
etc
so each record contains a path of each parent up to the root, using zero filled ids. then you can just sort by this field to get them in proper order. the drawbacks being that you'll have to set a max number of levels allowed, so that the ordering fields doesn't overflow, and also, moving a record to a new parent if ever needed would be a big pain.

Storing csv in MySQL field – bad idea?

I have two tables, one user table and an items table. In the user table, there is the field "items". The "items" table only consists of a unique id and an item_name.
Now each user can have multiple items. I wanted to avoid creating a third table that would connect the items with the user but rather have a field in the user_table that stores the item ids connected to the user in a "csv" field.
So any given user would have a field "items" that could have a value like "32,3,98,56".
It maybe is worth mentioning that the maximum number of items per user is rather limited (<5).
The question: Is this approach generally a bad idea compared to having a third table that contains user->item pairs?
Wouldn't a third table create quite an overhead when you want to find all items of a user (I would have to iterate through all elements returned by MySQL individually).
You don't want to store the value in the comma separated form.
Consider the case when you decide to join this column with some other table.
Consider you have,
x items
1 1, 2, 3
1 1, 4
2 1
and you want to find distinct values for each x i.e.:
x items
1 1, 2, 3, 4
2 1
or may be want to check if it has 3 in it
or may be want to convert them into separate rows:
x items
1 1
1 2
1 3
1 1
1 4
2 1
It will be a HUGE PAIN.
Use atleast normalization 1st principle - have separate row for each value.
Now, say originally you had this as you table:
x item
1 1
1 2
1 3
1 1
1 4
2 1
You can easily convert it into csv values:
select x, group_concat(item order by item) items
from t
group by x
If you want to search if x = 1 has item 3. Easy.
select * from t where x = 1 and item = 3
which in earlier case would use horrible find_in_set:
select * from t where x = 1 and find_in_set(3, items);
If you think you can use like with CSV values to search, then first like %x% can't use indexes. Second, it will produce wrong results.
Say you want check if item ab is present and you do %ab% it will return rows with abc abcd abcde .... .
If you have many users and items, then I'd suggest create separate table users with an PK userid, another items with PK itemid and lastly a mapping table user_item having userid, itemid columns.
If you know you'll just need to store and retrieve these values and not do any operation on it such as join, search, distinct, conversion to separate rows etc. etc. - may be just may be, you can (I still wouldn't).
Storing complex data directly in a relational database is a nonstandard use of a relational database. Normally they are designed for normalized data.
There are extensions which vary according to the brand of software which may help. Or you can normalize your CSV file into properly designed table(s). It depends on lots of things. Talk to your enterprise data architect in this case.
Whether it's a bad idea depends on your business needs. I can't assess your business needs from way out here on the internet. Talk to your product manager in this case.

MySql sorting hierarchical data in a closure table that has repeated nodes

I have a hierarchy that I have represented as a closure table, as described by Bill Karwin. I am trying to write a query that will return the nodes sorted as a depth-first traversal. This reply would solve my problem, except that in my structure some nodes appear more than once because they have multiple parents.
My sample data looks like this:
125354625
As you can see, node 2 appears twice, both as a child and a grandchild of the root. Node 5 appears twice as a grandchild of the root (each time with a different parent), and then again as a great-grandchild because its parent, node 2, is repeated.
This will set up the data as a closure table:
CREATE TABLE ancestor_descendant (
ancestor int NOT NULL,
descendant int NOT NULL,
path_length int NOT NULL
);
INSERT INTO ancestor_descendant (ancestor, descendant, path_length) VALUES
(1,1,0),(2,2,0),(3,3,0),(4,4,0),(5,5,0),(6,6,0),(1,2,1),(1,3,1),(1,4,1),
(2,5,1),(3,5,1),(4,6,1),(4,2,1),(1,5,2),(1,6,2),(1,2,2),(1,5,3),(4,5,2);
or as an adjacency list:
CREATE TABLE parent_child (
parent int NOT NULL,
child int NOT NULL
);
INSERT INTO parent_child (parent, child) VALUES
(1,2),(1,3),(1,4),(2,5),(3,5),(4,2),(4,6);
I can produce a breadth-first traversal (although 5 only appears as a grandchild once):
SELECT CONCAT(LPAD('', path_length, '-'), ' ', descendant)
FROM ancestor_descendant
WHERE ancestor = 1
ORDER BY path_length;
1
- 2
- 3
- 4
-- 5
-- 6
-- 2
--- 5
but my attempt at a depth-first traversal using breadcrumbs fails (it shows the repeated nodes only once because of the GROUP BY a.descendant):
SELECT a.descendant, GROUP_CONCAT(b.ancestor ORDER BY b.path_length DESC) AS breadcrumbs
FROM ancestor_descendant a
INNER JOIN ancestor_descendant b ON (b.descendant = a.descendant)
WHERE a.ancestor = 1
GROUP BY a.descendant
ORDER BY breadcrumbs;
1 1
2 1,1,4,1,4,1,2,2
5 1,1,4,1,4,1,3,2,3,2,5,5
3 1,3
4 1,4
6 1,4,6
Is it possible to output a depth-first traversal using a closure table representation?
Should I use an alternative representation? I can't use recursive CTEs, because I'm restricted to MySql (which doesn't implement them).
I would suggest splitting the node id into two concepts. One would be a unique id that is used for the graph properties (i.e. ancestor_descendant list). The second is what you show on output.
125350462051
Then create a mapping table:
Id Value
1 1
2 2
20 2
3 3
4 4
5 5
50 5
51 5
6 6
You can then get what you want by joining back to the mapping table and using the value column instead of the id column.

Storing IDs as comma separated values

How can I select from a database all of the rows with an ID stored in a varchar comma separated. for example, I have a table with this:
, 7, 9, 11
How can I SELECT the rows with those IDs?
Normalize your database. You should be using a lookup table most likely.
You have 2 options:
Use a function to split the string into a temp table and then join the table your selecting from to that temp table.
Use dynamic SQL to query the table where id in (#variable) --- bad choice if you choose this way.
select * from table_name where id in (7, 9, 11)
If you do typically have that comma at the start, you will need to remove it first.
Use match(column) against('7,9,11')
this willl show all varchar column of your id's where 7,9,11 is there.
But you have to be shure that ur column have fulltext index.
Just yesterday I was fixing a bug in an old application here and saw where they handled it like this:
AND (T.ServiceIDs = '#SegmentID#' OR T.ServiceIDs LIKE '#SegmentID#,%'
OR T.ServiceIDs LIKE '%,#SegmentID#,%' OR T.ServiceIDs LIKE '%,#SegmentID#')
I am assuming you are saying something like the value of ServiceIDs from the database might contain 7,9,11 and that the variable SegmentID is one or more values. It was inside a CFIF statement checking to see that SegmentID in fact had a value(which was always the case due to prior logic that would default it.
I personally though would do as others have suggested and I'd create what I always refer to as a bridging table that allows you to have 0 to many PKs from one table related to the PK of another.
I had to tackle this problem years ago where I could not change the table structure and I created a custom table type and a set of functions so I could treat the values via SQL as if they were coming from a table. That custom table type solution though was specific to Oracle and I'd not know how to do that in MySQL without some research on my part.
There is a reason querying lists is so difficult: databases are not designed to work with delimited lists. They are optimized to work best with rows (or sets) of data. Creating the proper table structure will result in much better query performance and simpler sql queries. (So while it is technically possible, you should seriously consider normalizing your database as Todd and others suggested.)
Many-to-many relationships are best represented by three (3) tables. Say you are selling "widgets" in a variety of "sizes". Create two tables representing the main entities:
Widget (unique widgets)
WidgetID | WidgetTitle
1 | Widget 1
2 | Widget 2
....
Size (unique sizes)
SizeID | SizeTitle
7 | X-Small
8 | Small
9 | Medium
10 | Large
11 | X-Large
Then create a junction table, to store the relationships between those two entities, ie Which widgets are available in which sizes
WidgetSize (available sizes for each widget)
WidgetID | SizeID
1 | 7 <== Widget 1 "X-Small"
1 | 8 <== Widget 1 + "Small"
2 | 7 <== Widget 2 + "X-Small"
2 | 9 ....
2 | 10
2 | 11
....
With that structure, you can easily return all widgets having any (or all) of a list of sizes. Not tested, but something similar to the sql below should work.
Find widgets available in any of the sizes: <cfset listOfSizes = "7,9,11">
SELECT w.WidgetID, w.WidgetTitle
FROM Widget w
WHERE EXISTS
( SELECT 1
FROM WidgetSize ws
WHERE ws.WidgetID = w.WidgetID
AND ws.SizeID IN (
<cfqueryparam value="#listOfSizeIds#"
cfsqltype="cf_sql_integer" list="true" >
)
)
Find widgets available in all three sizes: <cfset listOfSizes = "7,9,11">
SELECT w.WidgetID, w.WidgetTitle, COUNT(*) AS MatchCount
FROM Widget w INNER JOIN WidgetSize ws ON ws.WidgetID = w.WidgetID
WHERE ws.SizeID IN (
<cfqueryparam value="#listOfSizeIds#"
cfsqltype="cf_sql_integer" list="true" >
)
GROUP BY w.WidgetID, w.WidgetTitle
HAVING MatchCount = 3

Storing Hierarchical Data (MySQL) for Referral Marketing

I need to have a 5 levels hierarchy for the users registered to a website. Every user is invited by another, and I need to know all descendants for a user. And also ancestors for a user.
I have in mind 2 solution.
Keeping a table with relationships this way. A closure table:
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
2 3 1
Having this table for relationships. Keeping in a table 5 levels ancestors. A "ancestors" table:
user_id ancestor_level1_id ancestor_level2_id ancestor_level3_id ancestor_level4_id ancestor_level5_id
10 9 7 4 3 2
9 7 4 3 2 1
Are these good ideas?
I know about "the adjacency list model" and "the modified preorder tree traversal algorithm", but are these good solutions for a "referral" system?
The queries that I need to perform on this tree are:
frequently adding a new users
when a user buys something, their referrers get a percentage commission
every user should be able to find out how many people they've referred (and how many people were referred by people who they referred....) at each level
Closure Table
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
2 3 1
To add user 10, referred by user 3. (I don't think you need to lock the table between these two insertions):
insert into ancestor_table
select ancestor_id, 10, distance+1
from ancestor_table
where descendant_id=3;
insert into ancestor_table values (10,10,0);
To find all users referred by user 3.
select descendant_id from ancestor_table where ancestor_id=3;
To count those users by depth:
select distance, count(*) from ancestor_table where ancestor_id=3 group by distance;
To find the ancestors of user 10.
select ancestor_id, distance from ancestor_table where descendant_id=10;
The drawback to this method is amount of storage space this table will take.
Use the OQGRAPH storage engine.
You probably want to keep track of an arbitrary number of levels, rather than just 5 levels. Get one of the MySQL forks that supports the QGRAPH engine (such as MariaDB or OurDelta), and use that to store your tree. It implements the adjacency list model, but by using a special column called latch to send a command to the storage engine, telling it what kind of query to perform, you get all of the advantages of a closure table without needing to do the bookkeeping work each time someone registers for your site.
Here are the queries you'd use in OQGRAPH. See the documentation at
http://openquery.com/graph-computation-engine-documentation
We're going to use origid as the referrer, and destid as the referree.
To add user 11, referred by user 10
insert into ancestors_table (origid,destid) values (10,11)
To find all users referred by user 3.
SELECT linkid FROM ancestors_table WHERE latch = 2 AND origid = 3;
To find the ancestors of user 10.
SELECT linkid FROM ancestors_table WHERE latch = 2 AND destid = 10;
To find the number of users at each level, referred by user 3:
SELECT count(linkid), weight
FROM ancestors_table
WHERE latch = 2 AND origid = 3
GROUP BY weight;
Managing Hierarchical Data in MySQL
In general, I like the "nested set", esp. in MySQL which doesn't really have language support for hierarchical data.
It's fast, but you'll need to make sure your developers read that article if ease of maintenance is a big deal. It's very flexible - which doesn't seem to matter much in your case.
It seems a good fit for your problem - in the referral model, you need to find the tree of referrers, which is fast in the nested set model; you also need to know who are the ~children# of a given user, and the depth of their relationship; this is also fast.
Delimited String of Ancestors
If you're strongly considering the 5-level relationship table, it may simplify things to use a delimited string of ancestors instead of 5 separate columns.
user_id depth ancestors
10 7 9,7,4,3,2,1
9 6 7,4,3,2,1
...
2 2 1
1 1 (empty string)
Here are some SQL commands you'd use with this model:
To add user 11, referred by user 10
insert into ancestors_table (user_id, depth, ancestors)
select 11, depth+1, concat(10,',',ancestors)
from ancestors_table
where user_id=10;
To find all users referred by user 3. (Note that this query can't use an index.)
select user_id
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3';
To find the ancestors of user 10. You need to break up the string in your client program. In Ruby, the code would be ancestorscolumn.split(",").map{|x| x.to_i}. There's no good way to break up the string in SQL.
select ancestors from ancestors_table where user_id=10;
To find the number of users at each level, referred by user 3:
select
depth-(select depth from ancestors_table where user_id=3),
count(*)
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3'
group by depth;
You can avoid SQL injection attacks in the like '%,3,%' parts of these queries by using like concat('%,', ?, ',%') instead and binding the an integer for the user number to the placeholder.