saving tree data in database (family tree) - mysql

I am trying to store a family tree.
Here is the platform that I am using, Zend framework, Mysql, Ajax
I have searched stackoverflow I came across this post which is very helpful in handling data in terms of objects.
"Family Tree" Data Structure
I'll Illustrate my use case in brief.
User can create family members or friends based on few relations defined in database. I have Model for relations too. User can create family members like Divorced spouse, frineds. Max the Tree can be deep that we are assuming max to kids of the grandchildren but it can expand in width too. Brother/sister & their family.
I am looking an efficient database design for lesser query time. If I have to use the data structures described in above post where I must keep them as they necessary have to be a Model.
For representation I am planning to use Visualization: Organizational Chart from
http://code.google.com/apis/chart/interactive/docs/gallery/orgchart.html#Example
I'll summarize what I need
Database design
Placing of controllers (ajax) & models
The people that the user will create they will not be any other users. just some another data
yeah thats it! I'll post a complete solution on this thread when I'll be completing the project, of course with help of expertise of u guys
Thanks in advance
EDIT I I'll Contribute more to elaborate my situation
I have a user table, a relation table, & last family/family tree table
the Family table must have similar structure to following
ID userid relation id Name
1 34 3 // for son ABC
2 34 4 // for Wife XYZ
3 34 3 // for Mom PQR
4 34 3 // for DAd THE
5 34 3 // for Daughter GHI
6 34 3 // for Brother KLM
The drawback for this approach is generating relations to the other nodes like daughter-in-law, wifes brother & their family.
The ideal way of doing is for a user we can add Parents, siblings, children & for extra relations they must be derived from the family members relation i.e. Brother-in-law must be derived as sister's husband, or wife's brother.
THis is what I can think now. I just need Implementation guidelines.
Hope this helps u guys to provide a better solution.

I guess that from the database point of view it would be best to implement it like
id | name | parent_male | parent_female
Other option would be string prefixing
id | name | prefix
1 | Joe | 0001
2 | Jack | 000100001 //ie. Joes son
3 | Marry| 0001 //ie. Jacks mother
4 | Eve | 0002 // new family tree
5 | Adam | 00020001 // ie. Eves son
6 | Mark | 000200010001 // ie. Adams son
Other (more effective) algorithms like MPTT assume that the data is a tree, which in this case is not (it has circles).
To show it would work - to select Mark's grandparents:
--Mark
SELECT prefix FROM family_tree WHERE id = 6;
-- create substring - trim N 4-character groups from the end where N is N-th parent generation => 2 for grandparent ==> 0002
--grandparents
SELECT * FROM family_tree WHERE prefix = '0002'
-- same for other side of family
-- cousins from one side of family
SELECT * FROM family_tree WHERE prefix LIKE '0002%' AND LENGTH(prefix) = 12

Related

How to separate one column's data into multiple columns?

Here's my situation : I have a table that has large amounts of records, I need to pull out a number of these records for each name in the database, note that TOP will not work for my use case. My end user wants the report formatted in such a way that each user shows up only once, and up to 3 different dates are shown for the user.
Table format
AutoID
Enum
TNum
Date
Comments
1
25
18
2/2/22
2
25
18
1/2/21
Blah
3
18
18
1/2/21
4
18
18
1/2/20
5
25
17
1/2/22
6
25
17
1/2/20
Now the Enum and TNum fields are fk with other tables, I have created a join that pulls the correct information from the other tables. In the end my query provides this output
RecordID
Training
CompletedDate
FirstName
LastName
Location
2821
MaP
1/1/21
David
Simpson
123 Sesame St.
2822
1/2/22
Fuller
MaP
Dough
GHI
David
123 Sesame St.
2825
1/1/20
Simpson
The two "Blank fields" represent information that is pulled and may or may not be needed in some future report.
So to my question : How do I manage to get a report, with this query's pull to look like this:
Place
LastName
FirstName
Training
FirstCuttoff
Secondcutoff
ThirdCutoff
Comments
123 Sesame St.
David
Simpson
MaP
1/1/20
1/1/21
123 Sesame St.
John
Dough
MaP
1/1/22
I was originally planning on joining my query to itself using where clauses. But when I tried that it just added two extra columns of the same date. In addition it is possible that each record is not identical; locations may be different but since the report needs the most recent location and the name of the trainee. In addition, to add more complexity, there are a number of people in the company with effectively the same name as far as the database is concerned, so rejoining on the name is out. I did pull the Enum in my query, I can join on that if needed.
Is there an easier way to do this, or do I need to sort out a multiple self-joining query?
I have a project I am working on where I am going to have to do this. Some of the suggestions I received were to use a Pivot query. It wouldn't work in my case but it might for yours. Here is a good example
Pivot Columns

SQL Architecture design for handling employee competencies

ninjas, I understand that this probably is a "way too broad" or "wrong portal" type question but SO feels like home, so I will give it a try anyways.
I have a table with employees
Table: employee
id, name
1 - John
2 - Jane
3 - Obama
4 - Donald
...nothing fancy. And then there is competencies table (a classifier of special tasks/responsibilieties)
competencies table:
id, name
1 - Janitor
2 - Sysadmin
3 - Programmer
4 - Pilot
...
Each employee can have multiple competencies (relations table)
table: employee_competency
id, employee_id, competency_id
1 - 1 - 1 - John is a Janitor
2 - 1 - 2 - John is also a Sysadmin (imagine that)
3 - 2 - 3 - Jane is a Programmer
4 - 3 - 3 - Obama is a Programmer
5 - 3 - 4 - ...and a Pilot
6 - 4 - 1 - Donald is a Janitor
The existencial problem of a database architecture or how to handle such cases.
I want to be able to define unlimited count of competencies and these competencies can vary from one customer to another (where the project I am programming will be installed - each project installation can have a different set of competencies)
In the code, I want to be able to select employees with specific competency (for example - list all employees who are Pilots)....
By hard-coding the competency ID when listing employees I loose ability to define competencies freely. I could define custom fields in the employee table like is_janitor, is_sysadmin, is_programmer, is_pilot, etc... but then I loose the ability to define unlimited count of competencies...
Is there a way to solve this rather XY problem with a different DB architecture approach?
The key idea here is that you have to have that list that allows you to pick a competency be data-driven as well. So, when you are on the screen/form/page where you are selecting the competency to list, you drive that selection by the table of competencies in the database, passing the ID of the competency as the Value of the selection back to your query so that you can query the list of employees by competency.
You should never put individual IDs into the system. Now, this gets complicated when you have behavior you want to drive based on the competency. This requires thinking at a higher level of abstraction. For example, lets say you have a form where you want to show another tab to allow the customer to select what planes a pilot is certified on. To drive this, I usually create flags that actually define the driving behaviors (like CAN_SELECT_PLANES) to add to a related table. This table defines the capabilities of the system, not the capabilities of the competency. It is important to maintain that abstraction because customers will want to vary their competency name, and you will find new uses for that feature later on.
To select all programmers from the database, use for example:
SELECT
e.name AS empl_name,
c.name AS comp_name
FROM
employee_competency ec,
competencies c,
employee e
WHERE
c.id=ec.competency_id
AND
e.id=ec.employee_id
AND
c.id=3

Match between two tables where field is substring of the other

Hi (sorry for the poor title),
I have two tables in a MySQL DB, lets call them CarMake and CarModel. Both tables have two fields, ID:int(11) and Description:varchar(100). For example:
CarMake CarModel
ID Description | ID Description
-----------------------------------------------------
123456 Honda | 12345678 Accord
234567 Toyota | 12345665 Civic
369258 Lexus | 23456789 Prius
Where each car model shares the same first 6 digits of the ID of its Make. In this example, both Accord and Civic share the first 6 digits of the ID with Honda, therefore they are Honda models.
Now, what I want to do is select all rows from CarMake that do not have a record in CarModel where the first 6 digits of the ID match. In this example, my query should return the Lexus row from CarMake, as it does not have a matching row in CarModel.
Nothing I have tried so far has really come close to achieving what I want, so I am posting it here.
Any help would be greatly appreciated!
EDIT: Solved with help from zerkms
SELECT * FROM CarMake
LEFT JOIN CarModel
ON CarModel.ID LIKE CONCAT(CarMake.ID, '%')
WHERE CarModel.ID IS NULL;
Follow up questions:
This solution takes a very long time to run, is there any way to improve efficiency?
What would be the best way to delete the records returned by that query? Is there some way I can combine that into the query itself?

When is it better to flatten out data using comma separated values to improve search query performance?

My question about SEARCH query performance.
I've flattened out data into a read-only Person table (MySQL) that exists purely for search. The table has about 20 columns of data (mostly limited text values, dates and booleans and a few columns containing unlimited text).
Person
=============================================================
id First Last DOB etc (20+ columns)...
1 John Doe 05/02/1969
2 Sara Jones 04/02/1982
3 Dave Moore 10/11/1984
Another two tables support the relationship between Person and Activity.
Activity
===================================
id activity
1 hiking
2 skiing
3 snowboarding
4 bird watching
5 etc...
PersonActivity
===================================
id PersonId ActivityId
1 2 1
2 2 3
3 2 10
4 2 16
5 2 34
6 2 37
7 2 38
8 etc…
Search considerations:
Person table has potentially 200-300k+ rows
Each person potentially has 50+ activities
Search may include Activity filter (e.g., select persons with one and/or more activities)
Returned results are displayed with person details and activities as bulleted list
If the Person table is used only for search, I'm wondering if I should add the activities as comma separated values to the Person table instead of joining to the Activity and PersonActivity tables:
Person
===========================================================================
id First Last DOB Activity
2 Sara Jones 04/02/1982 hiking, snowboarding, golf, etc.
Given the search considerations above, would this help or hurt search performance?
Thanks for the input.
Horrible idea. You will lose the ability to use indexes in querying. Do not under any circumstances store data in a comma delimited list if you ever want to search on that column. Realtional database are designed to have good performance with tables joined together. Your database is relatively small and should have no performance issues at all if you index properly.
You may still want to display the results in a comma delimted fashion. I think MYSQL has a function called GROUP_CONCAT for that.

SQL "shortcut" identifiers or a long string of joins?

QUESTION: Is it okay to have "shortcut" identifiers in a table so that I don't have to do a long string of joins to get the information I need?
To understand what I'm talking about, I'm going to have to lay ouf an example here that looks pretty complicated but I've simplified the problem quite a bit here, and it should be easily understood (I hope).
The basic setup: A "company" can be an "affiliate", a "client" or both. Each "company" can have multiple "contacts", some of which can be "users" with log in privileges.
`Company` table
----------------------------------------------
ID Company_Name Address
-- ----------------------- -----------------
1 Acme, Inc. 101 Sierra Vista
2 Spacely Space Sprockets East Mars Colony
3 Cogswell Cogs West Mars Colony
4 Stark Industries Los Angeles, CA
We have four companies in our database.
`Affiliates` table
---------------------
ID Company_ID Price Sales
-- ---------- ----- -----
1 1 50 456
2 4 50 222
3 1 75 14
Each company can have multiple affiliate id's so that they can represent the products at different pricing levels to different markets.
Two of our companies are affiliates (Acme, Inc. and Stark Industries), and Acme has two affiliate ID's
`Clients` table
--------------------------------------
ID Company_ID Referring_affiliate_id
-- ---------- ----------------------
1 2 1
2 3 1
3 4 3
Each company can only be a client once.
Three of our companies are clients (Spacely Space Sprockets, Cogswell Cogs, and Stark Industries, who is also an affiliate)
In all three cases, they were referred to us by Acme, Inc., using one of their two affiliate ID's
`Contacts` table
-----------------------------------------
ID Name Email
-- -------------- ---------------------
1 Wylie Coyote wcoyote#acme.com
2 Cosmo Spacely boss#spacely.com
3 H. G. Cogswell ceo#cogs.com
4 Tony Stark tony#stark.com
5 Homer Simpson simpson#burnscorp.com
Each company has at least one contact, but in this table, there is no indication of which company each contact works for, and there's also an extra contact (#5). We'll get to that in a moment.
Each of these contacts may or may not have a login account on the system.
`Contacts_type` table
--------------------------------------
contact_id company_id contact_type
---------- ---------- --------------
1 1 Administrative
2 2 Administrative
3 3 Administrative
4 4 Administrative
5 1 Technical
4 2 Technical
Associates a contact with one or more companies.
Each contact is associated with a company, and in addition, contact 5 (Homer Simpson) is a technical contact for Acme, Inc, and contact 4 (Tony Stark) is a both an administrative contact for company 4 (Stark Industries) and a technical contact for company 3 (Cogswell Cogs)
`Users` table
-------------------------------------------------------------------------------------
ID contact_id company_id client_id affiliate_id user_id password access_level
-- ---------- ---------- --------- ------------ -------- -------- ------------
1 1 1 1 1 wylie A03BA951 2
2 2 2 2 NULL cosmo BF16DA77 3
3 3 3 3 NULL cogswell 39F56ACD 3
4 4 4 4 2 ironman DFA9301A 2
The users table is essentially a list of contacts that are allowed to login to the system.
Zero or one user per contact; one contact per user.
Contact 1 (Wylie Coyote) works for company 1 (Acme) and is a customer (1) and also an affiliate (1)
Contact 2 (Cosmo Spacely) works for company 2 (Spacely Space Sprockets) and is a customer (2) but not an affiliate
etc...
NOW finally onto the problem, if there is one...
Do I have a circular reference via the client_id and affiliate_id columns in the Users table? Is this a bad thing? I'm having a hard time wrapping my head around this.
When someone logs in, it checks their credentials against the users table and uses users.contact_id, users.client_id, and users.affiliate_id to do a quick look up rather than having to join together a string of tables to find out the same information. But this causes duplication of data.
Without client_id in the users table, I would have to find the following information out like this:
affiliate_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `affiliates`.`company_id`
client_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `clients`.`company_id`
company_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `company`.`company_id`
user's name: join `users`.`contact_id` to `contacts_types`.`contact_id` to `contacts`.`contact_id` > `name`
In each case, I wouldn't necessarily know if the user even has an entry in the affiliate table or the clients table, because they likely have an entry in only one of those tables and not both.
Is it better to do these kinds of joins and thread through multiple tables to get the information I want, or is it better to have a "shortcut" field to get me the information I want?
I have a feeling that over all, this is overly complicated in some way, but I don't see how.
I'm using MySQL.
it's better to do the joins. you should only be denormalizing your data when you have timed evidence of a slow response.
having said that, there are various ways to reduce the amount of typing:
use "as" to give shorter names to your fields
create views. these are "virtual tables" that already have your standard joins built-in, so that you don't have to repeat that stuff every time.
use "with" in sql. this lets you define something like a view within a single query.
it's possible mysql doesn't support all the above - you'll need to check the docs [update: ok, recent mysql seems to support views, but not "with". so you can add views to do the work of affiliate_id, client_id etc and treat them just like tables in your queries, but keeping the underlying data nicely organised.]