Anyone used SQl Server 2008 HierarchialID type to store genealogy data

Anyone used SQl Server 2008 HierarchialID type to store genealogy data - sql-server-2008

I have a genealogical database (about sheep actually), that is used by breeders to research genetic information. In each record I store fatherid and motherid. In a seperate table I store complete 'roll up' information so that I can quickly tell the complete family tree of any animal without recursing thru the entire database...
Recently discovered the hierarchicalID type built into SQL server 2008, on the surface it sounds promising, but I and am wondering if anyone has used it enough to know whether or not it would be appropriate in my type of app(i.e. two parents, multiple kids)? All the samples I have found/read so far deal with manager/employee type relationships where a given boss can have multiple employees, and each employee can have a single boss.
The needs of my app are similar, but not quite the same.
I am sure I will dig into this new technology anyway, but it would be nice to shortcut my research if someone already knew that it was not designed in such a fashion that it would allow me to make use of it.
I am also curious what kind of performance people are seeing using this new data type versus other methods that do the same thing.

Assuming each sheep has one male parent and one female parent, and that no sheep can be its own parent (leading to an Ovine Temporal Paradox), then what about using two HierarchyIDs?
CREATE TABLE dbo.Sheep(
MotherHID hierarchyid NOT NULL,
FatherHID hierarchyid NOT NULL,
Name int NOT NULL
)
GO
ALTER TABLE dbo.Sheep
ADD CONSTRAINT PK_Sheep PRIMARY KEY CLUSTERED (
MotherHID,
FatherHID
)
GO
By making them a joint PK, you'd be uniquely identifying each sheep as the product of its maternal hierarchy and it's paternal hierarchy.
There may be some inherent problem lurking here, so proceed cautiously with a couple simple prototypes - but initially it seems like it would work for you.

I can't see how it would work; in a regular hierarchy, there is a single chain to the root, so it can store the path (which is what the binary is) to each node. However, with multiple parents, this isn't possible: even if you split matriarchy and partiarchy, you still have 1 mother, 2 grandmothers, 4 great-grand-mothers, etc (not even getting into some of the more "interesting" scanerios possible, especially with livestock). There is no single logical path to encode, so no: I don't think that this can work in your case.
I'm happy to be corrected, though.

Using two separate HierarchyID to indicate father and mother would work well.
However, you definitely would NOT want to use those as a unique indicator of the row, since it's a 2-to-many situation. (Two sheep can have multiple children.)
I don't see anything inherently wrong with using HierarchyId for ancestry--for Sheep at least. For people, the relationships are much more complicated than "this person begat that person", so obviously the use would be limited to breeding.

SQL Server hierarchyID is not a robust solution for many genealogy analytic questions. It is based on ORDPATH and I've used it for awhile in genealogy; but there are too many scenarios in genealogy that cannot be readily addressed with ORDPATH methods for directed acyclic graphs. A graph database is much more robust and well suited for genealogy. I use Neo4j: http://stumpf.org/genealogy-blog/graph-databases-in-genealogy.

Related

Restructure Inventory Management Database (2 to 3 Tables; Development Stage)

I’m developing a database. I’d appreciate some help restructuring 2 to 3 tables so the database is both compliant with the first 3 normal forms; and practical to use and to expand on / add to in the future. I want to invest time now to reduce effort / and confusion later.
PREAMBLE
Please be aware that I'm both a nube, and an amateur, though I have a certain amount of experience and skill and an abundance of enthusiasm!
BACKGROUND TO PROJECT
I am writing a small (though ambitious!) web application (using PHP and AJAX to a MySQL database). It is essentially an inventory management system, for recording and viewing the current location of each individual piece of equipment, and its maintenance history. If relevant, transactions will be very low (probably less than 100 a day, but with a possibility of simultaneous connections / operations). Row count will also be very low (maybe a few thousand).
It will deal with many completely different categories of equipment, eg bikes and lamps (to take random examples). Each unit of equipment will have its details or specifications recorded in the database. For a bike, an important specification might be frame colour, whereas a lamp it might require information regarding lampshade material.
Since the categories of equipment have so little in common, I think the most logical way to store the information is 1 table per category. That way, each category can have columns specific to that category.
I intend to store a list of categories in a separate table. Each category will have an id which is unique to that category. (Depending on the final design, this may function as a lookup table and / or as a table to run queries against.) There are likely to be very few categories (perhaps 10 to 20), unless the system is particulary succesful and it expands.
A list of bikes will be held in the bikes table.
Each bike will have an id which is unique to that bike (eg bike 0001).
But the same id will exist in the lamp table (ie lamp 0001).
With my application, I want the user to select (from a dropdown list) the category type (eg bike).
They will then enter the object's numeric id (eg 0001).
The combination of these two ids is sufficient information to uniquely identify an object.
Images:
Current Table Design
Proposed Additional Table
PROBLEM
My gut feeling is that there should be an “overarching table” that encompasses every single article of equipment no matter what category it comes from. This would be far simpler to query against than god knows how many mini tables. But when I try to construct it, it seems like it will break various normal forms. Eg introducing redundancy, possibility of inconsistency, referential integrity problems etc. It also begins to look like a domain table.
Perhaps the overarching table should be a query or view rather than an entity?
Could you please have a look at the screenshots and let me know your opinion. Thanks.
For various reasons, I’d prefer to use surrogate keys rather than natural keys if possible. Ideally, I’d prefer to have that surrogate key in a single column.
Currently, the bike (or lamp) table uses just the first column as its primary key. Should I expand this to a composite key including the Equipment_Category_ID column too? Then make the Equipment_Article table into a view joining on these two columns (iteratively for each equipment category). Optionally Bike_ID and Lamp_ID columns could be renamed to something generic like Equipment_Article_ID. This might make the query simpler, but is there a risk of losing specificity? It would / could still be qualified by the table name.
Speaking of redundancy, the Equipment_Category_ID in the current lamp or bike tables seems a bit redundant (if every item / row in that table has the same value in that column).
It all still sounds messy! But surely this must be very common problem for eg online electronics stores, rental shops, etc. Hopefully someone will say oh that old chestnut! Fingers crossed! Sorry for not being concise, but I couldn't work out what bits to leave out. Most of it seems relevant, if a bit chatty. Thanks in advance.
UPDATE 27/03/2014 (Reply to #ElliotSchmelliot)
Hi Elliot.
Thanks for you reply and for pointing me in the right direction. I studied OOP (in Java) but wasn't aware that something similar was possible in SQL. I read the link you sent with interest, and the rest of the site/book looks like a great resource.
Does MySQL InnoDB Support Specialization & Generalization?
Unfortunately, after 3 hours searching and reading, I still can't find the answer to this question. Keywords I'm searching with include: MySQL + (inheritance | EER | specialization | generalization | parent | child | class | subclass). The only positive result I found is here: http://en.wikipedia.org/wiki/Enhanced_entity%E2%80%93relationship_model. It mentions MySQL Workbench.
Possible Redundancy of Equipment_Category (Table 3)
Yes and No. Because this is a lookup table, it currently has a function. However because every item in the Lamp or the Bike table is of the same category, the column itself may be redundant; and if it is then the Equipment_Category table may be redundant... unless it is required elsewhere. I had intended to use it as the RowSource / OptionList for a webform dropdown. Would it not also be handy to have Equipment_Category as a column in the proposed Equipment parent table. Without it, how would one return a list of all Equipment_Names for the Lamp category (ignoring distinct for the moment).
Implementation
I have no way of knowing what new categories of equipment may need to be added in future, so I’ll have to limit attributes included in the superclass / parent to those I am 100% sure would be common to all (or allow nulls I suppose); sacrificing duplication in many child tables for increased flexibility and hopefully simpler maintenance in the long run. This is particulary important as we will not have professional IT support for this project.
Changes really do have to be automated. So I like the idea of the stored procedure. And the CreateBike example sounds familiar (in principle if not in syntax) to creating an instance of a class in Java.
Lots to think about and to teach myself! If you have any other comments, suggestions etc, they'd be most welcome. And, could you let me know what software you used to create your UML diagram. Its styling is much better than those that I've used.
Cheers!

You sound very interested in this project, which is always awesome to see!
I have a few suggestions for your database schema:
You have individual tables for each Equipment entity i.e. Bike or Lamp. Yet you also have an Equipment_Category table, purely for identifying a row in the Bike table as a Bike or a row in the Lamp table as a Lamp. This seems a bit redundant. I would assume that each row of data in the Bike table represents a Bike, so why even bother with the category table?
You mentioned that your "gut" feeling is telling you to go for an overarching table for all Equipment. Are you familiar with the practice of generalization and specialization in database design? What you are looking for here is specialization (also called "top-down".) I think it would be a great idea to have an overarching or "parent" table that represents Equipment. Then, each sub-entity such as Bike or Lamp would be a child table of Equipment. A parent table only has the fields that all child tables share.
With these suggestions in mind, here is how I might alter your schema:
In the above schema, everything starts as Equipment. However, each Equipment can be specialized into Lamp, Bike, etc. The Equipment entity has all of the common fields. Lamp and Bike each have fields specific to their own type. When creating an entity, you first create the Equipment, then you create the specialized entity. For example, say we are adding the "BMX 200 Ultra" bike. We first create a record in the Equipment table with the generic information (equipmentName, dateOfPurchase, etc.) Then we create the specialized record, in this case a Bike record with any additional bike-specific fields (wheelType, frameColor, etc.) When creating the specialized entities, we need to make sure to link them back to the parent. This is why both the Lamp and Bike entities have a foreign key for equipmentID.
An easy and effective way to add specialized entities is to create a stored procedure. For example, lets say we have a stored procedure called CreateBike that takes in parameters bikeName, dateOfPurchase, wheelType, and frameColor. The stored procedure knows we are creating a Bike, and therefore can easily create the Equipment record, insert the generic equipment data, create the bike record, insert the specialized bike data, and maintain the foreign key relationship.
Using specialization will make your transactional life very simple. For example, if you want all Equipment purchased before 1/1/14, no joins are needed. If you want all Bikes with a frameColor of blue, no joins are needed. If you want all Lamps made of felt, no joins are needed. The only time you will need to join a specialized table back to the Equipment table is if you want data both from the parent entity and the specialized entity. For example, show all Lamps that use 100 Watt bulbs and are named "Super Lamp."
Hope this helps and best of luck!
Edit
Specialization and Generalization, as mentioned in your provided source, is part of an Enhanced Entity Relationship (EER) which helps define a conceptual data model for your schema. As such, it does not need to be "supported" per say, it is more of a design technique. Therefore any database schema naturally supports specialization and generalization as long as the designer implements it.
As far as your Equipment_Category table goes, I see where you are coming from. It would indeed make it easy to have a dropdown of all categories. However, you could simply have a static table (only contains Strings that represent each category) to help with this population, and still keep your Equipment tables separate. You mentioned there will only be around 10-20 categories, so I see no reason to have a bridge between Equipment and Equipment_Category. The fewer joins the better. Another option would be to include an "equipmentCategory" field in the Equipment table instead of having a whole table for it. Then you could simply query for all unique equipmentCategory values.
I agree that you will want to keep your Equipment table to guaranteed common values between all children. Definitely. If things get too complicated and you need more defined entities, you could always break entities up again down the road. For example maybe half of your Bike entities are RoadBikes and the other half are MountainBikes. You could always continue the specialization break down to better get at those unique fields.
Stored Procedures are great for automating common queries. On top of that, parametrization provides an extra level of defense against security threats such as SQL injections.
I use SQL Server. The diagram I created is straight out of SQL Server Management Studio (SSMS). You can simply expand a database, right click on the Database Diagrams folder, and create a new diagram with your selected tables. SSMS does the rest for you. If you don't have access to SSMS I might suggest trying out Microsoft Visio or if you have access to it, Visual Paradigm.

CSVs in database columns - not a good idea? [duplicate]

This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 8 years ago.
A while ago, I came to the realization that a way I would like to hold the skills for a player in a game would be through CSV format. On the player's stats, I made a varchar of skills that would be stored as CSV. (1,6,9,10 etc.) I made a 'skills' table with affiliated stats for each skill (name, effect) and when it comes time to see what skills they have, all I have to do is query that single column and use PHP's str_getcsv() to see if a certain skill exists because it'll be in an array.
However, my coworker suggests that a superior system is to have each skill simply be an entry into a master "skills" table that each player will use, and each skill will have an ID foreign key to the player. I just query all rows in this table, and what's returned will be their skills!
At first I thought this wouldn't be very good at all, but it appears the Internet disagrees. I understand that it's less searchable - but it was not my intention to ever say, "does the player have x skill?" or "show me all players with this skill!". At worst if I wanted such data, I'd just make a PHP report for it that would, admittedly, be slow.
But it appears as though this is really faster?! I'm having trouble finding a hard answer extending beyond "yeah it's good and normalized". Can Stack Overflow help me out?
Edit: Thanks, guys! I never realized how bad this was. And sorry about the dupe, but believe me, I didn't type all of that without at least checking for dupes. :P

Putting comma-separated values into a single field in a database is not just a bad idea, it is the incarnation of Satan expressed in a database model.
It cannot represent a great many situations accurately (cases in which the value contains a comma or something else that your CSV-consuming code has trouble with), often has problems with values nested in other values, cannot be properly indexed, cannot be used in database JOINs, is difficult to dedupe, cannot have additional information added to it (number of times the skill was earned, in your case, or a skill level), cannot participate in relational integrity, cannot enforce type constraints, and so on. The list is almost endless.
This is especially true of MySQL which has the very convenient group_concat function that makes it easy to present this data as a comma-separated string when needed while still maintaining the full functionality and speed of a normalized database.
You gain nothing from using the comma-separate approach but lose searchability and performance. Get Satan behind thee, and normalize your data.

Well, there are things such as scaleability to consider. What if you need to add/remove a skill? How about renaming a skill? What happens if the number of skills out grows the size of your field? It's bad practice to have to re-size a field just to accommodate something like this.
What about maintainability? Could another developer come in and understand what you've done? What happens if the same skill is given to a player twice?
You coworker's suggestion is not correct either. You would have 3 tables in this case. A master player table, a skills table, and a table that has a relationship to both, creating a many to many relationship, allowing a single skill to be associated with many players, and many players having the same skill.

Since the database will index the content (assuming that you use index) it will be very very fast to search the content and get the desired contents. Remember: databases are designed to hold a lot of information and a database such as mysql, which is a relational database, is made for relations.
Another matter is the maintainability of the system. It will be much much easier to maintain a system that's normalized. And when you are to remove or add a skill it will be easier.
When you are about to get the information from the database regarding the skills of the player you can easily get information connected to the concerned skills with a simple JOIN.
I say: Let the database do what it does best - handle the data. And let your programming do what it should do ;)

'Many to two' relationship

I am wondering about a 'many to two' relationship. The child can be linked to either of two parents, but not both. Is there any way to reinforce this? Also I would like to prevent duplicate entries in the child.
A real world example would be phone numbers, users and companies. A company can have many phone numbers, a user can have many phone numbers, but ideally the user shouldn't provide the same phone number as the company as there would be duplicate content in the DB.

This question shows that you don't fully understand entity relationships (no rudeness intended). Of which there are four (technically only 3) types below:
One to One
One to Many
Many to One
Many to Many
One to One (1:1):
In this case a table has been broken up into two parts for purposes of complying with normalisation, or more usually the open closed principle.
Normalisation compliance: You might have a business rule that each customer has only one account. Technically, you could in this case say customer and account could all be in the same table, but this breaks the rules of normalisation, so you split them and make a 1:1.
Open-Close principle compliance: A customer table, might have id, first & last names, and address. Later someone decides to add a date of birth and with it the ability to calculate age along with a bunch of other much needed fields. This is an over simplified example of one to one, but you get the main use for it is to extend your database without breaking existing code. Much code written (sadly) is tightly coupled to the database so changes in the structure of a table will break the code. Adding a 1:1 like this will extend the table to meet new requirements without modifying the origional, thereby allowing old code to continue functioning normally and new code to make use of the new db features.
The downside of normalisation and extending tables using 1:1 relationships in this way is performance. Often times on heavly used systems, the first target to increase database performance is de-normalising and combining such tables into a single table, and optimising the indexes thus removing the need to use joins and read from multiple tables. Normalisation / De-Normalisation is neither a good or bad thing, as it depends on the needs of the system. Most systems usually start off normalised changing back when needed, but this change needs to be done very carefully as mentioned, if code is tightly coupled to the DB structure, it will almost definitely cause the system to fail. i.e. When you combine 2 tables, one ceases to exist, all the code that includes that now nonexistant table fails until it is modified (in db terms, imagine connecting relationships to any of the tables in the 1:1, when you remove those tables, this breaks the relationships, and so the structure has to be greatly modified to compensate. Unfortunately, such bad designs are much easier to spot in the DB world than in the software world in most cases and you don't usually notice something went wrong in code until it all falls apart) unless the system is properly designed with separation of concerns in mind.
It the closest thing you can get to inheritance in object oriented programming. But its not quite the same.
One to Many (1:M) / Many to One (M:1):
These two relationships (hense why 4 become 3), are the most popular relationship types. They are both the same type of relationship, the only thing that changes is your point of view. An example A customer has many phone numbers, or alternately, many phone numbers can belong to a customer.
In object oriented programming this would be considered composition. Its not inheritance, but you are saying one item is composed of many parts. This is usually represented with arrays / lists / collections etc. inside of classes as opposed to an inheritance structure.
Many to Many (M:M):
This type of relationship with current technology is impossible. For this reason we need to break it down into two one to many relationships with an "association" table joining them. The many side of the two one to many relationships is always on the association / link table.
For your example, the person who said you need a many to many is correct. Because a two to many is effectively a many (meaning more than one) to many relationship. This is the only way you would get your system to work. Unless you are intending to research the field of relational calculus to find some new type of relationship that would allow this.
Also for such relationships (m2m) you have two choices, either create a compound key in the linker table so the combination of fields become a unique entry (if you are interested in db optimisation this is the slower choice, but takes less space). Alternately, you create a third field with an auto generated id column and make that the primary key (for db optimisation, this is the faster choice, but takes more space).
In your example specifically above...
A real world example would be phone numbers, users and companies. A company can have many phone numbers, a user can have many phone numbers, but ideally the user shouldn't provide the same phone number as the company as there would be duplicate content in the DB.
This would be a many to many relationship with the phone number table as the linker table between companies and users. As explained, to ensure no phone number is repeated, you simply set it as the primary key or use another primary key and set the phone number field to unique.
For those kind of questions, it is really down to how you phrase them. What is causing you to get confused about this, and how you overcome this confusion to see the solution is simple. Rephrase the problem as follows. Start by asking is it a one to one, if the answer is no, move on. Next ask is it a one to many, if the answer is no move on. The only other option remaining is many to many. Be careful though, ensure you have considered the first 2 questions carefully before moving on. Many inexperienced database people often over complicate issues by defining one to many as many to many. Once again, the most popular type of relationship by far is one to many (I would say 90%) with the many to many and one to one spliting the remaining 10% 7/3 respectevely. But those figures are just my personal perspective, so dont go quoting them as industry standard statistics. My point is to make extra extra sure it is definitely not a one to many before choosing many to many. It is worth the extra effort.
So now to find the linker table between the two, decide which two are your main tables, and what fields need to be shared between them. In this case, company and user tables both need to share the phone. Hense you need to make a new phone table as the linker.
The warning alarm of misunderstanding should show as soon as you decide none of the 3 are working for you. This should be enough to tell you that you simply are not phrasing the relationship question correctly. You will get better at it as time passes, but it is an essential skill and really should be mastered as soon as possible for your own sanaty.
Of course you could also go to an object oriented database which will allow a range of other relationships called "Hierarchacal" relationships. Thats great if you are thinking of becomming a programmer too. But I wouldnt recommend this as it going to make your head hurt when you start finding ways to combine the various types of relationships. Especially given there is not much need since nearly all databases in the world consist of just those 3 types of relationships unless they are something super duper special.
Hope this was a reasonable answer. Thanks for taking the time to read it.

Just make phone number a key in your contact numbers table.

For your phone number example, you would put the phone number in a table by itself, with an ID.
Then you link to that phone_id from each of users and companies.
For your parents example, you don't link the child to parent - instead you link the parent to the child. OR, you put both parents in the same table, and the child just links to one of them.

Data Modeling: ethnicities with parent-child relationship?

I have a site with users that I want users to be able to identify their ethnicities. What's the best way to model this if there is only 1 level of hierarchy?
Solution 1 (single table):
Ethnicity
- Id
- Parent Id
- Name
Solution 2 (two tables):
Ethnicity Group
- Id
- Name
Ethnicity
- Id
- Ethnicity Group Id
- Name
I will be using this so that users can search for other users based on ethnicity. Which of the 2 approaches will work better for me? Is there another approach I have not considered? I'm using MySQL.

Well there is such a thing as an Ethnicity Group in the real world, so you do need two tables, not one. The real world has three levels (the top-most would be Race), but I understand that may not be necessary here. If you squash the three levels into two, you have to be careful, and lay them all out properly at the beginning. However, they will be vulnerable to people saying they want the real thing, and you may have to change it, or change the structure to fit more in ... much more work later).
If you do it correctly, as per real world, that problem is eliminated. Let me know if you want Race, and I will change the model.
The tables are far too small, and the keys are too meaningful, to add Id-iot columns to them; leave them as pure Relational keys, otherwise you will lose the power of the Relational engine. If you really want narrow keys, use a CHAR(2) EthnicityCode, rather than a NUMERIC(10,0) or a meaningless number.
Link to Ethnicity Data Model (plus the answer to your other question)
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.

If there is nothing like an "ethnicity group" in the real world, I'd suggest you don't introduce one in your data model.
All the queries you can do with the second one you can also do with the first one, because you can just select FROM ethnicity AS e1 JOIN ethnicity AS es ON (e2.ethnicity_id = e1.parent_id).

I don't want to be awkward, but what are you going to do with people of mixed descent? I think that the best that you can hope for is a simple single-level enumeration like the kind of thing you get on census forms (e.g. 'Black', 'White', 'Asian', 'Hispanic' etc). It's not ideal, but it allows people to fairly easily self-identify. Concepts like race and ethnicity are wooly enough without trying to create additional (largely meaningless) hierarchies on top of them, so my gut feeling is to keep it simple.

How do you know when you need separate tables?

How do you know when to create a new table for very similar object types?
Example:
To learn mysql I'm building a model solar system. For the purposes of my project, planets have many similar attributes to dwarf planets, centaurs, and comets. Dwarf planets are almost completely identical to planets. Centaurs and comets are only different from planets because their orbital path has more variation. Should I have a separate table for each type of object, or should they share tables?
The example is probably too simple, but I'm also interested in best practices. Like should I use separate tables just in case I want to make planets and dwarf planets different in the future, or are their any efficiency reasons for keeping them in the same table.

Normal forms is what you should be interested with. They pretty much are the convention for building tables.
Any design that doesn't break the first, second or third normal form is fine by me. That's a pretty long list of requirement though, so I suggest you go read it off the Wikipedia links above.

It depends on what type of information you want to store about the objects. If the information for all of them is the same, say orbit radius, mass and name, then you can use the same table. However, if there are different properties for each (say atmosphere composition for planets, etc.) then you can either use separate tables for each (not very normalized) or have one table for basic properties like orbit, mass and name and a second table for just the properties that are unique to planets (and a similar table for comets, etc. if needed). All objects would be in the first table but only planets would be in the second table and linked through a foreign key to the first table.

It's called Database Normalization
There are many normal forms. By applying normalization you will go through metadata (tables) and study the relationsships between data more clearly. By using the normalization techniques you will optimize the tables to prevent redundancy. This process will help you understand which entities to create based on the relationsships between the different fields.

You should most likely split the data about a planet etc so that the shared (common) information is in another table.
E.g.
Common (Table)
Diameter (Column)
Mass (Column)
Planet
Population
Comet
Speed
Poor columns I know. Have the Planet and Comet tables link to the Common data with a key.

This is definitely a subjective question. It sounds like you are already on the right lines of thinking. I would ask:
Do these objects share many attributes? If so, it's probably worth considering at the very least a base table to list them all in.
Does one object "extend" another - it has all the attributes of the other, plus some extras? If so, it might be worth adding another table with the extra attributes and a one-to-one mapping back to the base object.
Do both objects have many shared attributes and unshared attributes? If this is the case, maybe you need a single table plus a "data extension" system where each object can have a type or category that specifies any amount of extra attributes that may be associated with it.
Do the objects only share one or two attributes? In this case, they are probably dissimilar enough to separate into multiple tables.
You may also ask yourself how you are going to query the data. Will you ever want to get them all in the same list? It's always a good idea to combine data into tables with other data they will commonly be queried with. For example, an "attachments" table where the file can be an image or a video, instead of images and video tables, if you commonly want to query for all attachments. Don't split into multiple tables unless there is a really good reason.

If you will ever want to get planets and comets in one single query, they will pretty much have to be in the same table if you want the database to work efficiently. Inheritance should be handled inside your app itself :)

Here's my answer to a similar question, which I think applies here as well:
How do you store business activities in a SQL database?

There are many different ways to express inheritance in your relational model. For example you can try to squish everything in to one table and have a field that allows you to distinguish between the different types or have one table for the shared attributes with relationships to a child table with the specific attributes etc... in either choice you're still storing the same information. When going from a domain model to a relational model this is what is called an impedance mismatch. Both choices have different trade offs, for example one table will be easier to query, but multiple tables will have higher data density.
In my experience it's best not to try to answer these questions from a database perspective, but let your domain model, and sometimes your application framework of choice, drive the table structure. Of course this isn't always a viable choice, especially when performance is concerned.
I recommend you start by drawing on paper the relationships you want to express and then go from there. Does the table structure you've chosen represent the domain accurately? Is it possible to query to extract the information you want to report on? Are the queries you've written complicated or slow? Answering these questions and others like them will hopefully guide you towards creating a good relational model.
I'd also suggest reading up on database normalization if you're serious about learning good relational modeling principals.

I'd probably have a table called [HeavenlyBodies] or some such thing. Then have a look up table with the type of body, ie Planet, comet, asteroid, star, etc. All will share similar things such as name, size, weight. Most of the answers I read so far all have good advise. Normalization is good, but I feel you can take it too far sometimes. 3rd normal is a good goal.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008