sql -> relational algebra - relational-database

How do I convert this to relational algebra tree?
What are the logical steps? Do I first need to convert to relational algebra? Or can I go straight from sql to tree?

I would first convert to relational algebra, then convert to the tree.
Look, the SELECT clause only wants three fields. That's a projection.
The FROM clause has three relations. That's a Cartesian product.
The WHERE clause gives a bunch of selections. This is the part where it helps to convert to relational algebra before converting to a tree.
I have no idea what notation you use in class, but you probably want something that has a general form of
projection((things-you-want), selection((criteria), selection((criteria),
selection((criteria), aXbXc))))
or projection of selection of selection of ... stuff resulting from cross products.
Note, depending on how picky your instructor is, you may have to rename fields. Since both Show and Seat have showNo as an attribute, you may not be allowed to take the cross product before giving them unique names (alternative rules, attributes are uniquely identified by an implicit relation name prefix).
Furthermore, depending on the purpose of the lesson, you may commute some of these operations. You can do a selection on Booking before taking the cross product as a means of restricting the date range. The end results will be equivalent.
Anyway, is it really that much extra work to go from sql to relational algebra to tree? I have no doubt that with practice, you could skip the intermediate step. However, since you asked the question in the first place, I would suggest going through the motions. Remember the "show your work" requirement from junior high math teachers for the combining of simple terms that went away in high school? Same rule applies here. I say this as a former grader of CS assignments.

The result of that SQL query is not a relation so it has no exact equivalent in the RA. You could try creating an RA version of the same SQL query with DISTINCT added.

Related

MySQL conditional table structure question [duplicate]

I do not have much experience in table design. My goal is to create one or more product tables that meet the requirements below:
Support many kinds of products (TV, Phone, PC, ...). Each kind of product has a different set of parameters, like:
Phone will have Color, Size, Weight, OS...
PC will have CPU, HDD, RAM...
The set of parameters must be dynamic. You can add or edit any parameter you like.
How can I meet these requirements without a separate table for each kind of product?
You have at least these five options for modeling the type hierarchy you describe:
Single Table Inheritance: one table for all Product types, with enough columns to store all attributes of all types. This means a lot of columns, most of which are NULL on any given row.
Class Table Inheritance: one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
Concrete Table Inheritance: no table for common Products attributes. Instead, one table per product type, storing both common product attributes, and product-specific attributes.
Serialized LOB: One table for Products, storing attributes common to all product types. One extra column stores a BLOB of semi-structured data, in XML, YAML, JSON, or some other format. This BLOB allows you to store the attributes specific to each product type. You can use fancy Design Patterns to describe this, such as Facade and Memento. But regardless you have a blob of attributes that can't be easily queried within SQL; you have to fetch the whole blob back to the application and sort it out there.
Entity-Attribute-Value: One table for Products, and one table that pivots attributes to rows, instead of columns. EAV is not a valid design with respect to the relational paradigm, but many people use it anyway. This is the "Properties Pattern" mentioned by another answer. See other questions with the eav tag on StackOverflow for some of the pitfalls.
I have written more about this in a presentation, Extensible Data Modeling.
Additional thoughts about EAV: Although many people seem to favor EAV, I don't. It seems like the most flexible solution, and therefore the best. However, keep in mind the adage TANSTAAFL. Here are some of the disadvantages of EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g. for a lookup table.
Fetching results in a conventional tabular layout is complex and expensive, because to get attributes from multiple rows you need to do JOIN for each attribute.
The degree of flexibility EAV gives you requires sacrifices in other areas, probably making your code as complex (or worse) than it would have been to solve the original problem in a more conventional way.
And in most cases, it's unnecessary to have that degree of flexibility. In the OP's question about product types, it's much simpler to create a table per product type for product-specific attributes, so you have some consistent structure enforced at least for entries of the same product type.
I'd use EAV only if every row must be permitted to potentially have a distinct set of attributes. When you have a finite set of product types, EAV is overkill. Class Table Inheritance would be my first choice.
Update 2019: The more I see people using JSON as a solution for the "many custom attributes" problem, the less I like that solution. It makes queries too complex, even when using special JSON functions to support them. It takes a lot more storage space to store JSON documents, versus storing in normal rows and columns.
Basically, none of these solutions are easy or efficient in a relational database. The whole idea of having "variable attributes" is fundamentally at odds with relational theory.
What it comes down to is that you have to choose one of the solutions based on which is the least bad for your app. Therefore you need to know how you're going to query the data before you choose a database design. There's no way to choose one solution that is "best" because any of the solutions might be best for a given application.
#StoneHeart
I would go here with EAV and MVC all the way.
#Bill Karvin
Here are some of the disadvantages of
EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g.
for a lookup table.
All those things that you have mentioned here:
data validation
attribute names spelling validation
mandatory columns/fields
handling the destruction of dependent attributes
in my opinion don't belong in a database at all because none of databases are capable of handling those interactions and requirements on a proper level as a programming language of an application does.
In my opinion using a database in this way is like using a rock to hammer a nail. You can do it with a rock but aren't you suppose to use a hammer which is more precise and specifically designed for this sort of activity ?
Fetching results in a conventional tabular layout is complex and
expensive, because to get attributes
from multiple rows you need to do JOIN
for each attribute.
This problem can be solved by making few queries on partial data and processing them into tabular layout with your application. Even if you have 600GB of product data you can process it in batches if you require data from every single row in this table.
Going further If you would like to improve the performance of the queries you can select certain operations like for e.g. reporting or global text search and prepare for them index tables which would store required data and would be regenerated periodically, lets say every 30 minutes.
You don't even need to be concerned with the cost of extra data storage because it gets cheaper and cheaper every day.
If you would still be concerned with performance of operations done by the application, you can always use Erlang, C++, Go Language to pre-process the data and later on just process the optimised data further in your main app.
If I use Class Table Inheritance meaning:
one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
-Bill Karwin
Which I like the best of Bill Karwin's Suggestions.. I can kind of foresee one drawback, which I will try to explain how to keep from becoming a problem.
What contingency plan should I have in place when an attribute that is only common to 1 type, then becomes common to 2, then 3, etc?
For example: (this is just an example, not my real issue)
If we sell furniture, we might sell chairs, lamps, sofas, TVs, etc. The TV type might be the only type we carry that has a power consumption. So I would put the power_consumption attribute on the tv_type_table. But then we start to carry Home theater systems which also have a power_consumption property. OK its just one other product so I'll add this field to the stereo_type_table as well since that is probably easiest at this point. But over time as we start to carry more and more electronics, we realize that power_consumption is broad enough that it should be in the main_product_table. What should I do now?
Add the field to the main_product_table. Write a script to loop through the electronics and put the correct value from each type_table to the main_product_table. Then drop that column from each type_table.
Now If I was always using the same GetProductData class to interact with the database to pull the product info; then if any changes in code now need refactoring, they should be to that Class only.
You can have a Product table and a separate ProductAdditionInfo table with 3 columns: product ID, additional info name, additional info value. If color is used by many but not all kinds of Products you could have it be a nullable column in the Product table, or just put it in ProductAdditionalInfo.
This approach is not a traditional technique for a relational database, but I have seen it used a lot in practice. It can be flexible and have good performance.
Steve Yegge calls this the Properties pattern and wrote a long post about using it.

many to many relation in DB on same table

I'm making a database for a languages dictionary. I have a table of definitions with words in diferent languages.
DEFINITIONS
-----------
Id
Definition
Language
For example some records may be:
1->casa->spanish
2->house->english
3->maison->french
...
And now I have to create another table for the relationships, but I don't know how to do it correctly. In my application I can have 10 languages more or less. I think two ways of doing this:
RELATIONSHIPS
-------------
Id
Id_Spanish
Id_English
Id_French
...
So that in the same record I have the word in the different languages. Or this other way:
RELATIONSHIPS
-------------
Id
Id_Language_1
Id_Language_2
and linking the words in pairs, for example:
1(Id) -> 1(Id_Language_1) -> 2(Id_Language_2)
1(Id) -> 1(Id_Language_1) -> 3(Id_Language_2)
...
I have read it a lot about relationships many to many, but in my case I think it's better the first option (one record with all the languages), but I'm not very sure. Can someone say they think is best. Thanks.
I would add another column to your primary table
DEFINITIONS
-----------
Id
Definition
Language
WordId
and assign a word_id to each group to indicate they are all the same word.
1->casa->Spanish->10
2->house->English->10
3->maison->French->10
Bridge tables only really make sense with dynamic content, your content is static and can easily be defined in rows. Bridge tables require joins which slow down queries, so I guess one table is what makes the most sense if what you care about is querying speed.
But, what if some words are not defined for all languages. Than you waste a bunch of space on the database, which means it may make sense to use a bridge table.
I am suggesting one really long table with rows like english_word, english_Definition,french_word,french_definition etc...
I've already upvoted Brian's answer but I have enough to add that I think it's worth an additional answer:
Really, stay away from the first idea. Make lots of overhead to add a new language, and means that you need to either hardcode many different queries, or use dynamically-generated SQL, to make use of that table.
Your second idea, storing pairwise relationships, is better. At first it might seem harder to write queries against it, but once you get used to it, they will be more general and more straightforward. However, this design requires you to choose between one of two approaches, either of which has flaws:
Store every possible pair (English/French, English/Spanish, French/Spanish). This makes finding any arbitrary translation relatively simply, but requires more storage and allows for the possibility of inconsistencies if you look at more than two languages at a time. You also need to decide whether to store each pair in both directions; if not the queries become somewhat more complex.
Store just enough pairs to establish the equivalencies (e.g. store English/French and French/Spanish) then traverse them when necessary to find any given translation. Simplest way to do this is probably select one language that will always be the first one in each pair (e.g. store Spanish/English and Spanish/French pairs). But even then, you then need application logic that is aware of which language is the central one.
If you use the design that Brian suggests, any arbitrary translation from one language to another can be done with the same generic query, just plugging in the desired languages and word.
The second one is definitely better.
If you add a language, you don't need to change the structure
Queries for all language pairs are the same. Language ID is a query parameter. This means better query optimization and less code
Simpler update/delete/insert operations
Somewhat better support for synonyms
Brian's idea is also good, as long as your words have one meaning and one translation. If they have multiple, it needs to be extended.

Is there a more efficient way to handle multi-valued attributes other than creating a relationship table?

I have three tables, tbl_school, tbl_courses and tbl_branches.
Each course can be taught in one or more branches of a school.
tbl_school has got:
id
school_name
total_branches
...
tbl_courses:
id
school_id
course_title
....
tbl_branches:
id
school_id
city
area
address
When I want to list all the branches of a school, it is a pretty straight forward JOIN.
However, each course will be taught in one or more branches or all the branches of the school and I need to store this information. Since there is a one-to-many relationship between tbl_courses and tbl_branches, I will have to create a new relationship table that maps each course record to it's respective branches.
When my users want to filter a course by city or area, this relationship table will be used.
I would like to know if this is the right approach or is there something better for my problem?
I was planning to store a JSON of branches of courses which would eliminate the relationship table and query would be much easier to find the city or area pattern in JSON string.
I am new to design patterns so kindly bear with me.
Issues
The table description you have given has a few errors, which need to be corrected first, after which my proposal will make more sense.
The use of a table prefix, especially tbl_, is incorrect. All the tables are tbl_s. If you do use a prefix, it is to group tables by Subject Area. Further, SQL allows a table qualifier when referring to any table in the code:
`... WHERE table_name.column_name = "something" ...
If you would like some advice re Naming Convention, please review this Answer.
Use singular, because the table name is supposed to refer to a row (relation), not to the content (we know it contains many rows). Then all the English used re the table_name makes sense. (Eg. refer my Predicates.)
You have some incorrect or extraneous columns. It is easier to give you a Data Model, than to explain each item. A couple of items do need explanation:
school.total_branches is a duplicate, because that value can easily be derived (by COUNT() of the Branches). It breaks Normalisation rules, and introduces an Update Anomaly, which can get "out of synch".
course.school_id is incorrect, given that each Branch may or may not teach a Course. That relation is 1 Course to many Branches, it should be in the new table you are contemplating.
By JSON, if you mean construct an array on the client instead of keeping the relations in the database, then no, definitely not. Data and relationships to data, should be implemented in the database. For many reasons, the most important of which is Integrity. Following that, you may easily drag it into the client, and keep it there for stream-performance purposes.
The table you are thinking about is an Associative Table, an ordinary Relational construct to relate ("map", "link") two parent tables, here Courses to Branches.
Data duplication is not prevented. Refer to the Keys is the Data Model.
ID columns you have do not provide row uniqueness, which the Relational Model demands. If that is not clear to you please read this Answer.
Solution
Here is the model.
Proposed School Data Model
Please review and comment.
I need to ensure that you understand the notation in IDEF1X models, that unlike non-standard diagrams: every little notch, tick and line means something very specific. If not, please got to the IDEF1X Notation link at the bottom right of the model.
Please check the Predicates carefully, they (a) explain the model, and (b) are used to verify it. It is a feedback loop. They have two separate benefits.
If you would like more information on Predicates, why they are relevant, please go to this Answer and read the Predicate section.
If you wish to thoroughly understand Predicates, with a view to understanding Data Modelling, consider that Data Model (latest version is linked at the top of the Answer) against those Predicates. Ie. see if you understand a database that you have never seen before, via the model plus Predicates.
The Relational Keys I have given provide the row uniqueness that is required for Relational databases, duplicate data must be prevented. Note that ID columns are simply not needed. The Relational Keys provide:
Data Integrity
Relational access to data (notice the ease of, and unlimited, joins)
Relational speed
None of which a Record Filing System (characterised by ID columns) has.
Column description:
I have implemented two address_lines. Obviously, that should not include city because that is a separate column.
I presume area means something like borough or county or the area that the school branch operates in. If it is a fixed geographic administrative region (my first two descriptors) then it requires a formal structure. If not (my third descriptor), ie. it is loose, or (eg) it spans counties, then a simple Lookup table is enough.
If you use formal administrative regions, then city must move into that structure.
Your approach with an additional table seems the simplest and most straightforward to me. I would not mix JSON in this.

qualified relationships in datomic

In a relational DB, I could have a table Person and a table Hobby. Every person can have zero, one or more hobbies, and I also want to record, say, the priority of those hobbies for every person.
I could create a relationship table with the 2 foreign keys PersonFK and HobbyFK, and one plain column Priority.
In datomic, to model a simple n:m relationship (without the priority), I'd probably create an attribute of type Reference with cardinality Many, that I'd use for Person entities.
But how would I go about qualifying that relation to be able to store the priority? Would it have to be done analogously to the relational case, i.e. by creating a new entity type just for that relation? Or is there any better way? Using some meta data facility or something?
A similar question was asked on the Datomic mailing list a few days ago:
https://groups.google.com/d/topic/datomic/7uOl-TISdxA/discussion
In summary, the answer given there is that you are right: you need to create a relation entity on which to store the extra information.
The accepted answer here is now no longer the full story, given a new feature added to Datomic in June 2019. Sometimes you will still want to reify the relationship, but there is also now another option: heterogenous tuples
An attribute value, i.e. the v in the eavto 5-tuple, can now itself be a tuple.
This is a clojure vector of max length 8. This isn't a way to store an arbitrary amount of meta-data on the relationship, due to max length 8.
Official blog post announcement.
Discussion of the release on twitter.
In your case:
{:db/ident :person/hobby
:db/valueType :db.type/tuple
:db/tupleTypes [:db.type/ref :db.type/long] ; hobby, priority
:db/cardinality :db.cardinality/many}
To use this in datalog, you can use the tuple and untuple functions.
It may be best though to use such tuples like arrays, where a tuple really represents compound data. Indeed the example in the docs supposedly for these heterogeneous tuples actually uses homogeneous data, so I think it's really up to the user of datomic what to make of these choices.
In the sql world, generally if data is of different types it's probably not a good idea to treat it like an array, due, for starters, to the loss of power you'll get when manipulating those data structures from the query language. Datomic might not be completely equivalent, being as it is a graph database, and perhaps this is still relatively uncharted territory.

Data Modeling: ethnicities with parent-child relationship?

I have a site with users that I want users to be able to identify their ethnicities. What's the best way to model this if there is only 1 level of hierarchy?
Solution 1 (single table):
Ethnicity
- Id
- Parent Id
- Name
Solution 2 (two tables):
Ethnicity Group
- Id
- Name
Ethnicity
- Id
- Ethnicity Group Id
- Name
I will be using this so that users can search for other users based on ethnicity. Which of the 2 approaches will work better for me? Is there another approach I have not considered? I'm using MySQL.
Well there is such a thing as an Ethnicity Group in the real world, so you do need two tables, not one. The real world has three levels (the top-most would be Race), but I understand that may not be necessary here. If you squash the three levels into two, you have to be careful, and lay them all out properly at the beginning. However, they will be vulnerable to people saying they want the real thing, and you may have to change it, or change the structure to fit more in ... much more work later).
If you do it correctly, as per real world, that problem is eliminated. Let me know if you want Race, and I will change the model.
The tables are far too small, and the keys are too meaningful, to add Id-iot columns to them; leave them as pure Relational keys, otherwise you will lose the power of the Relational engine. If you really want narrow keys, use a CHAR(2) EthnicityCode, rather than a NUMERIC(10,0) or a meaningless number.
Link to Ethnicity Data Model (plus the answer to your other question)
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
If there is nothing like an "ethnicity group" in the real world, I'd suggest you don't introduce one in your data model.
All the queries you can do with the second one you can also do with the first one, because you can just select FROM ethnicity AS e1 JOIN ethnicity AS es ON (e2.ethnicity_id = e1.parent_id).
I don't want to be awkward, but what are you going to do with people of mixed descent? I think that the best that you can hope for is a simple single-level enumeration like the kind of thing you get on census forms (e.g. 'Black', 'White', 'Asian', 'Hispanic' etc). It's not ideal, but it allows people to fairly easily self-identify. Concepts like race and ethnicity are wooly enough without trying to create additional (largely meaningless) hierarchies on top of them, so my gut feeling is to keep it simple.