Is it possible to express a not null constraint using relational calculus? - relational-database

I understand that relational calculus is based on first order logic and as such has no concept of null values, however a not null constraint can be expressed in a query in relational algebra using an anti-join. Is there an equivalent mechanism to express such a query using only relational calculus?
For example, could a basic SQL query in the form:
SELECT * from x WHERE y IS NOT NULL
be expressed using relational calculus?

E.F.Codd proposed introducing nulls to the relational model but he never seemed to deal with the consequences. In his book, "Relational Model for Database Management", he proposed using two different kinds of null and a four-value logic. He suggested such a system would need a tautology detection algorithm to make sure the right result (or at least a useful, comprehensible result) would be returned for some queries. It seems to me that such a scheme must be impractical and doomed to fail, although I have no proof. To me it seems unlikely that users would be able to understand tautology detection properly.
Under Codd's scheme, short circuit operations like x=x would presumably evaluate to true, even in the presence of nulls. The authors of SQL did not follow Codd's scheme of course, and there lies the difficulty. There is no single consistent set of rules for the treatment of nulls either in theory or in working software, so unless you explain such a system and its rules your question is unanswerable.

Related

For a many to many relationship is it better to use relational database or nosql?

For a many to many relationship is it better to use relational database or nosql?
Let's assume you have a bunch of users. And each user can have friends that are from the same users table. So it's essentially a many to many relationship to itself. Many to many relationship in relational database will create a third table. Now I was wondering assuming this user table is huge like millions of people in there, this third table would be thus be gigantic assuming let's say each person has more than 10 friends each. Wouldn't it be more efficient for friends(and just overall more intuitive) to be stored as a json list in a nosql as shown below?
{"user1": "friendslist":["user2","user3","user4"]}
{"user2": "friendslist":["user1","user3","user4"]}
{"user3": "friendslist":["user1","user2","user4"]}
{"user4": "friendslist":["user1","user2","user3"]}
so this is also a data structures question so it would be btree vs hash table if I'm not mistaken.
It does seem more intuitive to the untrained. That's why the network data model is still so prevalent even though the relational model has been around for decades.
"Better" depends on how you want to use it, and "more efficient" depends on the database engine, indexes and various other factors. I prefer the relational model since I can formulate any reasonable question that can be logically derived from the data and get a correct answer. For example, if I wanted to find friends of friends, I could join a relational many-to-many table with itself. I could find cycles and cliques of any particular size. I could easily declare a unique constraint on pairs of friends.
It's possible to do these things without a relational database but I doubt it would be as easy or concise.
The particular data structure used by the database engine has nothing to do with the relational concept, though it is relevant to efficiency. For more info on which data structure would be used, you'll need to look at particular database management systems and their storage engines.
Why would a relational implementation be "gigantic"? Why would your structure be "more efficient"? You are making a lot of unfounded assumptions that it would be good for you to think about. (Learn some relational basics. And the relational take on relational vs NoSQL.)
Re "intuitive", the obvious relational organization for when U friended F is a table Friended holding rows where... "U friended F". Friended(U,F) for short. If you want Us where U friended x, that's the rows where Friended(U,x), ie the rows in PROJECT U RESTRICT F x Friended, ie the rows in PROJECT U (Friended WHERE F=x), depending on whether you want to think in logic, relations or a mix. What's your query for that? Using a relational interface in terms of predicates and tables does not require or preclude any particular implementations. The entire NoSQL movement is a sad consequence of lack of understanding by users and vendors of the relational model as interface to data, not as storage structure. A DBMS for a NoSQL use case needs only to be a relational DBMS better supporting arbitrary types in querying and implementation.
From my answer to Adjustable, versioned graph database:
There is an obvious 1:1 correspondence between your states at a given time and a relational database with a given schema. So there is an obvious 1:1 correspondence between your set of states over time and a changing-schema database, ie a variable whose value is a database plus metadata, manipulated by both DDL and DML update commands. So there is no evidence that you shouldn't just use a relational DBMS.
Relational DBMSs allow generic querying with automated implementation at a certain computational complexity with certain opportunities for optimization. Any application can have specialized queries that make a specialized data structure and operators a better choice. But you must design your application and know about such special aspects to justify this. As it is, with the obvious correspondences between your states and relational states, this has not been justified.
Just because you can draw a picture of your application state as of some time using a graph does not mean that you need a graph database. What matters is what specialized queries/expressions you will be evaluating. You should understand what these are in terms of your problem domain, which is probably most easily expressible per some specialized data structure and operators and relationally. Then you can compare the expressive and computational demands to a specialized data structure, a relational representation, and the models of particular graph databases.
Of course there are specialized applications where we use optimized special operators and storage. But that merits justification, and from a relational perspective should supported by an extendible relational DBMS.

Theoretical basis for CRUD Operations on data

I know that RDBMSs are based on the Relational Model, supported by Relational Algebra.
Various Relational Algebra theoretical concepts like Selection, Projection, Joins implemented in Query languages like SQL. But these operations are primarily the R (Read) of CRUD (Create, Read, Update, Delete).
CRUD is the holy grail of programming, especially in the enterprise world.
I wanted to know on which programming language independent, theoretical foundation (may or may not be mathematical) are the INSERTS, UPDATES, DELETES modeled on? Does such a theory even exist?
If it would exist, it could probable explain things like constraints on Databases amongst other things.
Eg:
You cannot update a single row (tuple) without specifying a unique column (a WHERE clasue).
Or,
If a one to many relation is deleted, the entity on the many side gets deleted (the table in which the other table's primary key is housed).
For the sake of simplicity let us assume all CRUD is operated on Relational Models only.
The reason I am asking is because I need to do a deep R&D for a product that hopes to automate CRUD. I know I know people have tried and failed, but I'd still like to be pointed to some theoretical foundation please!
EDIT This will also help in the design of ORMs which can produce all CRUD operations independent of the underlying DB Model
EDIT I just found this link -> https://cs.stackexchange.com/questions/43672/a-relational-algebra-extended-to-model-the-full-dml-crud-domain This is similar to what I have to ask unfortunately the OP's question circles into a specific implementation!
In relational terms CREATE, UPDATE and DELETE operations are all assignments. E.g. inserting I into T can be accomplished by:
T = T UNION I;
Any practical relational language ought to have syntax shortcuts for these operations. See Tutorial D for example.
CRUD can be reduced to relations, relational algebra, variables and (optionally) type theory. A database is seen as a set of relation variables, similar to variables in any imperative programming language except that they hold relations rather than scalar values. Queries apply a sequence of relational algebra operators to the values stored in relation variables. Read queries return the result to the caller. Create, Update and Delete queries assign the result back to the original relation variable.
One problem with ORMs is that they confuse rows for entities, tables for entity sets and columns for attributes. Chen's original paper stated that entities are represented by values and attributes are one-to-one relations represented by pairs of values. Another problem is trying to manipulate a row at a time when the underlying system works with sets. Another is trying to abstract over a very high-level declarative data sublanguage.
I don't want ORMs, I want my objects to talk in SQL with each other, but that's a different topic.
This is too long for a comment.
"Relational" databases only loosely implement relational algebra. The "relational" in relational algebra, for instance, refers (among other things) to the relationship between "attributes" (columns) and their values within a "tuple" (rows in a table). In most SQL databases, all rows in a table ("tuples") have the same columns. That is not a requirement for relational algebra. Another examples are duplicates within tables. Relational algebra deals with sets of "tuples", where duplicates are not allowed. Yet, relational databases allow duplicates in tables unless a primary key is explicitly defined.
The semantics around CRUD are driven more by the ACID properties of databases (atomicity, consistency, isolation, and durability). These properties drive the transactional semantics of relational databases.
In my experience, successful practical applications usually differ from theoretical underpinnings.

Why are BOOLEAN type columns problematic in relational database design?

I've been working mostly with Oracle for the past few years, and am quite used to seeing single character varchar columns used as boolean values.
I can also see (per stack overflow answers), that suggested type for MySQL is TINYINT.
Now I've taken on my little side project - using DerbyDB, and it supports BOOLEAN columns, but not until after version 10 or so.
So, the question is, why is it so hard to incorporate a BOOLEAN column while designing a relational database? Am I missing something, or is it just pushed down the to-do list as unimportant, since you can use another column type meanwhile?
In the case of Derby, specifically, the answer is a bit of strange history: Derby, the open source database, was once called Cloudscape, and was a proprietary product. At that time, it fully supported BOOLEAN.
Subsequently, Cloudscape was purchased by Informix which was purchased by IBM, and IBM engineering decided to make Derby compatible with DB2. The reason for this was that, if the two databases were compatible, it would be easier for users to migrate their applications between Derby databases and DB2 databases. The engineering staff, however, did not remove the non-DB2-compatible features from Derby, they simply disabled them in the SQL grammar, leaving most of the implementation in place.
Subsequently, IBM open-sourced Cloudscape to the Apache Software Foundation, naming it Derby. The open source community, no longer bound by the requirement that Derby be completely compatible with DB2, decided to revive the BOOLEAN datatype support. And so Derby now has BOOLEAN datatype support.
Tom Kyte pretty much echoes your last sentence in this blog entry:
"It just isn't a type we have -- I can say no more and no less. ANSI
doesn't have it -- many databases don't have it (we are certainly not
alone). In the grand scheme of things -- I would say the
priotization of this is pretty "low" (thats my opinion there)."
He's speaking from the Oracle perspective, but it applies to any relational RDBMS.
PostgreSQL does have support for boolean for as long as I can think.
The oldest online doc I can find is for version 6.3 released 1998-03-01. They mention the boolean type:
http://www.postgresql.org/docs/6.3/static/c0805.htm
In later docs they mention SQL99 as the standard they follow.
Since SQL99 seems to mention this type I would assume, that many DBs did have support for that type quite well before 1999.
I don't know as I haven't designed one, but my guess would be that since RDBMS's are about describing and storing sets of things, boolean fields aren't needed because they would also denote what is in a set, but they are extraneous as the membership of sets will be derived from the actual data or structure of the database.
As an example, take a boolean column for roles given to employees where they're either managers, or they're not. You could use a boolean column to describe this, but what you should do is either have a table for managers, and a table for non managers, or (and this would be the more flexible and probably more manageable way) create an extra "look up" table that gives roles (as a single text column) and and key that is then referred to (a foreign key) in the employees table.
I think I should add that most times you see a boolean field in a table it's a code smell, as it will may hit performance - to use a boolean in a where clause would invoke a table scan and make an indexes on the table fairly pointless (but see the comments for a further discussion of this). I'd hazard another guess that boolean data types have been added to most RDBMS's for use in their procedural language extensions (T-SQL, PLSQL) to help with the odd conditional statement that's required.

sql -> relational algebra

How do I convert this to relational algebra tree?
What are the logical steps? Do I first need to convert to relational algebra? Or can I go straight from sql to tree?
I would first convert to relational algebra, then convert to the tree.
Look, the SELECT clause only wants three fields. That's a projection.
The FROM clause has three relations. That's a Cartesian product.
The WHERE clause gives a bunch of selections. This is the part where it helps to convert to relational algebra before converting to a tree.
I have no idea what notation you use in class, but you probably want something that has a general form of
projection((things-you-want), selection((criteria), selection((criteria),
selection((criteria), aXbXc))))
or projection of selection of selection of ... stuff resulting from cross products.
Note, depending on how picky your instructor is, you may have to rename fields. Since both Show and Seat have showNo as an attribute, you may not be allowed to take the cross product before giving them unique names (alternative rules, attributes are uniquely identified by an implicit relation name prefix).
Furthermore, depending on the purpose of the lesson, you may commute some of these operations. You can do a selection on Booking before taking the cross product as a means of restricting the date range. The end results will be equivalent.
Anyway, is it really that much extra work to go from sql to relational algebra to tree? I have no doubt that with practice, you could skip the intermediate step. However, since you asked the question in the first place, I would suggest going through the motions. Remember the "show your work" requirement from junior high math teachers for the combining of simple terms that went away in high school? Same rule applies here. I say this as a former grader of CS assignments.
The result of that SQL query is not a relation so it has no exact equivalent in the RA. You could try creating an RA version of the same SQL query with DISTINCT added.

What is the difference between a Relational and Non-Relational Database?

MySQL, PostgreSQL and MS SQL Server are relational database systems, and NoSQL, MongoDB, etc. are non-relational DBMSs.
What are the differences between the two types of system?
Hmm, not quite sure what your question is.
In the title you ask about Databases (DB), whereas in the body of your text you ask about Database Management Systems (DBMS). The two are completely different and require different answers.
A DBMS is a tool that allows you to access a DB.
Other than the data itself, a DB is the concept of how that data is structured.
So just like you can program with Oriented Object methodology with a non-OO powered compiler, or vice-versa, so can you set-up a relational database without an RDBMS or use an RDBMS to store non-relational data.
I'll focus on what Relational Database (RDB) means and leave the discussion about what systems do to others.
A relational database (the concept) is a data structure that allows you to link information from different 'tables', or different types of data buckets. A data bucket must contain what is called a key or index (that allows to uniquely identify any atomic chunk of data within the bucket). Other data buckets may refer to that key so as to create a link between their data atoms and the atom pointed to by the key.
A non-relational database just stores data without explicit and structured mechanisms to link data from different buckets to one another.
As to implementing such a scheme, if you have a paper file with an index and in a different paper file you refer to the index to get at the relevant information, then you have implemented a relational database, albeit quite a simple one. So you see that you do not even need a computer (of course it can become tedious very quickly without one to help), similarly you do not need an RDBMS, though arguably an RDBMS is the right tool for the job. That said there are variations as to what the different tools out there can do so choosing the right tool for the job may not be all that straightforward.
I hope this is layman terms enough and is helpful to your understanding.
Relational databases have a mathematical basis (set theory, relational theory), which are distilled into SQL == Structured Query Language.
NoSQL's many forms (e.g. document-based, graph-based, object-based, key-value store, etc.) may or may not be based on a single underpinning mathematical theory. As S. Lott has correctly pointed out, hierarchical data stores do indeed have a mathematical basis. The same might be said for graph databases.
I'm not aware of a universal query language for NoSQL databases.
Most of what you "know" is wrong.
First of all, as a few of the relational gurus routinely (and sometimes stridently) point out, SQL doesn't really fit nearly as closely with relational theory as many people think. Second, most of the differences in "NoSQL" stuff has relatively little to do with whether it's relational or not. Finally, it's pretty difficult to say how "NoSQL" differs from SQL because both represent a pretty wide range of possibilities.
The one major difference that you can count on is that almost anything that supports SQL supports things like triggers in the database itself -- i.e. you can design rules into the database proper that are intended to ensure that the data is always internally consistent. For example, you can set things up so your database asserts that a person must have an address. If you do so, anytime you add a person, it will basically force you to associate that person with some address. You might add a new address or you might associate them with some existing address, but one way or another, the person must have an address. Likewise, if you delete an address, it'll force you to either remove all the people currently at that address, or associate each with some other address. You can do the same for other relationships, such as saying every person must have a mother, every office must have a phone number, etc.
Note that these sorts of things are also guaranteed to happen atomically, so if somebody else looks at the database as you're adding the person, they'll either not see the person at all, or else they'll see the person with the address (or the mother, etc.)
Most of the NoSQL databases do not attempt to provide this kind of enforcement in the database proper. It's up to you, in the code that uses the database, to enforce any relationships necessary for your data. In most cases, it's also possible to see data that's only partially correct, so even if you have a family tree where every person is supposed to be associated with parents, there can be times that whatever constraints you've imposed won't really be enforced. Some will let you do that at will. Others guarantee that it only happens temporarily, though exactly how long it can/will last can be open to question.
The relational database uses a formal system of predicates to address data. The underlying physical implementation is of no substance and can vary to optimize for certain operations, but it must always assume the relational model. In layman's terms, that's just saying I know exactly how many values (attributes) each row (tuple) in my table (relation) has and now I want to exploit the fact accordingly, thoroughly and to it's extreme. That's the true nature of the beast.
Since we're obviously the generation that has had a relational upbringing, if you look at NoSQL database models from the perspective of the relational model, again in layman's terms, the first obvious difference is that no assumptions about the number of values a row can contain is ever made. This is really oversimplifying the matter and does not cleanly apply to the intricacies of the physical models of every NoSQL database, but it's the pinnacle of the relational model and the first assumption we have to leave behind or, if you'd rather, the biggest leap we have to make.
We can agree to two things that are true for every DBMS: it can store any kind of data and has enough mathematical underpinnings to make it possible to manage the data in any way imaginable. The reality is that you'll never want to make the mistake of putting any of the two points to the test, but rather just stick with what the actual DBMS was really made for. In layman's terms: respect the beast within!
(Please note that I've avoided comparing the (obviously) well founded standards revolving around the relational model against the many flavors provided by NoSQL databases. If you'd like, consider NoSQL databases as an umbrella term for any DBMS that does not completely assume the relational model, in exclusion to everything else. The differences are too many, but that's the principal difference and the one I think would be of most use to you to understand the two.)
Try to explain this question in a level referring to a little bit technology
Take MongoDB and Traditional SQL for comparison, imagine the scenario of posting a Tweet on Twitter. This tweet contains 9 pictures. How do you store this tweet and its corresponding pictures?
In terms of traditional relationship SQL, you can store the tweets and pictures in separate tables, and represent the connection through building a new table.
What's more, you can set a field which is an image type, and zip the 9 pictures into a binary document and store it in this field.
Using MongoDB, you could build a document like this (similar to the concept of a table in relational SQL):
{
"id":"XXX",
"user":"XXX",
"date":"xxxx-xx-xx",
"content":{
"text":"XXXX",
"picture":["p1.png","p2.png","p3.png"]
}
Therefore, in my opinion, the main difference is about how do you store the data and the storage level of the relationships between them.
In this example, the data is the tweet and the pictures. The different mechanism about storage level of relationship between them also play a important role in the difference between both.
I hope this small example helps show the difference between SQL and NoSQL (ACID and BASE).
Here's a link of picture about the goals of NoSQL from the Internet:
http://icamchuwordpress-wordpress.stor.sinaapp.com/uploads/2015/01/dbc795f6f262e9d01fa0ab9b323b2dd1_b.png
The difference between relational and non-relational is exactly that. The relational database architecture provides with constraints objects such as primary keys, foreign keys, etc that allows one to tie two or more tables in a relation. This is good so that we normalize our tables which is to say split information about what the database represents into many different tables, once can keep the integrity of the data.
For example, say you have a series of table that houses information about an employee. You could not delete a record from a table without deleting all the records that pertain to such record from the other tables. In this way you implement data integrity. The non-relational database doesn't provide this constraints constructs that will allow you to implement data integrity.
Unless you don't implement this constraint in the front end application that is utilized to populate the databases' tables, you are implementing a mess that can be compared with the wild west.
First up let me start by saying why we need a database.
We need a database to help organise information in such a manner that we can retrieve that data stored in a efficient manner.
Examples of relational database management systems(SQL):
1)Oracle Database
2)SQLite
3)PostgreSQL
4)MySQL
5)Microsoft SQL Server
6)IBM DB2
Examples of non relational database management systems(NoSQL)
1)MongoDB
2)Cassandra
3)Redis
4)Couchbase
5)HBase
6)DocumentDB
7)Neo4j
Relational databases have normalized data, as in information is stored in tables in forms of rows and columns, and normally when data is in normalized form, it helps to reduce data redundancy, and the data in tables are normally related to each other, so when we want to retrieve the data, we can query the data by using join statements and retrieve data as per our need.This is suited when we want to have more writes, less reads, and not much data involved, also its really easy relatively to update data in tables than in non relational databases. Horizontal scaling not possible, vertical scaling possible to some extent.CAP(Consistency, Availability, Partition Tolerant), and ACID (Atomicity, Consistency, Isolation, Duration)compliance.
Let me show entering data to a relational database using PostgreSQL as an example.
First create a product table as follows:
CREATE TABLE products (
product_no integer,
name text,
price numeric
);
then insert the data
INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese', 9.99);
Let's look at another different example:
Here in a relational database, we can link the student table and subject table using relationships, via foreign key, subject ID, but in a non relational database no need to have two documents, as no relationships, so we store all the subject details and student details in one document say student document, then data is getting duplicated, which makes updating records troublesome.
In non relational databases, there is no fixed schema, data is not normalized. no relationships between data is created, all data mostly put in one document. Well suited when handling lots of data, and can transfer lots of data at once, best where high amounts of reads and less writes, and less updates, bit difficult to query data, as no fixed schema. Horizontal and vertical scaling is possible.CAP (Consistency, Availability, Partition Tolerant)and BASE (Basically Available, soft state, Eventually consistent)compliance.
Let me show an example to enter data to a non relational database using Mongodb
db.users.insertOne({name: ‘Mary’, age: 28 , occupation: ‘writer’ })
db.users.insertOne({name: ‘Ben’ , age: 21})
Hence you can understand that to the database called db, and there is a collections called users, and document called insertOne to which we add data, and there is no fixed schema as our first record has 3 attributes, and second attribute has 2 attributes only, this is no problem in non relational databases, but this cannot be done in relational databases, as relational databases have a fixed schema.
Let's look at another different example
({Studname: ‘Ash’, Subname: ‘Mathematics’, LecturerName: ‘Mr. Oak’})
Hence we can see in non relational database we can enter both student details and subject details into one document, as no relationships defined in non relational databases, but here this way can lead to data duplication, and hence errors in updating can occur therefore.
Hope this explains everything
In layman terms it's strongly structured vs unstructured, which implies that you have different degrees of adaptability for your DB.
Differences arise in indexation particularly as you need to ensure that a certain reference index can link to a another item -> this a relation. The more strict structure of relational DB comes from this requirement.
To note that NosDB apaprently provides both relational and non relational DBs and a way to query both http://www.alachisoft.com/nosdb/sql-cheat-sheet.html