Find the mode in classical relational algebra with bound conditions - relational-database

I'm given a table with two columns: id_ship and date. I need to find the ids of the ships that appear the maximum number of times (that is , the ships that have the maximum number of dates), assuming that a ship cannot have more than three dates.
It's easy to find the ids of the ships that have three dates, just selecting from the triple cartesian product of the table. However, I'm not able to check if the result obtained is empty or not. If it's not empty, I would have to choose this view, while if it is empty I would need to look for the ships that have two dates and check again if this is empty. How can I proceed? Remember I cannot use extended relational algebra, so it's not valid the use of aggregating functions or extended projection.
EDIT: Sorry, I thought the concept "classical relational algebra" was standard. The operations that. The operations supported by my RA are:
select, project, rename, union, difference, intersect, cartesian product, natural join, join with condition and division

The purpose of this exercise is to avoid aggregate operators or grouping. It'll be a good demonstration of why those are useful.
So you're saying that for each id_ship in the given table, there might be 3, 2, or 1 occurrences. Presumably for your "triple cartesian product" (which'll be a self-product with renaming) you're going to produce a row with three dates ascending left-to-right (so you can check the self-product doesn't duplicate dates).
Ok then for the 2-occurrence cases, you need a double cartesian product, dates ascending left-to-right. Exclude those id-ships that already appear in the triples.
For the 1-occurrence case, exclude those id-ships that appear in either of the above.
Then UNION together the three results:
SELECT id-ship, 3 AS count
FROM occ3
UNION
SELECT id-ship, 2 AS count
FROM occ2
UNION
SELECT id-ship, 1 AS count
FROM occ1 ;
Now you have a conventional group-style table with a count.
Ah, but you wanted RA not SQL? Then we bump into the difficulties of which version of RA. The versions differ in what operators are available. And for example the 'granddaddy' Codd 1972 version didn't even include rename, so you couldn't even produce that triple Cartesian Product.
All versions support UNION OK; all support projection, which is what you need to get the id-ship out of the 3 separate query results.
Supposing your RA supports relation literals: {{count 3}} is a (singleton) set of tuples, each tuple being a set of attribute name-value pairs
(pi<id-ship>(occ3) x {{count 3}})
UNION
(pi<id-ship>(occ2) x {{count 2}})
UNION
(pi<id-ship>(occ1) x {{count 1}})
If your RA doesn't support relation literals, it might support an EXTEND operation

Related

Why do we use UNION to join rows in SQL?

Why use union in Sql at all when there must be same order of columns and it's names in the select statements. Couldn't we just update or alter the table to add more rows ?
The reason that you would not just keep adding rows to a single table is that they don't belong in that table.
For the same reason that if you're doing arithmetic between two integer variables like x + y, you don't permanently add the value of y to x. You want to preserve x's value as it's own thing, even though sometimes you also need the sum.
A book like SQL and Relational Theory: How to Write Accurate SQL Code makes clear that there's a difference between a relation and a relvar. A relvar is like a table. It's a persistent storage of a specific set of rows.
A relation is the result of a SQL expression like SELECT or VALUES. That relation may not be stored in any relvar; it is ephemeral. Perhaps it's the result of a more complex query that uses expressions, joins, and so on.
By analogy, the a number like 42 is an integer value. But int x that stores an integer value 42 is an integer variable. They can both be used as an operand for + but they're not the same kind of thing.
You can UNION two relations, if their columns are compatible in number and data type. Those relations aren't necessarily just relvars, they could be the result of other subqueries.
Just like in arithmetic, you can add x and a whole other integer expression.

Is HAVING ever necessary for non-aggregated grouped columns?

I have some code that generates SQL and need to understand if HAVING is ever necessary (or useful) for non-aggregated grouped columns? I haven't found any examples that suggest it is but wanted to check here.
The MySQL docs has this comment "The SQL standard requires that HAVING must reference only columns in the GROUP BY clause or columns used in aggregate functions."
I know that HAVING is necessary for aggregated conditions on groups, and also understand that WHERE can be used for non-aggregated grouped columns (which can be more efficient than having), but my questions is this:
Is HAVING ever necessary (or useful) for non-aggregated grouped columns?
Thanks
HAVING is specifically designed for aggregated columns. MySQL allows non-aggregated columns in the HAVING clause. There are three use-cases, that I can think of:
An efficiency hack, when the values of the group having identical values for the group.
An error, which should be avoided.
An efficiency hack, when there is no aggregation.
The first could conceivably be used in a situation like this:
select l.*, sum(x.y)
from list l join
. . .
group by l.listid
having l.foo = 'bar';
This works because all l.foo should have the same value for a given l.listid (assuming l.listid is a primary key). In this case, this filters the data as if you used where.
BUT, if this condition is not true, then the HAVING/WHERE equivalence is not true. The HAVING will choose a value from an indeterminate row and then filter the resulting aggregation column. The WHERE does the filtering before the aggregation. So, if lists could have the same type and you do:
select l.*, sum(x.y)
from list l join
. . .
group by l.type
having l.foo = 'bar';
This is a badly formed query (hence an error in my opinion), but is not equivalent to moving the condition to the WHERE.
The third situation is where there is no aggregation:
select l.*, concat('a', 'b', 'c') as test
from list l
having test = 'abc';
This is a convenience in MySQL. Other dialects would use a subquery. MySQL materializes subqueries, introducing inefficiency.
No.
WHERE condition is applied before the rows are grouped, HAVING is applied on the grouped rows. If no aggregated column is used, you're selecting based on a single row value anyways, so the only (semantic) difference is whether the rows will be selected before grouping or after it - but the result will be the same.
(Note that this difference may not even be true in practice, the optimizer might reorder the operations.)

Where clause with one column and multiple criteria returning one row instead of13

I have a simple query with a few rows and multiple criteria in the where clause but it is only returning one row instead of 13. No joins and the syntax was triple checked and appears to be free of errors.
Query:
select column1, column2, column3
from mydb
where onecolumn in (number1, number2....number13)
Results:
returns one row of data associated with a random number in the where clause
spent a big part of the day trying to figure this one out and am now out of ideas. Please help...
Absent a more detailed test case, and the actual SQL statement that is actually running, this question cannot be answered. Here are some "ideas"...
Our first guess is that the rows you think are going to satisfy the predicates aren't actually satisfying all of the conditions.
Our second guess is that you've got an aggregate expression (COUNT(), MAX(), SUM()) in the SELECT list that's causing an implicit GROUP BY. This is a common "gotcha"... the non-standard MySQL extension to GROUP BY which allows non-aggregates to appear in the SELECT list, which are not also included as expressions in the GROUP BY clause. This same gotcha appears when the GROUP BY clause is omitted entirely, and an aggregate is included in the SELECT list.
But the question doesn't make any mention of an aggregate expression in the SELECT list.
Our third guess is another issue that beginners frequently overlook: the order of precedence of operations, especially AND and OR. For example, consider the expressions:
a AND b OR c
a AND ( b OR c )
( a AND b ) OR c
consider those while we sing-along, Sesame Street style,...: "One of these things is not like the others, one of these things just doesn't belong..."
A fourth guess... if it wasn't for the row being returned having a value of onecolumn as a random number in the IN list... if it was instead the first number in the IN list, we'd be very suspicious that the IN list actually contains a single string value that looks like a list a values, but is actually not.
The two expression in the SELECT list look very similar, but they are very different:
SELECT t.n IN (2,3,5,7) AS n_in_list
, t.n IN ('2,3,5,7') AS n_in_string
FROM ( SELECT 2 AS n
UNION ALL SELECT 3
UNION ALL SELECT 5
) t
The first expression is comparing n to each value in a list of four values.
The second expression is equivalent to t.n IN (2).
This is a frequent trip up when neophytes are dynamically creating SQL text, thinking that they can pass in a string value and that MySQL will see the commas within the string as part of the SQL statement.
(But this doesn't explain how a some the random one in the list.)
Those are all just guesses. Those are some of the most frequent trip ups we see, but we're just guessing. It could be something else entirely. In it's current form, there is no definitive "answer" to the question.

Query for multiple conditions in MySQL

I want to be able to query for multiple statements when I have a table that connects the id's from two other tables.
My three tables
destination:
id_destination, name_destination
keyword:
id_keyword, name_keyword
destination_keyword:
id_keyword, id_destination
Where the last one connects ids from the destination- and the keyword table, in order to associate destination with keywords.
A query to get the destination based on keyword would then look like
SELECT destination.name_destination FROM destination
NATURAL JOIN destination_keyword
NATURAL JOIN keyword
WHERE keyword.name_keyword like _keyword_
Is it possible to query for multiple keywords, let's say I wanted to get the destinations that matches all or some of the keywords in the list sunny, ocean, fishing and order by number of matches. How would I move forward? Should I restructure my tables? I am sort of new to SQL and would very much like some input.
Order your table joins starting with keyword and use a count on the number of time the destination is joined:
select
d.id_destination,
d.name_destination,
count(d.id_destination) as matches
from keyword k
join destination_keyword dk on dk.keyword = k.keyword
join destination d on d.id_destination = dk.id_destination
where name_keyword in ('sunny', 'ocean', 'fishing')
group by 1, 2
order by 3 desc
This query assumes that name_keyword values are single words like "sunny".
Using natural joins is not a good idea, because if the table structures change such that two naturally joined tables get altered to have columns the same name added, suddenly your query will stop working. Also by explicitly declaring the join condition, readers of your code will immediately understand how the tables are jones, and can modify it to add non-key conditions as required.
Requiring that only key columns share the same name is also restrictive, because it requires unnatural column names like "name_keyword" instead of simply "name" - the suffix "_keyword" is redundant and adds no value and exists only because your have to have it because you are using natural joins.
Natural joins save hardly any typing (and often cause more typing over all) and impose limitations on join types and names and are brittle.
They are to be avoided.
You can try something like the following:
SELECT dest.name_destination, count(*) FROM destination dest, destination_keyword dest_key, keyword key
WHERE key.id_keyword = dest_key.id_keyword
AND dest_key.id_destination = dest.id_destination
AND key.name_keyword IN ('sunny', 'ocean', 'fishing')
GROUP BY dest.name_destination
ORDER BY count(*), dest.name_destination
Haven't tested it, but if it is not correct it should show you the way to accomplish it.
You can do multiple LIKE statements:
Column LIKE 'value1' OR Column LIKE 'value2' OR ...
Or you could do a regular expression match:
Column LIKE 'something|somtthing|whatever'
The trick to ordering by number of matches has to do with understanding the GROUP BY clause and the ORDER BY clause. You either want one count for everything, or you want one count per something. So for the first case you just use the COUNT function by itself. In the second case you use the GROUP BY clause to "group" somethings/categories that you want counted. ORDER BY should be pretty straight forward.
I think based on the information you have provided your table structure is fine.
Hope this helps.
DISCLAIMER: My syntax isn't accurate.

How to grab rows which contain a dual id reference

I have a messages table:
messages:
id(int)
send_id(int)
receive_id(int)
And I want to be able to select rows from this only when a->b and b->a exist, e.g.:
id send_id recieve_id
0, 15, 16
1, 16, 15
So that basically one message has been passed to each person. How would I be able to go about selecting just one of those two rows (either send or receive), and all of those for a specific id.
I want to only return results that have this duality.
My code currently uses a nested SELECT and doesn't work at all as needed.
You can achieve the result by taking advantage MySQL's LEAST and GREATEST built-in functions.
SELECT *
FROM messages
WHERE (LEAST(send_id, recieve_id), GREATEST(send_id, recieve_id), id)
IN
(
SELECT LEAST(send_id, recieve_id) as x,
GREATEST(send_id, recieve_id) as y,
MAX(id) msg_ID
FROM messages
GROUP BY x, y
);
SQLFiddle Demo
MySQL Comparison Operator (LEAST/GREATEST)
You have to define an additional synthesized column for this. Different alternatives: permanent as an index (fast), temporary if just for a selection once a month or on-the-fly inside the actual query.
Whatever alternative, that column should contain both ids, ordered in a numerical way and concatenated, maybe by some separation character like -. Now when you make a uniqueness restriction to that column only one of the two candidates can be entered into the result, the second one is rejected because it would violate that uniqueness rule.
The trick is the ordered concatenation instead of a normal combined index that would allow both variants due to the different order of ids.