Efficient way to do unions and intersections in mySQL - mysql

I have a mySQL table with columns: name and label. If a person, "Bob" has the labels "cool","funny", and "childish", my table would have the corresponding rows: (Bob, cool), (Bob, funny), and (Bob, childish).
Is there an efficient way to select people based on labels with a boolean query? For example, in pseudo-SQL: SELECT name WHERE person IS (COOL OR NOT FUNNY) AND NOT CHILDISH.
I think I could hack something together using UNION, JOIN, maybe some sub-queries, but I was wondering if there was an efficient way to do this.
EDIT:
As of now, I am planning to distribute AND, ie ((COOL OR NOT FUNNY) AND NOT CHILDISH) => (COOL AND NOT CHILDISH) OR (NOT FUNNY AND NOT CHILDISH). And then I can determine each of the parts that are OR'd together with something like:
SELECT DISTINCT a.name
FROM `tags` AS a
JOIN `tags` AS b ON (a.label='cool' AND a.name=b.name AND b.name NOT IN (
SELECT name FROM `tags` WHERE label='funny'))
JOIN `tags` AS c ON (a.name=c.name AND c.name='childish')
# for "COOL AND NOT FUNNY AND CHILDISH"
And then use UNION to join them together.

For the negative checks, the most efficient way would be to use MINUS as follows:
SELECT NAME
FROM NAME_LABEL
WHERE LABEL IN ('COOL') -- use IN for easy matching of multiple labels
UNION
SELECT NAME
FROM NAME_LABEL NL
WHERE NOT EXISTS (SELECT * FROM NAME_LABEL WHERE NAME = NL.NAME AND LABEL IN ('FUNNY'))
MINUS
SELECT NAME
FROM NAME_LABEL
WHERE LABEL IN ('CHILDISH');
The MINUS keyword selects distinct rows from the first query and don't appear in the second query.
Performance would be better with an index on LABEL:
CREATE INDEX NAME_LABEL_NAME ON NAME_LABEL(NAME);
Unfortunately, the "NOT FUNNY" requires an EXISTS subquery. If you use a join, the MySQL query optimizer turns it into a subselect anyway :(

Related

mysql Create named table from values in select

I have a table, System, with a bunch of fields including System.serial.
I have a list of serial numbers that I want to get the status of.
Simple enough:
Select * from System where System.serial in ('s1','s2', 'sn');
However the list of serial numbers also has serials NOT IN the System table.
Obviously they are not in the results.
I want the missing serials to show in the results also but with no data.
The best way I can think of doing this is to make a temporary table with one column, serial, and then left join System on it.
How can I do this without creating a temporary table?
Something like:
Select listOfSerials.serial, System.*
from (Select ('s1','s2', 'sn') as serial ) as ListOfSerials
left join System on System.serial = ListOfSerials.serial;
Thanks,
Ryan
You're on the right track with your solution of creating a virtual table with which to do LEFT JOIN against your real data.
You can create a derived table as a series of UNIONed SELECT statements that select literal values with no table reference.
SELECT listOfSerials.serial, System.*
FROM (
SELECT 's1' AS serial
UNION SELECT 's2'
UNION SELECT 'sn'
) AS ListOfSerials
LEFT JOIN System ON System.serial = ListOfSerials.serial;
You only need to define a column alias in the first SELECT in the UNION. The rest are required to use that column alias.
Creating a reference table to store the list of serials is probably your best option. That would allow you to write a query like:
SELECT r.serial reference_serial, s.serial system_serial
FROM reference_table r
LEFT JOIN system_table s ON s.serial = r.serial
With the LEFT JOIN, serials declared in the reference table but unavailable in the system table will have second column set to NULL.
A quick and dirty work around is to use UNIONed subqueries to emulate the reference table:
SELECT r.serial reference_serial, s.serial system_serial
FROM (
SELECT 'serial1' AS serial
UNION ALL SELECT 'serial2'
UNION ALL SELECT 'serial2'
...
) r
LEFT JOIN system_table s ON s.serial = r.serial

Identifying values that are not existent in other table

I've got two tables that have one to many associations on a pmid. So if one table has an pmid, the second table should have multiple rows with the same pmid. However, something went sideways and I'm missing my latest batch of pmids in the second table. These queries, should help illustrate the problem, but I can't figure out how to get the ids from the first table that are actually missing in the second table.
select count(*) from abstract_mesh am; #2167101
select count(*) from abstract_mesh am
join abstracts a on am.pmid = a.pmid; #2133848
select 2167101 - 2133848; #33253
select count(*) from abstract_mesh where pmid is NULL; #33253
So as you can see there are 33,253 rows in abstract_mesh that have no pmids. I simply want to identify which pmids I should be interested in from the abstracts table.
You can use NOT EXITS to filter out the records, e.g.
select *
from table1 t1
where not exists
select * from table2 t2 where t1.pmid = t2.pmid;
You need and anti-join. SQL lacks an explicit anti-join operator. Standard SQL has EXCEPT (relational minus) by mySQL lacks this. Here I'm using NOT IN <table expression> to simulate anti-join (though not 100% sure I have the tables the right way round):
SELECT DISTINCT pmid
FROM abstract_mesh
WHERE pmid NOT IN ( SELECT pmid FROM abstracts );

Select from a table the ones that don't have a relationship with another table

The specific problem is listing the names of the teachers that never graded.
I have 'teachers' table with the columns 'Name' and 'ID'.
And 'grades' table with the column 'IDTeacher' and 'Grade'.
Don't get why this doesn't work:
Select Name from teachers where not exists(Select * from grades, teachers)
You can just join it with the grades table and use the ones where the join returns "null" for the right side:
SELECT
name
from
teachers t
LEFT JOIN
grades g
on
t.teacher = g.teacher
WHERE
ISNULL(g.teacher)
edit: Thought about a right join instead, but no, the right join might not work, if the teacher has no entry in the grades table. (Then you would miss him completely, even if he is in the teacher table)
You could also use WHERE IN for this:
SELECT
name
FROM
teachers
WHERE
name
NOT IN (SELECT name from grades)
BUT the MySQL Optimizer will rewrite this to exactly the correlated subquery #Gordon Linoff has written. Using WHERE NOT IN is just easier to read imho.
Your query does work, it just doesn't do what you think it should. The subquery creates a cartesian product between the two tables. If both tables have rows, then the cartesian product has rows and the where clause will always be true.
You can take this approach, but you need a correlated subquery:
Select Name
from teachers t
where not exists (Select 1 from grades g where g.idteacherid = t.id);
Note that this query only has one table in the subquery.
There are other ways to write this query, but this seems to be the approach you are heading in. And, not exists is a very reasonable approach.

Mysql Select 1 column AS

I am wanting to select 1 column from my select statement as but then leave the rest as is so:
SELECT tbl_user.reference AS "reference", * FROM tbl_user JOIN tbl_details ON.....
Is this possible?
Yes. You can use double quotes like that to create a column alias. You can SELECT a column twice (or more) in your SELECT list.
Try something like this, where you can give each "reference" column its own alias:
SELECT u.reference AS UserReference,
d.reference as DetailsReference,
u.id, /*etc etc*/
FROM tbl_user AS U
JOIN tblDetails AS D ON ....
You mention in the comments that you want all columns from each table, while being able to distinguish between the reference columns(likely named the same in both tables). Suggest NOT using SELECT *, as it's an anti-pattern. It's most beneficial to specify your column list in your SELECT statement, and do your query engine a favour of not having to look up the list of columns on each table.
If you just want one column, this will work:
SELECT SELECT tbl_user.username AS "username" FROM tbl_user JOIN tbl_details on tbl_user.key LIKE tbl_details.key
What do you mean by "but then leave the rest as is "?

Selecting multiple columns/fields in MySQL subquery

Basically, there is an attribute table and translation table - many translations for one attribute.
I need to select id and value from translation for each attribute in a specified language, even if there is no translation record in that language. Either I am missing some join technique or join (without involving language table) is not working here since the following do not return attributes with non-existing translations in the specified language.
select a.attribute, at.id, at.translation
from attribute a left join attributeTranslation at on a.id=at.attribute
where al.language=1;
So I am using subqueries like this, problem here is making two subqueries to the same table with the same parameters (feels like performance drain unless MySQL groups those, which I doubt since it makes you do many similar subqueries)
select attribute,
(select id from attributeTranslation where attribute=a.id and language=1),
(select translation from attributeTranslation where attribute=a.id and language=1),
from attribute a;
I would like to be able to get id and translation from one query, so I concat columns and get the id from string later, which is at least making single subquery but still not looking right.
select attribute,
(select concat(id,';',title)
from offerAttribute_language
where offerAttribute=a.id and _language=1
)
from offerAttribute a
So the question part.
Is there a way to get multiple columns from a single subquery or should I use two subqueries (MySQL is smart enough to group them?) or is joining the following way to go:
[[attribute to language] to translation] (joining 3 tables seems like a worse performance than subquery).
Yes, you can do this. The knack you need is the concept that there are two ways of getting tables out of the table server. One way is ..
FROM TABLE A
The other way is
FROM (SELECT col as name1, col2 as name2 FROM ...) B
Notice that the select clause and the parentheses around it are a table, a virtual table.
So, using your second code example (I am guessing at the columns you are hoping to retrieve here):
SELECT a.attr, b.id, b.trans, b.lang
FROM attribute a
JOIN (
SELECT at.id AS id, at.translation AS trans, at.language AS lang, a.attribute
FROM attributeTranslation at
) b ON (a.id = b.attribute AND b.lang = 1)
Notice that your real table attribute is the first table in this join, and that this virtual table I've called b is the second table.
This technique comes in especially handy when the virtual table is a summary table of some kind. e.g.
SELECT a.attr, b.id, b.trans, b.lang, c.langcount
FROM attribute a
JOIN (
SELECT at.id AS id, at.translation AS trans, at.language AS lang, at.attribute
FROM attributeTranslation at
) b ON (a.id = b.attribute AND b.lang = 1)
JOIN (
SELECT count(*) AS langcount, at.attribute
FROM attributeTranslation at
GROUP BY at.attribute
) c ON (a.id = c.attribute)
See how that goes? You've generated a virtual table c containing two columns, joined it to the other two, used one of the columns for the ON clause, and returned the other as a column in your result set.