I repeat myself several times this frequently asked question, but I think it is suitable for a better direction to the problem. I work in the PHPMYADMIN environment. My goal was to download data from the internet that describes the judges, the regions in which they work, the files they made, the number of decisions, their salaries and similar things that I upload to my databases. I created 5 databases, each of which symbolizes a specific region (part of the country). Now my goal is, for example, to find the judge with the highest number of decisions or the average salary of a judge or judges, and things like that. The following display lists the judge as judge_id and his number of decisions (from all five databases)
select id_sudca, rozhodnutia
from sudy_sk.informacie
INNER JOIN sudcovia on informacie.ID_sudca = sudcovia.ID_sudca
union all
select id_sudcu, rozhodnutie
from nabozny1.sudca
INNER JOIN sudca on m_priznania.id_sudcu = sudca.id_sudcu
union all
select sudca_id, pocet_rozhodnuti
from `kosice`.sudca
INNER JOIN sudca on sudca.SUDCA_ID = sudca.SUDCA_ID
union all
select id_sudca, rozhodnutia
from gerboc.sudcovia
INNER JOIN sudcovia on sudcovia.ID_sudca = sudcovia.ID_sudca
union all
select id_sudcu, rozhodnutia
from biz.sudcovia
INNER JOIN sudcovia on sudcovia.ID_SUDCU = sudcovia.ID_SUDCU
order by rozhodnutia desc;
So I get an output from this display and when I want to click on judge_id (find out who is hiding under this ID) it does not throw me into the judges' home table (because I entered inner join, lenze I can't come up with the correct syntax how to connect it there) one database has a table of judges with the primary boy judge_id, which refers to foreign boys, and another has a shelf called NAME, SURNAME and other things, and the other database has NAME and Surname put in the pile, I also tried the display concat but the inner join came to me more suitable.
Related
I have millions of customers and when I use left join and then I sort by a column it takes 4-5sec here is my query:
SELECT c.id AS id, o.description AS office_description, ... , d.type AS document_type, d.number AS document_number
FROM customers c INNER JOIN offices o ON (c.id_office = o.id)
INNER JOIN company cp ON (o.id_company = cp.id)
LEFT JOIN documents d ON (C.id = d.id_customer)
WHERE c.archive = 0
ORDER BY office_description
LIMIT 10
So when I remove documents columns in my SELECT the query is very fast.
Here is the query explain :
I have 1 millions customers and other tables I have only 1 row (for company / office / documents)
I set index on c.archive / o.description and primary keys / foreigns keys ofc. Here is the structures of these tables: http://sqlfiddle.com/#!9/a222f9
So I tried to build my query like this:
SELECT A.*, d.*
FROM (
SELECT c.id AS id, o.description AS office_description, ...
FROM customers c INNER JOIN offices o ON (c.id_office = o.id)
INNER JOIN company cp ON (o.id_company = cp.id)
WHERE c.archive = 0
ORDER BY o.description
LIMIT 10
) A LEFT JOIN documents d ON (A.id = d.id_customer)
And now, wow, it's very fast.
But I don't know if it's the best way to reduce the lag and if I'm doing wrong. I'd like to know if you know a better way to do that.
I hope there is an easier way because it will be complicated to use this query in my Phalcon project
An explanation...
Your faster query can find the 10 rows before looking in documents. So, it needs only 10 probes into that table.
In the original query, the Optimizer was not too smart. It planned to execute the query as if there were no LIMIT. Instead, it decided to optimizer the join to documents by fetching the entire table into the "join buffer" into RAM and built a hash index into it. While this would help some queries like yours, it was a big waste for the mere 10 rows that you needed.
So, your reformulation convinced the Optimizer to do it a better way.
If you had needed only one column from d, there is another way:
SELECT ...,
( SELECT col FROM d WHERE ... ) AS col,
... ((without the LEFT JOIN at all))
As for an "easier" way, especially one that can be reverse-engineered into some 3rd package, I doubt it. (Packages tend to be cruxes for getting started in databases. As you are finding out, you eventually need to learn more than they can teach you.)
A separate inefficiency:
WHERE c.archive = 0
ORDER BY o.office_description
LIMIT ...
If the archived rows had been removed from c, then the optimal execution would be to find the first 10 rows of o. Instead it must do a lengthy JOIN before sorting and limiting. (This is a common problem with "soft deletes". Neither MySQL nor the 3rd party package can optimize it.)
I have a database that contains the following tables I am concerned with.
JobAreas (Base table for which I want to query other tables)
JobSkills (Every Job Skill belongs to a Job Area via foreign key i.e. parent_id)
Jobs (Every job must belong to a Job Area via foreign key i.e. category_id)
UserSkills (This table contains the JobSkill that is related to a Job Area)
I am attaching the table structures.
I am trying to create a SQL query that can give me the number of skills, number of jobs and number of people for various Job Areas. Though calculating Users who offer services in a particular Job Area appears to be tough because it is connected indirectly. I tried to get Number of Skills and Number of Jobs for all Job Areas using the following query.
select
t.id,
t.title,
count(s.parent_id) as skillsCount,
count(m.category_id) as jobCount
from
job_areas t
left join skill_types s ON s.parent_id = t.id
left join job_requests m ON m.category_id = t.id
group by
t.id
But it is not giving the correct data. Can someone guide me in right direction on how to achieve this.
You are joining along different dimensions. The quick-and-dirty way to fix this is to use count(distinct):
select t.id, t.title,
count(distinct s.parent_id) as skillsCount,
count(distinct m.category_id) as jobCount
from job_areas t left join
skill_types s
ON s.parent_id = t.id left join
job_requests m
ON m.category_id = t.id
group by t.id;
This works fine if there are just a handful of skills and categories for each job. If there are many, you want to pre-aggregate before the join or use correlated subqueries.
I'm trying to answer to the following query:
Select the first name and last name of the clients which rent films (that have DVD's) from all the categories, ordering by first name and last name.
Database consists in:
(better view - open in a new tab)
Inventory -> DVD's
Rental -> Rents customers did
Category table:
| category_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(25) | YES | | NULL |
My doubt is in how to assign that a field from a query must contain all ids from another query (categories).
I mean I understand the fact we can natural join inventory with rental and film, and then find an id that fails on a single category, then we know he doesn't contain all... But I can't complete this.
I have this solution (But I can't understand it very well):
SELECT first_name, last_name
FROM customer AS C WHERE NOT EXISTS
(SELECT * FROM category AS K WHERE NOT EXISTS
(SELECT * FROM (film NATURAL JOIN inventory) NATURAL JOIN rental
WHERE C.customer_id = customer_id AND K.category_id = category_id));
Are there any other solutions?
On our projects, we NEVER use NATURAL JOIN. That doesn't work for us, because the PRIMARY KEY is always a surrogate column named id, and the foreign key columns are always tablename_id.
A natural join would match id in one table to id in the other table, and that's not what we want. We also frequently have "housekeeping" columns in the tables that are named the same, such as version column used for optimistic locking pattern.
And even if our naming conventions were different, and the join columns were named the same, there would be a potential for a join in an existing query to change if we added a column to a table that was named the same as a column in another table.
And, reading SQL statement that includes a NATURAL JOIN, we can't see what columns are actually being matched, without running through the table definitions, looking for columns that are named the same. That seems to put an unnecessary burden on the reader of the statement. (A SQL statement is going to be "read" many more times than it's written... the author of the statement saving keystrokes isn't a beneficial tradeoff for ambiguity leading to extra work by future readers.
(I know others have different opinions on this topic. I'm sure that successful software can be written using the NATURAL JOIN pattern. I'm just not smart enough or good enough to work with that. I'll give significant weight to the opinions of DBAs that have years of experience with database modeling, implementing schemas, writing and tuning SQL, supporting operational systems, and dealing with evolving requirements and ongoing maintenance.)
Where was I... oh yes... back to regularly scheduled programming...
The image of the schema is way too small for me to decipher, and I can't seem to copy any text from it. Output from a SHOW CREATE TABLE is much easier to work with.
Did you have a SQL Fiddle setup?
I don't thin the query in the question will actually work. I thought there was a limitation on how far "up" a correlated subquery could reference an outer query.
To me, it looks like this predicate
WHERE C.customer_id = customer_id
^^^^^^^^^^^^^
is too deep. The subquery that's in isn't allowed to reference columns from C, that table is too high up. (Maybe I'm totally wrong about that; maybe it's Oracle or SQL Server or Teradata that has that restriction. Or maybe MySQL used to have that restriction, but a later version has lifted it.)
OTHER APPROACHES
As another approach, we could get each customer and a distinct list of every category that he's rented from.
Then, we could compare that list of "customer rented category" with a complete list of (distinct) category. One fairly easy way to do that would be to collapse each list into a "count" of distinct category, and then compare the counts. If a count for a customer is less than the total count, then we know he's not rented from every category. (There's a few caveats, We need to ensure that the customer "rented from category" list contains only categories in the total category list.)
Another approach would be to take a list of (distinct) customer, and perform a cross join (cartesian product) with every possible category. (WARNING: this could be fairly large set.)
With that set of "customer cross product category", we could then eliminate rows where the customer has rented from that category (probably using an anti-join pattern.)
That would leave us with a set of customers and the categories they haven't rented from.
OP hasn't setup a SQL Fiddle with tables and exemplar data; so, I'm not going to bother doing it either.
I would offer some example SQL statements, but the table definitions from the image are unusable; to demonstrate those statements actually working, I'd need some exemplar data in the tables.
(Again, I don't believe the statement in the question actually works. There's no demonstration that it does work.)
I'd be more inclined to test it myself, if it weren't for the NATURAL JOIN syntax. I'm not smart enough to figure that out, without usable table definitions.
If I worked on that, the first think I would do would be to re-write it to remove the NATURAL keyword, and add actual predicates in an actual ON clause, and qualify all of the column references.
And the query would end up looking something like this:
SELECT c.first_name
, c.last_name
FROM customer c
WHERE NOT EXISTS
( SELECT 1
FROM category k
WHERE NOT EXISTS
( SELECT 1
FROM film f
JOIN inventory i
ON i.film_id = f.film_id
JOIN rental r
ON r.inventory_id = i.inventory_id
WHERE f.category_id = k.category_id
AND r.customer_id = c.customer_id
)
)
(I think that reference to c.customer_id is too deep to be valid.)
EDIT
I stand corrected on my conjecture that the reference to C.customer_id was too many levels "deep". That query doesn't throw an error for me.
But it also doesn't seem to return the resultset that we're expecting, I may have screwed it up somehow. Oh well.
Here's an example of getting the "count of distinct rental category" for each customer (GROUP BY c.customer_id, just in case we have two customers with the same first and last names) and comparing to the count of category.
SELECT c.last_name
, c.first_name
FROM customer c
JOIN rental r
ON r.customer_id = c.customer_id
JOIN inventory i
ON i.inventory_id = r.inventory_id
JOIN film f
ON f.film_id = i.film_id
GROUP
BY c.last_name
, c.first_name
, c.customer_id
HAVING COUNT(DISTINCT f.category_id)
= (SELECT COUNT(DISTINCT a.category_id) FROM category a)
ORDER
BY c.last_name
, c.first_name
, c.customer_id
EDIT
And here's a demonstration of the other approach, generating a cartesian product of all customers and all categories (WARNING: do NOT do this on LARGE sets!), and find out if any of those rows don't have a match.
-- customers who have rented from EVERY category
-- h = cartesian (cross) product of all customers with all categories
-- g = all categories rented by each customer
-- perform outer join, return all rows from h and matching rows from g
-- if a row from h does not have a "matching" row found in g
-- columns from g will be null, test if any rows have null values from g
SELECT h.last_name
, h.first_name
FROM ( SELECT hi.customer_id
, hi.last_name
, hi.first_name
, hj.category_id
FROM customer hi
CROSS
JOIN category hj
) h
LEFT
JOIN ( SELECT c.customer_id
, f.category_id
FROM customer c
JOIN rental r
ON r.customer_id = c.customer_id
JOIN inventory i
ON i.inventory_id = r.inventory_id
JOIN film f
ON f.film_id = i.film_id
GROUP
BY c.customer_id
, f.category_id
) g
ON g.customer_id = h.customer_id
AND g.category_id = h.category_id
GROUP
BY h.last_name
, h.first_name
, h.customer_id
HAVING MIN(g.category_id IS NOT NULL)
ORDER
BY h.last_name
, h.first_name
, h.customer_id
I will take a stab at this, only because I am curious why the answer proposed seems so complex. First, a couple of questions.
So your question is: "Select the first name and last name of the clients which rent films (that have DVD's) from all the categories, ordering by first name and last name."
So, just go through the rental database, joining customer. I am not sure what the category part has anything to do with this, as you are not selecting or displaying any category, so that does not need to be part of the search, it is implied as when they rent a DVD, that DVD has a category.
SELECT C.first_name, C.last_name
FROM customer as C JOIN rental as R
ON (C.customer_id = R.customer_id)
WHERE R.return_date IS NOT NULL;
So, you are looking for movies that are currently rented, and displaying the first and last names of customers with active rentals.
You can also do some UNIQUE to reduce the number of duplicate customers that show up in the list.
Does this help?!
I have an SQL query that needs to perform multiple inner joins, as follows:
SELECT DISTINCT adv.Email, adv.Credit, c.credit_id AS creditId, c.creditName AS creditName, a.Ad_id AS adId, a.adName
FROM placementlist pl
INNER JOIN
(SELECT Ad_id, List_id FROM placements) AS p
ON pl.List_id = p.List_id
INNER JOIN
(SELECT Ad_id, Name AS adName, credit_id FROM ad) AS a
ON ...
(few more inner joins)
My question is the following: How can I optimize this query? I was under the impression that, even though the way I currently query the database creates small temporary tables (inner SELECT statements), it would still be advantageous to performing an inner join on the unaltered tables as they could have about 10,000 - 100,000 entries (not millions). However, I was told that this is not the best way to go about it but did not have the opportunity to ask what the recommended approach would be.
What would be the best approach here?
To use derived tables such as
INNER JOIN (SELECT Ad_id, List_id FROM placements) AS p
is not recommendable. Let the dbms find out by itself what values it needs from
INNER JOIN placements AS p
instead of telling it (again) by kinda forcing it to create a view on the table with the two values only. (And using FROM tablename is even much more readable.)
With SQL you mainly say what you want to see, not how this is going to be achieved. (Well, of course this is just a rule of thumb.) So if no other columns except Ad_id and List_id are used from table placements, the dbms will find its best way to handle this. Don't try to make it use your way.
The same is true of the IN clause, by the way, where you often see WHERE col IN (SELECT DISTINCT colx FROM ...) instead of simply WHERE col IN (SELECT colx FROM ...). This does exactly the same, but with DISTINCT you tell the dbms "make your subquery's rows distinct before looking for col". But why would you want to force it to do so? Why not have it use just the method the dbms finds most appropriate?
Back to derived tables: Use them when they really do something, especially aggregations, or when they make your query more readable.
Moreover,
SELECT DISTINCT adv.Email, adv.Credit, ...
doesn't look to good either. Yes, sometimes you need SELECT DISTINCT, but usually you wouldn't. Most often it is just a sign that you haven't thought your query through.
An example: you want to select clients that bought product X. In SQL you would say: where a purchase of X EXISTS for the client. Or: where the client is IN the set of the X purchasers.
select * from clients c where exists
(select * from purchases p where p.clientid = c.clientid and product = 'X');
Or
select * from clients where clientid in
(select clientid from purchases where product = 'X');
You don't say: Give me all combinations of clients and X purchases and then boil that down so I just get each client once.
select distinct c.*
from clients c
join purchases p on p.clientid = c.clientid and product = 'X';
Yes, it is very easy to just join all tables needed and then just list the columns to select and then just put DISTINCT in front. But it makes the query kind of blurry, because you don't write the query as you would word the task. And it can make things difficult when it comes to aggregations. The following query is wrong, because you multiply money earned with the number of money-spent records and vice versa.
select
sum(money_spent.value),
sum(money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
And the following may look correct, but is still incorrect (it only works when the values happen to be unique):
select
sum(distinct money_spent.value),
sum(distinct money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
Again: You would not say: "I want to combine each purchase with each earning and then ...". You would say: "I want the sum of money spent and the sum of money earned per user". So you are not dealing with single purchases or earnings, but with their sums. As in
select
sum(select value from money_spent where money_spent.userid = user.userid),
sum(select value from money_earned where money_earned.userid = user.userid)
from user;
Or:
select
spent.total,
earned.total
from user
join (select userid, sum(value) as total from money_spent group by userid) spent
on spent.userid = user.userid
join (select userid, sum(value) as total from money_earned group by userid) earned
on earned.userid = user.userid;
So you see, this is where derived tables come into play.
I have data related as follows:
A table of Houses
A table of Boxes (with an FK back into Houses)
A table of Things_in_boxes (with an FK back to Boxes)
A table of Owners (with an FK back into Houses)
In a nutshell, a House has many Boxes, and each Box has many Things in it. In addition, each House has many Owners.
If I know two Owners (say Peter and Paul), how can I list all the Things that are in the Boxes that are in the Houses owned by these guys?
Also, I'd like to master this SQL stuff. Can anyone recommend a good book/resource? (I'm using MySQL).
Thanks!
Peter and Paul are gay couple ?
Then you should go for many-to-many relationship instead of having ownerID inside of Houses Table
ie. Houses2Owners with two columns ownerID and houseID
then the query would be
select item from houses as h
left join Boxes as b on h.houseID=b.houseID
left join Things as t on b.boxID=t.boxID
left join Houses2Owners as h2o on h.houseID=h2o.houseID
left join Owners as o on h2o.ownerID=o.ownerID
Main question you should ask yourself while designing that would be whether each object will appear once ie. if there are two similar boxes with similar things in them or ie. two boxes with ski masks in them.
Then you should create tables with no relationship to parent object and also to create a table that connects two tables. This way you will avoid ski mask to appear twice for two boxes which contain that mask.
SELECT
Things_in_boxes.*
FROM
Houses
JOIN Boxes ON Houses.HouseID = Boxes.House
JOIN Things_in_boxes ON Boxes.BoxID = Things_in_boxes.Box
WHERE
Houses.Owner = 'Peter' OR Houses.Owner = 'Paul'
As for resources to learn from... I can't really suggest anything specific. I learnt how to use (My)SQL gradually and from a number of sources, and can't single any of them out as having been of primary importance. w3schools has OK coverage of the very basic stuff, and MySQL's own documentation (available on the web, google for it) does an OK job and is a reasonable reference for when you want to know the nitty gritty of some topic or other.
EDIT: The above answer is wrong. I had missed the stipulation that a House can have multiple Owners.
New approach: I'll assume that there is a cross-referencing table, HouseOwners, with House and Owner as foreign keys.
My first thought was this:
SELECT
Things_in_boxes.*
FROM
Houses
JOIN Boxes ON Houses.HouseID = Boxes.House
JOIN Things_in_boxes ON Boxes.BoxID = Things_in_boxes.Box
JOIN HouseOwners ON Houses.HouseID = HouseOwners.House
WHERE
HouseOwners.Owner = 'Peter' OR HouseOwners.Owner = 'Paul'
However, this is not quite right. If both Peter and Paul are Owners of a given house, then the things in the boxes in that house would show up twice. I think a subquery is needed.
SELECT
Things_in_boxes.*
FROM
Houses
JOIN Boxes ON Houses.HouseID = Boxes.House
JOIN Things_in_boxes ON Boxes.BoxID = Things_in_boxes.Box
WHERE
Houses.HouseID IN (
SELECT DISTINCT House
FROM HouseOwners
WHERE Owner = 'Peter' OR Owner = 'Paul'
) AS MySubquery
SELECT t.name
FROM Houses h
INNER JOIN Boxes b ON b.houseId = h.id
INNER JOIN Things t ON t.boxId = b.id
INNER JOIN Owners o ON o.houseId = h.id
WHERE o.name = 'Peter' OR o.name = 'Paul'
By using inner joins you can combine these 4 tables with all the linked information. There is also an other way using inner select queries:
SELECT t.name
FROM Houses h
INNER JOIN Boxes b ON b.houseId = h.id
INNER JOIN Things t ON t.boxId = b.id
INNER JOIN Owners o ON o.houseId = h.id
WHERE h.id IN (SELECT o.housId
FROM Owners o
WHERE o.name = 'Peter' OR o.name = 'Paul')
This query works differently (by first finding the two house ID's of Peter and Paul and then performing the join), but it has the same effect.
Hopefully these examples will help you understand SQL :)
This isn't tested and written on the spot:
SELECT *
FROM
`things_in_boxes` AS a
LEFT JOIN `houses` AS b
on ( a.`house_id` = b.`house_id` )
LEFT JOIN `owners` AS c
on ( b.`house_id` = c.`house_id` )
WHERE c.`owner_id` IN( 0, 1 )
That is the general structure I would use, where the "0, 1" in the last IN statement are the owner ids for Peter and Paul. If you wanted to do it by name, you could simply make it something like
c.`name` IN( 'Peter', 'Paul' )
As far as books, I can't really tell you, I've learned through tutorials and references.
Here's one approach:
SELECT * FROM Things_in_boxes t
WHERE box_id IN (
SELECT b.id
FROM Boxes b
INNER JOIN Owners o
ON (o.house_id = b.house_id)
WHERE o.name LIKE 'Peter'
OR o.name LIKE 'Paul'
)
Note that you don't need to join on the House table, since both the Boxes and Owners have a house id.
Without knowing the full structure, I will assume a structure to build a query, step by step
Get the IDs of the houses belonging to the owners
select id from House where owner in ('peter', 'paul')
Get the boxes in those houses
select boxid from boxes where homeid in (select id from House where owner in ('peter', 'paul'))
Get the things in those boxes
select * from things where boxid in (select boxid from boxes where homeid in (select id from House where owner in ('peter', 'paul')))
This should get you what you want, but is very inefficient.
In the above method, The final query in step 3 gets the ids in each step, and stores them in temporary storage while it consumes them. This is a very slow operation in most DBMS.
The better alternative is a join. Combine all the tables and select the desired data.
select * from things join boxes on things.boxid =boxes.boxid join houses on boxes.houseid=house.id join owners on houses.owner=owner.ownerid where owner.name in ('peter',''paul)