First of all, let me explain my data structure.
I have a model called profile, witch is a profile for a service provider. The service provider may choose from two diferent types of "travels" (both and none of them also). The first travel mode is "I can go to the client". For that, the service provider needs to input all cities he can travel for (many-to-many). The second "travel" mode is "My client can come to me". Those, he chooses the city he is placed from a list (one-to-many).
For the "I can go to the client" travel mode, I have a locals_profile table with a profile_id and a local_id and a locals table with the city name.
For the "My client can come to me", I have a locations table with profile_id and city_id and a cities table with the name of the city.
Before you ask, because of other things in the project modeling, I couldn't use the same table for the city name in both cases (if I could, the performance would be increased?).
Also, the profile belongs to many sub_categories, witch bring us another table called profiles_sub_categories with sub_category_id and profile_id field.
What I want is, given a sub_category, show how many items I have in each city. Eg:
For the "designers" sub category:
New york (100)
San Fracisco (50)
Miami (10)
... (max 10 cities)
I've acomplished what I wanted with the following query:
select q1.city_name, if(q1.city_count is null, 0, q1.city_count) + if(q2.city_count is null, 0, q2.city_count) city_count
from
(select l.name as city_name, count(*) as city_count from locals_profiles lp
inner join locals l on l.id = lp.local_id
inner join profiles p on p.id = lp.profile_id
inner join profiles_sub_categories ps on ps.profile_id = p.id
where ps.sub_category_id = 97 and l.level = 2
group by 1) q1
left join
(select c.name as city_name, count(*) as city_count from cities c
inner join locations lo on lo.city_id = c.id
inner join profiles p on lo.profile_id = p.id
inner join profiles_sub_categories ps on ps.profile_id = p.id
where ps.sub_category_id = 97
group by 1) q2
on q1.city_name = q2.city_name
order by 2 DESC
limit 10;
But the query is taking too long to be executed. Since it's a web application, I need it to be almost instantly. Does anyone knows a better way to do what I'm trying?
I've found mysql doesn't do a great job of optimizing queries. The easy solution is writing two queries and merging them in the application layer. Suprisingly, it will probably be faster...
Alternatively, you could try eliminating the views (q1,q2) and aliasing the same tables multiple times instead. Might be tough to get right....
Related
I repeat myself several times this frequently asked question, but I think it is suitable for a better direction to the problem. I work in the PHPMYADMIN environment. My goal was to download data from the internet that describes the judges, the regions in which they work, the files they made, the number of decisions, their salaries and similar things that I upload to my databases. I created 5 databases, each of which symbolizes a specific region (part of the country). Now my goal is, for example, to find the judge with the highest number of decisions or the average salary of a judge or judges, and things like that. The following display lists the judge as judge_id and his number of decisions (from all five databases)
select id_sudca, rozhodnutia
from sudy_sk.informacie
INNER JOIN sudcovia on informacie.ID_sudca = sudcovia.ID_sudca
union all
select id_sudcu, rozhodnutie
from nabozny1.sudca
INNER JOIN sudca on m_priznania.id_sudcu = sudca.id_sudcu
union all
select sudca_id, pocet_rozhodnuti
from `kosice`.sudca
INNER JOIN sudca on sudca.SUDCA_ID = sudca.SUDCA_ID
union all
select id_sudca, rozhodnutia
from gerboc.sudcovia
INNER JOIN sudcovia on sudcovia.ID_sudca = sudcovia.ID_sudca
union all
select id_sudcu, rozhodnutia
from biz.sudcovia
INNER JOIN sudcovia on sudcovia.ID_SUDCU = sudcovia.ID_SUDCU
order by rozhodnutia desc;
So I get an output from this display and when I want to click on judge_id (find out who is hiding under this ID) it does not throw me into the judges' home table (because I entered inner join, lenze I can't come up with the correct syntax how to connect it there) one database has a table of judges with the primary boy judge_id, which refers to foreign boys, and another has a shelf called NAME, SURNAME and other things, and the other database has NAME and Surname put in the pile, I also tried the display concat but the inner join came to me more suitable.
I currently select a single row (a post):
SELECT s.id AS id,s.date,s.title,s.views,s.image,s.width,s.description,u.id AS userId,u.username,u.display_name,u.avatar,
(select count(*) from comments where item_id = s.id and type = 1) as numComments,
(select count(*) from likes where item_id = s.id and type = 1) as numLikes,
(select avg(value) from ratings where showcase_id = s.id) as average,
(select count(*) from ratings where showcase_id = s.id) as total
FROM showcase AS s
INNER JOIN users AS u ON s.user_id = u.id
WHERE s.id = :id
LIMIT 5
Then get comments for that post in a separate query:
SELECT c.id as c_id,c.text,c.date,u.id as u_id,u.username,u.display_name,u.avatar
FROM comments as c
INNER JOIN users as u ON c.user_id = u.id
WHERE item_id = :item_id AND type = :type
:id and :item_id are the same. However, the comments return multiple rows whereas the first query returns one row - is there a way to join the comments to the first query or is the current way fine?
It really depends on your application.
If we are talking about a few records returned from a small or medium table, and if the query is executed just a few times a day, then it wouldn't matter much if:
you work with two record sets (two different queries are executed
and then their results are put together);
you join the two queries, copying the post information for each record from the comments query;
you build a XML with the comments and join it to the record returned in the first query (the post record).
Another factor to take in consideration is whether the post and it's comments are displayed at the same time. If this is NOT the case and the comments are not visible at first and displayed only after some action like the click of a button, then you should chose the 1st option above, for performance reasons.
But if both the post information it's comments must be displayed at the same time, then you should chose one of the 3 options above. Which one is more of a personal favorite in modeling your application data structures and it's database access layer.
Now, if the volume of data may get huge, then you should dig a little deepen and run some simulations to find the query(ies) that give you the optimal performance.
I'm trying to answer to the following query:
Select the first name and last name of the clients which rent films (that have DVD's) from all the categories, ordering by first name and last name.
Database consists in:
(better view - open in a new tab)
Inventory -> DVD's
Rental -> Rents customers did
Category table:
| category_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(25) | YES | | NULL |
My doubt is in how to assign that a field from a query must contain all ids from another query (categories).
I mean I understand the fact we can natural join inventory with rental and film, and then find an id that fails on a single category, then we know he doesn't contain all... But I can't complete this.
I have this solution (But I can't understand it very well):
SELECT first_name, last_name
FROM customer AS C WHERE NOT EXISTS
(SELECT * FROM category AS K WHERE NOT EXISTS
(SELECT * FROM (film NATURAL JOIN inventory) NATURAL JOIN rental
WHERE C.customer_id = customer_id AND K.category_id = category_id));
Are there any other solutions?
On our projects, we NEVER use NATURAL JOIN. That doesn't work for us, because the PRIMARY KEY is always a surrogate column named id, and the foreign key columns are always tablename_id.
A natural join would match id in one table to id in the other table, and that's not what we want. We also frequently have "housekeeping" columns in the tables that are named the same, such as version column used for optimistic locking pattern.
And even if our naming conventions were different, and the join columns were named the same, there would be a potential for a join in an existing query to change if we added a column to a table that was named the same as a column in another table.
And, reading SQL statement that includes a NATURAL JOIN, we can't see what columns are actually being matched, without running through the table definitions, looking for columns that are named the same. That seems to put an unnecessary burden on the reader of the statement. (A SQL statement is going to be "read" many more times than it's written... the author of the statement saving keystrokes isn't a beneficial tradeoff for ambiguity leading to extra work by future readers.
(I know others have different opinions on this topic. I'm sure that successful software can be written using the NATURAL JOIN pattern. I'm just not smart enough or good enough to work with that. I'll give significant weight to the opinions of DBAs that have years of experience with database modeling, implementing schemas, writing and tuning SQL, supporting operational systems, and dealing with evolving requirements and ongoing maintenance.)
Where was I... oh yes... back to regularly scheduled programming...
The image of the schema is way too small for me to decipher, and I can't seem to copy any text from it. Output from a SHOW CREATE TABLE is much easier to work with.
Did you have a SQL Fiddle setup?
I don't thin the query in the question will actually work. I thought there was a limitation on how far "up" a correlated subquery could reference an outer query.
To me, it looks like this predicate
WHERE C.customer_id = customer_id
^^^^^^^^^^^^^
is too deep. The subquery that's in isn't allowed to reference columns from C, that table is too high up. (Maybe I'm totally wrong about that; maybe it's Oracle or SQL Server or Teradata that has that restriction. Or maybe MySQL used to have that restriction, but a later version has lifted it.)
OTHER APPROACHES
As another approach, we could get each customer and a distinct list of every category that he's rented from.
Then, we could compare that list of "customer rented category" with a complete list of (distinct) category. One fairly easy way to do that would be to collapse each list into a "count" of distinct category, and then compare the counts. If a count for a customer is less than the total count, then we know he's not rented from every category. (There's a few caveats, We need to ensure that the customer "rented from category" list contains only categories in the total category list.)
Another approach would be to take a list of (distinct) customer, and perform a cross join (cartesian product) with every possible category. (WARNING: this could be fairly large set.)
With that set of "customer cross product category", we could then eliminate rows where the customer has rented from that category (probably using an anti-join pattern.)
That would leave us with a set of customers and the categories they haven't rented from.
OP hasn't setup a SQL Fiddle with tables and exemplar data; so, I'm not going to bother doing it either.
I would offer some example SQL statements, but the table definitions from the image are unusable; to demonstrate those statements actually working, I'd need some exemplar data in the tables.
(Again, I don't believe the statement in the question actually works. There's no demonstration that it does work.)
I'd be more inclined to test it myself, if it weren't for the NATURAL JOIN syntax. I'm not smart enough to figure that out, without usable table definitions.
If I worked on that, the first think I would do would be to re-write it to remove the NATURAL keyword, and add actual predicates in an actual ON clause, and qualify all of the column references.
And the query would end up looking something like this:
SELECT c.first_name
, c.last_name
FROM customer c
WHERE NOT EXISTS
( SELECT 1
FROM category k
WHERE NOT EXISTS
( SELECT 1
FROM film f
JOIN inventory i
ON i.film_id = f.film_id
JOIN rental r
ON r.inventory_id = i.inventory_id
WHERE f.category_id = k.category_id
AND r.customer_id = c.customer_id
)
)
(I think that reference to c.customer_id is too deep to be valid.)
EDIT
I stand corrected on my conjecture that the reference to C.customer_id was too many levels "deep". That query doesn't throw an error for me.
But it also doesn't seem to return the resultset that we're expecting, I may have screwed it up somehow. Oh well.
Here's an example of getting the "count of distinct rental category" for each customer (GROUP BY c.customer_id, just in case we have two customers with the same first and last names) and comparing to the count of category.
SELECT c.last_name
, c.first_name
FROM customer c
JOIN rental r
ON r.customer_id = c.customer_id
JOIN inventory i
ON i.inventory_id = r.inventory_id
JOIN film f
ON f.film_id = i.film_id
GROUP
BY c.last_name
, c.first_name
, c.customer_id
HAVING COUNT(DISTINCT f.category_id)
= (SELECT COUNT(DISTINCT a.category_id) FROM category a)
ORDER
BY c.last_name
, c.first_name
, c.customer_id
EDIT
And here's a demonstration of the other approach, generating a cartesian product of all customers and all categories (WARNING: do NOT do this on LARGE sets!), and find out if any of those rows don't have a match.
-- customers who have rented from EVERY category
-- h = cartesian (cross) product of all customers with all categories
-- g = all categories rented by each customer
-- perform outer join, return all rows from h and matching rows from g
-- if a row from h does not have a "matching" row found in g
-- columns from g will be null, test if any rows have null values from g
SELECT h.last_name
, h.first_name
FROM ( SELECT hi.customer_id
, hi.last_name
, hi.first_name
, hj.category_id
FROM customer hi
CROSS
JOIN category hj
) h
LEFT
JOIN ( SELECT c.customer_id
, f.category_id
FROM customer c
JOIN rental r
ON r.customer_id = c.customer_id
JOIN inventory i
ON i.inventory_id = r.inventory_id
JOIN film f
ON f.film_id = i.film_id
GROUP
BY c.customer_id
, f.category_id
) g
ON g.customer_id = h.customer_id
AND g.category_id = h.category_id
GROUP
BY h.last_name
, h.first_name
, h.customer_id
HAVING MIN(g.category_id IS NOT NULL)
ORDER
BY h.last_name
, h.first_name
, h.customer_id
I will take a stab at this, only because I am curious why the answer proposed seems so complex. First, a couple of questions.
So your question is: "Select the first name and last name of the clients which rent films (that have DVD's) from all the categories, ordering by first name and last name."
So, just go through the rental database, joining customer. I am not sure what the category part has anything to do with this, as you are not selecting or displaying any category, so that does not need to be part of the search, it is implied as when they rent a DVD, that DVD has a category.
SELECT C.first_name, C.last_name
FROM customer as C JOIN rental as R
ON (C.customer_id = R.customer_id)
WHERE R.return_date IS NOT NULL;
So, you are looking for movies that are currently rented, and displaying the first and last names of customers with active rentals.
You can also do some UNIQUE to reduce the number of duplicate customers that show up in the list.
Does this help?!
So I have the following SQL schema (http://sqlfiddle.com/#!2/b366c) and what I'm trying to achieve is the % of companies that I can consider activated.
In the schema, you can see there are the following tables
organisations (otherwise known as companies)
competitions
competitionmembers
activity_entries
What I would like to do is, figure out the % of companies in the organisations (i.e. total users) that create a competition (competitions table), invite at least another person (competitionmembers table) and have completed at least one activity (activity_entries table)
This may be too complex, but what I'd like to do is also create a funnel - to visualise where most companies drop off. For this, I understand I should create a seperate query for each of the steps and then just stack them to see the flow.
Using the sample data provided here (http://sqlfiddle.com/#!2/b366c) you can see that:
1. 4 companies have registered
2. 2 companies have created competitions
3. 1 company has a competition with at least 2 participants (not just the admin)
4. 1 company has registered at least one activity
So 25% of companies became "activated".
I would really appreciate some help in building these queries and visualising percentages!
Maybe not the most efficient way, but the intermediate results ought to be small enough for this not to matter overmuch.
You can run the inner queries on their own to look at the different results:
SELECT COUNT(oid) AS organizations,
SUM(IF(competitions > 0, 1, 0)) AS competing,
SUM(IF(activations > 0, 1, 0)) AS activated,
100.0*SUM(IF(activations > 0, 1, 0))/COUNT(oid) AS actpercent
FROM (
SELECT oid,
SUM(IF(cid IS NULL,0,1)) AS competitions,
SUM(IF(aid IS NULL,0,1)) AS activations
FROM (
SELECT
o.organisationId AS oid,
c.competitionId AS cid,
a.id AS aid
FROM organisations AS o
LEFT JOIN competitions c USING (organisationId)
LEFT JOIN activity_entries AS a USING (competitionId)
) AS situation GROUP BY oid
) AS summary;
First we get the situation list of all organizations, competitions and activities; here you may add a WHERE condition to filter organizations of interest, removed competitions and so on.
From this we get a summary of organizations with the number of competitions and activations for each. Each competition can only count for one if it's activated (i.e., if you get three competitions, one with three activities and two with zero, you will retrieve three as the number of competitions, one as the number of activations).
Then we just get the total count of organizations, and calculate the number of activations as a percentage.
The output of the above would be,
ORGANIZATIONS COMPETING ACTIVATED ACTPERCENT
4 2 1 25
Addition
lserni would it be possible to add one more layer to your query, which
is the "inviting" aspect. i.e. if there are more than 2 users in the
competitionmembers table for a competition?
In this case for each competition we need to know how many members there are in another table. So we have to act on the query where the competitionId is available, and we modify situation:
SELECT
o.organisationId AS oid,
c.competitionId AS cid,
a.id AS aid
FROM organisations AS o
LEFT JOIN competitions c USING (organisationId)
LEFT JOIN activity_entries AS a USING (competitionId)
We just add the necessary GROUP BY existing-columns and the new aggregate field, and of course the necessary LEFT JOIN:
SELECT
o.organisationId AS oid,
c.competitionId AS cid,
a.id AS aid,
COUNT(m.id) AS members
FROM organisations AS o
LEFT JOIN competitions c USING (organisationId)
LEFT JOIN activity_entries AS a USING (competitionId)
LEFT JOIN competitionmembers AS m ON (c.competitionId = m.competitionid)
GROUP BY oid, cid, aid;
(which I think illustrates one of the advantages of nested "serialized" queries - they're easier to maintain. That at least is my opinion. Maybe the truth it's just that I can't wrap myself around the more complicated, all-in-one queries...).
Now that we have members of competition, we look to the query immediately external to the one above:
SELECT oid,
SUM(IF(cid IS NULL,0,1)) AS competitions,
SUM(IF(aid IS NULL,0,1)) AS activations
FROM v_situation GROUP BY oid
By the way: you can simplify the writing of these queries by offloading them to VIEWs. CREATE VIEW v_situation AS SELECT o.organisationId AS oid, ... GROUP BY oid, cid, aid; and you have a virtual table v_situation that you can use wherever you would a table).
...and rewrite it adding the number of competitions with one member and those with more:
SELECT oid,
SUM(IF(cid IS NULL,0,1)) AS competitions,
SUM(IF(aid IS NULL,0,1)) AS activations,
SUM(IF(members > 1, 1, 0)) AS withmany,
SUM(IF(members = 1, 1, 0)) AS withone
FROM ( ... ) AS situation
GROUP BY oid;
Then you just need to decide what to do with that information. You can pass it through and re-select the withone field in the parent query, or you can calculate its percentage. Only in this case remember that the number of competitions may be zero, so you need to arm against the case when
activations_with_many_members / activations
has a zero at the denominator, using IF to change the formula to 0.0 if no activations are present:
IF(activations > 0, <percent formula>, 0.0 ) AS percent_with_many
Also, if you only wanted members wherever an activation is also present, you should do so in the definition of members, so that a member is counted only if its id is not null (we have a member) and the aid is not null (we have activation):
SUM(IF(a.id IS NOT NULL AND m.id IS NOT NULL,1, 0)) AS members
select 1/ count(organisations.organisationId) * 100 *
(select count(distinct(org.organisationId)) from organisations org
inner join competitions cmp on org.organisationId = cmp.organisationId
inner join competitionmembers cmpm on cmpm.competitionid = cmp.competitionid
inner join activity_entries act on act.competitionid = cmpm.competitionid) as pct
from organisations
I have data related as follows:
A table of Houses
A table of Boxes (with an FK back into Houses)
A table of Things_in_boxes (with an FK back to Boxes)
A table of Owners (with an FK back into Houses)
In a nutshell, a House has many Boxes, and each Box has many Things in it. In addition, each House has many Owners.
If I know two Owners (say Peter and Paul), how can I list all the Things that are in the Boxes that are in the Houses owned by these guys?
Also, I'd like to master this SQL stuff. Can anyone recommend a good book/resource? (I'm using MySQL).
Thanks!
Peter and Paul are gay couple ?
Then you should go for many-to-many relationship instead of having ownerID inside of Houses Table
ie. Houses2Owners with two columns ownerID and houseID
then the query would be
select item from houses as h
left join Boxes as b on h.houseID=b.houseID
left join Things as t on b.boxID=t.boxID
left join Houses2Owners as h2o on h.houseID=h2o.houseID
left join Owners as o on h2o.ownerID=o.ownerID
Main question you should ask yourself while designing that would be whether each object will appear once ie. if there are two similar boxes with similar things in them or ie. two boxes with ski masks in them.
Then you should create tables with no relationship to parent object and also to create a table that connects two tables. This way you will avoid ski mask to appear twice for two boxes which contain that mask.
SELECT
Things_in_boxes.*
FROM
Houses
JOIN Boxes ON Houses.HouseID = Boxes.House
JOIN Things_in_boxes ON Boxes.BoxID = Things_in_boxes.Box
WHERE
Houses.Owner = 'Peter' OR Houses.Owner = 'Paul'
As for resources to learn from... I can't really suggest anything specific. I learnt how to use (My)SQL gradually and from a number of sources, and can't single any of them out as having been of primary importance. w3schools has OK coverage of the very basic stuff, and MySQL's own documentation (available on the web, google for it) does an OK job and is a reasonable reference for when you want to know the nitty gritty of some topic or other.
EDIT: The above answer is wrong. I had missed the stipulation that a House can have multiple Owners.
New approach: I'll assume that there is a cross-referencing table, HouseOwners, with House and Owner as foreign keys.
My first thought was this:
SELECT
Things_in_boxes.*
FROM
Houses
JOIN Boxes ON Houses.HouseID = Boxes.House
JOIN Things_in_boxes ON Boxes.BoxID = Things_in_boxes.Box
JOIN HouseOwners ON Houses.HouseID = HouseOwners.House
WHERE
HouseOwners.Owner = 'Peter' OR HouseOwners.Owner = 'Paul'
However, this is not quite right. If both Peter and Paul are Owners of a given house, then the things in the boxes in that house would show up twice. I think a subquery is needed.
SELECT
Things_in_boxes.*
FROM
Houses
JOIN Boxes ON Houses.HouseID = Boxes.House
JOIN Things_in_boxes ON Boxes.BoxID = Things_in_boxes.Box
WHERE
Houses.HouseID IN (
SELECT DISTINCT House
FROM HouseOwners
WHERE Owner = 'Peter' OR Owner = 'Paul'
) AS MySubquery
SELECT t.name
FROM Houses h
INNER JOIN Boxes b ON b.houseId = h.id
INNER JOIN Things t ON t.boxId = b.id
INNER JOIN Owners o ON o.houseId = h.id
WHERE o.name = 'Peter' OR o.name = 'Paul'
By using inner joins you can combine these 4 tables with all the linked information. There is also an other way using inner select queries:
SELECT t.name
FROM Houses h
INNER JOIN Boxes b ON b.houseId = h.id
INNER JOIN Things t ON t.boxId = b.id
INNER JOIN Owners o ON o.houseId = h.id
WHERE h.id IN (SELECT o.housId
FROM Owners o
WHERE o.name = 'Peter' OR o.name = 'Paul')
This query works differently (by first finding the two house ID's of Peter and Paul and then performing the join), but it has the same effect.
Hopefully these examples will help you understand SQL :)
This isn't tested and written on the spot:
SELECT *
FROM
`things_in_boxes` AS a
LEFT JOIN `houses` AS b
on ( a.`house_id` = b.`house_id` )
LEFT JOIN `owners` AS c
on ( b.`house_id` = c.`house_id` )
WHERE c.`owner_id` IN( 0, 1 )
That is the general structure I would use, where the "0, 1" in the last IN statement are the owner ids for Peter and Paul. If you wanted to do it by name, you could simply make it something like
c.`name` IN( 'Peter', 'Paul' )
As far as books, I can't really tell you, I've learned through tutorials and references.
Here's one approach:
SELECT * FROM Things_in_boxes t
WHERE box_id IN (
SELECT b.id
FROM Boxes b
INNER JOIN Owners o
ON (o.house_id = b.house_id)
WHERE o.name LIKE 'Peter'
OR o.name LIKE 'Paul'
)
Note that you don't need to join on the House table, since both the Boxes and Owners have a house id.
Without knowing the full structure, I will assume a structure to build a query, step by step
Get the IDs of the houses belonging to the owners
select id from House where owner in ('peter', 'paul')
Get the boxes in those houses
select boxid from boxes where homeid in (select id from House where owner in ('peter', 'paul'))
Get the things in those boxes
select * from things where boxid in (select boxid from boxes where homeid in (select id from House where owner in ('peter', 'paul')))
This should get you what you want, but is very inefficient.
In the above method, The final query in step 3 gets the ids in each step, and stores them in temporary storage while it consumes them. This is a very slow operation in most DBMS.
The better alternative is a join. Combine all the tables and select the desired data.
select * from things join boxes on things.boxid =boxes.boxid join houses on boxes.houseid=house.id join owners on houses.owner=owner.ownerid where owner.name in ('peter',''paul)