I have database with two tables - 'Warehouses' and 'Boxes'.
Each box has field with warehouse code, each Warehouse - 'capacity' field.
The purpose is to find only Warehouses that are "overfilled" (capacity of warehouse is less then number of all boxes with this warehouse code).
So, I count all boxes and join warehouse capacity by this query:
SELECT Warehouses.Code, Warehouses.Capacity, COUNT(Boxes.Code)
FROM `Warehouses` RIGHT JOIN
`Boxes`
on Warehouses.Code = Boxes.Warehouse
GROUP BY Boxes.Warehouse
Result:
------------------------------
Code | Capacity | COUNT
------------------------------
1 | 3 | 4
------------------------------
2 | 4 | 2
------------------------------
3 | 7 | 2
------------------------------
4 | 2 | 1
------------------------------
That returns me warehouse's capacity and counts boxes in it, but I don't know how and where to compare these numbers.
You do this in a HAVING clause:
SELECT w.Code, w.Capacity, COUNT(b.Code)
FROM `Warehouses` w LEFT JOIN
`Boxes` b
on w.Code = b.Warehouse
GROUP BY w.Code, w.Capacity
HAVING w.Capacity < COUNT(b.Code);
Notes:
LEFT JOIN is generally much easier to understand than RIGHT JOIN ("Keep all rows in the first table" versus "keep all rows in the last table, which I haven't read yet"). However, this query probably only needs an INNER JOIN.
Presumably, Warehouses should be the first table, because your question is about this entity.
The HAVING clause does the comparison after the aggregation.
Related
This question is a bit complicated to me, and I can't explain it in one sentence so the title may seem quite ambiguous.
I have 3 tables in my MySQL database, their structure is shown below:
word_list (5 million rows)
+-----+--------+
| wid | word |
+-----+--------+
| 1 | foo |
| 2 | bar |
| 3 | hello |
+-----+--------+
paper_word_relation (10 million rows)
+-----+-------+
| pid | word |
+-----+-------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 3 |
+-----+-------+
paper_citation_relation (80K rows)
+----------+--------+
| pid_from | pid_to |
+----------+--------+
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 3 |
+----------+--------+
I want to find out how many papers contain word W, and cite the papers also contain word W.(for each word in the list)
I use two inner join to do this job but it seems extremely slow when the word is popular - above 50s (quite fast if the word is rarely used - below 0.1s), here is my code
SELECT COUNT(*) FROM (
SELECT a.pid_from, a.pid_to, b.word FROM paper_citation_relation AS a
INNER JOIN paper_word_relation AS b ON a.pid_from = b.pid
INNER JOIN paper_word_relation AS c ON a.pid_to = c.pid
WHERE b.word = 2 AND c.word = 2) AS d
How can I do this faster? Is my query not efficient enough or it's the problem about the amount of data?
I can only come up with one solution that I delete the words which occur less than 2 in the paper_word_relation table. (About 4 million words only occur once)
Thanks!
If you are only concerned with getting the Count, you should not be first getting the results into a Derived Table, and then Count the rows out. This may create unnecessary temporary tables storing lots of data in-memory. You can directly count the number of rows.
I also think that you need to count unique number of papers. Because of Many-to-Many relationships in paper_citation_relation table, duplicate rows may be coming for a single paper.
SELECT COUNT(DISTINCT a.pid_from)
FROM paper_citation_relation AS a
INNER JOIN paper_word_relation AS b ON a.pid_from = b.pid
INNER JOIN paper_word_relation AS c ON a.pid_to = c.pid
WHERE b.word = 2 AND c.word = 2
For performance, you will need following indexing:
Composite Index on (pid_from, pid_to) in the paper_citation_relation table.
Composite Index on (pid, word) in the paper_word_relation table.
We may also possibly optimize the query further by reducing one join and use conditional AND/OR based filtering in HAVING. You will need to benchmark it though.
SELECT COUNT(*)
FROM (
SELECT a.pid_from
FROM paper_citation_relation AS a
INNER JOIN paper_word_relation AS b
ON (a.pid_from = b.pid OR
a.pid_to = b.pid)
GROUP BY a.pid_from
HAVING SUM(a.pid_from = b.pid AND b.word = 2) AND
SUM(a.pid_to = b.pid AND b.word = 2)
)
After the first 1:n join you get the same pid_to multiple times and your next join is no longer 1:n but n:m, creating a possibly huge intermediate result before the final DISTINCT. It's similar to a CROSS JOIN and it's getting worse for popular words, e.g. 10*10 vs. 1000*1000 rows.
You must remove the duplicates before the join, this should return the same number as #MadhurBhaiya's answer
SELECT Count(*) -- no more DISTINCT needed
FROM
(
SELECT DISTINCT cr.pid_to -- reducing m to 1
FROM paper_citation_relation AS cr
JOIN paper_word_relation AS wr
ON cr.pid_from = wr.pid
WHERE wr.word = 2
) AS dt
JOIN paper_word_relation AS wr
ON dt.pid_to = wr.pid -- 1:n join again
WHERE wr.word = 2
If you want to count the number of papers which have been cited you need to get a distinct list of pid (either pid_from or pid_to) from paper_citation_relation first and then join to the specific word.
SELECT Count(*)
FROM
( -- get a unique list of cited or citing papers
SELECT pid_from AS pid -- citing
FROM paper_citation_relation
UNION -- DISTINCT by default
SELECT pid_to -- cited
FROM paper_citation_relation
) AS dt
JOIN paper_word_relation AS wr
ON wr.pid = dt.pid
WHERE wr.word = 2 -- now check for the searched word
The number returned by this might be slightly higher (it counts a paper regardless if cited or citing).
I have two tables, bills and linesbill. I need all the products that a customer has ever bought. I've gotten this to work:
SELECT referencia, codcliente, pvpunitario, t2.fecha FROM
lineasfacturascli T1
INNER JOIN facturascli T2 ON T1.idfactura = T2.idfactura
WHERE T2.codcliente = "000001"
GROUP BY referencia
But I need get the last price that the customer has paid for each product. I'm trying to order by "fecha"->(date) but it does not work.
Tables structure
facturascli
idfactura(id bill),
codcliente(client id),
fecha(date)
lineasfacturascli
referencia(name of product),
idfactura(id bill)
pvpunitario(price)
Edit
DRapp solution works but I also need to handle the case that a customer buys it in the same day get only the lower price:
With the solution provided the result is:
|Referencia| |MostRecentDatePerItem| |MostRecentPricePerItem|
| pendrive | | 2017-03-02 | | 50 |
| pendrive | | 2017-03-02 | | 10 |
| samsung | | 2017-03-02 | | 50 |
| linux car| | 2017-04-26 | | 9.99 |
I need:
|Referencia| |MostRecentDatePerItem| |MostRecentPricePerItem|
| pendrive | | 2017-03-02 | | 10 |
| samsung | | 2017-03-02 | | 50 |
| linux car| | 2017-04-26 | | 9.99 |
Thanks
I would start with an inner pre-query of all line items for a specific person with a max date per item as a group by. So if a person ordered the 10 things multiple times over say... 50 orders, you would still have the final list of 10 things, but also the most recent date the thing was ordered.
The following is based on not exactly knowing your structures, nor sample data (please provide for future). Also, you should always qualify your table columns in a query with the corresponding table alias reference so users know which field comes from what table. I have to assume the "pvpunitario" column is from the line item details as the price, but basic translation appears to be "unit" not price. You will have to adjust accordingly if I am inaccurate on my impression.
select
T1.referencia,
max( t2.fecha ) as MostRecentDatePerItem
FROM
lineasfacturascli T1
INNER JOIN facturascli T2
ON T1.idfactura = T2.idfactura
WHERE
T2.codcliente = "000001"
GROUP BY
T1.referencia
So this will give us just the products and the maximum date ever ordered by a single client. Now, we take this result as a basis to the original query, re-joined to the line items / order headers that specifically match the corresponding MostRecentDatePerItem.
select
TT1.Referencia,
PQ.MostRecentDatePerItem,
TT1.pvpunitario as MostRecentPricePerItem
from
lineasfacturascli TT1
JOIN
(select
T1.referencia,
max( t2.fecha ) as MostRecentDatePerItem
FROM
lineasfacturascli T1
INNER JOIN facturascli T2
ON T1.idfactura = T2.idfactura
WHERE
T2.codcliente = "000001"
GROUP BY
T1.referencia ) PQ
on TT1.Referencia = PQ.Referencia
JOIN facturascli TT2
ON TT1.idfactura = TT2.idfactura
AND PQ.MostRecentDatePerItem = TT2.Fecha
where
TT2.codcliente = "000001"
To clarify what is going on. The inner query (now alias "PQ" -- PreQuery), is just those qualifying items for the one client in question with the most recent date said item was purchased.
So now back to the original list of all order line items joined to this table keeps the reference product ID linked. Now, we go again to the order header table and still apply the same client code, but ALSO joined on the same FETCHA date as the maximum date found for the transaction. So only THEN do we want to grab the detail level price / unit information for said product.
Hopefully this helps direct your final solution. If I am incorrect on any pieces, you should EDIT your original question and supply the additional missing details / alias references / sample data. Then you can reply comment for follow-up support.
Answer per Comment.
To get the minimum price, you would just adjust the outer select and add a group by. Since the item is the same, the group by will only group for the prices on that specific day. Change the above to...
select
TT1.Referencia,
PQ.MostRecentDatePerItem,
MIN( TT1.pvpunitario ) as LeastPricePerItemOnThisDate
(same rest of query)
GROUP BY
TT1.Referencia,
PQ.MostRecentDatePerItem
I'm having 2 tables. Table A contains a list of people who booked for an event, table B has a list of people the booker from table A brings with him/her. Both tables have many colums with unique data that I need to do certain calculations on in PHP , and as of now I do so by doing queries on the tables with a recursive PHP function to resolve it. I want to simplify the PHP and reduce the amount of queries that come from this recursive function by doing better MYSQL queries but I'm kind of stuck.
Because the table has way to many columns I will give an Excerpt of table A instead:
booking_id | A_customer | A_insurance
1 | 134 | 4
Excerpt of table B:
id | booking_id | B_insurance
1 | 1 | 0
2 | 1 | 1
3 | 1 | 1
4 | 1 | 3
The booking_id in table A is unique and set to auto increment, the booking_id in table b can occur many times (depending on how many guests the client from table A brings with him). Lets say I want to know every selected insurance from customer 134 and his guests, then I want the output like this:
booking_id | insurance
1 | 4
1 | 0
1 | 1
1 | 1
1 | 3
I have tried a couple of joins and this is the closest I've came yet, unfortunately this fails to show the row from A and only shows the matching rows in B.
SELECT a.booking_id,a.A_customer,a.A_insurance,b.booking_id,b.insurance FROM b INNER JOIN a ON (b.booking_id = a.booking_id) WHERE a.booking_id = 134
Can someone point me into the right direction ?
Please note: I have altered the table and column names for stackoverflow so it's easy for you guys to read, so it's possible that there is a typo that would break the query in it right now.
I think you need a union all for this:
select a.booking_id, a.insurance
from a
where a.a_customer = 134
union all
select b.booking_id, b.insurance
from a join
b
on a.booking_id = b.booking_id
where a.a_customer = 134;
The simplest way I can think of to achieve this is to use a UNION:
SELECT booking_id, A_insurance insurance
FROM A
WHERE booking_id = 134
UNION
SELECT booking_id, B_insurance insurance
FROM B
WHERE booking_id = 134
As my understanging of your isso is right, that should give you the result you need:
SELECT a.booking_id,a.insurance FROM a WHERE a.booking_id = 134
union
SELECT a.booking_id,b.insurance FROM b INNER JOIN a ON (b.booking_id = a.booking_id) WHERE a.booking_id = 134
I have the following tables in a succession of 1-to-many relationships:
company_company, company_portfolio, building_site and statistics_meter. The area of difficulty I am having is the final table, statistics_meter.
For the benefit of this exercise, it's structure is as follows:
Records are related within the same table, with some being parent meters, and some being child meters. Where a record is a child, it will have parent_meter_id set, and building_id, which crucially, is how the table is LEFT JOIN'ed set to NULL.
id | parent_meter_id | site_ref | building_id
1 | NULL | some building | 45
2 | NULL | some other building | 45
3 | 1 | and another | NULL
4 | 1 | one another one | NULL
5 | 2 | final one | NULL
I have two requirements:
1 - count the number of parent meters where the building_id is set (which I am doing successfully)
2 - count the number of meters where the parent_meter_id matches the meter_id of those counted in (1)
Thus I would expect a result whereby (1) = 2 and (2) = 3.
Here is the SQL I've got so far...I tried fiddling around with a SUM case when but I think it's totally wrong. Is this even possible within one query?
Thanks for the help.
SELECT
building_site.id as site_id,
building_site.site_ref as building_name,
COUNT(statistics_meter.id) AS meter_count,
SUM(CASE WHEN statistics_meter.parent_meter_id = [???] THEN 1 ELSE 0 END) AS check_meter_count
FROM company_company
LEFT JOIN company_portfolio ON company_portfolio.company_id=company_company.id
LEFT JOIN building_site ON building_site.portfolio_id=company_portfolio.id
LEFT JOIN statistics_meter ON statistics_meter.building_id=building_site.id
WHERE company_company.id=41
GROUP BY building_site.id
Well if I understand you, you'll need to use a subquery to get the parent meters with a building id, and then join that to your main table.
SQL Fiddle
select
sm.id,
sm.parent_meter_id,
sm2.id as ID2,
sm.site_ref,
sm.building_id
from
statistics_meter sm
inner join (
select
id,
parent_meter_id
from
statistics_meter
where
building_id is not null) sm2
on sm.parent_meter_id = sm2.id
Not sure if this is the most efficient way to do it, but in the end I performed a left join and subquery as below and performed two counts, one COUNT() for total number to answer my requirement (2) and a COUNT(distinct) to answer my requirement (1)
SELECT
count(distinct statistics_meter.id) as meter_count,
count(statistics_meter.id) as check_meter_count
FROM company_company
LEFT JOIN company_portfolio ON company_portfolio.company_id=company_company.id
LEFT JOIN building_site ON building_site.portfolio_id=company_portfolio.id
LEFT JOIN statistics_meter ON statistics_meter.building_id=building_site.id
LEFT JOIN (select * from statistics_meter where parent_meter_id is not NULL) sm2 on sm2.parent_meter_id = statistics_meter.id
For simplicity, I will give a quick example of what i am trying to achieve:
Table 1 - Members
ID | Name
--------------------
1 | John
2 | Mike
3 | Sam
Table 1 - Member_Selections
ID | planID
--------------------
1 | 1
1 | 2
1 | 1
2 | 2
2 | 3
3 | 2
3 | 1
Table 3 - Selection_Details
planID | Cost
--------------------
1 | 5
2 | 10
3 | 12
When i run my query, I want to return the sum of the all member selections grouped by member. The issue I face however (e.g. table 2 data) is that some members may have duplicate information within the system by mistake. While we do our best to filter this data up front, sometimes it slips through the cracks so when I make the necessary calls to the system to pull information, I also want to filter this data.
the results SHOULD show:
Results Table
ID | Name | Total_Cost
-----------------------------
1 | John | 15
2 | Mike | 22
3 | Sam | 15
but instead have John as $20 because he has plan ID #1 inserted twice by mistake.
My query is currently:
SELECT
sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
SELECT
m.id, m.name, g.premium
FROM members m
INNER JOIN member_selections s USING(ID)
INNER JOIN selection_details g USING(planid)
) sq group by sq.agent
Adding DISTINCT s.planID filters the results incorrectly as it will only show a single PlanID 1 sold (even though members 1 and 3 bought it).
Any help is appreciated.
EDIT
There is also another table I forgot to mention which is the agent table (the agent who sold the plans to members).
the final group by statement groups ALL items sold by the agent ID (which turns the final results into a single row).
Perhaps the simplest solution is to put a unique composite key on the member_selections table:
alter table member_selections add unique key ms_key (ID, planID);
which would prevent any records from being added where the unique combo of ID/planID already exist elsewhere in the table. That'd allow only a single (1,1)
comment followup:
just saw your comment about the 'alter ignore...'. That's work fine, but you'd still be left with the bad duplicates in the table. I'd suggest doing the unique key, then manually cleaning up the table. The query I put in the comments should find all the duplicates for you, which you can then weed out by hand. once the table's clean, there'll be no need for the duplicate-handling version of the query.
Use UNIQUE keys to prevent accidental duplicate entries. This will eliminate the problem at the source, instead of when it starts to show symptoms. It also makes later queries easier, because you can count on having a consistent database.
What about:
SELECT
sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
SELECT
m.id, m.name, g.premium
FROM members m
INNER JOIN
(select distinct ID, PlanID from member_selections) s
USING(ID)
INNER JOIN selection_details g USING(planid)
) sq group by sq.agent
By the way, is there a reason you don't have a primary key on member_selections that will prevent these duplicates from happening in the first place?
You can add a group by clause into the inner query, which groups by all three columns, basically returning only unique rows. (I also changed 'premium' to 'cost' to match your example tables, and dropped the agent part)
SELECT
sq.ID,
sq.name,
SUM(sq.Cost) AS total_cost
FROM
(
SELECT
m.id,
m.name,
g.Cost
FROM
members m
INNER JOIN member_selections s USING(ID)
INNER JOIN selection_details g USING(planid)
GROUP BY
m.ID,
m.NAME,
g.Cost
) sq
group by
sq.ID,
sq.NAME