mysql subquery understanding - mysql

I am trying to find all sale_id's that have an entry in sales_item_taxes table, but do NOT have a corresponding entry in the sales_items table.
mysql> describe phppos_sales_items_taxes;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| sale_id | int(10) | NO | PRI | NULL | |
| item_id | int(10) | NO | PRI | NULL | |
| line | int(3) | NO | PRI | 0 | |
| name | varchar(255) | NO | PRI | NULL | |
| percent | decimal(15,3) | NO | PRI | NULL | |
| cumulative | int(1) | NO | | 0 | |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.01 sec)
mysql> describe phppos_sales_items;
+--------------------+----------------+------+-----+--------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+----------------+------+-----+--------------+-------+
| sale_id | int(10) | NO | PRI | 0 | |
| item_id | int(10) | NO | PRI | 0 | |
| description | varchar(255) | YES | | NULL | |
| serialnumber | varchar(255) | YES | | NULL | |
| line | int(3) | NO | PRI | 0 | |
| quantity_purchased | decimal(23,10) | NO | | 0.0000000000 | |
| item_cost_price | decimal(23,10) | NO | | NULL | |
| item_unit_price | decimal(23,10) | NO | | NULL | |
| discount_percent | int(11) | NO | | 0 | |
+--------------------+----------------+------+-----+--------------+-------+
9 rows in set (0.00 sec)
mysql>
Proposed Query:
SELECT DISTINCT sale_id
FROM phppos_sales_items_taxes
WHERE item_id NOT IN
(SELECT item_id FROM phppos_sales_items WHERE sale_id = phppos_sales_items_taxes.sale_id)
The part I am confused by is the subquery. The query seems to work as intended but I am not understanding the subquery part. How does it look for each sale?
For example if I have the following data:
mysql> select * from phppos_sales;
+---------------------+-------------+-------------+---------+-------------------------+---------+--------------------+-----------+-----------+------------+---------+-----------+-----------------------+-------------+---------+
| sale_time | customer_id | employee_id | comment | show_comment_on_receipt | sale_id | payment_type | cc_ref_no | auth_code | deleted_by | deleted | suspended | store_account_payment | location_id | tier_id |
+---------------------+-------------+-------------+---------+-------------------------+---------+--------------------+-----------+-----------+------------+---------+-----------+-----------------------+-------------+---------+
| 2014-08-09 17:53:38 | NULL | 1 | | 0 | 1 | Cash: $12.96<br /> | | | NULL | 0 | 0 | 0 | 1 | NULL |
| 2014-08-09 17:56:59 | NULL | 1 | | 0 | 2 | Cash: $12.96<br /> | | | NULL | 0 | 0 | 0 | 1 | NULL |
+---------------------+-------------+-------------+---------+-------------------------+---------+--------------------+-----------+-----------+------------+---------+-----------+-----------------------+-------------+---------+
mysql> select * from phppos_sales_items;
+---------+---------+-------------+--------------+------+--------------------+-----------------+-----------------+------------------+
| sale_id | item_id | description | serialnumber | line | quantity_purchased | item_cost_price | item_unit_price | discount_percent |
+---------+---------+-------------+--------------+------+--------------------+-----------------+-----------------+------------------+
| 2 | 1 | | | 1 | 1.0000000000 | 10.0000000000 | 12.0000000000 | 0 |
+---------+---------+-------------+--------------+------+--------------------+-----------------+-----------------+------------------+
1 row in set (0.00 sec)
mysql> select * from phppos_sales_items_taxes;
+---------+---------+------+-----------+---------+------------+
| sale_id | item_id | line | name | percent | cumulative |
+---------+---------+------+-----------+---------+------------+
| 1 | 1 | 1 | Sales Tax | 8.000 | 0 |
| 2 | 1 | 1 | Sales Tax | 8.000 | 0 |
+---------+---------+------+-----------+---------+------------+
2 rows in set (0.00 sec)
When I run the query below it does find sale_id 1. But how does the subquery know to filter correctly. I guess I am not understanding how the sub query works.
mysql> SELECT DISTINCT sale_id
-> FROM phppos_sales_items_taxes
-> WHERE item_id NOT IN
-> (SELECT item_id FROM phppos_sales_items WHERE sale_id = phppos_sales_items_taxes.sale_id)
-> ;
+---------+
| sale_id |
+---------+
| 1 |
+---------+
1 row in set (0.00 sec)

Duffy356 link to the SQL-Joins is good, but sometimes seeing with your own data might sometimes make more sense...
First, your query as written and obviously learning will be very expensive to the engine. How it knows what to include is because it is doing a correlated sub-query -- meaning that FOR every record IN the sales_items_taxes table it is running a query TO the sales_items table, which is returning every item possible for said sale_id. Then it comes back to the main query and compares it to the sales_items_taxes table. If it does NOT find it, it allows the sale_id to be included in the result set. Then it goes to the next record in the sales_items_taxes table.
(Your query reformatted for better readability)
SELECT DISTINCT
sale_id
FROM
phppos_sales_items_taxes
WHERE
item_id NOT IN ( SELECT item_id
FROM phppos_sales_items
WHERE sale_id = phppos_sales_items_taxes.sale_id)
Now, think about this. You have 1 sale with 100 items. It is running the correlated sub-query 100 times. Now do this with 1,000 sales id entries and each has however many items, gets expensive quickly.
A better alternative is to take advantage of databases and do a left-join. The indexes work directly with the LEFT JOIN (or inner join) and are optimized by the engine. Also, notice I am using "aliases" for the tables and qualifying the aliases for readability. By starting with your sales items taxes table (the one you are looking for extra entries) is the basis. Now, left-join this sales items table on the two key components of the sale_id and item_id. I would suggest that each table has an index ON (sale_id, item_id) to match the join condition here.
SELECT DISTINCT
sti.sale_id
FROM
phppos_sales_items_taxes sti
LEFT JOIN phppos_sales_items si
ON sti.sale_id = si.sale_id
AND sti.item_id = si.item_id
WHERE
si.sale_id IS NULL
So, from here, think of it that each table is lined-up side-by-side with each other and all you are getting are those on the left side (sale items taxes) that DO NOT have an entry on the right side (sales_items).

Your problem can be fixed by using joins.
Read the following article about SQL-Joins and think about your problem -> you will be able to fix it ;)
The IN-clause is not the best solution, because some databases have limits on the number of arguments contained in it.

what you really wanted here is:
SELECT DISTINCT sale_id
FROM phppos_sales_items_taxes
WHERE sale_id NOT IN
(SELECT sale_id FROM phppos_sales_items)
WHERE field NOT IN (SELECT field FROM anothertable WHERE ...) is a perfectly fine query construct.

Your original query:
SELECT DISTINCT sale_id
FROM phppos_sales_items_taxes
WHERE item_id NOT IN
(SELECT item_id FROM phppos_sales_items WHERE sale_id = phppos_sales_items_taxes.sale_id)
Here you are pulling all the item_ids from the phppos_sales_items table where sale_id matches with taxes table, and removing those item_ids from the final result.
You can do also get same results in couple other ways, which may be easy to understand.
Use IN query with multiple columns:
select distinct sales_id
from sales_item_taxes
where (sale_id, item_id) not in (select sale_id, item_id from phppos_sales_items)
-- This form of query is easy to read and understand. Performance may not be good for large tables.
Exists / not exists format:
select distinct sales_id
from sales_item_taxes t1
where not exists (select '1' from phppos_sales_items t2
where t2.sale_id = t1.sale_id
and t2.item_id = t1.item_id
)
I would have also suggested the same solution as 'bwperrin' did - not sure why you didn't get any output by running the query. If your criteria is to filter on sale_id - that is the best solution. But looks like you are using (sale_id, item_id) as a way to identify sales record. Make sure your table structure makes sense.

Related

Sql query performance is varying though they are the same

There are 2 tables and their structure as below:
mysql> desc product;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| brand | varchar(20) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.02 sec)
mysql> desc sales;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| yearofsales | varchar(10) | YES | | NULL | |
| price | int(11) | YES | | NULL | |
+-------------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
Here id is the foreign key.
And Queries are as follows:
1.
mysql> select brand,sum(price),yearofsales
from product p, sales s
where p.id=s.id
group by s.id,yearofsales;
+-------+------------+-------------+
| brand | sum(price) | yearofsales |
+-------+------------+-------------+
| Nike | 917504000 | 2012 |
| FF | 328990720 | 2010 |
| FF | 328990720 | 2011 |
| FF | 723517440 | 2012 |
+-------+------------+-------------+
4 rows in set (1.91 sec)
2.
mysql> select brand,tmp.yearofsales,tmp.sum
from product p
join (
select id,yearofsales,sum(price) as sum
from sales
group by yearofsales,id
) tmp on p.id=tmp.id ;
+-------+-------------+-----------+
| brand | yearofsales | sum |
+-------+-------------+-----------+
| Nike | 2012 | 917504000 |
| FF | 2011 | 328990720 |
| FF | 2012 | 723517440 |
| FF | 2010 | 328990720 |
+-------+-------------+-----------+
4 rows in set (1.59 sec)
Question is: Why the second query takes less time than the first one? I have executed it multiple times in different order as well.
You can check the execution plan for the two queries and the indexes on the two tables to see why one query takes more than the other. Also, you cannot run one simple test and trust the results, there are many factors that can impact the execution of queries, like the server being busy with something else when executing one query, so it runs slower. You'll have to run both queries a big number of times and then compare the averages.
However, it is highly recommended to use explicit joins instead of implicit joins:
SELECT brand, SUM(price), yearofsales
FROM product p
INNER JOIN sales s ON p.id = s.id
GROUP BY s.id, yearofsales;

mysql join with sub-query

This is my schema:
mysql> describe stocks;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| symbol | varchar(32) | NO | | NULL | |
| date | datetime | NO | | NULL | |
| value | float(10,3) | NO | | NULL | |
| contracts | int(8) | NO | | NULL | |
| open | float(10,3) | NO | | NULL | |
| close | float(10,3) | NO | | NULL | |
| high | float(10,3) | NO | | NULL | |
| low | float(10,3) | NO | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
9 rows in set (0.03 sec)
I added the column open and low and I want to fill up with the data inside the table.
These values open/close are referenced to each day. (so the relative max/min id of each day should give me the correct value). So my first insight is get the list of date and then left join with the table:
SELECT DISTINCT(DATE(date)) as date FROM stocks
but I'm stuck because I can't get the max/min ID or the the first/last value. Thanks
You will get day wise min and max ids from below query
SELECT DATE_FORMAT(date, "%d/%m/%Y"),min(id) as min_id,max(id) as max_id FROM stocks group by DATE_FORMAT(date, "%d/%m/%Y")
But other requirement is not clear.
Solved!
mysql> UPDATE stocks s JOIN
-> (SELECT k.date, k.value as v1, y.value as v2 FROM (SELECT x.date, x.min_id, x.max_id, stocks.value FROM (SELECT DATE(date) as date,min(id) as min_id,max(id) as max_id FROM stocks group by DATE(date)) AS x LEFT JOIN stocks ON x.min_id = stocks.id) AS k LEFT JOIN stocks y ON k.max_id = y.id) sd
-> ON DATE(s.date) = sd.date
-> SET s.open = sd.v1, s.close = sd.v2;
Query OK, 995872 rows affected (1 min 50.38 sec)
Rows matched: 995872 Changed: 995872 Warnings: 0

mysql natural join not working

I have two tables in mysql server. I use these tables for studing JOIN multiple tables but something appears to be incorrect:
mysql> select * from category;
+-------------+-----------+
| category_id | name |
+-------------+-----------+
| 1 | fruit |
| 2 | vegetable |
+-------------+-----------+
2 rows in set (0.00 sec)
mysql> desc category;
+-------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+----------------+
| category_id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | | NULL | |
+-------------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
And:
mysql> select * from goods;
+---------+--------+-------------+------+
| good_id | name | category_id | cost |
+---------+--------+-------------+------+
| 1 | banan | 1 | 1.00 |
| 2 | potato | 2 | 1.00 |
| 3 | peach | 1 | 1.00 |
+---------+--------+-------------+------+
3 rows in set (0.00 sec)
mysql> desc goods;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| good_id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(100) | NO | | NULL | |
| category_id | int(11) | NO | MUL | NULL | |
| cost | decimal(6,2) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
The second table has foreign key (category_id) and I can join them using INNER JOIN:
mysql> select c.name category, g.name, g.cost from category as c INNER JOIN goods g ON c.category_id = g.category_id;
+-----------+--------+------+
| category | name | cost |
+-----------+--------+------+
| fruit | banan | 1.00 |
| vegetable | potato | 1.00 |
| fruit | peach | 1.00 |
+-----------+--------+------+
3 rows in set (0.00 sec)
I tried to use NATURAL JOIN but it didnt work and it seems I dont know why(((
mysql> select c.name, g.name, g.cost from category as c NATURAL JOIN goods g;
Empty set (0.00 sec)
Could somebody explain why NATURAL JOIN does not work?
I was having the exact same thing happen to me, and my Googling led me to this question. I eventually figured it out, so I figured I'd post my answer here.
This was the culprit:
Instead of specifying a join condition through ON, USING or a WHERE clause, the NATURAL keyword tells the server to match up any column names between the two tables, and automatically use those columns to resolve the join.
Your fruit and category tables both have a column called "name". When SQL tries to join the two, it tries to join all like columns. So thus, category_id==category_id, but name!=name.
Rename your columns tablename_column instead.

MySQL merge results into table from count of 2 other tables, matching ids

I've got 3 tables: model, model_views, and model_views2. In an effort to have one column per row to hold aggregated views, I've done a migration to make the model look something like this, with a new column for the views:
+---------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | | NULL | |
| [...] | | | | | |
| views | int(20) | YES | | 0 | |
+---------------+---------------+------+-----+---------+----------------+
This is what the columns for model_views and model_views2 look like:
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | smallint(5) | NO | MUL | NULL | |
| model_id | smallint(5) | NO | MUL | NULL | |
| time | int(10) unsigned | NO | | NULL | |
| ip_address | varchar(16) | NO | MUL | NULL | |
+------------+------------------+------+-----+---------+----------------+
model_views and model_views2 are gargantuan, both totalling in the tens of millions of rows each. Each row is representative of one view, and this is a terrible mess for performance. So far, I've got this MySQL command to fetch a count of all the rows representing single views in both of these tables, sorted by model_id added up:
SELECT model_id, SUM(c) FROM (
SELECT model_views.model_id, COUNT(*) AS c FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c FROM model_views2
GROUP BY model_views2.model_id)
AS foo GROUP BY model_id
So that I get a nice big table with the following:
+----------+--------+
| model_id | SUM(c) |
+----------+--------+
| 1 | 1451 |
| [...] | |
+----------+--------+
What would be the safest route for pulling off commands from here on in to merge the values of SUM(c) into the column model.views, matched by the model.id to model_ids that I get out of the above SQL query? I want to only fill the rows for models that still exist - There is probably model_views referring to rows in the model table which have been deleted.
You can just use UPDATE with a JOIN on your subquery:
UPDATE model
JOIN (
SELECT model_views.model_id, COUNT(*) AS c
FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c
FROM model_views2
GROUP BY model_views2.model_id) toupdate ON model.id = toupdate.model_id
SET model.views = toupdate.c

Querying a database of statistics to get counts of different events

I'm making a database of a soccer league that has these tables:
+---------------------+
| Tables_in_league484 |
+---------------------+
| player |
| statevent |
+---------------------+
18 rows in set (0.09 sec)
and the player table in question look like this,
mysql> desc player;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| pid | int(11) | NO | PRI | NULL | auto_increment |
| lastname | varchar(55) | YES | | NULL | |
| firstname | varchar(85) | YES | | NULL | |
| dob | date | YES | | NULL | |
| posid | int(11) | YES | MUL | NULL | |
| tid | int(11) | YES | MUL | NULL | |
| shirtnum | int(11) | YES | | NULL | |
| email | varchar(85) | YES | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
8 rows in set (0.09 sec)
posid is fk for position table;
tid is fk for team table;
mysql> desc statevent;
+--------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+----------------+
| eid | int(11) | NO | PRI | NULL | auto_increment |
| gid | int(11) | YES | MUL | NULL | |
| pid | int(11) | YES | MUL | NULL | |
| minute | int(11) | YES | | NULL | |
| typeid | int(11) | YES | | NULL | |
+--------+-------------+------+-----+---------+----------------+
5 rows in set (0.09 sec)
where the typeids are:
1 for shot
2 for save
3 for goal
4 for assist
how can i structure a mysql query that gives me a result that looks like this
+--------+------+------+-------+---------+----------------+
| Name | Team | Shots| Saves | Goals | Assists |
+--------+------+------+-------+---------+----------------+
| Nick | 1| 8| 0| 4| 1|
| Jeff | 4| 5| 0| 5| 6|
| Jim | 7| 7| 0| 6| 3|
+--------+------+------+-------+---------+----------------+
that ends after the 10th result? (limit 10)
I've been trying for hours and I'm knackered thinking about it. What do I count? What do I group by? Can I order by aliases?
EDIT
I failed to mention in my first edit that, while there are 18 helpful tables in this database, they are all empty (thus entirely useless) as they relate to the stat events.
They would have been wonderfully helpful.
However, I have to structure my query on this one table of statevents using only typeid. Is this possible?
Essentially, you're just trying to construct a simple PIVOT TABLE query. Personally I'd advocate just returning a GROUPed result set and handle the data display at the application level, but if you must do the pivoting in MySQL then it might look something like this - I've changed some column/table names to get you thinking a bit...
SELECT p.firstname
, p.team_id
, COUNT(CASE WHEN event_type_id = 1 THEN 'foo' END) Shots
, COUNT(CASE WHEN event_type_id = 2 THEN 'foo' END) Saves
, COUNT(CASE WHEN event_type_id = 3 THEN 'foo' END) Goals
, COUNT(CASE WHEN event_type_id = 4 THEN 'foo' END) Assists
FROM player p
JOIN stat_event e
ON e.player_id = p.player_id
GROUP
BY p.player_id;
You would have to join the player table with the other tables you need counts from (shots, saves, goals etc).
One you have the join in place, you would need to aggregate on player id, player name and team with the help of a group by clause.
Your final query will look something like this..
SELECT p.firstname, t.team, COUNT(sh.shots), COUNT(sa.saves), COUNT(g.goals),COUNT(a.assists)
FROM player p
INNER JOIN team t
ON p.tid = t.tid
....
GROUP BY p.pid, p.firstname, t.team
LIMIT 10
EDIT:
I am not a DB expert. I have one SUBOPTIMAL way of achieving this.
I would create a temporary table containing information of the form (it would have to contain pid and tid information too):
...
Nick Goals 13
Matt Saves 4
Nick Saves 11
...
This should be simple to achieve.
I would then use a SQL cursor to iterate over all distinct player ids and recover statistics from the temporary table we constructed above.