I'm new to MySQL, and I'm running this query,
SELECT item_id,amount FROM db.invoice_line WHERE item_id = 'xxx'
OR item_id = 'yyy'
...
AND invoice_id IN
(SELECT id_invoices FROM db.invoices
WHERE customer = 'zzzz'
AND transaction_date > DATE_SUB(NOW(), INTERVAL 6 MONTH)
AND sales_rep = 'aaa') ORDER BY item_id;
That is, select some columns from a table where a foreign key is found in another table.
The issue is that I would like to also have, in the results, the customer name. However, the customer name is not found in the invoice line table, it is found in the invoice table.
While I could naively create a duplicate index upon table creation and inserts, I was wondering if there was a SQL way to select the proper row from the invoice table and have it in the result sets.
Is the performance better if I just duplicate data?
Thanks,
Dane
How about something like this?
SELECT
invoice_line.item_id,
invoice_line.amount,
invoices.customer_name
FROM db.invoice_line
INNER JOIN db.invoices
ON invoice_line.invoice_id = invoices.id_invoices
WHERE invoices.customer = 'zzzz'
AND invoices.transaction_date > DATE_SUB(CURRENT_DATE, INTERVAL 6 MONTH)
AND invoices.sales_rep = 'aaa'
AND (invoice_line.item_id = 'xxx' OR invoice_line.item_id = 'yyy')
ORDER BY invoice_line.item_id;
Use join between table to achieve your result.
Related
Needless to say i am not proficient at SQL. Now i have to run a query on a table that looks like this :
id, tp_id, value_1, value_2, value_3, date
This table has 2 entries for each distinct tp_id, with different values. tp_id is a foreign key, which is indexed, in the following table :
id, external_id
I'm trying to retrieve data as follows :
Get distinct tp_id where value_2 = 2, value_1 = 1 | 2, value_3 = 1, and date < now - 1 year. These conditions must hold true for BOTH entries with matching tp_id
I have tried the following query, but as i understand it the SUM function paired with the JOIN statement makes the query too slow :
SELECT t1.tp_id, t2.external_id
FROM table_1 t1
JOIN table_2 t2 ON t1.tp_id = t2.id
GROUP BY t1.tp_id
HAVING
SUM(
t1.value_2 = 2
AND t1.value_1 IN (1, 2)
AND t1.value_3 = 1
AND t1.date <= DATE_SUB(NOW(), INTERVAL 1 YEAR)
) = 2;
Both tables have roughly 2.5M rows.
I'd like to optimize this query or learn a better way to do this, so any help would be welcome.
Thanks in advance
EDIT: It appears running this query will be altogether unnecessary. I will therefore close the question, thanks for the answers
If I got your requirement correctly, something like this might help.
SELECT tp_id
FROM (
SELECT t1.tp_id,count(*) as count
FROM table_1 t1
WHERE
t1.value_2 = 2
AND (t1.value_1 = 1 OR t1.value_1 = 2)
AND t1.value_3 = 1
AND t1.date <= DATE_SUB(NOW(), INTERVAL 1 YEAR)
GROUP BY tp_id
) as res
WHERE res.count = 2
Essentially, I did 3 performance update:
the WHERE condition is applied before the GROUP BY, way more performant than the HAVING
I've used a nested query, but you can also use HAVING COUNT(tp_id) = 2 depending on your MySQL version
2 boolean checks should be more performant than an IN clause
I have 2 tables that look like the following:
TABLE 1 TABLE 2
user_id | date accountID | date | hours
And I'm trying to add up the hours by the week. If I use the following statement I get the correct results:
SELECT
SUM(hours) as totalHours
FROM
hours
WHERE
accountID = 244
AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY
accountID
But when I join the two tables I get a number like 336640 when it should be 12
SELECT
SUM(hours) as totalHours
FROM
hours
JOIN table1 ON
user_id = accountID
WHERE
accountID = 244
AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY
accountID
Does anyone know why this is?
EDIT: Turns out I just needed to add DISTINC, thanks!
JOIN operations usually generate more rows in the result table: join's result is a row for every possible pair of rows in the two joined tables that happens to meet the criterion selected in the ON clause. If there are multiple rows in table1 that match each row in hours, the result of your join will repeat hours.accountID and hours.hours many times. So, adding up the hours yields a high result.
The reason is that the table you are joining to matches multiple rows in the first table. These all get added together.
The solution is to do the aggregation in a subquery before doing the join:
select totalhours
from (SELECT SUM(hours) as totalHours
FROM hours
WHERE accountID = 244 AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY accountID
) h join
table1 t1
on t1.user_id = h.accountID;
I suspect your actual query is more complicated. For instance, table1 is not referenced in this query so the join is only doing filtering/duplication of rows. And the aggregation on hours is irrelevant when you are choosing only one account.
You should probably be specifying LEFT JOIN to be sure that it won't eliminate rows that don't match.
Also, date BETWEEN ? AND ? is preferable to date >= ? AND date < ?.
Table temporary_search_table
post_id,property_status, property_address,....more 30 field
Table search_meta
meta_id,search_id,status,created_date
Ok I need Total data which created_date is yesterday. For each temporary_search_table data there may multiple entry within search_meta. So we need to pick last one field from search_meta and check created date is yesterday and property_status is pending. if yes then we can count the number. If there is no data available in search_meta for entry in temporary_search_table then we dont need to count that row within our results.
Here i am attaching my sql data. its work but for 30000 row it take lots of time.
SELECT COUNT(id) FROM temporary_search_table
WHERE property_status = 'pending' AND (1 = (SELECT DATEDIFF(NOW(), created_date)
FROM search_meta WHERE post_id = search_id ORDER BY created_date DESC LIMIT 0,1 ))
Thanks in advance.
Apart from checking the indexes on your table, it would probably be better to not use a correlated sub query and use a straight join instead.
SELECT COUNT(id)
FROM temporary_search_table
INNER JOIN search_meta ON post_id = search_id
WHERE property_status = 'pending' AND DATEDIFF(NOW(), created_date) = 1
ORDER BY created_date DESC
LIMIT 1
I have the following table:
I'm trying to find a way to get the records for those customers that have expired, and then update the table accordingly (by update I mean add an a new record with entry 'SERVICE EXPIRED' with the customer_id of the relevant customer).
If you look at the bottom of the table, you will notice two records with the entry 'SERVICE EXPIRED' for already existing customers (customer_id 11 and 16).
I'm looking for a SQL Query that will:
Get the last set of distinct records by customer_id
Exclude records for the same customer_id from the resulting resultset that have the entry 'SERVICE EXPIRED' or status_id of 2 appearing later on in the table
If I use the following:
SELECT MAX(id) FROM mytable WHERE status_id != '2' AND expiry < '2012-12-26 19:00:00' GROUP BY customer_id
It will return ids 1, 11, 13, and 16. However, I don't want ids 11 and 16 because the expiry status has already been noted later on in the table (see the last two records of the table), and id 1 has been renewed as can be seen with an updated expiry date in id 3 later. All I want is id 13 because that is the only expired record that does not have a 'SERVICE EXPIRED' entry that appears later in the table.
I'm looking for a SQL Query that will enable me capture this requirement.
Thanks in advance
After some fiddling around I managed to come up with a solution:
SELECT MAX(id)
FROM mytable
WHERE status_id != '2'
AND expiry < '2012-12-26 19:00:00'
AND customer_id NOT IN (SELECT MAX(customer_id) FROM mytable WHERE status_id = '2' GROUP BY customer_id)
GROUP BY customer_id
Thanks #JupiterP5 for pointing me in the right direction.
Regards,
Your requirement is equivalent to finding "n" records after the last expiry on a record. The following query returns all records after the last expiry for a given customer:
select t.*
from t join
(select t.customer_id, MAX(id) as maxid
from t
where status_id = 2
) texp
on t.customer_id = texp.customer_id and
t.id > texp.maxid
By using variables cleverly, you can enumerate these to get the last "n". However, do you really need a fixed number? Why not all of them? Why not just one of them?
It's not efficient, but this should work.
SELECT MAX(id)
FROM mytable
WHERE status_id != '2'
AND expiry < '2012-12-26 19:00:00'
AND id NOT IN (SELECT id FROM mytable where status_id = 2)
GROUP BY customer_id
Edit: Missed the service renewed case. I'll update if I think of something.
My table is reasonably small around 50,000 rows. My schema is as follows:
DAILY
match_id
user_id
result
round
tournament_id
Query:
SELECT user_id
FROM `daily`
WHERE user_id IN (SELECT user_id
FROM daily
WHERE round > 25
AND tournament_id = 24
AND (result = 'Won' OR result = 'Lost'))
Using the in keyword in the fashion you are is a very dangerous [from a performance perspective] thing to do. It will result in the sub query [(select user_id from daily where round > 25 and tournament_id=24 and (result='Won' or result='Lost'))] being ran 50,000 times in this case.
You'll want to convert this onto a join something to the effect of
select user_id from daily a join
(select user_id from daily where round > 25 and tournament_id=24 and (result='Won' or result='Lost')) b on a.user_id = b.user_id
Doing something similar to this will result in only two queries and a join.
As Cybernate pointed out in your specific example you can simply use where clauses, but I went ahead and suggested this in case your query is actually more complex than what you posted.
First verify and add Indexes as suggested earlier.
Also why are you using an in if you are querying data from same table.
Change your query to:
SELECT user_id
FROM daily
WHERE round > 25
AND tournament_id = 24
AND ( result = 'Won'
OR result = 'Lost' )
Your query only needs to be:
SELECT d.user_id
FROM DAILY d
WHERE d.round > 25
AND d.tournament_id = 24
AND d.result IN ('Won', 'Lost')
Indexes should be considered on:
DAILY.round
DAILY.tournament_id
DAILY.result
This should return in a millisecond.
SELECT user_id FROM daily WITH(NOLOCK)
where user_id in (select user_id from daily WITH(NOLOCK) where round > 25 and tournament_id = 24 and (result = 'Won' or result = 'Lost'))
Then make sure there is an index on the filter columns.
CREATE NONCLUSTERED INDEX IX_1 ON daily (round ASC, tournament_id ASC, result ASC)