Mysql replace column value with other column value - mysql

I have 2 tables:
table: transaction:
====================
id billed_date amount
1 2016-09-30 5
2 2016-10-04 15
3 2016-10-06 10
table: report_date
====================
transaction_id report_date
1 2016-10-01
I want:
Create a report which sum all transactions's amount in October 2016
Base on report date, not billed date
When report date is not set, it's base on billed_date
In above example, I want result is 30 (not 25)
Then I write:
The First:
SELECT
sum(t.amount),
CASE WHEN d.report_date IS NOT NULL THEN d.report_date ELSE t.billed_date END AS new_date
FROM
transaction t LEFT JOIN report_date d ON t.id = d.transaction_id
WHERE new_date BETWEEN '2016-10-01' AND '2016-10-30'
The Second:
SELECT sum(amount) FROM
(SELECT t.amount,
CASE WHEN d.report_date IS NOT NULL THEN d.report_date ELSE t.billed_date END AS date
FROM transaction t LEFT JOIN report_date d ON t.id = d.transaction_id
) t
WHERE t.date BETWEEN '2016-10-01' AND '2016-10-30'
Result:
The First:
Unknown column 'new_date' in 'where clause'
If I replace 'new_date' by 'date': result = 25 (exclude id=1)
The Second:
result = 30 => Correct, but in my case, when transaction table have about 30k records, the process is too slow.
Anybody can help me?

First of all - the part
CASE WHEN d.report_date IS NOT NULL THEN d.report_date ELSE t.billed_date END
can be written shorter as
COALESCE(d.report_date, t.billed_date)
or as
IFNULL(d.report_date, t.billed_date)
In your first query you are using a column alias in the WHERE clause, wich is not allowed. You can fix it by moving the expression behind the alias to the WHERE clause:
SELECT sum(t.amount)
FROM transaction t LEFT JOIN report_date d ON t.id = d.transaction_id
WHERE COALESCE(d.report_date, t.billed_date) BETWEEN '2016-10-01' AND '2016-10-30'
This is almost the same as your own solution.
Your second query is slow because MySQL has to store the subquery result (30K rows) into a temporary table. Trying to optimize it, you will end up with the same solution above.
However if you have indexes on transaction.billed_date and report_date.report_date this query still can not use them. In order to use the indexes, you can split the query into two parts:
Entries with a report (will use report_date.report_date index):
SELECT sum(amount)
FROM transaction t JOIN report_date d ON id = transaction_id
WHERE d.report_date BETWEEN '2016-10-01' AND '2016-10-30'
Entries without a report (will use transaction.billed_date index):
SELECT sum(amount)
FROM transaction t LEFT JOIN report_date d ON id = transaction_id
WHERE d.report_date IS NULL AND t.billed_dateBETWEEN '2016-10-01' AND '2016-10-30'
Both queries can use an index. You just need to sum the results, wich can also be done combining the two queries:
SELECT (
SELECT sum(amount)
FROM transaction t JOIN report_date d ON id = transaction_id
WHERE d.report_date BETWEEN '2016-10-01' AND '2016-10-30'
) + (
SELECT sum(amount)
FROM transaction t LEFT JOIN report_date d ON id = transaction_id
WHERE d.report_date IS NULL AND t.billed_dateBETWEEN '2016-10-01' AND '2016-10-30'
) AS sum_amount

I finally find out the solution with the help from my brother:
SELECT sum(amount)
FROM transaction t LEFT JOIN report_date d ON id = transaction_id
WHERE (report_date BETWEEN '2016-10-01' AND '2016-10-30') OR (report_date IS NULL AND billed_date BETWEEN '2016-10-01' AND '2016-10-30')
Thank you for caring me!

Is fill table: report_date with absent values from table: transaction: the case?
SELECT id FROM report_date WHERE report_date BETWEEN '2016-10-01' AND '2016-10-30';
INSERT INTO report_date SELECT id, billed_date FROM transaction WHERE billed_date BETWEEN '2016-10-01' AND '2016-10-30' AND id NOT IN (ids_from previous_query);
SELECT sum(t.amount) FROM transaction LEFT JOIN report_date d ON (t.id = d.transaction_id) WHERE d.report_date BETWEEN '2016-10-01' AND '2016-10-30';

Your Second Query is correct,no need to re-write query. But I have one thing to tell you, which will help you a lot when dealing with thousand/millions of records. We have focus on some other things too. Because when your table contains large amount of data(in thousands and millions) of records then it takes time to execute query. It may causes locking also, might be query lock or database gone away kind of issue. To avoid this issue,you just create INDEX of one column. Create INDEX on that column which act/use on where clauses. Like in your case you can create INDEX on billed_date column from transaction table. Because your result is based on transaction table. For more details how to create index in mysql/phpmyadmin you can take reference from this http://www.yourwebskills.com/dbphpmyadmintable.php link.
I had been faced same issue at some point of time then I created INDEX on column. Now I am dealing with millions of records using mysql.

Related

MYSQL union how to maintain date field order when date came from 2 fields?

I have two tables Transactions and Expenses. I have written a query to get date wise transaction statement. Here Transactions table is deposit table. For this query I am getting my desire result without order date.
SELECT IFNULL(date(t1.created), date(ex.created)) as Date , sum(t1.amount) as ReceiveAmount,ex.amount as ExpensesAmount
FROM transactions as t1
LEFT JOIN (
SELECT sum(e.amount) as amount, created
FROM expenses as e
group by date(e.created)
) as ex
ON date(ex.created) = date(t1.created)
GROUP BY date(t1.created)
UNION
SELECT IFNULL(date(t1.created), date(ex.created)) as Date, sum(t1.amount) as Receive,ex.amount as ExpensesAmount
FROM transactions as t1
RIGHT JOIN (
SELECT sum(e.amount) as amount, created
FROM expenses as e
group by date(e.created)
) as ex
ON date(t1.created) = date(ex.created)
GROUP BY date(t1.created)
OUTPUT :
Date ReceiveAmount ExpensesAmount
2018-12-04 600 NULL
2019-08-01 500 NULL
2019-10-18 500 NULL
2019-11-18 820 500 <== that should come at last.
2019-11-04 NULL 100
I need to see date ASC order. Here last 2 date 2019-11-18 and 2019-11-04 not maintaining ORDER. How can I solve this problem ?
You may add an ORDER BY clause to your union query, after placing both halves of the union in parentheses:
(SELECT IFNULL(t1.created, DATE(ex.created)) AS Date, SUM(t1.amount) AS ReceiveAmount,
ex.amount AS ExpensesAmount
FROM transactions as t1
LEFT JOIN
...
)
UNION ALL
(SELECT IFNULL(t1.created, DATE(ex.created)), SUM(t1.amount), ex.amount
FROM transactions as t1
RIGHT JOIN
...
)
ORDER BY Date
I assume here that you really want a UNION ALL, and not a UNION. Note that in most other RDBMS you would have to use a formal subquery to apply an ORDER BY clause to the entire union query.

Need help on MySQL query, i need to get the starting balance and the end balance by date group by stock_id

I need to get the starting balance from the earliest date and the ending balance from month end and group by stock_id.
My table:
id stock_id balance transact_at
1 1 100 2018-06-15
2 1 70 2018-06-16
3 1 30 2018-06-31
4 2 50 2018-06-01
5 2 10 2018-03-31
I want output:
stock_id start_balance ending_balance
1 100 30
2 50 10
Try this one. In this one two inner queries are fetching starting balance and closing balance by getting minimum and maximum transact_at corresponding to a stock_id and then the parent query is combing the two queries to get starting and closing balance in an single row. I have also shared fiddle link below to try.
select
tabledata1.stock_id,
startBalance,
closingBalance
from (
select
table1.stock_id,
balance as startBalance
from table1 join
(
select stock_id,
min(transact_at) as transact_at
from Table1 group by stock_id
) startTransaction
on Table1.stock_id = startTransaction.stock_id and
Table1.transact_at = startTransaction.transact_at
) tabledata1
join (
select
table1.stock_id,
balance as closingBalance
from table1 join
(
select stock_id,
max(transact_at) as transact_at
from Table1 group by stock_id
) endTransaction
on Table1.stock_id = endTransaction.stock_id
and Table1.transact_at = endTransaction.transact_at
) tabledata2
on tabledata1.stock_id = tabledata2.stock_id;
Demo
One approach in MySQL would be to aggregate by stock_id once and find the opening and closing dates. Then, self-join twice to pull in the actual balances which occurred on those opening and closing dates.
SELECT
t1.stock_id,
t2.balance AS start_balance,
t3.balance AS ending_balance
FROM
(
SELECT
stock_id,
MIN(transact_at) AS min_transact_at,
MAX(transact_at) AS max_transact_at
FROM my_table
GROUP BY stock_id
) t1
INNER JOIN my_table t2
ON t1.stock_id = t2.stock_id AND t2.transact_at = t1.min_transact_at
INNER JOIN my_table t3
ON t1.stock_id = t3.stock_id AND t3.transact_at = t1.max_transact_at;
Demo
Note: For posterity's sake, when MySQL 8+ becomes the norm, we could make use of things like ROW_NUMBER here, which might make it easier to get the result we want.
Try This One.
SELECT stock_id,MAX(balance) as start_balance, MIN(balance) as ending_balance FROM tbl_balance GROUP BY stock_id

MySQL join tables and use two different records from joined table in two different columns of main table

I'm trying to create new table - new_data in which i'm going to store data about campaigns from two different tables
. I have table - campaign_revenue with a column data_date to indicate...data date, and another column revenue.
in another table - campaign_manager i have the columns revenue and revenue_yesterday . so what i want is to join the two tables and take the revenue and revenue_yesterday from campaign_revenue into new_data.
new record in the result table should look something like:
campaign_id | campaign_name | revenue | revenue_yesterday
43243242 | testing name | 109.02 | 159.43
where what we see is actually two records from campaign_revenue and for each date and campaign id and name from campaign_manager.
i've been trying quite few variations, but based on this answer
my last attempt was this:
SELECT campaign_id, campaign_name
FROM campaign_manager
UNION
SELECT
revenue
FROM campaign_revenue
WHERE data_date = '2018-02-13'
UNION
SELECT
revenue AS revenue_yesterday
FROM campaign_revenue
WHERE data_date = '2018-02-12'
it clearly didn't work but i hope it help''s understand what i'm trying to achive...thx
A self join would seem to be what you have in mind. You may join twice to the campaign_revenue table, once for today's revenue, and once for yesterday's revenue.
SELECT
cm.campaign_id,
cm.campaign_name,
cr1.data_date,
cr1.revenue AS revenue_today,
cr2.revenue AS revenue_yesterday
FROM campaign_manager cm
INNER JOIN campaign_revenue cr1
ON cm.campaign_id = cr1.campaign_id
LEFT JOIN campaign_revenue cr2
ON cm.campaign_id = cr2.campaign_id AND
cr1.data_date = DATE_ADD(cr2.data_date, INTERVAL 1 DAY)
-- WHERE cr1.data_date = CURDATE()
This answer assumes that your dates are contiguous, that is, there are no missing dates.
Try following query:
SELECT campaign_id, campaign_name FROM campaign_manager WHERE data_date IN ('2018-02-13', '2018-01-14', '2017-02-15')
UNION
SELECT title FROM data_table WHERE title IN ('2018-02-13', '2018-01-14')
you need to use two queries with union keyword.
For reference : you can go to 'UNION Syntax'

SQL get one time customers by email field

I have a database with over 100,000 records. I'm trying to get all customers who ordered only once searching by customer's email field (OrderEmail).
The SQL query is running for 10 minutes and then times out.
If I use short date ranges, I can get results but it still takes over 3 minutes.
How can I optimize the syntax to get it work?
SELECT
tblOrders.OrderID,
tblOrders.OrderName,
tblOrders.OrderEmail,
tblOrders.OrderPhone,
tblOrders.OrderCountry,
tblOrders.OrderDate
FROM
tblOrders
LEFT JOIN tblOrders AS orders_join ON orders_join.OrderEmail = tblOrders.OrderEmail
AND NOT orders_join.OrderID = tblOrders.OrderID
WHERE
orders_join.OrderID IS NULL
AND (tblOrders.OrderDate BETWEEN '2015-01-01' AND '2017-03-01')
AND tblOrders.OrderDelivered = - 1
ORDER BY
tblOrders.OrderID ASC;
I would expect the below to work - but I can't test it as you don't provide sample data. Well, I added a temporary table definition that could be used for the query ....
But , if you could actually change the data model to use an INTEGER id for the entity who placed the order (instead of a VARCHAR() email address), you would get considerably faster.
CREATE TEMPORARY TABLE IF NOT EXISTS
tblorders(orderid,ordername,orderemail,orderphone,ordercountry,orderdate) AS (
SELECT 1,'ORD01','adent#hog.com' ,'9-991' ,'UK', DATE '2017-01-01'
UNION ALL SELECT 2,'ORD02','tricia#hog.com','9-992' ,'UK', DATE '2017-01-02'
UNION ALL SELECT 3,'ORD03','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-03'
UNION ALL SELECT 4,'ORD04','zaphod#hog.com','9-9943','UK', DATE '2017-01-04'
UNION ALL SELECT 5,'ORD05','marvin#hog.com','9-9942','UK', DATE '2017-01-05'
UNION ALL SELECT 6,'ORD06','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-06'
UNION ALL SELECT 7,'ORD07','tricia#hog.com','9-992' ,'UK', DATE '2017-01-07'
UNION ALL SELECT 8,'ORD08','benji#hog.com' ,'9-995' ,'UK', DATE '2017-01-08'
UNION ALL SELECT 9,'ORD09','benji#hog.com' ,'9-995' ,'UK', DATE '2017-01-09'
UNION ALL SELECT 10,'ORD10','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-10'
)
;
SELECT
tblOrders.OrderID
, tblOrders.OrderName
, tblOrders.OrderEmail
, tblOrders.OrderPhone
, tblOrders.OrderCountry
, tblOrders.OrderDate
FROM tblOrders
JOIN (
SELECT
OrderEmail
FROM tblOrders
GROUP BY
OrderEmail
HAVING COUNT(*) = 1
) singleOrders
ON singleOrders.OrderEmail = tblOrders.OrderEmail
ORDER BY OrderID
;
OrderID|OrderName|OrderEmail |OrderPhone|OrderCountry|OrderDate
1|ORD01 |adent#hog.com |9-991 |UK |2017-01-01
4|ORD04 |zaphod#hog.com|9-9943 |UK |2017-01-04
5|ORD05 |marvin#hog.com|9-9942 |UK |2017-01-05
As you can see, it returns Mr. Dent, Zaphod and Marvin, who all occur only once in the example data.
Another approach that might work is that you group by email address and get only those with one entry. It may behave unpredictably if you want to get customers with multiple orders but it should be fine for this particular case:
SELECT
tblOrders.OrderID,
tblOrders.OrderName,
tblOrders.OrderEmail,
tblOrders.OrderPhone,
tblOrders.OrderCountry,
tblOrders.OrderDate,
count(tblOrders.OrderID) as OrderCount
FROM
tblOrders
WHERE
tblOrders.OrderDate BETWEEN '2015-01-01' AND '2017-03-01'
AND tblOrders.OrderDelivered = - 1
GROUP BY
tblOrders.OrderEmail
HAVING
OrderCount = 1
ORDER BY
tblOrders.OrderID ASC;
Also, I suspect that if you're seeing so long query times with just 100k records, you probably don't have an index on the OrderEmail column - I suggest setting that up and that might help with your original queries as well.
This does not work in Oracle, or SQL Server but it does work in MySQL and SQLite. So, while the code is not portable between different RDBMS, it works for this particular case.

MySQL SUM not working correctly after JOIN

I have 2 tables that look like the following:
TABLE 1 TABLE 2
user_id | date accountID | date | hours
And I'm trying to add up the hours by the week. If I use the following statement I get the correct results:
SELECT
SUM(hours) as totalHours
FROM
hours
WHERE
accountID = 244
AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY
accountID
But when I join the two tables I get a number like 336640 when it should be 12
SELECT
SUM(hours) as totalHours
FROM
hours
JOIN table1 ON
user_id = accountID
WHERE
accountID = 244
AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY
accountID
Does anyone know why this is?
EDIT: Turns out I just needed to add DISTINC, thanks!
JOIN operations usually generate more rows in the result table: join's result is a row for every possible pair of rows in the two joined tables that happens to meet the criterion selected in the ON clause. If there are multiple rows in table1 that match each row in hours, the result of your join will repeat hours.accountID and hours.hours many times. So, adding up the hours yields a high result.
The reason is that the table you are joining to matches multiple rows in the first table. These all get added together.
The solution is to do the aggregation in a subquery before doing the join:
select totalhours
from (SELECT SUM(hours) as totalHours
FROM hours
WHERE accountID = 244 AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY accountID
) h join
table1 t1
on t1.user_id = h.accountID;
I suspect your actual query is more complicated. For instance, table1 is not referenced in this query so the join is only doing filtering/duplication of rows. And the aggregation on hours is irrelevant when you are choosing only one account.
You should probably be specifying LEFT JOIN to be sure that it won't eliminate rows that don't match.
Also, date BETWEEN ? AND ? is preferable to date >= ? AND date < ?.