mysql query inner join on same table with different conditions - mysql

I have a table transactions and I trying to figure out our new customers in a given month. That means that if a customer didn't have a transaction in the time before the month he/she counts as a new customer.
I have figured out a way, but it is seriously inefficient and takes ages. I then came across this artikel which compares different methods. I have tried to adjust that approach to mine without success.
To visualise my problem:
|--------------------------- time period with all transactions -----------------------|
|----- period before month transactions = 0) ---|---- curr month transactions > 0 ----|
The table looks like this:
transactions
id, email, state, date_paid
My query:
SELECT
l.email
FROM
transactions as l
LEFT JOIN transactions as r ON r.email = l.email
WHERE
r.email IS NULL
AND l.state = 'paid'
AND r.state = 'paid'
AND l.date_paid <= '2013-12-31 23:59:59'
AND r.date_paid < '2013-12-01 00:00:00'
What am I doing wrong?

Try this:
SELECT l.email
FROM transactions AS l
LEFT JOIN transactions AS r ON r.email = l.email AND r.state = 'paid' AND r.date_paid < '2013-12-01 00:00:00'
WHERE r.email IS NULL AND l.state = 'paid' AND l.date_paid <= '2013-12-31 23:59:59'

try this:
SELECT l.email
FROM transactions l
WHERE NOT l.email IN (SELECT r.email
FROM transactions r
WHERE r.state = 'paid' AND r.date_paid < '2013-12-01 00:00:00')
AND l.state = 'paid' AND l.date_paid <= '2013-12-31 23:59:59'

Related

SQL calculate new column based on other queries with inner join

I'm building up a large SQL query to calculate everything in one go based on a table. I need a column called total_revenue, which should be the sum of total_commission and total_profit together.
Right now, I'm constructing a new inner join to do this, but this inner join that calculates total_revenue is an overhead.
How can I, instead, run a inner join on the result of my upper query to then do a sum of total_commission + total_profit without doing another inner join on another table, here's my current SQL:
SELECT
# Affilaite
Conversion.aff_id,
# Revenue
JoinedRevenue.total_revenue,
# Commission
JoinedCommission.total_commission,
# Profit
JoinedProfit.total_profit
FROM
tlp_conversions AS Conversion
INNER JOIN
(
SELECT
Commission.aff_id,
Commission.created,
SUM(Commission.amount) AS total_commission
FROM
tlp_commissions AS Commission
WHERE
Commission.created >= '2022-10-03 00:00:00'
AND
Commission.created <= '2022-10-03 23:59:59'
GROUP BY
Commission.aff_id
) AS JoinedCommission
ON JoinedCommission.aff_id = Conversion.aff_id
INNER JOIN
(
SELECT
Application.tlp_aff_id,
ApplicationResponse.application_id,
ApplicationResponse.result,
Commission.seller_code,
Commission.application_response_id,
Commission.created,
SUM(Commission.amount) AS total_profit
FROM
tlp_commissions AS Commission
INNER JOIN tlp_application_responses AS ApplicationResponse
ON ApplicationResponse.id = Commission.application_response_id
INNER JOIN tlp_applications AS Application
ON Application.id = ApplicationResponse.application_id
WHERE
Commission.created >= '2022-10-03 00:00:00'
AND
Commission.created <= '2022-10-03 23:59:59'
AND
ApplicationResponse.result = 'Accepted'
AND
Commission.seller_code = 44
GROUP BY
Application.tlp_aff_id
) AS JoinedProfit
ON JoinedProfit.tlp_aff_id = Conversion.aff_id
INNER JOIN
(
SELECT
Application.tlp_aff_id,
ApplicationResponse.application_id,
Commission.application_response_id,
Commission.created,
SUM(Commission.amount) AS total_revenue
FROM
tlp_commissions AS Commission
INNER JOIN tlp_application_responses AS ApplicationResponse
ON ApplicationResponse.id = Commission.application_response_id
INNER JOIN tlp_applications AS Application
ON Application.id = ApplicationResponse.application_id
WHERE
Commission.created >= '2022-10-03 00:00:00'
AND
Commission.created <= '2022-10-03 23:59:59'
GROUP BY
Application.tlp_aff_id
) AS JoinedRevenue
ON JoinedRevenue.tlp_aff_id = Conversion.aff_id
WHERE
Conversion.conversion_time >= '2022-10-03 00:00:00'
AND
Conversion.conversion_time <= '2022-10-03 23:59:59'
AND
Conversion.aff_id IS NOT NULL
GROUP BY
Conversion.aff_id
I was hoping I could just do another SQL join that loops over each returned result in my query and appends total_revenue based on the row column of each?
INNER JOIN
(
SELECT SUM(JoinedProfit.total_profit + JoinedCommission.total_commission) AS total_revenue
) AS JoinedRevenue
But this isn't the right syntax here.

SQL - GROUB BY - HAVING - MISSING ROWS

the following is the situation. I need to connect an order-table with a message-table. But i'm only interested in the first message(lowest message-id). The connection between the tables is the orderid.
$result = $this->db->executeS('
SELECT o.*, c.iso_code AS currency, s.name AS shippingMethod, m.message AS note
FROM '._DB_PREFIX_.'orders o
LEFT JOIN '._DB_PREFIX_.'currency c ON c.id_currency = o.id_currency
LEFT JOIN '._DB_PREFIX_.'message m ON m.id_order = o.id_order
LEFT JOIN '._DB_PREFIX_.'carrier s ON s.id_carrier = o.id_carrier
LEFT JOIN jtl_connector_link l ON o.id_order = l.endpointId AND l.type = 4
WHERE l.hostId IS NULL AND o.date_add BETWEEN DATE_SUB(NOW(), INTERVAL 1 WEEK) AND NOW()
GROUP BY o.id_order
HAVING MIN(m.id_message)
LIMIT '.$limit
);
This query works so far. But now orders without a message are missing.
Thank you for your help!
Markus
You want to select several orders and per order the first message. This is generally difficult in MySQL for the lack of window functions (e.g. ROW_NUMBER OVER). But as it's just one column from the message table you are interested in, you can use a subquery in the SELECT clause.
SELECT
o.*,
c.iso_code AS currency,
s.name AS shippingMethod,
(
SELECT m.message
FROM message m
WHERE m.id_order = o.id_order
ORDER BY m.id_message
LIMIT 1
) AS note
FROM orders o
JOIN currency c ON c.id_currency = o.id_currency
JOIN carrier s ON s.id_carrier = o.id_carrier
WHERE o.date_add BETWEEN DATE_SUB(NOW(), INTERVAL 1 WEEK) AND NOW()
AND NOT EXISTS
(
SELECT *
FROM jtl_connector_link l
WHERE l.endpointId = o.id_order
AND l.type = 4
);

Buyers structure by registration date query optimisation

I would like to show buyers structure by their registration date e.g.:
H12016 10.000 buyers
from which
2.000 registered in H12014
4.000 registered in H22014
etc.
I have two queries for that:
Number 1 (buyers from H12016 (about 50k records)):
SELECT DISTINCT
r.idUsera as id_usera
FROM
rezerwacje r
WHERE
r.dataZalozenia between '2016-01-01' and '2016-07-01'
and r.`status` = 'zabookowana'
ORDER BY
id_usera
Number 2 (users_ids and their registration (insert) date (about 3,8M users)):
SELECT
m.user_id,
date(m.action_date) as data_insert
FROM
mwids m
WHERE
m.`type` = 'insert'
Both queries separately run fine, but when I try to combine them like so:
SELECT DISTINCT
r.idUsera as id_usera,
t1.data_insert
FROM
rezerwacje r
LEFT JOIN
(
SELECT
m.user_id,
date(m.action_date) as data_insert
FROM
mwids m
WHERE
m.`type` = 'insert'
) t1 ON t1.user_id = r.idUsera
WHERE
r.dataZalozenia between '2016-01-01' and '2016-07-01'
and r.`status` = 'zabookowana'
ORDER BY
id_usera
this query runs "indefinetely" and I have to kill it after some time.
I do not belive it should run that long. If the query Number 2 was smaller i.e. about 1M users I could combine results in Excel in matter of seconds. So why is it not possible inside the database? What am I doing wrong?
SELECT DISTINCT
r.idUsera as id_usera,
t1.data_insert
FROM
rezerwacje r
INNER JOIN
(
SELECT
m.user_id,
date(m.action_date) as data_insert
FROM
mwids m
WHERE
m.`type` = 'insert'
) t1 ON t1.user_id = r.idUsera
WHERE
r.dataZalozenia between '2016-01-01' and '2016-07-01'
and r.`status` = 'zabookowana'
ORDER BY
id_usera
Try with INNER JOIN.
Query 1 needs
INDEX(status, dataZalozenia, id_usera)
Query 3: Rewrite thus:
If there is only one row in mwids for 'insert' per user:
SELECT r.idUsera as id_usera, DATE(m.action_date) AS data_insert
FROM rezerwacje r
LEFT JOIN mwids m ON m.user_id = r.idUsera
AND m.`type` = 'insert'
WHERE r.dataZalozenia >= '2016-01-01'
AND r.dataZalozenia < '2016-01-01' + 12 MONTH
and r.`status` = 'zabookowana'
ORDER BY r.idUsera
with
INDEX(status, dataZalozenia, isUsera) -- on r
INDEX(type, user_id, action_date) -- on m
If there can be multiple rows, do this:
SELECT r.idUsera as id_usera,
( SELECT DATE(m.action_date)
FROM mwids m
WHERE m.user_id = r.idUsera
AND m.`type` = 'insert'
LIMIT 1
) AS data_insert
FROM rezerwacje r
LEFT JOIN mwids m ON m.user_id = r.idUsera
AND m.`type` = 'insert'
WHERE r.dataZalozenia >= '2016-01-01'
AND r.dataZalozenia < '2016-01-01' + 12 MONTH
and r.`status` = 'zabookowana'
ORDER BY r.idUsera
But you will be getting a random action_date. So maybe you want MIN() or MAX()?

How to merge these results into one row

I have this code below. I am trying to merge the rows based on date.
SELECT
TICKETS.TICKETID,
RECEIPTS.DATENEW,
TAXCATEGORIES.NAME = 'GCT' as GCT,
TAXCATEGORIES.NAME = 'Tax 25%' as Tax25,
TAXLINES.AMOUNT,
SUM(TAXLINES.AMOUNT) AS TOTAL,
SUM(CASE WHEN taxcategories.NAME = 'GCT' THEN taxlines.AMOUNT ELSE 0 END) AS GCTTOTAL,
SUM(CASE WHEN taxcategories.NAME = 'Tax 25%' THEN taxlines.AMOUNT ELSE 0 END) AS TAX25TOTAL
FROM
RECEIPTS,
TAXLINES,
TAXES,
TAXCATEGORIES,
TICKETS,
PAYMENTS
WHERE
PAYMENTS.RECEIPT = RECEIPTS.ID
AND RECEIPTS.ID = TAXLINES.RECEIPT
AND RECEIPTS.ID = TICKETS.ID
AND TAXLINES.TAXID = TAXES.ID
AND TAXES.CATEGORY = TAXCATEGORIES.ID
AND DATENEW >= '2016-07-14 00:00:00' AND DATENEW <= '2016-07-14 23:00:00'
GROUP BY gct, Tax25, CAST(RECEIPTS.DATENEW AS DATE)
The Result of the query is attached in the screenshot below:
Now I need help to merge those rows that have the same date into one row. I am not sure where I am going wrong, I've tried a series of joins but I'm coming up blank.
It appears that the records with the same date actually differ in their values for the GCT and Tax25 columns. If you remove these columns from the GROUP BY clause, and instead aggregate them in the SELECT list, you would be left with a single record for the duplicate dates.
Note that in the query below I have replaced your implicit joins (using a comma-separated list of tables in the FROM clause) with explicit inner joins, with the join criteria in the ON clause. This is the standard way of writing queries now, and it makes them much easier to read. If INNER JOIN is too restrictive, then perhaps you intended to use LEFT JOIN instead.
SELECT TICKETS.TICKETID,
RECEIPTS.DATENEW,
MAX(TAXCATEGORIES.NAME) = 'GCT' as GCT, -- one value per date
MAX(TAXCATEGORIES.NAME) = 'Tax 25%' as Tax25, -- one value per date
TAXLINES.AMOUNT,
SUM(TAXLINES.AMOUNT) AS TOTAL,
SUM(CASE WHEN taxcategories.NAME = 'GCT' THEN taxlines.AMOUNT ELSE 0 END) AS GCTTOTAL,
SUM(CASE WHEN taxcategories.NAME = 'Tax 25%' THEN taxlines.AMOUNT ELSE 0 END) AS TAX25TOTAL
FROM RECEIPTS
INNER JOIN TAXLINES
ON RECEIPTS.ID = TAXLINES.RECEIPT
INNER JOIN TAXES
ON TAXLINES.TAXID = TAXES.ID
INNER JOIN TAXCATEGORIES
ON TAXES.CATEGORY = TAXCATEGORIES.ID
INNER JOIN TICKETS
ON RECEIPTS.ID = TICKETS.ID
INNER JOIN PAYMENTS
ON PAYMENTS.RECEIPT = RECEIPTS.ID
WHERE DATENEW >= '2016-07-14 00:00:00' AND
DATENEW <= '2016-07-14 23:00:00'
GROUP BY CAST(RECEIPTS.DATENEW AS DATE)

Help calculating average per day

The daily_average column is always returning zero. The default timestamp values are for the past week. Any thoughts on what I'm doing wrong here in getting the average order value per day?
SELECT
SUM(price+shipping_price) AS total_sales,
COUNT(id) AS total_orders,
AVG(price+shipping_price) AS order_total_average,
(SELECT
SUM(quantity)
FROM `order_product`
INNER JOIN `order` ON (
`order`.id = order_product.order_id AND
`order`.created >= '.$startTimestamp.' AND
`order`.created <= '.$endTimestamp.' AND
`order`.type_id = '.$type->getId().' AND
`order`.fraud = 0
)
) as total_units,
SUM(price+shipping_price)/DATEDIFF('.$endTimestamp.', '.$startTimestamp.') as daily_average
FROM `order`
WHERE created >= '.$startTimestamp.' AND
created <= '.$endTimestamp.' AND
fraud = 0 AND
type_id = '.$type->getId().'
You're using aggregate functions (SUM, COUNT, AVG) without an aggregate command (group by). I think your SQL is more complicated than it needs to be (no need for the inner select).
Here's a SQL command that should work (hard to test without test data ;))
SELECT
COUNT(id) total_orders,
SUM(finalprice) total_sales,
AVG(finalprice) order_average,
SUM(units) total_units,
SUM(finalprice)/DATEDIFF('.$endTimestamp.', '.$startTimestamp.') daily_average
FROM (
SELECT
o.id id,
o.price+o.shipping_price finalprice,
SUM(p.quantity) units
FROM order o INNER JOIN order_product p ON p.order_id=o.id
WHERE o.created>='.$startTimestamp.'
AND o.created<='.$endTimestamp.'
AND o.fraud=0
AND o.type_id='.$type->getId().'
GROUP BY p.order_id
) t;
Does casting one of the elements in the division work for you?
SELECT
SUM(price+shipping_price) AS total_sales,
COUNT(id) AS total_orders,
AVG(price+shipping_price) AS order_total_average,
(SELECT
SUM(quantity)
FROM `order_product`
INNER JOIN `order` ON (
`order`.id = order_product.order_id AND
`order`.created >= '.$startTimestamp.' AND
`order`.created <= '.$endTimestamp.' AND
`order`.type_id = '.$type->getId().' AND
`order`.fraud = 0
)
) as total_units,
CAST(SUM(price+shipping_price) AS float)/DATEDIFF('.$endTimestamp.', '.$startTimestamp.') as daily_average
FROM `order`
WHERE created >= '.$startTimestamp.' AND
created <= '.$endTimestamp.' AND
fraud = 0 AND
type_id = '.$type->getId().'