Looking for optimizations for huge table queries - mysql

I have a table with transactions between multiple parties and want to query it to create visualization and perform basic accounting.
Table is like this:
+----+-----------+--------+----------------------------+-------------+------------+-------+-------+
| id |transaction| amount | logged_at | buyer | seller | b_pos | s_pos |
+----+-----------+--------+----------------------------+-------------+------------+-------+-------+
| 1 | 1 | 125000 | 2017-05-28 21:54:53.069000 | store2 | store1 | 4 | 5 |
| 2 | 1 | 109375 | 2017-05-28 21:54:53.069000 | store3 | store2 | 3 | 4 |
| 3 | 1 | 75000 | 2017-05-28 21:54:53.069000 | store4 | store3 | 2 | 3 |
| 4 | 1 | 100000 | 2017-05-28 21:54:53.069000 | store5 | store4 | 1 | 2 |
| 5 | 2 | 50000 | 2017-05-28 21:55:53.069000 | store5 | store3 | 1 | 2 |
So b_pos and s_pos is a position of a store in chain of transactions between those. So chain looks like store1 -> store2 -> store3 and so on.
So I am trying to do several things with my SQL.
Create all distinct path between parties for visualization
Calculate total amount of goods being sold between parties in those paths
Total number of transactions was performed in these distinct paths.
Here's my SQL. The only problem is this query is getting pretty slow with table over 1mil records (30 sec) and my table could be as big as 700mil records. How should I approach this problem?
I can restrict queries to certain time intervals but it needs to be reasonably fast.
SELECT seller, CONCAT(seller, s_pos - offset) seller_id, buyer,
CONCAT(buyer, b_pos - offset) buyer_id, SUM(amount), cnt as transactions, amount/cnt as ROI
FROM transaction_table
JOIN (SELECT DISTINCT transaction,
CASE
WHEN seller = 'store3' THEN s_pos
WHEN buyer = 'store3' THEN b_pos
END AS offset
FROM
transaction_table
WHERE buyer = 'store3' OR seller = 'store3'
AND logged_at >= '2014-06-23 17:34:20'
AND logged_at <= '2018-06-23 17:34:00'
) ck_offset
ON transaction_table.transaction = ck_offset.transaction
JOIN
(SELECT transaction, count(transaction) as cnt from (select * from transaction_table
WHERE buyer = 'store3' OR seller = 'store3'
AND logged_at >= '2014-06-23 17:34:20'
AND logged_at <= '2018-06-23 17:34:00' group by transaction, logged_at) AS dist_chainkeys
group BY transaction) key_counts
ON key_counts.transaction = ck_offset.transaction
WHERE logged_at >= '2014-06-23 17:34:20'
AND logged_at <= '2018-06-23 17:34:00'
GROUP BY seller, seller_id, buyer, buyer_id;

Do your tables use indexes? Indexes are the way to boost the performance. Have a look to this:
What is an index in SQL?

Related

How to select rows with the latest date and calculate another field based on the row

I have two tables i.e vehicle and vehicle_maintenance.
vehicle
-----------------------------------
| v_id | v_name | v_no |
-----------------------------------
| 1 | car1 | car123 |
-----------------------------------
| 2 | car2 | car456 |
-----------------------------------
vehicle_maintenance
-----------------------------------------------------------------------
| v_main_id | v_id | v_main_date | v_main_remainder |
-----------------------------------------------------------------------
| 1 | 1 | 2020/10/10 | 1 |
| 2 | 1 | 2020/10/20 | 2 |
| 3 | 2 | 2020/10/04 | 365 |
| 4 | 2 | 2020/10/15 | 5 |
-----------------------------------------------------------------------
I want to get each car maintenance details i.e car2 maintenance date is 2020/10/15 and i want to check next maintenance date based on v_main_remainder field. That means next maintenance date will be 2020/10/20 ( add 5 day to the maintenance date). I want to also calculate the no of days left from next maintenance date. Suppose today is 2020/10/10 then it will show 10 days left.
Here is my query
SELECT
v.v_id,
v.v_name,
v.v_no,
max(vm.v_main_date) as renewal_date,
datediff(
DATE_ADD(
max(vm.v_main_date), INTERVAL +vm.v_main_remainder day
),
now()
) as day_left
FROM vehicle as v, vehicle_maintenance as vm
GROUP BY v.v_id
But the problem is vm.v_main_remainder in date_add function taken from first row.
Here is the result
-----------------------------------------------------------------------
| v_id | v_name | v_no | renewal_date | day_left |
-----------------------------------------------------------------------
| 1 | car1 | car123 | 2020/10/20 | 11 |
-----------------------------------------------------------------------
| 2 | car2 | car456 | 2020/10/15 | 370 |
-----------------------------------------------------------------------
As a starter, your query is obviously missing a join condition between the two tables, so that's a cartesian product. This type of problem is much easier to spot when using explicit joins.
Then: you want to filter on the latest maintenance record per car, so aggregation is not appropriate.
One option uses window functions, available in MySQL 8.0:
select v.v_id, v.v_name, v.v_no, vm.v_main_date as renewal_date,
datediff(vm.v_main_date + interval vm.v_main_remainder day, current_date) as day_left
from vehicle as v
inner join (
select vm.*, row_number() over(partition by v_id order by v_main_date desc) rn
from vehicle_maintenance
) as vm on vm.v_id = v.v_id
where vm.rn = 1
Note that I changed now() to current_date, so datediff() works consistently on dates rather than datetimes.

How to select sum of specific id in select query MySQL, Beego

I want to get a result like
result
-------------------------------------------------------
id | uuid | user_id |created_date | amount | name
-------------------------------------------------------
1 | ABC | 1 | 2019/5/1 | 5 | xa
2 | PQR | 2 | 2019/5/5 | 150 | xb
A query that I trying to use
SELECT(SELECT SUM(paid_amount) WHERE ID = t1.**HERE**) AS sub1,
(t1.amount - sub1) AS sub2
FROM invoice t1 CROSS JOIN
invoice_paid t2;
Table struct in my DB
table invoice_paid
------------------------------------
id | uuid | paid_date | paid_amount
------------------------------------
1 | ABC | 2019/5/1 | 15
2 | ABC | 2019/5/5 | 80
table invoice
-------------------------------------------------------
id | uuid | user_id |created_date | amount | name
-------------------------------------------------------
1 | ABC | 1 | 2019/5/1 | 100 | xa
2 | PQR | 2 | 2019/5/5 | 150 | xb
I can use sum only 1 condition like where id = 1 but how do I combine this query in select query with a join query.
I use beego(golang), MariaDB
You can use this query. It JOINs the invoice table to a derived table of SUMs of all the amounts paid per invoice from invoice_paid, subtracting that total from the invoice amount to get the outstanding amount:
SELECT i.id, i.uuid, i.user_id, i.created_date, i.amount - COALESCE(p.amount, 0) AS amount, i.name
FROM invoice i
LEFT JOIN (SELECT uuid, SUM(paid_amount) AS amount
FROM invoice_paid
GROUP BY uuid) p ON p.uuid = i.uuid
ORDER BY i.id
Output:
id uuid user_id created_date name amount
1 ABC 1 2019-05-01 00:00:00 xa 5
2 PQR 2 2019-05-05 00:00:00 xb 150
Demo on dbfiddle

select data from multiple tables in one mysql query

I have three tables
1 Policies
| date | policy_No | client_no | premium | policy_type |
| 2019-01-23 | 10002 | 1570 | 4000 | New policy |
| 2019-03-15 | 10003 | 1570 | 16000 | Renewal policy|
2 Endorsements
|date |client_no | policy_no| premium| endorsement_type|
|2019-02-17 | 1570 | 10002 | 2000 | Debit
|2019-03-17 | 1570 | 10003 | -4000 | Credit
3 Payment
| date | client_id | policy_no| amount|
| 2019-03-16| 1570 | 10003 | 10000 |
expected result
| date | type | amount|
| 2019-01-23 | New Policy | 4000 |
| 2019-02-17 | Debit endorsement | 2000 |
| 2019-03-15 | Renewal policy | 16000 |
| 2019-03-16 | Payment | 10000 |
| 2019-03-17 | Credit endorsement| -4000 |
how do i achieve this in one MySQL query
We can try using a union query here:
SELECT date, policy_type AS type, premium AS amount FROM Policies
UNION ALL
SELECT
date,
CASE WHEN endorsement_type = 'Debit' THEN 'Debit endorsement'
WHEN endorsement_type = 'Credit' THEN 'Credit endorsement' END,
premium
FROM Endorsements
UNION ALL
SELECT date, 'Payment', amount FROM Payment
ORDER BY date;
your tables have 2 common fields and it's not easy to determine which is serving as the primary key or foreign key. However, to select fields from different tables, You can use LEFT JOIN RIGHT JOIN INNER JOIN or FULL OUTER JOIN.
You should read this post from W3schools to decide which is the best option for you.
The typical syntax is like
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
Hope, all three tables have the foreign key. So, we can able to use JOIN concept also like below code,
SELECT * from Policies PS
INNER JOIN Endorsements E ON E.policy_no = PS.policy_no
INNER JOIN Payment P ON P.client_id = PS.client_no
ORDER BY date ASC
Thanks,
Try that,
SELECT date, policy_type type, premium amount
FROM Policies
UNION ALL
SELECT date,
CASE WHEN endorsement_type = 'Debit' THEN 'Debit endorsement'
WHEN endorsement_type = 'Credit' THEN 'Credit endorsement'
END type,
premium amount
FROM Endorsements
UNION ALL
SELECT date, 'Payment', amount FROM Payment

Get the balance of my users in the same table

Help please, I have a table like this:
| ID | userId | amount | type |
-------------------------------------
| 1 | 10 | 10 | expense |
| 2 | 10 | 22 | income |
| 3 | 3 | 25 | expense |
| 4 | 3 | 40 | expense |
| 5 | 3 | 63 | income |
I'm looking for a way to use one query and retrive the balance of each user.
The hard part comes when the amounts has to be added on expenses and substracted on incomes.
This would be the result table:
| userId | balance |
--------------------
| 10 | 12 |
| 3 | -2 |
You need to get each totals of income and expense using subquery then later on join them so you can subtract expense from income
SELECT a.UserID,
(b.totalIncome - a.totalExpense) `balance`
FROM
(
SELECT userID, SUM(amount) totalExpense
FROM myTable
WHERE type = 'expense'
GROUP BY userID
) a INNER JOIN
(
SELECT userID, SUM(amount) totalIncome
FROM myTable
WHERE type = 'income'
GROUP BY userID
) b on a.userID = b.userid
SQLFiddle Demo
This is easiest to do with a single group by:
select user_id,
sum(case when type = 'income' then amount else - amount end) as balance
from t
group by user_id
You could have 2 sub-queries, each grouped by id: one sums the incomes, the other the expenses. Then you could join these together, so that each row had an id, the sum of the expenses and the sum of the income(s), from which you can easily compute the balance.

Doing complex ordering with a MySQL query

I'm having trouble with a Mysql Query that require some "complex" ordering.
I've 2 tables:
Training
+--------------+------------------+
| training_id | training_name |
+--------------+------------------+
| 1 | test1 |
| 2 | test2 |
| 3 | test3 |
+--------------+------------------+
Training_venue
+----------+--------------+------------+
| venue_id | training_id | venue_date |
+----------+--------------+------------+
| 1 | 2 | 2009-06-01 |
| 2 | 2 | 2012-06-01 |
| 3 | 2 | 2011-06-01 |
| 4 | 1 | 2009-09-01 |
| 5 | 1 | 2011-09-01 |
| 6 | 1 | 2012-09-01 |
| 7 | 3 | 2009-01-01 |
+----------+--------------+------------+
And I'm expecting the following results:
+--------------+------------------+------------+--------------+
| training_id | training_name | venue_id | venue_date |
+--------------+------------------+------------+--------------+
| 2 | test2 | 2 | 2011-06-01 |
| 2 | test2 | 3 | 2012-06-01 |
| 1 | test1 | 6 | 2011-09-01 |
| 1 | test1 | 5 | 2012-09-01 |
+--------------+------------------+------------+--------------+
As you can see, the result requirement are:
A training with no future venue is discarded.
Expired venue are discarded
The trainings are "grouped" together
The trainings with the soonest venue is first, the training with the "latest soonest venue" is last
Inside the training, the venues are ordered from the soonest to the latest.
What mysql query will return that result set ?
Edit:
Here's what I've tried so far:
SELECT *
FROM `training` AS t
LEFT JOIN `training_venue` AS v USING ( `training_id` )
WHERE `venue_date` >= NOW()
ORDER BY `training_id;
But if the order by training_id take care of keeping all the training "grouped" together, it doesnt permit to order the training from the training with the soonest venue to the training with the lastest venue.
I also edited the data table to illustrate that problematic. See how the results are ordered, it's not by training_id, but by soonest venue.
Edit:
Corrected the dates.
SELECT t1.training_id, t1.training_name, t2.venue_id, t2.venue_date
FROM Training t1
INNER JOIN Training_venue t2 ON t1.training_id = t2.training_id
WHERE t2.venue_date >= NOW()
ORDER BY t1.training_id ASC, t2.venue_date ASC
I don't know what your last point is getting at: "Inside the training, the venues are ordered from the soonest to the latest." But the above query seems to match the rest of your needs.
EDIT: I now sort of understand better what you are after. And it is a tad complicated I think. I'll have another think about it.
EDIT: I think I have it!
SELECT t1.training_id, t1.training_name, t2.venue_id, t2.venue_date
FROM Training t1
INNER JOIN (SELECT training_id, venue_date
FROM training_venue
WHERE venue_date >= NOW()
GROUP BY training_id
ORDER BY MIN(venue_date)) t3 ON t1.training_id = t3.training_id
INNER JOIN Training_venue t2 ON t1.training_id = t2.training_id
WHERE t2.venue_date >= NOW()
ORDER BY t3.venue_date DESC, t2.venue_date ASC
Try it!
EDIT: Was using '2010-01-01' instead of NOW() as NOW() would lose the 2010 dates that you seemed to want included.