I am working with the tpch database and have a query that I want to optimize with faster runtimes.
I tried adding indexes and views to the query, but they are not improving performance. Could someone please provide suggestions? Thanks.
Connection:
conn = mysql.connect(host = 'relational.fit.cvut.cz', port = int(3306), user = 'guest', passwd = 'relational', db = 'tpch')
Query:
WITH customer_lifetime_value AS (
SELECT
c_custkey,
c_name,
c_address,
c_nationkey,
c_phone,
c_acctbal,
c_mktsegment,
c_comment,
SUM(o_totalprice) AS ltv
FROM customer
JOIN orders
ON o_custkey = c_custkey
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8
)
SELECT
r_name,
MAX(ltv) AS best_customer_value
FROM region
JOIN nation
ON n_regionkey = r_regionkey
JOIN customer_lifetime_value clv
ON clv.c_nationkey = n_nationkey
GROUP BY 1;
Could you try this? It should read less data and give you the same output:
WITH customer_lifetime_value AS
(
SELECT o_custkey
,SUM(o_totalprice) AS ltv
FROM orders
GROUP BY o_custkey
)
SELECT
r_name,
MAX(ltv) AS best_customer_value
FROM customer_lifetime_value
JOIN customer
ON o_custkey = c_custkey
JOIN nation
ON c_nationkey = n_nationkey
JOIN region
ON n_regionkey = r_regionkey
GROUP BY r_name
If it is correct, you can create simple indexes:
on orders table including only the o_custkey and o_totalprice
on customer table including only c_custkey and c_nationkey
Related
My tables are
TRANSACTION TABLE
transaction_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
customer_id INT,
inventory_id INT,
kiosk_id INT,
rental_out DATETIME,
rental_proposal INT,
rental_due DATETIME,
rental_cost FLOAT,
rental_in DATETIME,
rental_period INT,
rental_past_due INT,
late_fee INT
INVENTORY TABLE
inventory_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
title_id INT,
title_name VARCHAR(255),
genre_id INT,
genre_name VARCHAR(255),
qty INT
I'm trying to figure out a way in order to create a query that allows the inventory_id from the transaction table to create a count of the genre's of each transaction. My query allows to find the number of genres transacted, but only one at a time.
SELECT COUNT(genre_id)
FROM inventory
INNER JOIN transactions
ON inventory.title_id = transactions.inventory_id
WHERE transactions.customer_id = 1 and inventory.genre_id = 1;
I'd like to figure out a way to join the table multiple times to display the number of times each genre has been rented, being currently existing genres 1, 2, 3, 4, 5, 6
So far, I've come up these queries, but I don't see a logical way to the solution.
SELECT COUNT(A.genre_id) as GENRE_A, COUNT(A.genre_id) as GENRE_B, COUNT(A.genre_id) as GENRE_C FROM inventory A
INNER JOIN transactions D ON A.title_id = D.inventory_id
INNER JOIN transactions E ON A.title_id = E.inventory_id
INNER JOIN transactions F ON A.title_id = F.inventory_id
WHERE A.genre_id = 1 AND A.genre_id = 2 and A.genre_id = 3;
SELECT COUNT(A.inventory_id), COUNT(B.inventory_id), COUNT(C.inventory_id) FROM transactions A, transactions B, transactions C
INNER JOIN inventory D ON A.inventory_id = D.title_id
INNER JOIN inventory E ON A.inventory_id = E.title_id
INNER JOIN inventory F ON A.inventory_id = F.title_id
WHERE A.genre_id = 1 AND B.genre_id = 2 and C.genre_id = 3;
I've tried multiple variations, some of which I've deleted and haven't posted, but I can't seem to figure it out. Is there any solution? Any help would be greatly appreciated. Thank you!
Just use conditional aggregation.
Only 1 join needed.
This will count the transactions for the 3 genres
SELECT
-- trans.customer_id,
COUNT(CASE WHEN inv.genre_id = 1 THEN trans.transaction_id END) AS genre1,
COUNT(CASE WHEN inv.genre_id = 2 THEN trans.transaction_id END) AS genre2,
COUNT(CASE WHEN inv.genre_id = 3 THEN trans.transaction_id END) AS genre3
FROM transactions trans
JOIN inventory inv ON inv.inventory_id = trans.inventory_id
WHERE inv.genre_id IN (1, 2, 3)
-- GROUP BY trans.customer_id
Why not put the results on multiple rows?
SELECT genre_id, COUNT(*)
FROM inventory i INNER JOIN
transactions t
ON i.title_id = t.inventory_id
WHERE t.customer_id = 1 and i.genre_id IN (1, 2, 3)
GROUP BY genre_id;
I have a MYSQL query I'm working on that pulls data from multiple joins.
select students.studentID, students.firstName, students.lastName, userAccounts.userID, userstudentrelationship.userID, userstudentrelationship.studentID, userAccounts.getTexts, reports.pupID, contacts.pfirstName, contacts.plastName, reports.timestamp
from userstudentrelationship
join userAccounts on (userstudentrelationship.userID = userAccounts.userID)
join students on (userstudentrelationship.studentID = students.studentID)
join reports on (students.studentID = reports.studentID)
join contacts on (reports.pupID = contacts.pupID)
where userstudentrelationship.studentID = "10000005" AND userAccounts.getTexts = 1 ORDER BY reports.timestamp DESC LIMIT 1
I have a unique situation where I would like one of the joins (the reports join) to be limited to the latest result only for that table (order by reports.timestamp desc limit 1 is what I use), while not limiting the result quantities for the overall query.
By running the above query I get the data I would expect, but only one record when it should return several.
My question:
How can I modify this query to ensure that I receive all possible records available, while ensuring that only the latest record from the reports join used? I expect that each record will possibly contain different data from the other joins, but all records returned by this query will share the same report record
Provided I understand the issue; one could add a join to a set of data (aliased Z below) that has the max timestamp for each student; thereby limiting to one report record (most recent) for each student.
SELECT students.studentID
, students.firstName
, students.lastName
, userAccounts.userID
, userstudentrelationship.userID
, userstudentrelationship.studentID
, userAccounts.getTexts
, reports.pupID
, contacts.pfirstName
, contacts.plastName
, reports.timestamp
FROM userstudentrelationship
join userAccounts
on userstudentrelationship.userID = userAccounts.userID
join students
on userstudentrelationship.studentID = students.studentID
join reports
on students.studentID = reports.studentID
join contacts
on reports.pupID = contacts.pupID
join (SELECT max(timestamp) mts, studentID
FROM REPORTS
GROUP BY StudentID) Z
on reports.studentID = Z.studentID
and reports.timestamp = Z.mts
WHERE userstudentrelationship.studentID = "10000005"
AND userAccounts.getTexts = 1
ORDER BY reports.timestamp
for get all the records you should avoid limit 1 at the end of the query
for join anly one row from reports table you could use subquery as
select
students.studentID
, students.firstName
, students.lastName
, userAccounts.userID
, userstudentrelationship.userID
, userstudentrelationship.studentID
, userAccounts.getTexts
, t.pupID
, contacts.pfirstName
, contacts.plastName
, t.timestamp
from userstudentrelationship
join userAccounts on userstudentrelationship.userID = userAccounts.userID
join students on userstudentrelationship.studentID = students.studentID
join (
select * from reports
order by reports.timestamp limit 1
) t on students.studentID = t.studentID
join contacts on reports.pupID = contacts.pupID
where userstudentrelationship.studentID = "10000005"
AND userAccounts.getTexts = 1
I have this query:
SELECT `assemblies`.`id`,
`assemblies`.`type`,
`assemblies`.`champion`,
`assemblies`.`name`,
`assemblies`.`author`,
`assemblies`.`githublastmod`,
( assemblies.forum IS NOT NULL ) AS forumExists,
Count(votes.id) AS votesCount,
Count(install_clicks.id) AS installCount,
Count(github_clicks.id) AS githubCount,
Count(forum_clicks.id) AS forumCount
FROM `assemblies`
INNER JOIN `votes`
ON `votes`.`assembly` = `assemblies`.`id`
INNER JOIN `install_clicks`
ON `install_clicks`.`assembly` = `assemblies`.`id`
INNER JOIN `github_clicks`
ON `github_clicks`.`assembly` = `assemblies`.`id`
INNER JOIN `forum_clicks`
ON `forum_clicks`.`assembly` = `assemblies`.`id`
WHERE `assemblies`.`type` = 'utility'
AND Unix_timestamp(Date(assemblies.githublastmod)) > '1419536536'
GROUP BY `assemblies`.`id`
ORDER BY `votescount` DESC,
`githublastmod` DESC
For some reason this query is very slow, I'm using the database engine MyISAM. I hope someone can help me out here :)
Explain command:
I believe this is a case where making the subqueries for the counts will make it run a lot faster (and the values will be correct).
The problem with the original query is the explosion of the number of intermediate rows: For each 'assembly', there were n1 votes, n2 installs, etc. That led to n1*n2*... rows per assembly.
SELECT `assemblies`.`id`, `assemblies`.`type`, `assemblies`.`champion`,
`assemblies`.`name`, `assemblies`.`author`, `assemblies`.`githublastmod`,
( assemblies.forum IS NOT NULL ) AS forumExists,
( SELECT Count(*)
FROM votes
WHERE `assembly` = `assemblies`.`id`
) AS votesCount,
( SELECT Count(*)
FROM install_clicks
WHERE `assembly` = `assemblies`.`id`
) AS installCount,
( SELECT Count(*)
FROM github_clicks
WHERE `assembly` = `assemblies`.`id`
) AS githubCount,
( SELECT Count(*)
FROM forum_clicks.id
WHERE `assembly` = `assemblies`.`id`
) AS forumCount
FROM `assemblies`
WHERE `assemblies`.`type` = 'utility'
AND Unix_timestamp(Date(assemblies.githublastmod)) > '1419536536'
ORDER BY `votescount` DESC, `githublastmod` DESC
Each secondary table needs an INDEX starting with assembly.
Your problem should be fixed using the right indices:
CREATE INDEX index_name_1 ON `votes`(`assembly`);
CREATE INDEX index_name_2 ON `install_clicks`(`assembly`);
CREATE INDEX index_name_3 ON `github_clicks`(`assembly`);
CREATE INDEX index_name_4 ON `forum_clicks`(`assembly`);
Try your query again after creating these indices and it should be quite faster.
I have been struggling with this for a while now so maybe someone can shed some insight.
We have a practice query that comes from a TVShow database that for this problem has 4 tables.
This is the query: Sponsors that sponsor all tv shows by ABC
What I have tried so far is this but it doesn't seem to be working:
SELECT DISTINCT RSPONSOR.SPONSOR_NAME
FROM RSPONSOR
WHERE NOT EXISTS (
SELECT *
FROM RTVSHOW
WHERE NOT EXISTS (
SELECT *
FROM RSPONSORBY
WHERE RSPONSOR.SPONSOR_NAME = RSPONSORBY.SPONSOR_NAME
AND RSPONSORBY.SHOW_NUM = RTVSHOW.SHOW_NUM
AND RTVSHOW.NETWORK_ID = 'ABC'
)
);
Would love any help! Thanks in advance.
Here are the tables for reference
--RTVSHOW--
SHOW_NUM NUMBER
SHOW_NAME VARCHAR2(20 BYTE)
START_MONTH NUMBER
START_YEAR NUMBER
END_MONTH NUMBER
END_YEAR NUMBER
NETWORK_ID VARCHAR2(20 BYTE)
DISTR_NAME VARCHAR2(20 BYTE)
--RSPONSOR--
SPONSOR_NAME
PARENT_NAME
--RSPONSORBY--
SHOW_NUM
SPONSOR_NAME
--RNETWORK--
NETWORK_ID
NETWORK_HQ
PARENT_NAME
select rSponsor.sponsor_name, rSponsor.parent_name
from rSponsor
join rSponsorBy
on rSponsor.sponsor_name = rSponsorBy.sponsor_name
join rTVShow
on rSponsorBy.show_num = rTVShow.show_num
join rNetwork
on rTVShow.network_id = rNetwork.network_id
where rNetwork.network_id = 'ABC'
group by rSponsor.sponsor_name, rSponsor.parent_name
having count(distinct rTVShow.show_num) = --ABC shows sponsored by this sponsor
(
select count(distinct rTVShow.show_num) --ABC shows
from rTVShow
join rNetwork
on rTVShow.network_id = rNetwork.network_id
where
rNetwork.network_id = 'ABC'
);
I tried to simplify my question to a basic example I wrote down below, the actual problem is much more complex so the below queries might not make much sense but the basic concepts are the same (data from one query as argument to another).
Query 1:
SELECT Ping.ID as PingID, Base.ID as BaseID FROM
(SELECT l.ID, mg.DateTime from list l
JOIN mygroup mg ON mg.ID = l.MyGroup
WHERE l.Type = "ping"
ORDER BY l.ID DESC
) Ping
INNER JOIN
(SELECT l.ID, mg.DateTime from list l
JOIN mygroup mg ON mg.ID = l.MyGroup
WHERE l.Type = "Base"
ORDER BY l.ID DESC
) Base
ON Base.DateTime < Ping.DateTime
GROUP BY Ping.ID
ORDER BY Ping.ID DESC;
+--------+--------+
| PingID | BaseID |
+--------+--------+
| 11 | 10 |
| 9 | 8 |
| 7 | 6 |
| 5 | 3 |
| 4 | 3 |
+--------+--------+
// from below I need to replace 11 by PingID above and 10 by BaseID above then the results to show up on as third column above (0 if no results, 1 if results)
Query 2:
SELECT * FROM
(SELECT sl.Data FROM list l
JOIN sublist sl ON sl.ParentID = l.ID
WHERE l.Type = "ping" AND l.ID = 11) Ping
INNER JOIN
(SELECT sl.Data FROM list l
JOIN sublist sl ON sl.ParentID = l.ID
WHERE l.Type = "base" AND l.ID = 10) Base
ON Base.Data < Ping.Data;
How can I do this? Again I'm not sure what kind of advice I will receive but please understand that the Query 2 is in reality over 200 lines and I basically can't touch it so I don't have so much flexibility as I'd like and ideally I'd like to get this working all in SQL without having to script this.
CREATE DATABASE lookback;
use lookback;
CREATE TABLE mygroup (
ID BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
DateTime DateTime
) ENGINE=InnoDB;
CREATE TABLE list (
ID BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
Type VARCHAR(255),
MyGroup BIGINT NOT NULL,
Data INT NOT NULL
) ENGINE=InnoDB;
CREATE TABLE sublist (
ID BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
ParentID BIGINT NOT NULL,
Data INT NOT NULL
) ENGINE=InnoDB;
INSERT INTO mygroup (DateTime) VALUES ("2012-03-09 22:33:19"), ("2012-03-09 22:34:19"), ("2012-03-09 22:35:19"), ("2012-03-09 22:36:19"), ("2012-03-09 22:37:19"), ("2012-03-09 22:38:19"), ("2012-03-09 22:39:19"), ("2012-03-09 22:40:19"), ("2012-03-09 22:41:19"), ("2012-03-09 22:42:19"), ("2012-03-09 22:43:19");
INSERT INTO list (Type, MyGroup, Data) VALUES ("ping", 1, 4), ("base", 2, 2), ("base", 3, 4), ("ping", 4, 7), ("ping", 5, 8), ("base", 6, 7), ("ping", 7, 8), ("base", 8, 3), ("ping", 9, 10), ("base", 10, 2), ("ping", 11, 3);
INSERT INTO sublist (ParentID, Data) VALUES (1, 2), (2, 3), (3, 6), (4, 8), (5, 4), (6, 5), (7, 1), (8, 9), (9, 11), (10, 4), (11, 6);
The simplest way of dealing with this is temporary tables, described here and here. If you create an empty table to store your results (let's call it tbl_temp1) you can to this:
INSERT INTO tbl_temp1 (PingID, BaseID)
SELECT Ping.ID as PingID, Base.ID as BaseID
FROM ...
Then you can query it however you like:
SELECT PingID, BaseID from tbl_temp1 ...
Edited to add:
From the docs for CREATE TEMPORARY TABLE:
You can use the TEMPORARY keyword when creating a table. A TEMPORARY
table is visible only to the current connection, and is dropped
automatically when the connection is closed. This means that two
different connections can use the same temporary table name without
conflicting with each other or with an existing non-TEMPORARY table of
the same name. (The existing table is hidden until the temporary table
is dropped.)
If this were a more flattened query, then there would a straightforward answer.
It is certainly possible to use a derived table as the input to outer queries. A simple example would be:
select
data1,
(select data3 from howdy1 where howdy1.data1 = greetings.data1) data3_derived
from
(select data1 from hello1 where hello1.data2 < 4) as greetings;
where the derived table greetings is used in the inline query. (SQL Fiddle for this simplistic example: http://sqlfiddle.com/#!3/49425/2 )
Following this logic would lead us to assume that you could cast your first query as a derived table of query1 and then recast query2 into the select statement.
For that I constructed the following:
select query1.pingId, query1.baseId,
(SELECT ping.Data pingData FROM
(SELECT sl.Data FROM list l
JOIN sublist sl ON sl.ParentID = l.ID
WHERE l.Type = "ping" AND l.ID = query1.pingId
) Ping
INNER JOIN
(SELECT sl.Data FROM list l
JOIN sublist sl ON sl.ParentID = l.ID
WHERE l.Type = "base" AND l.ID = query1.baseId
) Base
ON Base.Data < Ping.Data)
from
(SELECT Ping.ID as PingID, Base.ID as BaseID FROM
(SELECT l.ID, mg.DateTime from list l
JOIN mygroup mg ON mg.ID = l.MyGroup
WHERE l.Type = "ping"
ORDER BY l.ID DESC
) Ping
INNER JOIN
(SELECT l.ID, mg.DateTime from list l
JOIN mygroup mg ON mg.ID = l.MyGroup
WHERE l.Type = "Base"
ORDER BY l.ID DESC
) Base
ON Base.DateTime < Ping.DateTime
GROUP BY Ping.ID
) query1
order by pingId desc;
where I have inserted query2 into a select clause from query1 and inserted query1.pingId and query1.baseId in place of 11 and 10, respectively. If 11 and 10 are left in place, this query works (but obviously only generates the same data for each row).
But when this is executed, I'm given an error: Unknown column 'query1.pingId'. Obviously, query1 cannot be seen inside the nested derived tables.
Since, in general, this type of query is possible, when the nesting is only 1 level deep (as per my greeting example at the top), there must be logical restrictions as to why this level of nesting isn't possible. (Time to pull out the database theory book...)
If I were faced with this, I'd rewrite and flatten the queries to get the real data that I wanted. And eliminate a couple things including that really nasty group by that is used in query1 to get the max baseId for a given pingId.
You say that's not possible, due to external constraints. So, this is, ultimately, a non-answer answer. Not very useful, but maybe it'll be worth something.
(SQL Fiddle for all this: http://sqlfiddle.com/#!2/bac74/35 )
If you cannot modify query 2 then there is nothing we can suggest. Here is a combination of your two queries with a reduced level of nesting. I suspect this would be slow with a large dataset -
SELECT tmp1.PingID, tmp1.BaseID, IF(slb.Data, 1, 0) AS third_col
FROM (
SELECT lp.ID AS PingID, MAX(lb.ID) AS BaseID
FROM MyGroup mgp
INNER JOIN MyGroup mgb
ON mgb.DateTime < mgp.DateTime
INNER JOIN list lp
ON mgp.ID = lp.MyGroup
AND lp.Type = 'ping'
INNER JOIN list lb
ON mgb.ID = lb.MyGroup
AND lb.Type = 'base'
GROUP BY lp.ID DESC
) AS tmp1
LEFT JOIN sublist slp
ON tmp1.PingID = slp.ParentID
LEFT JOIN sublist slb
ON tmp1.BaseID = slb.ParentID
AND slb.Data < slp.Data;