I have a query like below
select sum(ARRAY_SUM(DailyCampaignUsage.`data`[*].cost)) revenue from
Inheritx DailyCampaignUsage WHERE DailyCampaignUsage._type='DailyCampaignUsage'
it is taking 12.3s
here count(ARRAY_SUM(DailyCampaignUsage.data[*].cost)) it is 51k
How I can improve it's performance ??
I have index like below
CREATE INDEX `abc` ON `Inheritx`(`_type`) USING GSI
Use a covering partial index.
CREATE INDEX idx_covering ON Inheritx(_type, data[*].cost) WHERE _type = 'DailyCampaignUsage';
select sum(ARRAY_SUM(DailyCampaignUsage.`data`[*].cost)) revenue
from Inheritx DailyCampaignUsage USE INDEX ( idx_covering )
WHERE DailyCampaignUsage._type='DailyCampaignUsage'
Related
Need help with MySQL query.
I have indexed mandatory columns but still getting results in 160 seconds.
I know I have a problem with Contact conditions without it results are coming in 15s.
Any kind of help is appreciated.
My Query is :
SELECT `order`.invoicenumber, `order`.lastupdated_by AS processed_by, `order`.lastupdated_date AS LastUpdated_date,
`trans`.transaction_id AS trans_id,
GROUP_CONCAT(`trans`.subscription_id) AS subscription_id,
GROUP_CONCAT(`trans`.price) AS trans_price,
GROUP_CONCAT(`trans`.quantity) AS prod_quantity,
`user`.id AS id, `user`.businessname AS businessname,
`user`.given_name AS given_name, `user`.surname AS surname
FROM cdp_order_transaction_master AS `order`
INNER JOIN `cdp_order_transaction_detail` AS trans ON `order`.transaction_id=trans.transaction_id
INNER JOIN cdp_user AS user ON (`order`.user_id=user.id OR CONCAT( user.id , '_CDP' ) = `order`.lastupdated_by)
WHERE `order`.xero_invoice_status='Completed' AND `order`.order_date > '2021-01-01'
GROUP BY `order`.transaction_id
ORDER BY `order`.lastupdated_date
DESC LIMIT 100
1. Index the columns used in the join, where section so that sql does not scan the entire table and only scans the desired columns. A full scan of the table works extremely badly.
create index for cdp_order_transaction_master table :
CREATE INDEX idx_cdp_order_transaction_master_transaction_id ON cdp_order_transaction_master(transaction_id);
CREATE INDEX idx_cdp_order_transaction_master_user_id ON cdp_order_transaction_master(user_id);
CREATE INDEX idx_cdp_order_transaction_master_lastupdated_by ON cdp_order_transaction_master(lastupdated_by);
CREATE INDEX idx_cdp_order_transaction_master_xero_invoice_status ON cdp_order_transaction_master(xero_invoice_status);
CREATE INDEX idx_cdp_order_transaction_master_order_date ON cdp_order_transaction_master(order_date);
create index for cdp_order_transaction_detail table :
CREATE INDEX idx_cdp_order_transaction_detail_transaction_id ON cdp_order_transaction_detail(transaction_id);
create index for cdp_user table :
CREATE INDEX idx_cdp_user_id ON cdp_user(id);
2. Use Owner/Schema Name
If the owner name is not specified, the SQL Server engine tries to find it in all schemas to find the object.
I am using EXPLAIN to get performance analysis of my below query:
SELECT `wf_cart_items` . `id`
FROM `wf_cart_items`
WHERE (`wf_cart_items` . `docket_number` = '405-2844' OR
match( `wf_cart_items` . `multi_docket_number` ) against ( '405-2844' )
)
The problem is that it shows rows to be searched 597151 while individual OR queries examine only 1 row each. How is it possible that when I use OR it is doing a full table scan?
P.S.: I have FULL-TEXT index on multi_docket_number & BTREE index on docket_number
OR is quite tricky for SQL optimizers -- both in the WHERE clause and in ON clauses.
The recommendation is to switch this to union all:
SELECT ci.id
FROM wf_cart_items ci
WHERE ci.docket_number = '405-2844'
UNION ALL
SELECT ci.id
FROM wf_cart_items ci
WHERE MATCH(ci.multi_docket_number) AGAINST ( '405-2844' ) AND
ci.docket_number <> '405-2844';
Based on the naming of your columns, I feat that multi-docket_number actually contains multiple docket numbers. If that is the case, you probably want to fix the data model, but that is another conversation.
I am using spring-data-jpa & postgresql-9.4.
There is a table: tbl_oplog. This table has about seven million rows of data, and data is need to be displayed on the front end.(paged).
I use Spring#PagingAndSortingRepository , and then I found that the data query was very slow. From the logs, I found that two SQL queries were issued:
select
oplog0_.id as id1_8_,
oplog0_.deleted as deleted2_8_,
oplog0_.result_desc as result_d3_8_,
oplog0_.extra as extra4_8_,
oplog0_.info as info5_8_,
oplog0_.login_ipaddr as login_ip6_8_,
oplog0_.level as level7_8_,
oplog0_.op_type as op_type8_8_,
oplog0_.user_name as user_nam9_8_,
oplog0_.op_obj as op_obj10_8_,
oplog0_.op as op11_8_,
oplog0_.result as result12_8_,
oplog0_.op_time as op_time13_8_,
oplog0_.login_name as login_n14_8_
from
tbl_oplog oplog0_
where
oplog0_.deleted=false
order by
oplog0_.op_time desc limit 10
And:
select
count(oplog0_.id) as col_0_0_
from
tbl_oplog oplog0_
where
oplog0_.deleted=?
(The second SQL statement is used to populate the page object,which is necessary)
I found the second statement to be very time-consuming. Why does it take so long?
How to optimize? Does this happen with Mysql?
Or is there any other way I can optimize this requirement? (It seems that select count is inevitable).
EDIT:
I'll use another table for the demonstration(same):
Table:
select count(*) from tbl_gather_log; // count is 6300931.cost 5.408S
EXPLAIN select count(*) from tbl_gather_logļ¼
Aggregate (cost=246566.58..246566.59 rows=1 width=0)
-> Index Only Scan using tbl_gather_log_pkey on tbl_gather_log (cost=0.43..230814.70 rows=6300751 width=0)
EXPLAIN ANALYSE select count(*) from tbl_gather_log:
Aggregate (cost=246566.58..246566.59 rows=1 width=0) (actual time=6697.102..6697.102 rows=1 loops=1)
-> Index Only Scan using tbl_gather_log_pkey on tbl_gather_log (cost=0.43..230814.70 rows=6300751 width=0) (actual time=0.173..4622.674 rows=6300936 loops=1)
Heap Fetches: 298
Planning time: 0.312 ms
Execution time: 6697.267 ms
EDIT2:
TABLE:
create table tbl_gather_log (
id bigserial not null primary key,
event_level int,
event_time timestamp,
event_type int,
event_dis_type int,
event_childtype int,
event_name varchar(64),
dev_name varchar(32),
dev_ip varchar(32),
sys_type varchar(16),
event_content jsonb,
extra jsonb
);
And:
There are probably many filtering criteria supported, so i can't simply do special operations on deleted.For example, a query might be issued select * from tbl_oplog where name like xxx and type = xxx limit 10,so, there will be a query:select count * from tbl_oplog where name like xxx and type = xxx . Futhermore, i have to know exact counts. because I need to show how many pages there are on the front end.
The second statement takes a long time because it has to scan the whole table in order to count the rows.
One thing you can do is use an index:
CREATE INDEX ON tbl_oplog (deleted) INCLUDE (id);
VACUUM tbl_oplog; -- so you get an index only scan
Assuming that id is the primary key, it would be much better to use count(*) and omit the INCLUDE clause from the index.
But the best is probably to use an estimate:
SELECT t.reltuples * freq.f AS estimated_rows
FROM pg_stats AS s
JOIN pg_namespace AS n
ON s.schemaname = n.nspname
JOIN pg_class AS t
ON s.tablename = t.relname
AND n.oid = t.relnamespace
CROSS JOIN LATERAL
unnest(s.most_common_vals::text::boolean[]) WITH ORDINALITY AS val(v,id)
JOIN LATERAL
unnest(s.most_common_freqs) WITH ORDINALITY AS freq(f,id)
USING (id)
WHERE s.tablename = 'tbl_oplog'
AND s.attname = 'deleted'
AND val.v = ?;
This uses the distribution statistics to estimate the desired count.
If it is just about pagination, you don't need exact counts.
Read my blog for more on the topic of counting in PostgreSQL.
SELECT `f`.*
FROM `files_table` `f`
WHERE f.`application_id` IN(6)
AND `f`.`project_id` IN(130418)
AND `f`.`is_last_version` = 1
AND `f`.`temporary` = 0
AND f.deleted_by is null
ORDER BY `f`.`date` DESC
LIMIT 5
When I remove the ORDER BY, query executes in 0.1 seconds. With the ORDER BY it takes 3 seconds.
There is an index on every WHERE column and there is also an index on ORDER BY field (date).
What can I do to make this query faster? Why is ORDER BY slowing it down so much? Table has 3M rows.
instead of an index on each column in where be sure you have a composite index that cover all the columns in where
eg
create index idx1 on files_table (application_id, project_id,is_last_version,temporary,deleted_by)
avoid IN clause for single value use = for these
SELECT `f`.*
FROM `files_table` `f`
WHERE f.`application_id` = 6
AND `f`.`project_id` = 130418
AND `f`.`is_last_version` = 1
AND `f`.`temporary` = 0
AND f.deleted_by is null
ORDER BY `f`.`date` DESC
LIMIT 5
the date or others column in select could be useful retrive all info using the index and avoiding the access to the table data .. but for select all (select *)
you probably need severl columns an then the access to the table data is done however .. but you can try an eval the performance ..
be careful to place the data non involved in where at the right of all the column involved in where
create index idx1 on files_table (application_id, project_id,is_last_version,temporary,deleted_by, date)
I have large table crumbs (about 100M+ rows, 100GB). It's just collection of json stored as text. It has index on column run_id that has about 10K unique values. So each run is small (1K - 1M rows).
For simple query:
explain analyze verbose select * from crumbs c
where c.run_id='2016-04-26T19_02_01_015Z' limit 10
Plan is good:
Limit (cost=0.56..36.89 rows=10 width=2262) (actual time=1.978..2.016 rows=10 loops=1)
Output: id, robot_id, run_id, content, created_at, updated_at, table_id, fork_id, log, err
-> Index Scan using index_crumbs_on_run_id on public.crumbs c (cost=0.56..5533685.73 rows=1523397 width=2262) (actual time=1.975..1.996 rows=10 loops=1)
Output: id, robot_id, run_id, content, created_at, updated_at, table_id, fork_id, log, err
Index Cond: ((c.run_id)::text = '2016-04-26T19_02_01_015Z'::text)
Planning time: 0.117 ms
Execution time: 2.048 ms
But if I try to look inside json stored in one of the columns it then wants to do full scan:
explain verbose select x from crumbs c,
lateral json_array_elements(c.content::json) x
where c.run_id='2016-04-26T19_02_01_015Z'
limit 10
Plan:
Limit (cost=0.01..0.69 rows=10 width=32)
Output: x.value
-> Nested Loop (cost=0.01..10332878.67 rows=152343800 width=32)
Output: x.value
-> Seq Scan on public.crumbs c (cost=0.00..7286002.66 rows=1523438 width=895)
Output: c.id, c.robot_id, c.run_id, c.content, c.created_at, c.updated_at, c.table_id, c.fork_id, c.log, c.err
Filter: ((c.run_id)::text = '2016-04-26T19_02_01_015Z'::text)
-> Function Scan on pg_catalog.json_array_elements x (cost=0.01..1.01 rows=100 width=32)
Output: x.value
Function Call: json_array_elements((c.content)::json)
Tried:
analyze crumbs
But made no difference.
Update 1
Disabling sequential scanning for whole database works, but this is not an option in our application. In many other places seq scan should stay:
set enable_seqscan=false;
Plan:
Limit (cost=0.57..1.14 rows=10 width=32) (actual time=0.120..0.294 rows=10 loops=1)
Output: x.value
-> Nested Loop (cost=0.57..8580698.45 rows=152343400 width=32) (actual time=0.118..0.273 rows=10 loops=1)
Output: x.value
-> Index Scan using index_crumbs_on_run_id on public.crumbs c (cost=0.56..5533830.45 rows=1523434 width=895) (actual time=0.087..0.107 rows=10 loops=1)
Output: c.id, c.robot_id, c.run_id, c.content, c.created_at, c.updated_at, c.table_id, c.fork_id, c.log, c.err
Index Cond: ((c.run_id)::text = '2016-04-26T19_02_01_015Z'::text)
-> Function Scan on pg_catalog.json_array_elements x (cost=0.01..1.01 rows=100 width=32) (actual time=0.011..0.011 rows=1 loops=10)
Output: x.value
Function Call: json_array_elements((c.content)::json)
Planning time: 0.124 ms
Execution time: 0.337 ms
Update 2:
Schema is:
CREATE TABLE crumbs
(
id serial NOT NULL,
run_id character varying(255),
content text,
created_at timestamp without time zone,
updated_at timestamp without time zone,
CONSTRAINT crumbs_pkey PRIMARY KEY (id)
);
CREATE INDEX index_crumbs_on_run_id
ON crumbs
USING btree
(run_id COLLATE pg_catalog."default");
Update 3
Rewriting query like so:
select json_array_elements(c.content::json) x
from crumbs c
where c.run_id='2016-04-26T19_02_01_015Z'
limit 10
Gets correct plan. Still unclear why wrong plan is chosen for second query.
Rewriting the query so that the limit is applied first and then the cross join against the function should make Postgres use the index:
Using a derived table:
select x
from (
select *
from crumbs
where run_id='2016-04-26T19_02_01_015Z'
limit 10
) c
cross join lateral json_array_elements(c.content::json) x
Alternatively using a CTE:
with c as (
select *
from crumbs
where run_id='2016-04-26T19_02_01_015Z'
limit 10
)
select x
from c
cross join lateral json_array_elements(c.content::json) x
Or use json_array_elements() directly in the select list:
select json_array_elements(c.content::json)
from crumbs c
where c.run_id='2016-04-26T19_02_01_015Z'
limit 10
However this is something different then the other two queries because it applies the limit after "unnesting" the json array, not on the number of rows returned from the crumbs table (which is what your first query is doing).
You've got three different problems going on. First, the limit 10 in the first query is tipping the planner in favor of the index scan, which would otherwise be pretty expensive to get all rows matching that run_id. For the sake of comparison you might want to see what the first (un-joined) query plan looks like if you remove the limit. My guess is the planner switches to a table scan.
Second, that lateral join is unnecessary and throwing off the planner. You can expand the elements of the content array in your select clause like so:
select json_array_elements(content::json)
from crumbs
where run_id = '2016-04-26T19_02_01_015Z'
;
This is more likely to use the index scan to pick off rows for that run_id, then "unnest" the array elements for you.
But the third hidden problem is what you're actually trying to get. If you run this last query as is then you're in the same boat as the first (un-joined) query without a limit, which means you'll likely not get an index scan (not that that's inherently bad if you're reading such a large chunk of the table).
Do you want just the first few arbitrary array elements from all content arrays in that run? If so then tacking on a limit clause here should be the end of the story. If you want all array elements for this particular run then you may just have to accept a table scan, although without the lateral join you're potentially in a much better situation than the original query.
Data modelling suggestions:
-- Suggest replacing the column run_id (low cardinality, and rather fat)
-- by a reference to a domain table, like:
-- ------------------------------------------------------------------
CREATE TABLE runs
( run_seq serial NOT NULL PRIMARY KEY
, run_id character varying UNIQUE
);
-- Grab all the distinct values occuring in crumbs.run_id
-- -------------------------------------------------------
INSERT INTO runs (run_id)
SELECT DISTINCT run_id FROM crumbs;
-- Add an FK column
-- -----------------
ALTER TABLE crumbs
ADD COLUMN run_seq integer REFERENCES runs(run_seq)
;
UPDATE crumbs c
SET run_seq = r.run_seq
FROM runs r
WHERE r.run_id = c.run_id
;
VACUUM ANALYZE runs;
-- Drop old column and set new column to not nullable
-- ---------------------------------------------------
ALTER TABLE crumbs
DROP COLUMN run_id
;
ALTER TABLE crumbs
ALTER COLUMN run_seq SET NOT NULL
;
-- Recreate the supporting index for the FK
-- adding id to support index-only lookups
-- (and enforce uniqueness)
-- -------------------------------------
CREATE UNIQUE INDEX index_crumbs_run_seq_id ON crumbs (run_seq,id)
;
-- Refresh statistics
-- ------------------
VACUUM ANALYZE crumbs; -- this may take some time ...
-- and then: join the runs table to your original crumbs table
-- -----------------------------------------------------------
-- explain analyze
SELECT x FROM crumbs c
JOIN runs r ON r.run_seq = c.run_seq
, lateral json_array_elements(c.content::json) x
WHERE r.run_id='2016-04-26T19_02_01_015Z'
LIMIT 10
;
Or: use the other answerers's suggestion with a similar join.
But possibly even better: replace the ugly run_id text string by an actual timestamp.