How can I optimize the 'IN' query?

How can I optimize the 'IN' query? - mysql

**** EDIT ****
14ms may not seem a lot, however as you can see below in the "PostgresSQL Explain", PostgreSQL is doing a Seq Scan on 80,000 rows. There must be a way to avoid this Scan and do a couple of Index lookups instead.
**** EDIT END ****
I am playing around with the schemaless idea and I have the following three tables:
The tables are populated with 100,000 random entries.
entities(_primary_key SERIAL PRIMARY KEY, _id CHAR(32) UNIQUE,
data BYTEA)
index_username_profile_names(_id CHARE(32) PRIMARY KEY,
key VARCHAR UNIQUE)
index_username_email(_id CHAR(32) PRIMARY KEY, key VARCHAR)
with a non-unique index on index_username_email(key)
My SQL query is:
SELECT data FROM entities WHERE
_id IN (SELECT _id FROM index_users_email WHERE key = 'test')
OR
_id in (SELECT _id FROM index_users_profile_name WHERE key = 'test')
This takes a whooping 14ms although 'test' doesn't exits in either of the 'index' tables, no matter if I use PostgreSQL or MySQL, so it must be something that I am doing wrong.
Any idea how I can optimized it, or what I am doing wrong?
Thanks!
Postgres explain:
Seq Scan on entities (cost=16.88..4776.15 rows=80414 width=163) (actual time=15.169..15.169 rows=0 loops=1)
Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
Rows Removed by Filter: 107218
SubPlan 1
-> Index Scan using index_users_email_key_idx1 on index_users_email (cost=0.42..8.44 rows=1 width=33) (actual time=0.039..0.039 rows=0 loops=1)
Index Cond: ((key)::text = 'test'::text)
SubPlan 2
-> Index Scan using index_users_profile_name_key_idx1 on index_users_profile_name (cost=0.42..8.44 rows=1 width=33) (actual time=0.071..0.071 rows=0 loops=1)
Index Cond: ((key)::text = 'test'::text)
Planning time: 0.202 ms
Execution time: 15.216 ms

ORed (join-) conditions are usually bad, try UNION instead:
SELECT data FROM entities
WHERE _id IN
( SELECT _id
FROM index_users_email
WHERE key = 'test'
)
UNION
SELECT data FROM entities
WHERE _id in
( SELECT _id
FROM index_users_profile_name
WHERE key = 'test'
)

14 milliseconds is quite fine. I have no idea why you think the query should run in less than a millisecond. There is "a lot" of work to set up a query, validate that the data is in memory, identify where the indexes are, and so on. I put that in quotes, because for most queries, this is trivial. But it can easily add up to milliseconds.
Second, if you are doing real timings, keep the following in mind:
Computers (as we use them) are not deterministic. You need to run the timings multiple times. For something that takes milliseconds, this would normally be thousands of times to get a stable reading.
Initialize the system to be in the same state for each timing. You need to decide if you want a cold cache or warm cache, but the timings should all be on the same system.
Isolate the system from any other work. Background tasks (even moving a mouse) can affect performance.
In terms of the query, the one thing I can think of is to use = and exists:
SELECT e.data
FROM entities e
WHERE _id = (SELECT _id FROM index_users_email WHERE key = 'test') OR
EXISTS (SELECT 1 FROM index_users_profile_name iupn WHERE iupn._id = e.id AND iupn.key = 'test');
At best, though, I'm guessing these would shave a millisecond or two off the query.

SELECT data
FROM entities e
LEFT OUTER JOIN index_users_email iue ON e._id=iue._id and iue.key = 'test'
LEFT OUTER JOIN index_users_profile_name iupn ON e._id=iupn._id and iupn .key = 'test'
WHERE
iue._id IS NOT NULL or iupn._id IS NOT NULL

Looks like one potential answer is union and join:
explain select data from entities as t join (select _id from index_users_email where key = 'test' union select _id from index_users_profile_name where key = 'test') u on t._id = u._id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=17.32..33.82 rows=2 width=163)
-> Unique (cost=16.90..16.91 rows=2 width=33)
-> Sort (cost=16.90..16.91 rows=2 width=33)
Sort Key: index_users_email._id
-> Append (cost=0.42..16.89 rows=2 width=33)
-> Index Scan using index_users_email_key_idx1 on index_users_email (cost=0.42..8.44 rows=1 width=33)
Index Cond: ((key)::text = 'test'::text)
-> Index Scan using index_users_profile_name_key_idx1 on index_users_profile_name (cost=0.42..8.44 rows=1 width=33)
Index Cond: ((key)::text = 'test'::text)
-> Index Scan using entities__id_key on entities t (cost=0.42..8.44 rows=1 width=196)
Index Cond: (_id = index_users_email._id)
(11 rows)
Time: 0.714 ms

This will have better performance as it takes advantage of the primary key in those tables:
select data
from entities
where
exists (
select _id
from index_users_email
where key = 'test' and _id = entities._id
) or exists (
select _id
from index_users_profile_name
where key = 'test' and _id = entities._id
)

Related

Why mysql performing scan operation, if index exists on channel column

i am having 3 tables,
testdata1: id (pri) -> 1000000 rows
testdata2: id (pri), channel (indexed), -> 10000 rows
testdata3: id (pri) -> 1000 rows
on performing following query, i get scan on testdata2.
explain format=tree
select *
from testdata1
inner join testdata2 on testdata1.id = testdata2.channel
inner join testdata3 on testdata2.channel = testdata3.id
where testdata1.id < 100;
EXPLAIN: -> Nested loop inner join (cost=8014.20 rows=9984)
-> Nested loop inner join (cost=4519.80 rows=9984)
-> Table scan on testdata2 (cost=1025.40 rows=9984)
-> Filter: ((testdata1.id < 100) and (testdata1.id = testdata2.`channel`)) (cost=0.25 rows=1)
-> Single-row index lookup on testdata1 using PRIMARY (id=testdata2.`channel`) (cost=0.25 rows=1)
-> Filter: (testdata2.`channel` = testdata3.id) (cost=0.25 rows=1)
-> Single-row index lookup on testdata3 using PRIMARY (id=testdata2.`channel`) (cost=0.25 rows=1)
why is mysql not utilising index of testdata2(channel) column?
*UPDATE
After running analyze table testdata2, mysql used the index.
Is it necessary to use analyze table command after creating index?
EXPLAIN: -> Nested loop inner join (cost=241.34 rows=156)
-> Nested loop inner join (cost=186.85 rows=156)
-> Filter: (testdata1.id < 100) (cost=20.09 rows=99)
-> Index range scan on testdata1 using PRIMARY over (id < 100) (cost=20.09 rows=99)
-> Index lookup on testdata2 using a_temp_index (channel=testdata1.id), with index condition: (testdata1.id = testdata2.`channel`) (cost=1.53 rows=2)
-> Filter: (testdata2.`channel` = testdata3.id) (cost=0.25 rows=1)
-> Single-row index lookup on testdata3 using PRIMARY (id=testdata2.`channel`) (cost=0.25 rows=1)

https://dev.mysql.com/doc/refman/8.0/en/create-index.html says:
When the innodb_stats_persistent setting is enabled, run the ANALYZE TABLE statement for an InnoDB table after creating an index on that table.
This setting is on by default, so yes, it's recommended to run ANALYZE TABLE after creating an index.

Postgresl select count(*) time-consuming

I am using spring-data-jpa & postgresql-9.4.
There is a table: tbl_oplog. This table has about seven million rows of data, and data is need to be displayed on the front end.(paged).
I use Spring#PagingAndSortingRepository , and then I found that the data query was very slow. From the logs, I found that two SQL queries were issued:
select
oplog0_.id as id1_8_,
oplog0_.deleted as deleted2_8_,
oplog0_.result_desc as result_d3_8_,
oplog0_.extra as extra4_8_,
oplog0_.info as info5_8_,
oplog0_.login_ipaddr as login_ip6_8_,
oplog0_.level as level7_8_,
oplog0_.op_type as op_type8_8_,
oplog0_.user_name as user_nam9_8_,
oplog0_.op_obj as op_obj10_8_,
oplog0_.op as op11_8_,
oplog0_.result as result12_8_,
oplog0_.op_time as op_time13_8_,
oplog0_.login_name as login_n14_8_
from
tbl_oplog oplog0_
where
oplog0_.deleted=false
order by
oplog0_.op_time desc limit 10
And:
select
count(oplog0_.id) as col_0_0_
from
tbl_oplog oplog0_
where
oplog0_.deleted=?
(The second SQL statement is used to populate the page object,which is necessary)
I found the second statement to be very time-consuming. Why does it take so long?
How to optimize? Does this happen with Mysql?
Or is there any other way I can optimize this requirement? (It seems that select count is inevitable).
EDIT:
I'll use another table for the demonstration(same):
Table:
select count(*) from tbl_gather_log; // count is 6300931.cost 5.408S
EXPLAIN select count(*) from tbl_gather_log：
Aggregate (cost=246566.58..246566.59 rows=1 width=0)
-> Index Only Scan using tbl_gather_log_pkey on tbl_gather_log (cost=0.43..230814.70 rows=6300751 width=0)
EXPLAIN ANALYSE select count(*) from tbl_gather_log:
Aggregate (cost=246566.58..246566.59 rows=1 width=0) (actual time=6697.102..6697.102 rows=1 loops=1)
-> Index Only Scan using tbl_gather_log_pkey on tbl_gather_log (cost=0.43..230814.70 rows=6300751 width=0) (actual time=0.173..4622.674 rows=6300936 loops=1)
Heap Fetches: 298
Planning time: 0.312 ms
Execution time: 6697.267 ms
EDIT2:
TABLE:
create table tbl_gather_log (
id bigserial not null primary key,
event_level int,
event_time timestamp,
event_type int,
event_dis_type int,
event_childtype int,
event_name varchar(64),
dev_name varchar(32),
dev_ip varchar(32),
sys_type varchar(16),
event_content jsonb,
extra jsonb
);
And:
There are probably many filtering criteria supported, so i can't simply do special operations on deleted.For example, a query might be issued select * from tbl_oplog where name like xxx and type = xxx limit 10,so, there will be a query:select count * from tbl_oplog where name like xxx and type = xxx . Futhermore, i have to know exact counts. because I need to show how many pages there are on the front end.

The second statement takes a long time because it has to scan the whole table in order to count the rows.
One thing you can do is use an index:
CREATE INDEX ON tbl_oplog (deleted) INCLUDE (id);
VACUUM tbl_oplog; -- so you get an index only scan
Assuming that id is the primary key, it would be much better to use count(*) and omit the INCLUDE clause from the index.
But the best is probably to use an estimate:
SELECT t.reltuples * freq.f AS estimated_rows
FROM pg_stats AS s
JOIN pg_namespace AS n
ON s.schemaname = n.nspname
JOIN pg_class AS t
ON s.tablename = t.relname
AND n.oid = t.relnamespace
CROSS JOIN LATERAL
unnest(s.most_common_vals::text::boolean[]) WITH ORDINALITY AS val(v,id)
JOIN LATERAL
unnest(s.most_common_freqs) WITH ORDINALITY AS freq(f,id)
USING (id)
WHERE s.tablename = 'tbl_oplog'
AND s.attname = 'deleted'
AND val.v = ?;
This uses the distribution statistics to estimate the desired count.
If it is just about pagination, you don't need exact counts.
Read my blog for more on the topic of counting in PostgreSQL.

Postgresql doesn't use index

I have large table crumbs (about 100M+ rows, 100GB). It's just collection of json stored as text. It has index on column run_id that has about 10K unique values. So each run is small (1K - 1M rows).
For simple query:
explain analyze verbose select * from crumbs c
where c.run_id='2016-04-26T19_02_01_015Z' limit 10
Plan is good:
Limit (cost=0.56..36.89 rows=10 width=2262) (actual time=1.978..2.016 rows=10 loops=1)
Output: id, robot_id, run_id, content, created_at, updated_at, table_id, fork_id, log, err
-> Index Scan using index_crumbs_on_run_id on public.crumbs c (cost=0.56..5533685.73 rows=1523397 width=2262) (actual time=1.975..1.996 rows=10 loops=1)
Output: id, robot_id, run_id, content, created_at, updated_at, table_id, fork_id, log, err
Index Cond: ((c.run_id)::text = '2016-04-26T19_02_01_015Z'::text)
Planning time: 0.117 ms
Execution time: 2.048 ms
But if I try to look inside json stored in one of the columns it then wants to do full scan:
explain verbose select x from crumbs c,
lateral json_array_elements(c.content::json) x
where c.run_id='2016-04-26T19_02_01_015Z'
limit 10
Plan:
Limit (cost=0.01..0.69 rows=10 width=32)
Output: x.value
-> Nested Loop (cost=0.01..10332878.67 rows=152343800 width=32)
Output: x.value
-> Seq Scan on public.crumbs c (cost=0.00..7286002.66 rows=1523438 width=895)
Output: c.id, c.robot_id, c.run_id, c.content, c.created_at, c.updated_at, c.table_id, c.fork_id, c.log, c.err
Filter: ((c.run_id)::text = '2016-04-26T19_02_01_015Z'::text)
-> Function Scan on pg_catalog.json_array_elements x (cost=0.01..1.01 rows=100 width=32)
Output: x.value
Function Call: json_array_elements((c.content)::json)
Tried:
analyze crumbs
But made no difference.
Update 1
Disabling sequential scanning for whole database works, but this is not an option in our application. In many other places seq scan should stay:
set enable_seqscan=false;
Plan:
Limit (cost=0.57..1.14 rows=10 width=32) (actual time=0.120..0.294 rows=10 loops=1)
Output: x.value
-> Nested Loop (cost=0.57..8580698.45 rows=152343400 width=32) (actual time=0.118..0.273 rows=10 loops=1)
Output: x.value
-> Index Scan using index_crumbs_on_run_id on public.crumbs c (cost=0.56..5533830.45 rows=1523434 width=895) (actual time=0.087..0.107 rows=10 loops=1)
Output: c.id, c.robot_id, c.run_id, c.content, c.created_at, c.updated_at, c.table_id, c.fork_id, c.log, c.err
Index Cond: ((c.run_id)::text = '2016-04-26T19_02_01_015Z'::text)
-> Function Scan on pg_catalog.json_array_elements x (cost=0.01..1.01 rows=100 width=32) (actual time=0.011..0.011 rows=1 loops=10)
Output: x.value
Function Call: json_array_elements((c.content)::json)
Planning time: 0.124 ms
Execution time: 0.337 ms
Update 2:
Schema is:
CREATE TABLE crumbs
(
id serial NOT NULL,
run_id character varying(255),
content text,
created_at timestamp without time zone,
updated_at timestamp without time zone,
CONSTRAINT crumbs_pkey PRIMARY KEY (id)
);
CREATE INDEX index_crumbs_on_run_id
ON crumbs
USING btree
(run_id COLLATE pg_catalog."default");
Update 3
Rewriting query like so:
select json_array_elements(c.content::json) x
from crumbs c
where c.run_id='2016-04-26T19_02_01_015Z'
limit 10
Gets correct plan. Still unclear why wrong plan is chosen for second query.

Rewriting the query so that the limit is applied first and then the cross join against the function should make Postgres use the index:
Using a derived table:
select x
from (
select *
from crumbs
where run_id='2016-04-26T19_02_01_015Z'
limit 10
) c
cross join lateral json_array_elements(c.content::json) x
Alternatively using a CTE:
with c as (
select *
from crumbs
where run_id='2016-04-26T19_02_01_015Z'
limit 10
)
select x
from c
cross join lateral json_array_elements(c.content::json) x
Or use json_array_elements() directly in the select list:
select json_array_elements(c.content::json)
from crumbs c
where c.run_id='2016-04-26T19_02_01_015Z'
limit 10
However this is something different then the other two queries because it applies the limit after "unnesting" the json array, not on the number of rows returned from the crumbs table (which is what your first query is doing).

You've got three different problems going on. First, the limit 10 in the first query is tipping the planner in favor of the index scan, which would otherwise be pretty expensive to get all rows matching that run_id. For the sake of comparison you might want to see what the first (un-joined) query plan looks like if you remove the limit. My guess is the planner switches to a table scan.
Second, that lateral join is unnecessary and throwing off the planner. You can expand the elements of the content array in your select clause like so:
select json_array_elements(content::json)
from crumbs
where run_id = '2016-04-26T19_02_01_015Z'
;
This is more likely to use the index scan to pick off rows for that run_id, then "unnest" the array elements for you.
But the third hidden problem is what you're actually trying to get. If you run this last query as is then you're in the same boat as the first (un-joined) query without a limit, which means you'll likely not get an index scan (not that that's inherently bad if you're reading such a large chunk of the table).
Do you want just the first few arbitrary array elements from all content arrays in that run? If so then tacking on a limit clause here should be the end of the story. If you want all array elements for this particular run then you may just have to accept a table scan, although without the lateral join you're potentially in a much better situation than the original query.

Data modelling suggestions:
-- Suggest replacing the column run_id (low cardinality, and rather fat)
-- by a reference to a domain table, like:
-- ------------------------------------------------------------------
CREATE TABLE runs
( run_seq serial NOT NULL PRIMARY KEY
, run_id character varying UNIQUE
);
-- Grab all the distinct values occuring in crumbs.run_id
-- -------------------------------------------------------
INSERT INTO runs (run_id)
SELECT DISTINCT run_id FROM crumbs;
-- Add an FK column
-- -----------------
ALTER TABLE crumbs
ADD COLUMN run_seq integer REFERENCES runs(run_seq)
;
UPDATE crumbs c
SET run_seq = r.run_seq
FROM runs r
WHERE r.run_id = c.run_id
;
VACUUM ANALYZE runs;
-- Drop old column and set new column to not nullable
-- ---------------------------------------------------
ALTER TABLE crumbs
DROP COLUMN run_id
;
ALTER TABLE crumbs
ALTER COLUMN run_seq SET NOT NULL
;
-- Recreate the supporting index for the FK
-- adding id to support index-only lookups
-- (and enforce uniqueness)
-- -------------------------------------
CREATE UNIQUE INDEX index_crumbs_run_seq_id ON crumbs (run_seq,id)
;
-- Refresh statistics
-- ------------------
VACUUM ANALYZE crumbs; -- this may take some time ...
-- and then: join the runs table to your original crumbs table
-- -----------------------------------------------------------
-- explain analyze
SELECT x FROM crumbs c
JOIN runs r ON r.run_seq = c.run_seq
, lateral json_array_elements(c.content::json) x
WHERE r.run_id='2016-04-26T19_02_01_015Z'
LIMIT 10
;
Or: use the other answerers's suggestion with a similar join.
But possibly even better: replace the ugly run_id text string by an actual timestamp.

indexes don't affect time execution in ms sql 2014 VS mysql (mariaDB 10)

I'm porting statistics analyzer system from MySQL (MariaDB 10) to MS SQL 2014, and I found a strange thing. Normally I used to use single- and multi-field indexes for most operations: statistics database holds about 60 millions of events on 4-core pc, and analysis includes funnels, event segmentation, cohort analysis, KPIs and more, so it may be slow sometimes.
But I was quite surprised when I've executed several query sequences from on MS SQL and then removed all indexes (except the main clastered id): I saw that execution time even decreased! I've restarted server (cache is cleared) but after each restart result was similar - my queries work faster without indexes (actually speed is the same, but no time is spent on manual indexes creation ).
I suppose MS SQL creates implicit indexes for me, but in this case it looks like I should remove all indexes creation from my queries? In MySQL you can clearly see that adding indexes really works. Does this MS SQL behaviour mean that I don't need to care about indexes anymore? I've made several tests with my queries and it seems that indexes almost don't affect execution time. Last time I dealed with MS SQL was a long ago and it was MS SQL 2000, so maybe MSFT developed f**n' AI during last 15 years? :)
Just in case this test sql code (generated by back-end for front-end) is below.
In short it produces graph data for particular type of events for last 3 months over time, then does segmentation by one parameter. It creates temp table from main events table with user set constraints (time period, parameters), creates several more temp tables and indexes, does several joins and returns final select result:
select min(tmstamp), max(tmstamp)
from evt_db.dbo.events
where ( ( source = 3 )
and ( event_id=24 )
and tmstamp > 1451606400
AND tmstamp < 1458000000
);
select min(param1), max(param1), count(DISTINCT(param1))
from evt_db.dbo.events
WHERE ( ( source = 3 )
AND ( event_id=24 )
AND tmstamp > 1451606400
AND tmstamp < 1458000000
);
create table #_tmp_times_calc_analyzer_0_0 (
tm_start int,
tm_end int,
tm_origin int,
tm_num int
);
insert into #_tmp_times_calc_analyzer_0_0 values
( 1451606400, 1452211200, 1451606400, 0 ),
( 1452211200, 1452816000, 1452211200, 1 ),
( 1452816000, 1453420800, 1452816000, 2 ),
( 1453420800, 1454025600, 1453420800, 3 ),
( 1454025600, 1454630400, 1454025600, 4 ),
( 1454630400, 1455235200, 1454630400, 5 ),
( 1455235200, 1455840000, 1455235200, 6 ),
( 1455840000, 1456444800, 1455840000, 7 ),
( 1456444800, 1457049600, 1456444800, 8 ),
( 1457049600, 1457654400, 1457049600, 9 ),
( 1457654400, 1458259200, 1457654400, 10 );
And...
CREATE INDEX tm_num ON _tmp_times_calc_analyzer_0_0 (tm_num);
SELECT id, t1.uid, tmstamp, floor((tmstamp - 1451606400) / 604800) period_num,
param1 into #_tmp_events_view_analyzer_0_0
FROM evt_db.dbo.events t1
WHERE ( ( source = 3 )
AND ( event_id=24 )
AND tmstamp > 1451606400
AND tmstamp < 1458000000
);
CREATE INDEX uid ON _tmp_events_view_analyzer_0_0 (uid);
CREATE INDEX period_num ON _tmp_events_view_analyzer_0_0 (period_num);
CREATE INDEX tmstamp ON _tmp_events_view_analyzer_0_0 (tmstamp);
CREATE INDEX _index_param1 ON _tmp_events_view_analyzer_0_0 (param1);
create table #_tmp_median_analyzer_0_0 (ts int );
insert into #_tmp_median_analyzer_0_0
select distinct(param1) v
from #_tmp_events_view_analyzer_0_0
where param1 is not null
order by v ;
select tm_origin, count(distinct uid), count(distinct id)
from #_tmp_times_calc_analyzer_0_0
left join #_tmp_events_view_analyzer_0_0 ON period_num = tm_num
GROUP BY tm_origin;
select top 600 (param1) seg1, count(distinct uid), count(distinct id)
from #_tmp_events_view_analyzer_0_0
GROUP BY param1
order by 1 asc;
And...
select seg1, tm_origin, count(distinct uid), count(distinct id)
from
( SELECT (param1) seg1, tm_origin, uid, id
from #_tmp_times_calc_analyzer_0_0
left join #_tmp_events_view_analyzer_0_0 ON period_num = tm_num
group by param1, tm_origin, uid, id
) t
GROUP BY seg1, tm_origin;
select min(param1), max(param1), round(avg(param1),0)
from #_tmp_events_view_analyzer_0_0;
DECLARE #c BIGINT = (SELECT COUNT(*) FROM #_tmp_median_analyzer_0_0);
SELECT round(AVG(1.0 * ts),0)
FROM
( SELECT ts
FROM #_tmp_median_analyzer_0_0
ORDER BY ts OFFSET (#c - 1) / 2 ROWS
FETCH NEXT 1 + (1 - #c % 2) ROWS ONLY
) AS median_val;

evt_db.dbo.events needs INDEX(source, event, tmstamp), with tmstamp 3rd. In the case of MySQL, those first 2 SELECTs will run entirely in the index (because it is a "covering" index). source and event can be in either order.
Later, you have a similar SELECT but it also has id, t1.uid. You could make this covering index for it: INDEX(source, event, tmstamp, uid, id). Again, tmstamp must be third in the list.
select top 600 (param1) seg1, count(distinct uid), count(distinct id) ... might benefit from INDEX(param1, uid, id), where param1 must be first.
The other indexes you list are possibly not useful at all. What indexes did you try?
One difference between MySQL and other Databases -- MySQL almost never uses more than one index in a query. And, in my experience, MySQL's choice is 'wise'. Perhaps MSSql is trying too hard to use two indexes, when simply scanning the table would be less work.

MySQL -- Better way to do this query?

This query will be done in a cached autocomplete text box, possibly by thousands of users at the same time. What I have below works, bit I feel there may be a better way to do what I am doing.
Any advice?
UPDATED -- it can be 'something%':
SELECT a.`object_id`, a.`type`,
IF( b.`name` IS NOT NULL, b.`name`,
IF( c.`name` IS NOT NULL, c.`name`,
IF( d.`name` IS NOT NULL, d.`name`,
IF ( e.`name` IS NOT NULL, e.`name`, f.`name` )
)
)
) AS name
FROM `user_permissions` AS a
LEFT JOIN `divisions` AS b
ON ( a.`object_id` = b.`division_id`
AND a.`type` = 'division'
AND b.`status` = 1 )
LEFT JOIN `departments` AS c
ON ( a.`object_id` = c.`department_id`
AND a.`type` = 'department'
AND c.`status` = 1 )
LEFT JOIN `sections` AS d
ON ( a.`object_id` = d.`section_id`
AND a.`type` = 'section'
AND d.`status` = 1 )
LEFT JOIN `units` AS e
ON ( a.`object_id` = e.`unit_id`
AND a.`type` = 'unit'
AND e.`status` = 1 )
LEFT JOIN `positions` AS f
ON ( a.`object_id` = f.`position_id`
AND a.`type` = 'position'
AND f.`status` = 1 )
WHERE a.`user_id` = 1 AND (
b.`name` LIKE '?%' OR
c.`name` LIKE '?%' OR
d.`name` LIKE '?%' OR
e.`name` LIKE '?%' OR
f.`name` LIKE '?%'
)

Two simple, fast queries is often better than one huge, inefficient query.
Here's how I'd design it:
First, create a table for all your names, in MyISAM format with a FULLTEXT index. That's where your names are stored. Each of the respective object type (e.g. departments, divisions, etc.) are dependent tables whose primary key reference the primary key of the main named objects table.
Now you can search for names with this much simpler query, which runs blazingly fast:
SELECT a.`object_id`, a.`type`, n.name, n.object_type
FROM `user_permissions` AS a
JOIN `named_objects` AS n ON a.`object_id = n.`object_id`
WHERE MATCH(n.name) AGAINST ('name-to-be-searched')
Using the fulltext index will run hundreds of times faster than using LIKE in the way you're doing.
Once you have the object id and type, if you want any other attributes of the respective object type you can do a second SQL query joining to the table for the appropriate object type:
SELECT ... FROM {$object_type} WHERE object_id = ?
This will also go very fast.
Re your comment: Yes, I'd create the table with names even if it's redundant.

Other than changing the nested Ifs to use a Coalesce() function (MySql has Coalesce() doesn't it)? There is not much you can do as long as you need to filter on that input parameter with a like expresion. Putting a filter on a column using a Like expression, where the Like parameter has a wildcard at the begining, as you do, makes the query argument non-SARG-able, which means that the query processor must do a complete table scan of all the rows in the table to evaluate the filter predicate.
It cannot use an index, because an index is based on the column values, and with your Like parameter, it doesn't know which index entries to read from (since the parameter starts with a wild card)
if MySql has Coalesce, you can replace your Select with:
SELECT a.`object_id`, a.`type`,
Coalesce(n.name, c.name, d.Name, e.Name) name
If you can replace the search argument parameter so that it does not start with a wildcard, then just ensure that there is an index on the name column in each of the tables, and (if there are not indices on that column now), the query performance will increase enormously.

There are 500 things you can do. Optimize once you know where your bottlenecks are. Until then, work on getting those users onto your app. Its a much higher priority.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How can I optimize the 'IN' query? - mysql

ORed (join-) conditions are usually bad, try UNION instead: SELECT data FROM entities WHERE _id IN ( SELECT _id FROM index_users_email WHERE key = 'test' ) UNION SELECT data FROM entities WHERE _id in ( SELECT _id FROM index_users_profile_name WHERE key = 'test' )

SELECT data FROM entities e LEFT OUTER JOIN index_users_email iue ON e._id=iue._id and iue.key = 'test' LEFT OUTER JOIN index_users_profile_name iupn ON e._id=iupn._id and iupn .key = 'test' WHERE iue._id IS NOT NULL or iupn._id IS NOT NULL

Related

Why mysql performing scan operation, if index exists on channel column

Postgresl select count(*) time-consuming

Postgresql doesn't use index

indexes don't affect time execution in ms sql 2014 VS mysql (mariaDB 10)

MySQL -- Better way to do this query?

Categories

Resources