why my sql query slow? - mysql

I try to create a view which join from 4 tables (tb_user is 200 row, tb_transaction is 250.000 row, tb_transaction_detail is 250.000 row, tb_ms_location is 50 row),
when i render with datatables serverside, it's take 13 secons. even when I filtering it.
I don't know why it's take too long...
here my sql query
CREATE VIEW `vw_cashback` AS
SELECT
`tb_user`.`nik` AS `nik`,
`tb_user`.`full_name` AS `nama`,
`tb_ms_location`.`location_name` AS `lokasi`,
`tb_transaction`.`date_transaction` AS `tanggal_setor`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=1 THEN 1 ELSE 0 END) AS `mobil`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=2 THEN 1 ELSE 0 END) AS `motor`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=3 THEN 1 ELSE 0 END) AS `truck`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=4 THEN 1 ELSE 0 END) AS `speda`,
sum(`tb_transaction_detail`.`total`) AS `total_global`,
(sum(`tb_transaction_detail`.`total`) * 0.8) AS `total_user`,
(sum(`tb_transaction_detail`.`total`) * 0.2) AS `total_tgr`,
((sum(`tb_transaction_detail`.`total`) * 0.2) / 2) AS `total_cashback`,
(curdate() - cast(`tb_user`.`created_at` AS date)) AS `status`
FROM `tb_user`
JOIN `tb_transaction` ON `tb_user`.`id` = `tb_transaction`.`user_id`
JOIN `tb_transaction_detail` ON `tb_transaction`.`id` = `tb_transaction_detail`.`transaction_id`
JOIN `tb_ms_location` ON `tb_ms_location`.`id` = `tb_transaction`.`location_id`
GROUP BY
`tb_user`.`id`,
`tb_transaction`.`date_transaction`,
`tb_user`.`nik`,
`tb_user`.`full_name`,
`tb_user`.`created_at`,
`tb_ms_location`.`location_name`
thanks

The unfiltered query must be slow, because it takes all records from all tables, joins and aggregates them.
But you say the view is still slow when you filter. The question is: How do you filter? As you are aggregating by user, location and transaction date, it should be one of these. However, you don't have the user ID or the transaction ID in your result list. This doesn't feel natural and I'd suggest you add them, so a query like
select * from vw_cashback where user_id = 5
or
select * from vw_cashback where transaction_id = 12345
would be possible.
As is, you'd have to filter by location name or user nik / name. So if you want it thus, then create Indexes for the lookup:
CREATE idx_location_name ON tb_ms_location(location_name, id)
CREATE idx_user_name ON tb_user(full_name, id)
CREATE idx_user_nik ON tb_user(nik, id)
The latter two can even be turned into covering indexs (i.e. indexes containing all columns used in the query) that may still speed up the process:
CREATE idx_user_name ON tb_user(nik, id, full_name, created_at);
CREATE idx_user_nik ON tb_user(full_name, id, nik, created_at);
As for the access via index, you also may want covering indexes:
CREATE idx_location_id ON tb_ms_location(id, location_name)
CREATE idx_user_id ON tb_user(id, nik, full_name, created_at);

Related

How to fetch multiple data from two tables in sql query

There are two table one is egg table and other one is rate disabled table.
Below I have share a screenshot so you can understand.
I want to fetch egg table data whose all the field is greater than 0 and for that particular field rate_status not disabled .
output should come like this:`
desi_egg =108, small_egg =55
(only two field should come because double_keshar_egg and medium_egg rate is greate than 0 and large_egg rate_status is disabled)
Here merchant_id is common for both table.
Can anyone has any idea
How to solve this proble by using sql query or hql query.
I am using MySql databse.
You are suggesting some cumbersome query like this:
select concat_ws(', ',
(case when desi_egg > 0 and
not exists (select 1
from testdb.rate_disabled rd
where rd.merchant_id = e.merchant_id and
rd.productName = 'desi_egg'
)
then concat('desi_egg=', e.desi_egg)
end),
(case when desi_egg > 0 and
not exists (select 1
from testdb.rate_disabled rd
where rd.merchant_id = e.merchant_id and
rd.productName = 'double_kesher_egg'
)
then concat('double_kesher_egg=', e.double_kesher_egg)
end),
. . .
) as all_my_eggs
from testdb.egg e;

Get Multi Columns Count in Single Query

I am working on a application where I need to write a query on a table, which will return multiple columns count in a single query.
After research I was able to develop a query for a single sourceId, but what will happen if i want result for multiple sourceIds.
select '3'as sourceId,
(select count(*) from event where sourceId = 3 and plateCategoryId = 3) as TotalNewCount,
(select count(*) from event where sourceId = 3 and plateCategoryId = 4) as TotalOldCount;
I need to get TotalNewCount and TotalOldCount for several source Ids, for example (3,4,5,6)
Can anyone help, how can I revise my query to return a result set of three columns including data of all sources in list (3,4,5,6)
Thanks
You can do all source ids at once:
select source_id
sum(case when plateCategoryId = 3 then 1 else 0 end) as TotalNewCount,
sum(case when plateCategoryId = 4 then 1 else 0 end) as TotalOldCount
from event
group by source_id;
Use a where (before the group by) if you want to limit the source ids.
Note: The above works in both Vertica and MySQL, and being standard SQL should work in any database.

Searching large (6 million) rows MySQL with stored queries?

I have a database with roughly 6 million entries - and will grow - where I'm running queries to return for a HighCharts charting functionality. I need to read longitudinally over years, so I'm running queries like this:
foreach($states as $state_id) { //php code
SELECT //mysql psuedocode
sum(case when mydatabase.Year = '2003' then 1 else 0 end) Year_2003,
sum(case when mydatabase.Year = '2004' then 1 else 0 end) Year_2004,
sum(case when mydatabase.Year = '2005' then 1 else 0 end) Year_2005,
sum(case when mydatabase.Year = '2006' then 1 else 0 end) Year_2006,
sum(case when mydatabase.Year = '2007' then 1 else 0 end) Year_2007,
sum(case when mydatabase.Year = '$more_years' then 1 else 0 end) Year_$whatever_year,
FROM mytable
WHERE State='$state_id'
AND Sex IN (0,1)
AND Age_segment IN (5,4,3,2,1)
AND "other_filters IN (etc, etc, etc)
} //end php code
But for various state at once... So returning lets say 5 states, each with the above statement but a state ID is substituted. Meanwhile the years can be any number of years, the Sex (male/female/other) and Age segment and other modifiers keep changing based on filters. The queries are long (at minimum 30-40seconds) a piece. So a thought I had - unless I'm totally doing it wrong - is to actually store the above query in a second table with the results, and first check that "meta query" and see if it was "cached" and then return the results without reading the db (which won't be updated very often).
Is this a good method or are there potential problems I'm not seeing?
EDIT: changed to table, not db (duh).
Table structure is:
id | Year | Sex | Age_segment | Another_filter | Etc
Nothing more complicated than that and no joining anything else. There are keys on id, Year, Sex, and Age_segment right now.
Proper indexing is what is needed to speed up the query. Start by doing an "EXPLAIN" on the query and post the results here.
I would suggest the following to start off. This way avoids the for loop and returns the data in 1 query. Not knowing the number of rows and cardinality of each column I suggest a composite index on State and Year.
SELECT mytable.State,mytable.Year,count(*)
FROM mytable
AND Sex IN (0,1)
AND Age_segment IN (5,4,3,2,1)
AND "other_filters IN (etc, etc, etc)
GROUP BY mytable.State,mytable.Year
The above query can be further optimised by checking the cardinality of some of the columns. Run the following to get the cardinality:
SELECT Age_segment FROM mytable GROUP BY Age_segment;
Pseudo code...
SELECT Year
, COUNT(*) total
FROM my_its_not_a_database_its_a_table
WHERE State = $state_id
AND Sex IN (0,1)
AND Age_segment IN (5,4,3,2,1)
GROUP
BY Year;

query optimization for mysql

I have the following query which takes about 28 seconds on my machine. I would like to optimize it and know if there is any way to make it faster by creating some indexes.
select rr1.person_id as person_id, rr1.t1_value, rr2.t0_value
from (select r1.person_id, avg(r1.avg_normalized_value1) as t1_value
from (select ma1.person_id, mn1.store_name, avg(mn1.normalized_value) as avg_normalized_value1
from matrix_report1 ma1, matrix_normalized_notes mn1
where ma1.final_value = 1
and (mn1.normalized_value != 0.2
and mn1.normalized_value != 0.0 )
and ma1.user_id = mn1.user_id
and ma1.request_id = mn1.request_id
and ma1.request_id = 4 group by ma1.person_id, mn1.store_name) r1
group by r1.person_id) rr1
,(select r2.person_id, avg(r2.avg_normalized_value) as t0_value
from (select ma.person_id, mn.store_name, avg(mn.normalized_value) as avg_normalized_value
from matrix_report1 ma, matrix_normalized_notes mn
where ma.final_value = 0 and (mn.normalized_value != 0.2 and mn.normalized_value != 0.0 )
and ma.user_id = mn.user_id
and ma.request_id = mn.request_id
and ma.request_id = 4
group by ma.person_id, mn.store_name) r2
group by r2.person_id) rr2
where rr1.person_id = rr2.person_id
Basically, it aggregates data depending on the request_id and final_value (0 or 1). Is there a way to simplify it for optimization? And it would be nice to know which columns should be indexed. I created an index on user_id and request_id, but it doesn't help much.
There are about 4907424 rows on matrix_report1 and 335740 rows on matrix_normalized_notes table. These tables will grow as we have more requests.
First, the others are right about knowing better how to format your samples. Also, trying to explain in plain language what you are trying to do is also a benefit. With sample data and sample result expectations is even better.
However, that said, I think it can be significantly simplified. Your queries are almost completely identical with the exception of the one field of "final_value" = 1 or 0 respectively. Since each query will result in 1 record per "person_id", you can just do the average based on a CASE/WHEN AND remove the rest.
To help optimize the query, your matrix_report1 table should have an index on ( request_id, final_value, user_id ). Your matrix_normalized_notes table should have an index on ( request_id, user_id, store_name, normalized_value ).
Since your outer query is doing the average based on an per stores averages, you do need to keep it nested. The following should help.
SELECT
r1.person_id,
avg(r1.ANV1) as t1_value,
avg(r1.ANV0) as t0_value
from
( select
ma1.person_id,
mn1.store_name,
avg( case when ma1.final_value = 1
then mn1.normalized_value end ) as ANV1,
avg( case when ma1.final_value = 0
then mn1.normalized_value end ) as ANV0
from
matrix_report1 ma1
JOIN matrix_normalized_notes mn1
ON ma1.request_id = mn1.request_id
AND ma1.user_id = mn1.user_id
AND NOT mn1.normalized_value in ( 0.0, 0.2 )
where
ma1.request_id = 4
AND ma1.final_Value in ( 0, 1 )
group by
ma1.person_id,
mn1.store_name) r1
group by
r1.person_id
Notice the inner query is pulling all transactions for the final value as either a zero OR one. But then, the AVG is based on a case/when of the respective value for the normalized value. When the condition is NOT the 1 or 0 respectively, the result is NULL and is thus not considered when the average is computed.
So at this point, it is grouped on a per-person basis already with each store and Avg1 and Avg0 already set. Now, roll these values up directly per person regardless of the store. Again, NULL values should not be considered as part of the average computation. So, if Store "A" doesn't have a value in the Avg1, it should not skew the results. Similarly if Store "B" doesnt have a value in Avg0 result.

SELECT CASE, COUNT(*)

I want to select the number of users that has marked some content as favorite and also return if the current user has "voted" or not. My table looks like this
CREATE TABLE IF NOT EXISTS `favorites` (
`user` int(11) NOT NULL DEFAULT '0',
`content` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`user`,`content`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Say I have 3 rows containing
INSERT INTO `favorites` (`user`, `content`) VALUES
(11, 26977),
(22, 26977),
(33, 26977);
Using this
SELECT COUNT(*), CASE
WHEN user='22'
THEN 1
ELSE 0
END as has_voted
FROM favorites WHERE content = '26977'
I expect to get has_voted=1 and COUNT(*)=3 but
I get has_voted=0 and COUNT(*)=3. Why is that? How to fix it?
This is because you mixed aggregated and non-aggregated expressions in a single SELECT. Aggregated expressions work on many rows; non-aggregated expressions work on a single row. An aggregated (i.e. COUNT(*)) and a non-aggregated (i.e. CASE) expressions should appear in the same SELECT when you have a GROUP BY, which does not make sense in your situation.
You can fix your query by aggregating the second expression - i.e. adding a SUM around it, like this:
SELECT
COUNT(*) AS FavoriteCount
, SUM(CASE WHEN user=22 THEN 1 ELSE 0 END) as has_voted
FROM favorites
WHERE content = 26977
Now both expressions are aggregated, so you should get the expected results.
Try this with SUM() and without CASE
SELECT
COUNT(*),
SUM(USER = '22') AS has_voted
FROM
favorites
WHERE content = '26977'
See Fiddle Demo
Try this:
SELECT COUNT(*), MAX(USER=22) AS has_voted
FROM favorites
WHERE content = 26977;
Check the SQL FIDDLE DEMO
OUTPUT
| COUNT(*) | HAS_VOTED |
|----------|-----------|
| 3 | 1 |
You need sum of votes.
SELECT COUNT(*), SUM(CASE
WHEN user='22'
THEN 1
ELSE 0
END) as has_voted
FROM favorites WHERE content = '26977'
You are inadvertently using a MySQL feature here: You aggregate your results to get only one result record showing the number of matches (aggregate function COUNT). But you also show the user (or rather an expression built on it) in your result line (without any aggregate function). So the question is: Which user? Another dbms would have given you an error, asking you to either state the user in a GROUP BY or aggregate users. MySQL instead picks a random user.
What you want to do here is aggregate users (or rather have your expression aggregated). Use SUM to sum all votes the user has given on the requested content:
SELECT
COUNT(*),
SUM(CASE WHEN user='22' THEN 1 ELSE 0 END) as sum_votes
FROM favorites
WHERE content = '26977';
You forgot to wrap the CASE statement inside an aggregate function. In this case has_voted will contain unexpected results since you are actually doing a "partial group by". Here is what you need to do:
SELECT COUNT(*), SUM(CASE WHEN USER = 22 THEN 1 ELSE 0 END) AS has_voted
FROM favorites
WHERE content = 26977
Or:
SELECT COUNT(*), COUNT(CASE WHEN USER = 22 THEN 1 ELSE NULL END) AS has_voted
FROM favorites
WHERE content = 26977