google cloud storage mySql query reads only a few hundred lines per second - mysql

I have set up a database with a table of approximately 100GB. I use 4cpu and 15GB ram on the gcs. When I make a query I can see from the read/write operations dashboards that only a few hundred lines are read per second which will take forever to complete.
the query is done on DBeaver, it is not a complicated query as can be seen from the picture attached. I simply do not understand why it is so slow??
the query is simply this:
"""INSERT INTO analytics_agg (
hash,
product,
interface,
click_time_0,
click_time_5,
click_time_10,
click_time_30,
click_time_60)
SELECT
hash,
product,
interface,
count(case when click_time=0 then 1 else 0 end) ,
count(case when click_time=5 then 1 else 0 end) ,
count(case when click_time=10 then 1 else 0 end) ,
count(case when click_time=30 then 1 else 0 end) ,
count(case when click_time=60 then 1 else 0 end)
FROM analytics
group by hash,product,interface """

As you're selecting from one table and inserting in another, the selection from the first table has a large impact on the final insertion rate and performance.
Try to create the following index, which should optimize the selection query and improve the overall performance of the insertion:
ALTER TABLE `analytics` ADD INDEX `analytics_idx_hash_product_interface` (`hash`,`product`,`interface`);

Related

Get Multi Columns Count in Single Query

I am working on a application where I need to write a query on a table, which will return multiple columns count in a single query.
After research I was able to develop a query for a single sourceId, but what will happen if i want result for multiple sourceIds.
select '3'as sourceId,
(select count(*) from event where sourceId = 3 and plateCategoryId = 3) as TotalNewCount,
(select count(*) from event where sourceId = 3 and plateCategoryId = 4) as TotalOldCount;
I need to get TotalNewCount and TotalOldCount for several source Ids, for example (3,4,5,6)
Can anyone help, how can I revise my query to return a result set of three columns including data of all sources in list (3,4,5,6)
Thanks
You can do all source ids at once:
select source_id
sum(case when plateCategoryId = 3 then 1 else 0 end) as TotalNewCount,
sum(case when plateCategoryId = 4 then 1 else 0 end) as TotalOldCount
from event
group by source_id;
Use a where (before the group by) if you want to limit the source ids.
Note: The above works in both Vertica and MySQL, and being standard SQL should work in any database.

why my sql query slow?

I try to create a view which join from 4 tables (tb_user is 200 row, tb_transaction is 250.000 row, tb_transaction_detail is 250.000 row, tb_ms_location is 50 row),
when i render with datatables serverside, it's take 13 secons. even when I filtering it.
I don't know why it's take too long...
here my sql query
CREATE VIEW `vw_cashback` AS
SELECT
`tb_user`.`nik` AS `nik`,
`tb_user`.`full_name` AS `nama`,
`tb_ms_location`.`location_name` AS `lokasi`,
`tb_transaction`.`date_transaction` AS `tanggal_setor`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=1 THEN 1 ELSE 0 END) AS `mobil`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=2 THEN 1 ELSE 0 END) AS `motor`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=3 THEN 1 ELSE 0 END) AS `truck`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=4 THEN 1 ELSE 0 END) AS `speda`,
sum(`tb_transaction_detail`.`total`) AS `total_global`,
(sum(`tb_transaction_detail`.`total`) * 0.8) AS `total_user`,
(sum(`tb_transaction_detail`.`total`) * 0.2) AS `total_tgr`,
((sum(`tb_transaction_detail`.`total`) * 0.2) / 2) AS `total_cashback`,
(curdate() - cast(`tb_user`.`created_at` AS date)) AS `status`
FROM `tb_user`
JOIN `tb_transaction` ON `tb_user`.`id` = `tb_transaction`.`user_id`
JOIN `tb_transaction_detail` ON `tb_transaction`.`id` = `tb_transaction_detail`.`transaction_id`
JOIN `tb_ms_location` ON `tb_ms_location`.`id` = `tb_transaction`.`location_id`
GROUP BY
`tb_user`.`id`,
`tb_transaction`.`date_transaction`,
`tb_user`.`nik`,
`tb_user`.`full_name`,
`tb_user`.`created_at`,
`tb_ms_location`.`location_name`
thanks
The unfiltered query must be slow, because it takes all records from all tables, joins and aggregates them.
But you say the view is still slow when you filter. The question is: How do you filter? As you are aggregating by user, location and transaction date, it should be one of these. However, you don't have the user ID or the transaction ID in your result list. This doesn't feel natural and I'd suggest you add them, so a query like
select * from vw_cashback where user_id = 5
or
select * from vw_cashback where transaction_id = 12345
would be possible.
As is, you'd have to filter by location name or user nik / name. So if you want it thus, then create Indexes for the lookup:
CREATE idx_location_name ON tb_ms_location(location_name, id)
CREATE idx_user_name ON tb_user(full_name, id)
CREATE idx_user_nik ON tb_user(nik, id)
The latter two can even be turned into covering indexs (i.e. indexes containing all columns used in the query) that may still speed up the process:
CREATE idx_user_name ON tb_user(nik, id, full_name, created_at);
CREATE idx_user_nik ON tb_user(full_name, id, nik, created_at);
As for the access via index, you also may want covering indexes:
CREATE idx_location_id ON tb_ms_location(id, location_name)
CREATE idx_user_id ON tb_user(id, nik, full_name, created_at);

Searching large (6 million) rows MySQL with stored queries?

I have a database with roughly 6 million entries - and will grow - where I'm running queries to return for a HighCharts charting functionality. I need to read longitudinally over years, so I'm running queries like this:
foreach($states as $state_id) { //php code
SELECT //mysql psuedocode
sum(case when mydatabase.Year = '2003' then 1 else 0 end) Year_2003,
sum(case when mydatabase.Year = '2004' then 1 else 0 end) Year_2004,
sum(case when mydatabase.Year = '2005' then 1 else 0 end) Year_2005,
sum(case when mydatabase.Year = '2006' then 1 else 0 end) Year_2006,
sum(case when mydatabase.Year = '2007' then 1 else 0 end) Year_2007,
sum(case when mydatabase.Year = '$more_years' then 1 else 0 end) Year_$whatever_year,
FROM mytable
WHERE State='$state_id'
AND Sex IN (0,1)
AND Age_segment IN (5,4,3,2,1)
AND "other_filters IN (etc, etc, etc)
} //end php code
But for various state at once... So returning lets say 5 states, each with the above statement but a state ID is substituted. Meanwhile the years can be any number of years, the Sex (male/female/other) and Age segment and other modifiers keep changing based on filters. The queries are long (at minimum 30-40seconds) a piece. So a thought I had - unless I'm totally doing it wrong - is to actually store the above query in a second table with the results, and first check that "meta query" and see if it was "cached" and then return the results without reading the db (which won't be updated very often).
Is this a good method or are there potential problems I'm not seeing?
EDIT: changed to table, not db (duh).
Table structure is:
id | Year | Sex | Age_segment | Another_filter | Etc
Nothing more complicated than that and no joining anything else. There are keys on id, Year, Sex, and Age_segment right now.
Proper indexing is what is needed to speed up the query. Start by doing an "EXPLAIN" on the query and post the results here.
I would suggest the following to start off. This way avoids the for loop and returns the data in 1 query. Not knowing the number of rows and cardinality of each column I suggest a composite index on State and Year.
SELECT mytable.State,mytable.Year,count(*)
FROM mytable
AND Sex IN (0,1)
AND Age_segment IN (5,4,3,2,1)
AND "other_filters IN (etc, etc, etc)
GROUP BY mytable.State,mytable.Year
The above query can be further optimised by checking the cardinality of some of the columns. Run the following to get the cardinality:
SELECT Age_segment FROM mytable GROUP BY Age_segment;
Pseudo code...
SELECT Year
, COUNT(*) total
FROM my_its_not_a_database_its_a_table
WHERE State = $state_id
AND Sex IN (0,1)
AND Age_segment IN (5,4,3,2,1)
GROUP
BY Year;

What is a better way to process data?

I have a table with around 10 million rows and 47 columns in Oracle. I do some processing on them before converting the data into JSON and transporting it to view layer. The processing is mostly select() grouping by various columns. This processing by select() is done 5 times with each time differently grouped columns. Now this is taking a lot of time. Is there any way to speed up the process?
I was thinking about pumping data from table into a csv file and processing it and then converting data into JSON to send it. Am I thinking into right direction. Please help.
The 5 queries I use are below for better understanding.
select sum(case when LOWER(column1) LIKE 'succeeded' then 1 else 0 end)/count(*))
from tablename where (TIME_STAMP between 'startTime' and 'endTime')
select column2,sum(case when LOWER(column1) LIKE 'succeeded' then 1 else 0 end)/count(*))
from tablename where (TIME_STAMP between 'startTime' and 'endTime') group by column2;
select column2,column3,sum(case when LOWER(column1) LIKE 'succeeded' then 1 else 0 end)/count(*))
from tablename where (TIME_STAMP between 'startTime' and 'endTime') group by column2,column3;
select column4,column3,sum(case when LOWER(column1) LIKE 'succeeded' then 1 else 0 end)/count(*))
from tablename where (TIME_STAMP between 'startTime' and 'endTime') group by column4,column3;
select column5,column4,column3,sum(case when LOWER(column1) LIKE 'succeeded' then 1 else 0 end)/count(*))
from tablename where (TIME_STAMP between 'startTime' and 'endTime') group by column5,column4,column3;
The result set is combined with the help of JSON and sent to View layer.
EDIT1: There are going to be multiple connections(5-20) to this database. Each connection executing these same queries.

Dividing count(*) sql statements to get ratios

I have a table with one column as a web site name and one column as an action that can be done for that website. So, one row will look like this -
site action
----- ------
Yahoo View
I am trying to find the ratio of one action to another. I know how I can do this for the entire table with the statement below, but I was wondering if there was a statement I could write that would return the ratios on a site-level. So I would get something like Yahoo 15%, Google 20%, etc, all listed out so I wouldn't have to have a different statement for each site. Thanks
select (select count(*) from practice
where action='Like') / (select count(*)
from practice where action='View') from dual;
The keyword you need is group by. Not knowing your field table names a little hard but something like this
Select
sum(case when Action = 'like' then 1 else 0 end) as CountLike,
sum(case when Action = 'view' then 1 else 0 end) as CountView,
(sum(case when Action = 'like' then 1 else 0 end)/count(action)) as RatioLikeTotal ,
(sum(case when Action = 'view' then 1 else 0 end)/count(action)) as RatioViewTotal ,
Site
from
tblLinks
group by
Site
With the breakdown by site you can calculate the ratios in your app or against a total of all hits.