SQL locking of table in an inner join - mysql

Every day, I run a SQL statement to set an estimate of an aggregated value. This is a mysql server and the code looks like this:
UPDATE users
INNER JOIN (
SELECT user_id, COUNT(*) AS action_count
FROM action_log
GROUP BY user_id
) AS action_log
ON users.id = action_log.user_id
SET users.action_count = action_log.action_count
As my database has grown, this is taking longer to run and it seems to be affecting other queries. As the code stands now, is there a lock held on my action_log table for the entirety of the update?
Since this is an estimate and doesn't need to be totally accurate, I'm considering splitting this up into multiple SQL statements. One that does the select to get the aggregated counts per users. And then do single updates for each user row.
I'm hoping that would help. Running EXPLAIN on this query didn't give me much info and I'm not sure if breaking things up in this fashion will actually help.

Yep, it's probably going to hold onto that lock. If you can separate out the SELECT your problem will go away, something like this:
DECLARE #THETABLE TABLE(
USERID INT,
TOTAL INT
)
INSERT INTO #THETABLE
SELECT user_id, COUNT(*) AS action_count
FROM action_log
GROUP BY user_id
UPDATE users
INNER JOIN (
#THETABLE
) AS action_log
ON users.id = action_log.user_id
SET users.action_count = action_log.action_count
You could also dump the results into a #TEMP_TABLE but I like using a table variable. Another option is to play with locking hints like NOLOCK or READPAST or ROWLOCK but I wouldn't suggest it, that can get you into all kinds of other trouble.

Related

SQL Temporary Table or Select

I've got a problem with MySQL select statement.
I have a table with different Department and statuses, there are 4 statuses for every department, but for each month there are not always every single status but I would like to show it in the analytics graph that there is '0'.
I have a problem with select statement that it shows only existing statuses ( of course :D ).
Is it possible to create temporary table with all of the Departments , Statuses and amount of statuses as 0, then update it by values from other select?
Select statement and screen how it looks in perfect situation, and how it looks in bad situation :
SELECT utd.Departament,uts.statusDef as statusoforder,Count(uts.statusDef) as Ilosc_Statusow
FROM ur_tasks_details utd
INNER JOIN ur_tasks_status uts on utd.StatusOfOrder = uts.statusNR
WHERE month = 'Sierpien'
GROUP BY uts.statusDef,utd.Departament
Perfect scenario, now bad scenario :
I've tried with "union" statements but i don't know if there is a possibility to take only "the highest value" for every department.
example :
I've also heard about
With CTE tables, but I don't really get how to use it. Would love to get some tips on it!
Thanks for your help.
Use a cross join to generate the rows you want. Then use a left join and aggregation to bring in the data:
select d.Departament, uts.statusDef as statusoforder,
Count(uts.statusDef) as Ilosc_Statusow
from (select distinct utd.Departament
from ur_tasks_details utd
) d cross join
ur_tasks_status uts left join
ur_tasks_details utd
on utd.Departament = d.Departament and
utd.StatusOfOrder = uts.statusNR and
utd.month = 'Sierpien'
group by uts.statusDef, d.Departament;
The first subquery should be your source of all the departments.
I also suspect that month is in the details table, so that should be part of the on clause.

Cross-Apply bad for a larger database or alternatives perform better?

so 2 (more so 3) questions, is my query just badly coded or thought out ? (be kind, I only just discovered cross apply and relatively new) and is corss-apply even the best sort of join to be using or why is it slow?
So I have a database table (test_tble) of around 66 million records. I then have a ##Temp_tble created which has one column called Ordr_nbr (nchar (13)). This is basically ones I wish to find.
The test_tble has 4 columns (Ordr_nbr, destination, shelf_no, dte_bought).
This is my current query which works the exact way I want it to but it seems to be quite slow performance.
select ##Temp_tble.Ordr_nbr, test_table1.destination, test_table1.shelf_no,test_table1.dte_bought
from ##MyTempTable
cross apply(
select top 1 test_table.destination,Test_Table.shelf_no,Test_Table.dte_bought
from Test_Table
where ##MyTempTable.Order_nbr = Test_Table.order_nbr
order by dte_bought desc)test_table1
If the ##Temp_tble only has 17 orders to search for it take around 2 mins. As you can see I'm trying to get just the most recent dte_bought or to some max(dte_bought) of each order.
In term of index I ran database engine tuner and it says its optimized for the query and I have all relative indexes created such as clustered index on test_tble for dte_bought desc including order_nbr etc.
The execution plan is using a index scan(on non_clustered) and a key lookup(on clustered).
My end result is it to return all the order_nbrs in ##MyTempTble along with columns of destination, shelf_no, dte_bought in relation to that order_nbr, but only the most recent bought ones.
Sorry if I explained this awfully, any info needed that I can provide just ask. I'm not asking for just downright "give me code", more of guidance,advice and learning. Thank you in advance.
UPDATE
I have now tried a sort of left join, it works reasonably quicker but still not instant or very fast (about 30 seconds) and it also doesn't return just the most recent dte_bought, any ideas? see below for left join code.
select a.Order_Nbr,b.Destination,b.LnePos,b.Dte_bought
from ##MyTempTble a
left join Test_Table b
on a.Order_Nbr = b.Order_Nbr
where b.Destination is not null
UPDATE 2
Attempted another let join with a max dte_bought, works very but only returns the order_nbr, the other columns are NULL. Any suggestion?
select a.Order_nbr,b.Destination,b.Shelf_no,b.Dte_Bought
from ##MyTempTable a
left join
(select * from Test_Table where Dte_bought = (
select max(dte_bought) from Test_Table)
)b on b.Order_nbr = a.Order_nbr
order by Dte_bought asc
K.M
Instead of CROSS APPLY() you can use INNER JOIN with subquery. Check the following query :
SELECT
TempT.Ordr_nbr
,TestT.destination
,TestT.shelf_no
,TestT.dte_bought
FROM ##MyTempTable TempT
INNER JOIN (
SELECT T.destination
,T.shelf_no
,T.dte_bought
,ROW_NUMBER() OVER(PARTITION BY T.Order_nbr ORDER BY T.dte_bought DESC) ID
FROM Test_Table T
) TestT
ON TestT.Id=1 AND TempT.Order_nbr = TestT.order_nbr

SQL not exists to find some issues

I'm practicing some SQL (new to this),
I have the next tables:
screening_occapancy(idscreening,row,col,idclient)
screening(screeningid,idmovie,idtheater,screening_time)
Im trying to creating a query to search which clients watched all the movies in the "screening" table and show their ID(idclient).
this is what I written(which doesn't work):
select idclient from screening_occapancy p where not exists
(select screeningid from screening where screeningid=p.idscreening)
I know it's probably not that good so please try to explain also what am I doing wrong.
P.S My mission is to use not/exists while doing it...
Thanks!
Your query is basically fine, although the select distinct is unnecessary in the subquery:
select p.idclient
from screening_occapancy p
where not exists (select 1
from screening s
where s.screeningid = p.idscreening
);
Notes:
You can select anything in the exists subquery. Selecting a column is misleading.
Use table aliases and use them for all column references, particularly in a correlated subquery.
If you are designing the tables, I would advise you to give the primary key and foreign key the same name (screeningid or idscreening, but not both).
EDIT:
If you want clients who watched all movies, then I would approach this as:
select p.idclient
from screening_occapancy p
group by p.idclient
having count(distinct p.screening_occapancy p) = (select count(*) from screening);
Why don't you count the number of movies in the screening_table, load it into a variable and check the results of your query results against the variable?
load number of movies into variable (identified by idmovie):
SELECT count(DISTINCT(idmovie)) FROM screening INTO #number_of_movies;
check the results of your query against the variable:
SELECT A.idclient,
count(DISTINCT(idmovie)) AS number_of_movies_watched,
FROM screening_occapancy A
INNER JOIN screening B
ON(A.idscreening = B.screeningid)
GROUP BY A.idclient
HAVING number_of_movies_watched = #number_of_movies ;
If you want to find all clients, that attended all screenings, replace idmovie with screeningid.
Even someone relatively new to MySQL can get his head around this query. The "not exists"-approach is more difficult to understand.

Lazily evaluate MySQL view

I have some MySQL views which define a number of extra columns based on some relatively straightforward subqueries. The database is also multi-tenanted so each row has a company ID against it.
The problem I have is my views are evaluated for every row before being filtered by the company ID, giving huge performance issues. Is there any way to lazily evaluate the view so the 'where' clause in the outer query applies to the subqueries in the view. Or is there something similar to views that I can use to add the extra fields. I want to calculate them in SQL so the calculated fields can be used for filtering/searching/sorting/pagination.
I've taken a look at the MySQL docs that explain the algorithms available and am aware that the views can't be proccessed as a 'merge' since they contain subqueries.
view
create view companies_view as
select *,
(
select count(id) from company_user where company_user.company_id = companies.id
) as user_count,
(
select count(company_user.user_id)
from company_user join users on company_user.user_id = users.id
where company_user.company_id = companies.id
and users.active = 1
) as active_user_count,
(
select count(company_user.user_id)
from company_user join users on company_user.user_id = users.id
where company_user.company_id = companies.id
and users.active = 0
as inactive_user_count
from companies;
query
select * from companies_view where company_id = 123;
I want the subqueries in the view to be evaluated AFTER applying the 'where company_id = 123' from the main query scope. I can't hard code the company ID in the view since I want the view to be usable for any company ID.
You cannot change the order of evaluation, that is set by the MySQL server.
However, in this particular case you could rewrite the whole sql statement to use joins and conditional counts instead of subqueries:
select c.*,
count(u.id) as user_count,
count(if(u.active=1, 1, null)) as active_user_count,
count(if(u.active=0, 1, null)) as inactive_user_count
from companies c
left join company_user cu on c.id=cu.company_id
left join users u on cu.user_id = u.id
group by c.company_id, ...
If you have MySQL v5.7, then you may not need to add any further fields to the group by clause since the other fields in the companies table would be functionally dependent on the company_id. In earlier versions you may have to list all fields in the companies table (depends on the sql mode settings).
Another way to optimalise such query would be using denormalisation. Your users and company_user table probably have a lot more records than your companies table. You could add a user_count, an active_user_count, and an inactive_user_count field to the companies table, add after insert / update / delete triggers to the company_user table and an after update to the users table and update these 2 fields there. This way you would not need to do the joins and the conditional counts in the view.
It is possible to convince the optimizer to handle a view with scalar subqueries using the MERGE algorithm... you just have to beat the optimizer at its own game.
This will seem quite unorthodox to some, but it is a pattern I use with success in cases where this is needed.
Create a stored function to encapsulate each subquery, then reference the stored function in the view. The optimizer remains blissfully unaware that the functions will invoke the subqueries.
CREATE FUNCTION user_count (_cid INT) RETURNS INT
DETERMINISTIC
READS SQL DATA
RETURN (SELECT count(id) FROM company_user WHERE company_user.company_id = _cid);
Note that a stored function with a single statement does not need BEGIN/END or a change of DELIMITER.
Then in the view, replace the subquery with:
user_count(id) AS user_count,
And repeat the process for each subquery.
The optimizer will then process the view as a MERGE view, select the one appropriate row from the companies table based on the outer WHERE, invoke the functions, and... problem solved.

Join on 3 tables insanely slow on giant tables

I have a query which goes like this:
SELECT insanlyBigTable.description_short,
insanlyBigTable.id AS insanlyBigTable,
insanlyBigTable.type AS insanlyBigTableLol,
catalogpartner.id AS catalogpartner_id
FROM insanlyBigTable
INNER JOIN smallerTable ON smallerTable.id = insanlyBigTable.catalog_id
INNER JOIN smallerTable1 ON smallerTable1.catalog_id = smallerTable.id
AND smallerTable1.buyer_id = 'xxx'
WHERE smallerTable1.cont = 'Y' AND insanlyBigTable.type IN ('111','222','33')
GROUP BY smallerTable.id;
Now, when I run the query first time it copies the giant table into a temp table... I want to know how I can prevent that? I am considering a nested query, or even to reverse the join (not sure the effect would be to run faster), but that is well, not nice. Any other suggestions?
To figure out how to optimize your query, we first have to boil down exactly what it is selecting so that we can preserve that information while we change things around.
What your query does
So, it looks like we need the following
The GROUP BY clause limits the results to at most one row per catalog_id
smallerTable1.cont = 'Y', insanelyBigTable.type IN ('111','222','33'), and buyer_id = 'xxx' appear to be the filters on the query.
And we want data from insanlyBigTable and ... catalogpartner? I would guess that catalogpartner is smallerTable1, due to the id of smallerTable being linked to the catalog_id of the other tables.
I'm not sure on what the purpose of including the buyer_id filter on the ON clause was for, but unless you tell me differently, I'll assume the fact it is on the ON clause is unimportant.
The point of the query
I am unsure about the intent of the query, based on that GROUP BY statement. You will obtain just one row per catalog_id in the insanelyBigTable, but you don't appear to care which row it is. Indeed, the fact that you can run this query at all is due to a special non-standard feature in MySQL that lets you SELECT columns that do not appear in the GROUP BY statement... however, you don't get to select WHICH columns. This means you could have information from 4 different rows for each of your selected items.
My best guess, based on column names, is that you are trying to bring back a list of items that are in the same catalog as something that was purchased by a given buyer, but without any more than one item per catalog. In addition, you want something to connect back to the purchased item in that catalog, via the catalogpartner table's id.
So, something probably akin to amazon's "You may like these items because you purchased these other items" feature.
The new query
We want 1 row per insanlyBigTable.catalog_id, based on which catalog_id exists in smallerTable1, after filtering.
SELECT
ibt.description_short,
ibt.id AS insanlyBigTable,
ibt.type AS insanlyBigTableLol,
(
SELECT smallerTable1.id FROM smallerTable1 st
WHERE st.buyer_id = 'xxx'
AND st.cont = 'Y'
AND st.catalog_id = ibt.catalog_id
LIMIT 1
) AS catalogpartner_id
FROM insanlyBigTable ibt
WHERE ibt.id IN (
SELECT (
SELECT ibt.id AS ibt_id
FROM insanlyBigTable ibt
WHERE ibt.catalog_id = sti.catalog_id
LIMIT 1
) AS ibt_id
FROM (
SELECT DISTINCT(catalog_id) FROM smallerTable1 st
WHERE st.buyer_id = 'xxx'
AND st.cont = 'Y'
AND EXISTS (
SELECT * FROM insanlyBigTable ibt
WHERE ibt.type IN ('111','222','33')
AND ibt.catalog_id = st.catalog_id
)
) AS sti
)
This query should generate the same result as your original query, but it breaks things down into smaller queries to avoid the use (and abuse) of the GROUP BY clause on the insanlyBigTable.
Give it a try and let me know if you run into problems.