Im trying to make an operation of creating user network based on call detail records in my CDR table.
To make things simple lets say Ive got CDR table :
CDRid
UserAId
UserBId
there is more than 100 mln records so table is quite big.
I reated user2user table:
UserAId
UserBId
NumberOfConnections
then using curos I iterate through each row in the table, then I make select statement:
if in user2user table there is record which has UserAId = UserAId from CDR record and UserBId = UserBId from CDR record then increase NumberOfConnections.
otherwise insert such a row which NumebrOfConnections = 1.
Quite simple task and it works as I said using cursor but it is very bad in performance (estimated time at my computer ~60 h).
I heard about Sql Server Integration Services that it has got better performance when we are talking about such big tables.
Problem is that I have no idea how to customize SSIS package for creating such task.
If anyone has got any idea how to help me, any good resources etc I would be really thankful.
Maybe there is any other good solution to make it work faster. I used indexes and variable tables and so on and performance is still pure.
thanks for help,
P.S.
This is script which I wrote and execution of this takes sth like 40 - 50 h.
DECLARE CDR_cursor CURSOR FOR
SELECT CDRId, SubscriberAId, BNumber
FROM dbo.CDR
OPEN CDR_cursor;
FETCH NEXT FROM CDR_cursor
INTO #CdrId, #SubscriberAId, #BNumber;
WHILE ##FETCH_STATUS = 0
BEGIN
--here I check if there is a user with this number (Cause in CDR i only have SubscriberAId --and BNumber so that I need to check which one user is this (I only have users from
--network so that each time I cant find this user I add one which is outide network)
SELECT #UserBId = (Select UserID from dbo.Number where Number = #BNumber)
IF (#UserBId is NULL)
BEGIN
INSERT INTO dbo.[User] (ID, Marked, InNetwork)
VALUES (#OutUserId, 0, 0);
INSERT into dbo.[Number](Number, UserId) values (#BNumber, #OutUserId);
INSERT INTO dbo.User2User
VALUES (#SubscriberAId, #OutUserId, 1)
SET #OutUserId = #OutUserId - 1;
END
else
BEGIN
UPDATE dbo.User2User
SET NumberOfConnections = NumberOfConnections + 1
WHERE User1ID = #SubscriberAId AND User2ID = #UserBId
-- Insert the row if the UPDATE statement failed.
if(##ROWCOUNT = 0)
BEGIN
INSERT INTO dbo.User2User
VALUES (#SubscriberAId, #UserBId, 1)
END
END
SET #Counter = #Counter + 1;
if((#Counter % 100000) = 0)
BEGIN
PRINT Cast (#Counter as NVarchar(12));
END
FETCH NEXT FROM CDR_cursor
INTO #CdrId, #SubscriberAId, #BNumber;
END
CLOSE CDR_cursor;
DEALLOCATE CDR_cursor;
The thing about SSIS is that it probably won't be much faster than a cursor. It's pretty much doing the same thing: reading the table record by record, processing the record and then moving to the next one. There are some advanced techniques in SSIS like sharding the data input that will help if you have heavy duty hardware, but without that it's going to be pretty slow.
A better solution would be to write an INSERT and an UPDATE statement that will give you what you want. With that you'll be better able to take advantage of indices on the database. They would look something like:
WITH SummaryCDR AS (UserAId, UserBId, Conns) AS
(
SELECT UserAId, UserBId, COUNT(1) FROM CDR
GROUP BY UserAId, UserBId)
UPDATE user2user
SET NumberOfConnections = NumberOfConnections + SummaryCDR.Conns
FROM SummaryCDR
WHERE SummaryCDR.UserAId = user2user.UserAId
AND SummaryCDR.UserBId = user2user.UserBId
INSERT INTO user2user (UserAId, UserBId, NumberOfConnections)
SELECT CDR.UserAId, CDR.UserBId, Count(1)
FROM CDR
LEFT OUTER JOIN user2user
ON user2user.UserAId = CDR.UserAId
AND user2user.UserBId = CDR.UserBId
WHERE user2user.UserAId IS NULL
GROUP BY CDR.UserAId, CDR.UserBId
(NB: I don't have time to test this code, you'll have to debug it yourself)
is this what you need?
select
UserAId, UserBId, count(CDRid) as count_connections
from cdr
group by UserAId, UserBId
Could you break the conditional update/insert into two separate statements and get rid of the cursor?
Do the INSERT for all the NULL rows and the UPDATE for all the NOT NULL rows.
Why are you even considering doing row-by-row processing on a table that size? You know you can use the merge statment and insert or update and it will be faster. Or you could write an update to insert all rows that need updating in one set-based stament and an insert to insert alll rows when the row doesn't exist in one set-based statement.
Stop using the values clause and use an insert with joins instead. Same thing with updates. If you need extra complexity the case stamenet will probably give you all you need.
In general stop thinking of row-by-row processing. If you can write a select for the cursor, you can write a set-based statement to do the work 99.9% of the time.
You may still want a cursor with a table this large but one to process batches of data (for instance a 1000 records at time) not one to run ro-by-row.
Related
Hello guys.
I've an issue with a simple query.
Here we go, that's the code.
UPDATE user_resources AS ures
LEFT JOIN user_buildings as ub
ON ub.city_id = ures.city_id
INNER JOIN building_consumption AS bcons
ON bcons.resource_id = ures.resource_id
SET ures.quantity = ures.quantity - abs(FORMULA HERE that requires
building level and consumption at lvl 1 [default])
WHERE
(SELECT COUNT(id) FROM building_consumption AS bc2
WHERE bc2.building_id=ub.building_id) =
(SELECT COUNT(bc3.id) FROM building_consumption AS bc3
LEFT JOIN tmp_user_resources AS ures
ON ures.resource_id = bc3.resource_id
WHERE ures.city_id = ub.city_id
AND bc3.building_id=ub.building_id
AND bc3.quantity>0
AND IFNULL(ures.quantity, 0) - abs(FORMULA AGAIN);
I'll try to explain a bit.
As you can imagine, this is for a game.
Users (players) can has different buildings in different cities.
tab user_buildings
|id, city_id, buildings_id, level, usage|
A building can produce different resources
tab building_production
|id, building_id, resource_id, quantity_h|
but it can consume some resources too:
tab building_consumption
|id, building_id, resource_id, quantity_h|
Obviously a building cannot produce if there are not enough resources to consume for his job.
That's why I'm trying to compare WHERE SELECT COUNT how many resources it has to consume AND how many resources it can actually consume.
Mysql does NOT ALLOW to subquery same table inside an UPDATE stmt.
Using a cursor + loop is too much slow. I prefer to use different solution.
Temp table could be a solution but my problem now is how to update the temp table without triggers? (UPDATE + SELECT fires triggers and to avoid endless loops mysql block the query, and i can't pause/resume triggers because
IF ((#TRIGGER_CHECKS = FALSE)
OR (#TRIGGER_BEFORE_INSERT_CHECKS = FALSE))
AND (USER() = 'root#localhost')
THEN
LEAVE thisTrigger;
END IF;
is inside the trigger itself).
I am open to all your suggestions!
Thanks
P.S. The code must be inside a scheduled event.
Having a issue with my project need to insert an auto incremental value for my MySQL view, I would be nice if you guys help in solving this obstacle, Here is the code in which I wanna have auto incremental serial number (say S.No) as the first column.
CREATE
ALGORITHM = UNDEFINED
DEFINER = `srems_admin`#`localhost`
SQL SECURITY DEFINER
VIEW `emp_elec_consumption_view` AS
SELECT
`t1`.`PFNUMBER` AS `PFNUMBER`,
`emp`.`EMPNAME` AS `EMPNAME`,
`t1`.`MonthAndYear` AS `MonthAndYear`,
`qt`.`QTRSCODE` AS `QTRSCODE`,
`t1`.`UNITS_CONSUMED` AS `UNITS_CONSUMED`,
(`t2`.`FIXED_COMPONENT` + (`t1`.`UNITS_CONSUMED` * `t2`.`RATE_COMPONENT`)) AS `Amount`
FROM
(((`srems`.`mstqtroccu` `qt`
JOIN `srems`.`mstemp` `emp`)
JOIN `srems`.`msttariffrate` `t2`)
JOIN (SELECT
`srems`.`tranmeterreading`.`PFNUMBER` AS `PFNUMBER`,
(`srems`.`tranmeterreading`.`CLOSINGREADING` - `srems`.`tranmeterreading`.`OPENINGREADING`) AS `UNITS_CONSUMED`,
CONCAT(CONVERT( IF((LENGTH(MONTH(`srems`.`tranmeterreading`.`READINGDATE`)) > 1), MONTH(`srems`.`tranmeterreading`.`READINGDATE`), CONCAT('0', MONTH(`srems`.`tranmeterreading`.`READINGDATE`))) USING UTF8), '/', RIGHT(YEAR(`srems`.`tranmeterreading`.`READINGDATE`), 2)) AS `MonthAndYear`,
(SELECT
`t`.`TRANSACTIONID`
FROM
`srems`.`msttariffrate` `t`
WHERE
(`t`.`TORANGE` > (`srems`.`tranmeterreading`.`CLOSINGREADING` - `srems`.`tranmeterreading`.`OPENINGREADING`))
LIMIT 1) AS `tariffplanid`
FROM
`srems`.`tranmeterreading`) `t1`)
WHERE
((`t1`.`tariffplanid` = `t2`.`TRANSACTIONID`)
AND (`t1`.`PFNUMBER` = `qt`.`PFNUMBER`)
AND (`t1`.`PFNUMBER` = `emp`.`PFNUMBER`))
Pls insert the things at the correct place and post it as an comment to get S.No which should be auto-incremental starting from 1 and also it should be the first column, ty in advance
Your view has no chance of working in MySQL anyway so you might as well give up.
MySQL does not allow subqueries in the FROM clause. And your query is pretty complicated with lots of subqueries.
It also does not allow variables, so getting a row number is rather complicated.
Does calling the Laravel increment() on an Eloquent model lock the row?
For example:
$userPoints = UsersPoints::where('user_id','=',\Auth::id())->first();
if(isset($userPoints)) {
$userPoints->increment('points', 5);
}
If this is called from two different locations in a race condition, will the second call override the first increment and we still end up with only 5 points? Or will they add up and we end up with 10 points?
To answer this (helpful for future readers): the problem you are asking about depends on database configuration.
Most MySQL engines: MyISAM and InnoDB etc.. use locking when inserting, updating, or altering the table until this feature is explicitly turned off. (anyway this is the only correct and understandable implementation, for most cases)
So you can feel comfortable with what you got, because it will work correct at any number of concurrent calls:
-- this is something like what laravel query builder translates to
UPDATE users SET points += 5 WHERE user_id = 1
and calling this twice with starting value of zero will end up to 10
The answer is actually a tiny bit different for the specific case with ->increment() in Laravel:
If one would call $user->increment('credits', 1), the following query will be executed:
UPDATE `users`
SET `credits` = `credits` + 1
WHERE `id` = 2
This means that the query can be regarded as atomic, since the actual credits amount is retrieved in the query, and not retrieved using a separate SELECT.
So you can execute this query without running any DB::transaction() wrappers or lockForUpdate() calls because it will always increment it correctly.
To show what can go wrong, a BAD query would look like this:
# Assume this retrieves "5" as the amount of credits:
SELECT `credits` FROM `users` WHERE `id` = 2;
# Now, execute the UPDATE statement separately:
UPDATE `users`
SET `credits` = 5 + 1, `users`.`updated_at` = '2022-04-15 23:54:52'
WHERE `id` = 2;
Or in a Laravel equivalent (DONT DO THIS):
$user = User::find(2);
// $user->credits will be 5.
$user->update([
// Shown as "5 + 1" in the query above but it would be just "6" ofcourse.
'credits' => $user->credits + 1
]);
Now, THIS can go wrong easily since you are 'assigning' the credit value, which is dependent on the time that the SELECT statement took place. So 2 queries could update the credits to the same value while the intention was to increment it twice. However, you CAN correct this Laravel code the following way:
DB::transaction(function() {
$user = User::query()->lockForUpdate()->find(2);
$user->update([
'credits' => $user->credits + 1,
]);
});
Now, since the 2 queries are wrapped in a transaction and the user record with id 2 is READ-locked using lockForUpdate(), any second (or third or n-th) instance of this transaction that takes place in parallel should not be able to read using a SELECT query until the locking transaction is complete.
I have the following query, written inside perl script:
insert into #temp_table
select distinct bv.port,bv.sip,avg(bv.bv) bv, isnull(avg(bv.book_sum),0) book_sum,
avg(bv.book_tot) book_tot,
check_null = case when bv.book_sum = null then 0 else 1 end
from table_bv bv, table_group pge, table_master sm
where pge.a_p_g = '$val'
and pge.p_c = bv.port
and bv.r = '$r'
and bv.effective_date = '$date'
and sm.sip = bv.sip
query continued -- need help below (can some one help me make this efficient, or rewriting, I am thinking its wrong)
and ((sm.s_g = 'FE')OR(sm.s_g='CH')OR(sm.s_g='FX')
OR(sm.s_g='SH')OR(sm.s_g='FD')OR(sm.s_g='EY')
OR ((sm.s_t = 'TA' OR sm.s_t='ON')))
query continued below
group by bv.port,bv.sip
query ends
explanation: some $val that contain sip with
s_g ('FE','CH','FX','SH','FD','EY') and
s_t ('TA','ON') have book_sum as null. The temp_table does not take null values,
hence I am inserting them as zero ( isnull(avg(bv.book_sum),0) ) where ever it encounters a null for the following s_g and s_m ONLY.
I have tried making the query as follows but it made my script to stop wroking:
and sm.s_g in ('FE', 'CH','FX','SH','FD','EY')
or sm.s_t in ('TA','ON')`
I know this should be a comment, but I don't have the rep. To me, it looks like it's hanging because you lost your grouping at the end. I think it should be:
and (
sm.s_g in ('FE', 'CH','FX','SH','FD','EY')
or
sm.s_t in ('TA','ON')
)
Note the parentheses. Otherwise, you're asking for all of the earlier conditions, OR that sm.s_t is one of TA or ON, which is a much larger set than you're anticipating, which may cause it to spin.
What I have is a table with a bunch of products (books, in this case). My point-of-sale system generates me a report that has the ISBN (unique product number) and perpetual sales.
I basically need to do an update that matches the ISBN from one table with the ISBN from the other and then add the sales from the one table to the other.
This needs to be done for about 30,000 products.
Here is the SQL statement that I am using:
UPDATE `inventory`,`sales`
SET `inventory`.`numbersold` = `sales`.`numbersold`
WHERE `inventory`.`isbn` = `sales`.`isbn`;
I am getting MySQL Error:
#1317 SQLSTATE: 70100 (ER_QUERY_INTERRUPTED) Query execution was interrupted
I am using phpMyAdmin provided by GoDaddy.com
I've probably come to this a bit late, but... It certainly looks like the query is being interrupted by an execution time limit. There may be no easy way around this, but here's a couple of ideas:
Make sure that inventory.isbn and sales.isbn are indexed. If they aren't, adding an index will reduce your execution time dramatically.
if that doesn't work, break the query down into blocks and run it several times:
UPDATE `inventory`,`sales`
SET `inventory`.`numbersold` = `sales`.`numbersold`
WHERE `inventory`.`isbn` = `sales`.`isbn`
AND substring(`inventory`.sales`,1,1) = '1';
The AND clause restricts the search to ISBNs starting with the digit 1. Run the query for each digit from '0' to '9'. For ISBNs you might find selecting on the last character gives better results. Use substring(inventory.sales,-1)`
try to use INNER JOIN in the two tables like that
UPDATE `inventory`
INNER JOIN `sales`
ON `inventory`.`isbn` = `sales`.`isbn`
SET `inventory`.`numbersold` = `sales`.`numbersold`
UPDATE inventory,sales
SET inventory.numbersold = sales.numbersold
WHERE inventory.isbn = sales.isbn
AND inventory.id < 5000
UPDATE inventory,sales
SET inventory.numbersold = sales.numbersold
WHERE inventory.isbn = sales.isbn
AND inventory.id > 5000 inventory.id < 10000
...
If the error, you can try to reduce the number to 1000, for example