I'm using powershell v4 on W2K12 R2 (fully patched) to insert a large number(100+ million) of records into a MySQL database. I've run into a bit of a problem where memory usage continues growing and growing despite aggressively removing variables and garbage collecting. Note that the memory usage is growing on the box that I'm running the script on -not the DB server.
The insertion speed is good and the job runs fine. However, I have a memory leak and have been beating my head against a wall for a week trying to figure out why. I know from testing that the memory accumulates when calling the MySQL portion of the script and not anywhere else.
I've noticed that after every insertion that the memory grows from anywhere between 1MB and 15MB.
Here is the basic flow of the process (code at the bottom).
-records are being added to an array until there are 1,000 records in the array
-once there are a thousand records, they are inserted, as a batch, into the DB
-the array is then emptied using the .clear() method (I've verified that 0 records remain in array).
-I've tried aggressively garbage collecting after every insert (no luck there).
-also tried removing variables and then garbage collecting. Still no luck.
The code below is simplified for the sake of brevity. But, it shows how I'm iterating over the records and doing the insert:
$reader = [IO.File]::OpenText($filetoread)
$lineCount = 1
while ($reader.Peek() -ge 0) {
if($lineCount -ge 1000-or $reader.Peek() -lt 0) {
insert_into_db
$lineCount = 0
}
$lineCount++
}
$reader.Close()
$reader.Dispose()
One call to establish the connection:
[void][system.reflection.Assembly]::LoadFrom("C:\Program Files (x86)\MySQL\MySQL Connector Net 6.8.3\Assemblies\v4.5\MySql.Data.dll")
$connection = New-Object MySql.Data.MySqlClient.MySqlConnection($connectionString)
And here is the call to MySQL to do the actual inserts for each 1,000 records:
function insert_into_db {
$command = $connection.CreateCommand() # Create command object
$command.CommandText = $query # Load query into object
$script:RowsInserted = $command.ExecuteNonQuery() # Execute command
$command.Dispose() # Dispose of command object
$command = $null
$query = $null
}
If anyone has any ideas or suggestions I'm all ears!
Thanks,
Jeremy
My initial conclusion about the problem being related to the Powershell -join operator appear to be wrong.
Here is what I was doing. Note that I'm adding each line to an array, which I will un-roll later when I form my SQL. (On a side note, adding items to an array tends to more performant than concatenating strings)
$dataForInsertion = = New-Object System.Collections.Generic.List[String]
$reader = [IO.File]::OpenText($filetoread)
$lineCount = 1
while ($reader.Peek() -ge 0) {
$line = $reader.Readline()
$dataForInsertion.add($line)
if($lineCount -ge 1000-or $reader.Peek() -lt 0) {
insert_into_db -insertthis $dataForInsertion
$lineCount = 0
}
$lineCount++
}
$reader.Close()
$reader.Dispose()
Calling the insert function:
sql_query -query "SET autocommit=0;INSERT INTO ``$table`` ($columns) VALUES $($dataForInsertion -join ',');COMMIT;"
The improved insert function now looks like this:
function insert_into_db {
$command.CommandText = $query # Load query into object
$script:RowsInserted = $command.ExecuteNonQuery() # Execute command
$command.Dispose() # Dispose of command object
$query = $null
}
So, it turns out my initial conclusion about the source of the problem was wrong. the Powershell -join operator had nothing to do with the issue.
In my SQL insert function I was repeatedly calling $connection.CreateCommand() on every insert. Once I moved that into the function that handles setting up the connection (which is only called once -or when needed) the memory leak disappeared.
Related
I'm writing a script to backup existing bit locker keys to the associated device in Azure AD, I've created a function which goes through the bit locker enabled volumes and backs up the key to Azure however would like to know how I can check that the function has completed successfully without any errors. Here is my code. I've added a try and catch into the function to catch any errors in the function itself however how can I check that the Function has completed succesfully - currently I have an IF statement checking that the last command has run "$? - is this correct or how can I verify please?
function Invoke-BackupBDEKeys {
##Get all current Bit Locker volumes - this will ensure keys are backed up for devices which may have additional data drives
$BitLockerVolumes = Get-BitLockerVolume | select-object MountPoint
foreach ($BDEMountPoint in $BitLockerVolumes.mountpoint) {
try {
#Get key protectors for each of the BDE mount points on the device
$BDEKeyProtector = Get-BitLockerVolume -MountPoint $BDEMountPoint | select-object -ExpandProperty keyprotector
#Get the Recovery Password protector - this will be what is backed up to AAD and used to recover access to the drive if needed
$KeyId = $BDEKeyProtector | Where-Object {$_.KeyProtectorType -eq 'RecoveryPassword'}
#Backup the recovery password to the device in AAD
BackupToAAD-BitLockerKeyProtector -MountPoint $BDEMountPoint -KeyProtectorId $KeyId.KeyProtectorId
}
catch {
Write-Host "An error has occured" $Error[0]
}
}
}
#Run function
Invoke-BackupBDEKeys
if ($? -eq $true) {
$ErrorActionPreference = "Continue"
#No errors ocurred running the last command - reg key can be set as keys have been backed up succesfully
$RegKeyPath = 'custom path'
$Name = 'custom name'
New-ItemProperty -Path $RegKeyPath -Name $Name -Value 1 -Force
Exit
}
else {
Write-Host "The backup of BDE keys were not succesful"
#Exit
}
Unfortunately, as of PowerShell 7.2.1, the automatic $? variable has no meaningful value after calling a written-in-PowerShell function (as opposed to a binary cmdlet) . (More immediately, even inside the function, $? only reflects $false at the very start of the catch block, as Mathias notes).
If PowerShell functions had feature parity with binary cmdlets, then emitting at least one (non-script-terminating) error, such as with Write-Error, would set $? in the caller's scope to $false, but that is currently not the case.
You can work around this limitation by using $PSCmdlet.WriteError() from an advanced function or script, but that is quite cumbersome. The same applies to $PSCmdlet.ThrowTerminatingError(), which is the only way to create a statement-terminating error from PowerShell code. (By contrast, the throw statement generates a script-terminating error, i.e. terminates the entire script and its callers - unless a try / catch or trap statement catches the error somewhere up the call stack).
See this answer for more information and links to relevant GitHub issues.
As a workaround, I suggest:
Make your function an advanced one, so as to enable support for the common -ErrorVariable parameter - it allows you to collect all non-terminating errors emitted by the function in a self-chosen variable.
Note: The self-chosen variable name must be passed without the $; e.g., to collection in variable $errs, use -ErrorVariable errs; do NOT use Error / $Error, because $Error is the automatic variable that collects all errors that occur in the entire session.
You can combine this with the common -ErrorAction parameter to initially silence the errors (-ErrorAction SilentlyContinue), so you can emit them later on demand. Do NOT use -ErrorAction Stop, because it will render -ErrorVariable useless and instead abort your script as a whole.
You can let the errors simply occur - no need for a try / catch statement: since there is no throw statement in your code, your loop will continue to run even if errors occur in a given iteration.
Note: While it is possible to trap terminating errors inside the loop with try / catch and then relay them as non-terminating ones with $_ | Write-Error in the catch block, you'll end up with each such error twice in the variable passed to -ErrorVariable. (If you didn't relay, the errors would still be collected, but not print.)
After invocation, check if any errors were collected, to determine whether at least one key wasn't backed up successfully.
As an aside: Of course, you could alternatively make your function output (return) a Boolean ($true or $false) to indicate whether errors occurred, but that wouldn't be an option for functions designed to output data.
Here's the outline of this approach:
function Invoke-BackupBDEKeys {
# Make the function an *advanced* function, to enable
# support for -ErrorVariable (and -ErrorAction)
[CmdletBinding()]
param()
# ...
foreach ($BDEMountPoint in $BitLockerVolumes.mountpoint) {
# ... Statements that may cause errors.
# If you need to short-circuit a loop iteration immediately
# after an error occurred, check each statement's return value; e.g.:
# if (-not $BDEKeyProtector) { continue }
}
}
# Call the function and collect any
# non-terminating errors in variable $errs.
# IMPORTANT: Pass the variable name *without the $*.
Invoke-BackupBDEKeys -ErrorAction SilentlyContinue -ErrorVariable errs
# If $errs is an empty collection, no errors occurred.
if (-not $errs) {
"No errors occurred"
# ...
}
else {
"At least one error occurred during the backup of BDE keys:`n$errs"
# ...
}
Here's a minimal example, which uses a script block in lieu of a function:
& {
[CmdletBinding()] param() Get-Item NoSuchFile
} -ErrorVariable errs -ErrorAction SilentlyContinue
"Errors collected:`n$errs"
Output:
Errors collected:
Cannot find path 'C:\Users\jdoe\NoSuchFile' because it does not exist.
As stated elsewhere, the try/catch you're using is what is preventing the relay of the error condition. That is by design and the very intentional reason for using try/catch.
What I would do in your case is either create a variable or a file to capture the error info. My apologies to anyone named 'Bob'. It's the variable name that I always use for quick stuff.
Here is a basic sample that works:
$bob = (1,2,"blue",4,"notit",7)
$bobout = #{} #create a hashtable for errors
foreach ($tempbob in $bob) {
$tempbob
try {
$tempbob - 2 #this will fail for a string
} catch {
$bobout.Add($tempbob,"not a number") #store a key/value pair (current,msg)
}
}
$bobout #output the errors
Here we created an array just to use a foreach. Think of it like your $BDEMountPoint variable.
Go through each one, do what you want. In the }catch{}, you just want to say "not a number" when it fails. Here's the output of that:
-1
0
2
5
Name Value
---- -----
notit not a number
blue not a number
All the numbers worked (you can obvious surpress output, this is just for demo).
More importantly, we stored custom text on failure.
Now, you might want a more informative error. You can grab the actual error that happened like this:
$bob = (1,2,"blue",4,"notit",7)
$bobout = #{} #create a hashtable for errors
foreach ($tempbob in $bob) {
$tempbob
try {
$tempbob - 2 #this will fail for a string
} catch {
$bobout.Add($tempbob,$PSItem) #store a key/value pair (current,error)
}
}
$bobout
Here we used the current variable under inspection $PSItem, also commonly referenced as $_.
-1
0
2
5
Name Value
---- -----
notit Cannot convert value "notit" to type "System.Int32". Error: "Input string was not in ...
blue Cannot convert value "blue" to type "System.Int32". Error: "Input string was not in a...
You can also parse the actual error and take action based on it or store custom messages. But that's outside the scope of this answer. :)
Im trying to make a script which takes data from serval places in our network and centralize them in one database. At the moment I'm trying to take data from AD and but it in my database but i get some weird outcome.
function Set-ODBC-Data{
param(
[string]$query=$(throw 'query is required.')
)
$cmd = new-object System.Data.Odbc.OdbcCommand($query,$DBConnection)
$cmd.ExecuteNonQuery()
}
$DBConnection = $null
$DBConnected = $FALSE
try{
$DBConnection = New-Object System.Data.Odbc.OdbcConnection
$DBConnection.ConnectionString = "Driver={MySQL ODBC 8.0 Unicode Driver};Server=127.0.0.1;Database=pcinventory;User=uSR;Password=PpPwWwDdD;Port=3306"
$DBConnection.Open()
$DBConnected = $TRUE
Write-Host "Connected to the MySQL database."
}
catch{
Write-Host "Unable to connect to the database..."
}
$ADEMEA = "ADSERVER.SERVER.WORK"
$addata = Get-ADComputer -filter * -property Name,CanonicalName,LastLogonDate,IPv4Address,OperatingSystem,OperatingSystemVersion -Server $ADEMEA | Select-Object Name,CanonicalName,LastLogonDate,IPv4Address,OperatingSystem,OperatingSystemVersion
ForEach($aditem in $addata){
Set-ODBC-Data -query "INSERT INTO ad VALUES( '$aditem.Name', '','','','','' )"
}
The result in my database looks someting like this
This happens, as $aditem is a custom Powershell object, and the SQL insert query doesn't quite know how to handle it. The outcome is a hashtable (aka key-value store) containing objects' attributes and attribute values.
As for fix, the good one is to use parametrized queries.
As for quick and dirty work-around that makes SQL injection easy, build insert string in a few parts. Using string formatting {} and -f makes it quite simple. Like so,
$q = "INSERT INTO ad VALUES( '{0}', '{1}', '{2}' )" -f $aditem.name, "more", "stuff"
write-host "Query: $q" # For debugging purposes
Set-ODBC-Data -query $q
The problem in quick and dirty is, as mentioned SQL injection. Consider what happens if the input is
$aditem.name, "more", "'); drop database pcinventory; --"
If the syntax is about right and permissions are adequate, it will execute the insertion. Right after that, it will drop your pcinventory database. So don't be tempted to use the fast approach, unless you are sure about what you are doing.
I'm a bit new to PowerShell, and I've got a new requirement to get Data out of a MySQL database and into an Oracle one. The strategy I chose was to output to a CSV and then import the CSV into Oracle.
I wanted to get a progress bar for the export from MySQL into CSV, so I used the data reader to achieve this. It works, and begins to export, but somewhere during the export (around record 5,000 of 4.5mil -- not consistent) it will throw an error:
Exception calling "Read" with "0" argument(s): "Fatal error encountered during data read." Exception calling "Close" with "0" argument(s): "Timeout in IO operation" Method invocation failed because [System.Management.Automation.PSObject] does not contain a method named 'op_Addition'. Exception calling "ExecuteReader" with "0" argument(s): "The CommandText property has not been properly initialized."
Applicable code block is below. I'm not sure what I'm doing wrong here, and would appreciate any feedback possible. I've been pulling my hair out on this for days.
Notes: $tableObj is a custom object with a few string fields to hold table name and SQL values. Not showing those SQL queries here, but they work.
Write-Host "[INFO]: Gathering data from MySQL select statement..."
$conn = New-Object MySql.Data.MySqlClient.MySqlConnection
$conn.ConnectionString = $MySQLConnectionString
$conn.Open()
#
# Get Count of records in table
#
$countCmd = New-Object MySql.Data.MySqlClient.MySqlCommand($tableObj.SqlCount, $conn)
$recordCount = 0
try{
$recordCount = $countCmd.ExecuteScalar()
} Catch {
Write-Host "[ERROR]: (" $tableObj.Table ") Error getting Count."
Write-Host "---" $_.Exception.Message
Exit
}
$recordCountString = $recordCount.ToString('N0')
Write-Host "[INFO]: Count for table '" $tableObj.Table "' is " $recordCountString
#
# Compose the command
#
$cmd = New-Object MySql.Data.MySqlClient.MySqlCommand($tableObj.SqlExportInit, $conn)
#
# Write to CSV using DataReader
#
Write-Host "[INFO]: Data gathered into memory. Writing data to CSV file '" $tableObj.OutFile "'"
$counter = 0 # Tracks items selected
$reader=$cmd.ExecuteReader()
$dataRows = #()
# Read all rows into a hash table
while ($reader.Read())
{
$counter++
$percent = ($counter/$recordCount)*100
$percentString = [math]::Round($percent,3)
$counterString = $counter.ToString('N0')
Write-Progress -Activity '[INFO]: CSV Export In Progress' -Status "$percentString% Complete" -CurrentOperation "($($counterString) of $($recordCountString))" -PercentComplete $percent
$row = #{}
for ($i = 0; $i -lt $reader.FieldCount; $i++)
{
$row[$reader.GetName($i)] = $reader.GetValue($i)
}
# Convert hashtable into an array of PSObjects
$dataRows += New-Object psobject -Property $row
}
$conn.Close()
$dataRows | Export-Csv $tableObj.OutFile -NoTypeInformation
EDIT: Didn't work, but I also added this line to my connection string: defaultcommandtimeout=600;connectiontimeout=25 per MySQL timeout in powershell
Using #Carl Ardiente's thinking, the query is timing out, and you have to set the timeout to something insane to fully execute. You simply have to set the timeout value for your session before you start getting data.
Write-Host "[INFO]: Gathering data from MySQL select statement..."
$conn = New-Object MySql.Data.MySqlClient.MySqlConnection
$conn.ConnectionString = $MySQLConnectionString
$conn.Open()
# Set timeout on MySql
$cmd = New-Object MySql.Data.MySqlClient.MySqlCommand("set net_write_timeout=99999; set net_read_timeout=99999", $conn)
$cmd.ExecuteNonQuery()
#
# Get Count of records in table
#
...Etc....
Not that I've found the solution, but none of the connection string changes worked. Manually setting the timeout didn't seem to help either. It seemed to be caused from too many rows returned, so I broke up the function to run in batches, and append to a CSV as it goes. This gets rid of the IO / timeout error.
i've coded an ActiveDirectory logging system a couple of years ago...
it never become a status greater than beta but its still in use...
i got an issue reported and found out what happening...
they are serveral filds in such an ActiveDirectory Event witch are UserInputs, so i've to validate them! -- of course i didnt...
so after the first user got the brilliant idea to use singlequotes in a specific foldername it crashed my scripts - easy injection possible...
so id like to make an update using prepared statements like im using in PHP and others.
Now this is a Powershell Script.. id like to do something like this:
$MySQL-OBJ.CommandText = "INSERT INTO `table-name` (i1,i2,i3) VALUES (#k1,#k2,#k3)"
$MySQL-OBJ.Parameters.AddWithValue("#k1","value 1")
$MySQL-OBJ.Parameters.AddWithValue("#k2","value 2")
$MySQL-OBJ.Parameters.AddWithValue("#k3","value 3")
$MySQL-OBJ.ExecuteNonQuery()
This would work fine - 1 times.
My Script runs endless as a Service and loops all within a while($true) loop.
Powershell clams about the param is already set...
Exception calling "AddWithValue" with "2" argument(s): "Parameter
'#k1' has already been defined."
how i can reset this "bind" without closing the database connection?
id like the leave the connection open because the script is faster without closing and opening the connections when a event is fired (10+ / sec)
Example Code
(shortend and not tested)
##start
function db_prepare(){
$MySqlConnection = New-Object MySql.Data.MySqlClient.MySqlConnection
$MySqlConnection.ConnectionString = "server=$MySQLServerName;user id=$Username;password=$Password;database=$MySQLDatenbankName;pooling=false"
$MySqlConnection.Open()
$MySqlCommand = New-Object MySql.Data.MySqlClient.MySqlCommand
$MySqlCommand.Connection = $MySqlConnection
$MySqlCommand.CommandText = "INSERT INTO `whatever` (col1,col2...) VALUES (#va1,#va2...)"
}
while($true){
if($MySqlConnection.State -eq 'closed'){ db_prepare() }
## do the event reading and data formating stuff
## bild some variables to set as sql param values
$MySQLCommand.Parameters.AddWithValue("#va1",$variable_for_1)
$MySQLCommand.Parameters.AddWithValue("#va2",$variable_for_2)
.
.
.
Try{ $MySqlCommand.ExecuteNonQuery() | Out-Null }
Catch{ <# error handling #> }
}
Change your logic so that the db_prepare() method initializes a MySql connection and a MySql command with parameters. Set the parameter values for pre-declared parameter names in loop. Like so,
function db_prepare(){
# ...
# Add named parameters
$MySQLCommand.Parameters.Add("#val1", <datatype>)
$MySQLCommand.Parameters.Add("#val2", <datatype>)
}
while($true) {
# ...
# Set values for the named parameters
$MySQLCommand.Parameters.SetParameter("#val1", <value>)
$MySQLCommand.Parameters.SetParameter("#val2", <value>)
$MySqlCommand.ExecuteNonQuery()
# ...
}
I am populating a table in MySQL from a xml file (containing more than a billion lines) using Perl script for finding the lines of interest. The script runs very smoothly till line 15M but after that it starts increasing some what exponentially.
Like for 1st 1000000 lines it took ~12s to parse and write it to database but after 15M lines the time required to parse and write the same number of lines ~43s.
I increased the innodb_buffer_pool_size from 128M to 1024M, as suggested at
Insertion speed slowdown as the table grows in mysql answered by Eric Holmberg
The time requirements came down to ~7s and ~32s respectively but it is still slow as I have a huge file to process and its time requirements keep on increasing.
Also I removed the creation of any Primary key and Index, thought that it might be causing some problem (not sure though)
Below is the code snippet:
$dbh = DBI->connect('dbi:mysql:dbname','user','password') or die "Connection Error: $DBI::errstr\n";
$stmt = "DROP TABLE IF EXISTS dbname";
$sth = $dbh->do($stmt);
$sql = "create table db(id INTEGER not null, type_entry VARCHAR(30) not null, entry VARCHAR(50))";
$sth = $dbh->prepare($sql);
$sth->execute or die "SQL Error: $DBI::errstr\n";
open my $fh1, '<', "file.xml" or die $!;
while (<$fh1>)
{
if ($_=~ m/some pattern/g)
{
$_=~ s/some pattern//gi;
$id = $_;
}
elsif ($_=~ m/some other pattern/)
{
$_=~ s/\s|(\some other pattern//gi;
$type = $_;
}
elsif ($_=~ m/still some other pattern/)
{
$_=~ s/still some other pattern//gi;
$entry = $_;
}
if($id ne "" && $type ne "" && $entry ne "")
{
$dbh->do('INSERT INTO dbname (id, type_entry, species) VALUES (?, ?, ?)', undef, $id, $type, $entry);
}
}
The database would contain around 1.7 million entries. What more can be done to reduce the time?
Thanks in Advance
EDIT 1:
Thank you all for help
Since morning I have been trying to implement all that has been told and was checking if I get any significant reduction in time.
So what I did:
I removed matching the pattern twice as told by #ikegami, but yes I do need substitution.
I made use of hash (as told by #ikegami)
I used LOAD DATA LOCAL INFILE (as told by #ikegami, #ysth and #ThisSuitIsBlackNot ). But I have embedded it into my code to take up the file and then process it to database. The file here is dynamically written by the script and when it reaches 1000 entries it is written to the db.
The timings of the run for consecutive 1000000 lines are
13 s
11 s
24 s
22 s
35 s
34 s
47 s
45 s
58 s
57 s .....
(Wanted to post the image but... reputation)
Edit 2:
I checked back the timings and tracked the time required by the script to write it to the database; and to my surprise it is linear. Now what I am concluding from here is that there is some issue with the while loop which I believe increases the time exponentially as it has to go to the line number for every iteration and as it reaches deep into the file it has to count more number of lines to reach the next line.
Any comments on that
EDIT 3
$start_time = time();
$line=0;
open my $fh1, '<', "file.xml" or die $!;
while (<$fh1>)
{
$line++;
%values;
if ($_=~ s/foo//gi)
{
$values{'id'} = $_;
}
elsif ($_=~ s/foo//gi)
{
$values{'type'} = $_;
}
elsif ($_=~ s/foo//gi)
{
$values{'pattern'} = $_;
}
if (keys(%values) == 3)
{
$no_lines++;
open FILE, ">>temp.txt" or die $!;
print FILE "$values{'id'}\t$values{'type'}\t$values{'pattern'}\n";
close FILE;
if ($no_lines == 1000)
{
#write it to database using `LOAD DATA LOCAL INFILE` and unlink the temp.txt file
}
undef %values;
}
if($line == ($line1+1000000))
{
$line1=$line;
$read_time = time();
$processing_time = $read_time - $start_time - $processing_time;
print "xml file parsed till line $line, time taken $processing_time sec\n";
}
}
ANSWER:
First, I would like to apologize to take so long to reply; as I started again from root to top for Perl and this time came clear with use strict, which helped me in maintaining the linear time. And also the use of XML Parsers is a good thing to do while handling large Xml files..
And to add more, there is nothing with the speed of MySQL inserts it is always linear
Thanks all for help and suggestions
I'm guessing the bottleneck is the actual insertion. It will surely be a bit faster to generate the INSERT statements, place them in a file, then execute the file using the mysql command line tool.
You can experiment with creating INSERT statements that insert a large number of rows vs individual statements.
Or may it's best to avoid INSERT statements entirely. I think the mysql command line tool has a facility to populate a database from a CSV file. That might possibly yield a little bit more speed.
Better yet, you can use LOAD DATA INFILE if you have access to the file system of the machine hosting the database.
Your Perl code could also use some cleaning up.
You search for every pattern twice? Change
if (/foo/) { s/foo//gi; $id = $_ }
to
if (s/foo//gi) { $id = $_ }
Actually, do you need a substitution at all? This might be faster
if (/foo (.*)/) { $id = $1 }
Looks like you might be able to do something more along the lines of
my ($k, $v) = split(/:\s*/);
$row{$k} = $v;
instead of that giant if.
Also, if you use a hash, then you can use the following for the last check:
if (keys(%row) == 3)