I've run into a performance issue with Powershell and generating paramterized queries. I'm pretty sure that I'm going about it wrong and hopefully someone can help.
Here is the problem: inserting 1000 rows (with about 15 columns each) takes around 3 minutes when using parameterized queries. Using non-parameterized batch inserts takes less than a second. The problem seems to lie in how I'm generating the parameters as that is the part of the code that eats most of the time. The looping is killing the performance. The problem wouldn't be a big deal if all I had is 1000 rows to deal with -however, there are millions.
Here is what I have so far. I've omitted quite a bit for the sake of brevity. But, don't get me wrong -it certainly works. Its just painfully slow.
$values = #() #some big array of values
$sqlCommand.CommandText = "INSERT INTO ``my_db``.``$table`` (dataseries_code,dataseries_text) VALUES (?,?),(?,?)" #abbreviated for for the sake of sanity
for($i = 0; $i -lt $values.Count; $i++) {
$sqlCommand.Parameters.AddWithValue($i, $values[$i]) #this is the slow part
}
$insertedRows = $sqlCommand.ExecuteNonQuery()
Write-Host "$insertedRows"
$sqlCommand.Parameters.Clear()
There must be a better way to generate the .Parameters.AddWithValue statement? I took a look at the .AddRange method but couldn't figure how to make it work and if it was even intended for what I'm try to do.
EDIT:
I should have mentioned that I also tried creating the parameters first, and then adding the values. It took 3 minutes to both create the parameters, and 3 minutes to add the values! (see example below). 6 minutes total for 30,000 rows! Something is not right.
I can't help but think that there has to be a faster method to do parameterized inserts?!
#THIS PIECE WILL ONLY RUN ONCE
#this is where we create the parameters first
for($i = 0; $i -lt #($values).Count; $i++) {
$command.Parameters.Add((New-Object MySql.Data.MySqlClient.MySqlParameter($i[MySql.Data.MySqlClient.MySqlParameter]::VarChar))) | out-null
}
#THIS PIECE WILL RUN MULTIPLE TIME
#and this is where we add the values
for($i = 0; $i -lt #($values).Count; $i++) {
$command.Parameters[$i].Value = $values[$i] | Out-Null
}
Thanks!
What about building the command object and adding the parameters, then looping through the values.
Instead of clearing the parameters each time, you could just change the value. That would avoid a lot of object creation/disposal.
Related
I've seen different post regarding paralleling or run functions simultaneously in parallel but the code in the answers have not quite worked for me. I'm doing patching automation and the functions I have all work and do their thing separately but since we work with more than 200+ computers, waiting for each function to finish with its batch of computers kind of defeats the purpose. I have the code in one script and in summary its structured like this:
define global variables
$global:varN
define sub-functions
function sub-function1()
function sub-functionN()
define main functions
function InstallGoogleFunction($global:varN)
{
$var1
$result1 = sub-function1
$resultN = sub-functionN
}
function InstallVLCFunction($global:varN)
{
"similar code as above"
}
function InstallAppFunction($global:varN)
{
"similar code as above"
}
The functions will all install a different app/software and will write output to a file. The only thing is I cannot seem to run all the functions for installation without waiting for the first one to finish. I I tried start-job code but it executed and displayed a table like output but when verifying the computers neither had anything running on Task Manager. Is there a way powershell can run this installation functions at the same time? If I have to resort to a one-by-one I will call the functions by the least amount of time taken or the least computers the functions read they need to install I will but I just wanted someone to better explain if this can be done.
You can use multithreading to achieve this. Below example shows how you can trigger tasks on multiple using multi threading in PowerShell.
Replace list of machines in $Computers with your machines and give it a try. Below example get the disk details on given machines
# This is your script you want to execute
$wmidiskblock =
{
Param($ComputerName = "LocalHost")
Get-WmiObject -ComputerName $ComputerName -Class win32_logicaldisk | Where-Object {$_.drivetype -eq 3}
}
# List of servers
$Computers = #("Machine1", "Machine2", "Machine3")
#Start all jobs in parallel
ForEach($Computer in $Computers)
{
Write-Host $Computer
Start-Job -scriptblock $wmidiskblock -ArgumentList $Computer
}
Get-Job | Wait-Job
$out = Get-Job | Receive-Job
$out |export-csv 'c:\temp\wmi.csv'
Okay, so I am just wondering why, in the block of code below, the commented out code only outputs 1 file - the last file that would be created? I don't have much experience with sed or writing shell scripts, but they seem to be identical to me except the second for loop doesn't use variables to specify the replacements used in the sed command. I'm guessing it has something to do with how I edit the strings stored in the variables with the while loop iterators.
i=32;
j=2;
SEPORIG="ghsep1";
#SEPHN="ghsep"$j"";
SEPIP="10.84.194.31";
#NEWIP="10.84.194."$i"";
SEPORGN=ghsep1.json;
#SEPNEW=ghsep"$j".json;
#while [ $i -lt 90 ];
#do
# sed "s/$SEPORIG/$SEPHN/; s/$SEPIP/$NEWIP/" $SEPORGN > $SEPNEW;
# i=$(( $i + 1 ));
# j=$(( $j + 1 ));
#done;
while [ $i -lt 90 ];
do
sed "s/$SEPORIG/"ghsep"$j""/; s/$SEPIP/"10.84.194."$i""/"$SEPORGN > ghsep"$j".json;
i=$(( $i + 1 ));
j=$(( $j + 1 ));
done;
Basically this code just edits the hostname and IP address of a JSON file used to create specifications for a server. The iterators and conditionals are hardcoded because we know how many servers we will deploy using these JSON configuration files. I think the code is probably very ugly due to my limited knowledge with either JSON or shell scripts.
Can anyone provide me any insight into how the 2 blocks of code for the loops differ? Maybe I'm just missing something and new a fresh set of eyes. Also, if anyone has any suggestions on how I can improve the script based on what I've told you, that would be great! I feel like there is probably some things I can do to clean up or shorten the code, or maybe not since it requires a decent amount of hardcoding for the size of the script? This is especially true due to the fact that not every hostname will be "ghsep##", there are at least 4 other types of hostnames - for example "gmtip##".
Taking from your commented code :
sed "s/$SEPORIG/$SEPHN/; s/$SEPIP/$NEWIP/" $SEPORGN > $SEPNEW
will overwrite file $SEPNEW (which has fixed value ghsep2.json) for each iteration, this is why it only gives you one file and only the last iteration is prevailing.
In you new code, you are iterating file with $j giving you a new file for each iteration : ghsep"$j".json
I wrote a code like this (I use MySQL, PDO, InnoDB, Laravel4, localhost & MAC) :
$all_queue = Queue1::all()->toArray(); //count about 10000
ob_end_clean();
foreach($all_queue as $key=>$value) {
$priceCreate=array(...);
Price::create($priceCreate);
Queue1::where('id',$value['id'])->delete();
}
This worked for me (65mg ram usage), but when it was working, other parts of my program(such as other tables) didn't work. I can't open my database on mysql even. My program and my sql wait and when process is completed ,they work.
I don't know what am i supposed to do.
I think this is not for laravel and this is for my php or mysql configuration.
this is my php.ini and mysql config
I assume
$all_foreach($all_queue as $key=>$value) {
Is
foreach($all_queue as $key=>$value) {
And that you have no errors (you have set debug true in your app config).
Try to set no time limit for your script.
In your php.ini
max_execution_time = 3600 ;this is one hour, set to 0 to no limit
Or in code
set_time_limit(0)
And if it's a memory problem try to free memory and unset unused vars. I'ts a good practice in long scripts to free space.
...
}//end foreach loop
unset($all_queue); //no longer needed, so unset it to free memory
When working on large SSIS projects containing several packages, the packages can start to get a bit messy with variables that were created but never used or have been made redundant by other changes. I need something that will scan several SSIS packages and list all unused variables.
I have managed to answer my own question by employing some Powershell. The script below uses xpath to get the variable names and then uses regex to find the number of occurrences, if it occurs once it must be because it was defined but never used.
The only caveat is that if you use names for variables that are words that would naturally be present in a dtsx file, then the script will not pick them up. I probably need to expand my script to only do a regex search on spesific nodes in the package.
$results = #()
Get-ChildItem -Filter *.dtsx |
% {
$xml = [xml](Get-Content $_)
$Package = $_.Name
$ns = [System.Xml.XmlNamespaceManager]($xml.NameTable)
$ns.AddNamespace("DTS", "www.microsoft.com/SqlServer/Dts")
$var_list = #($xml.SelectNodes("//DTS:Variable/DTS:Property[#DTS:Name = 'ObjectName']", $ns) | % {$_.'#text'})
$var_list | ? {#([Regex]::Matches($xml.InnerXml, "\b$($_)\b")).Count -eq 1} |
% { $results += New-Object PSObject -Property #{
Package = $Package
Name = $_}
}
}
$results
This is a great question, as I have the same concern with a few rather large SSIS packages. Unfortunately, external to the SSIS packages, there isn't any feature available that will provide this function. See discussions in attached link:
CodePlex
But by opening an SSIS package, you can determine the variable usage by applying the steps outlined in the following link:
find-ssis-variable-dependencies-with-bi-xpress
Hope this helps.
I'd like to write a script that traverses a file tree, calculates a hash for each file, and inserts the hash into an SQL table together with the file path, such that I can then query and search for files that are identical.
What would be the recommended hash function or command like tool to create hashes that are extremely unlikely to be identical for different files?
Thanks
B
I've been working on this problem for much too long. I'm on my third (and hopefully final) rewrite.
Generally speaking, I recommend SHA1 because it has no known collisions (whereas MD5 collisions can be found in minutes), and SHA1 doesn't tend to be a bottleneck when working with hard disks. If you're obsessed with getting your program to run fast in the presence of a solid-state drive, either go with MD5, or waste days and days of your time figuring out how to parallelize the operation. In any case, do not parallelize hashing until your program does everything you need it to do.
Also, I recommend using sqlite3. When I made my program store file hashes in a PostgreSQL database, the database insertions were a real bottleneck. Granted, I could have tried using COPY (I forget if I did or not), and I'm guessing that would have been reasonably fast.
If you use sqlite3 and perform the insertions in a BEGIN/COMMIT block, you're probably looking at about 10000 insertions per second in the presence of indexes. However, what you can do with the resulting database makes it all worthwhile. I did this with about 750000 files (85 GB). The whole insert and SHA1 hash operation took less than an hour, and it created a 140MB sqlite3 file. However, my query to find duplicate files and sort them by ID takes less than 20 seconds to run.
In summary, using a database is good, but note the insertion overhead. SHA1 is safer than MD5, but takes about 2.5x as much CPU power. However, I/O tends to be the bottleneck (CPU is a close second), so using MD5 instead of SHA1 really won't save you much time.
you can use md5 hash or sha1
function process_dir($path) {
if ($handle = opendir($path)) {
while (false !== ($file = readdir($handle))) {
if ($file != "." && $file != "..") {
if (is_dir($path . "/" . $file)) {
process_dir($path . "/" . $file);
} else {
//you can change md5 to sh1
// you can put that hash into database
$hash = md5(file_get_contents($path . "/" . $file));
}
}
}
closedir($handle);
}
}
if you working in Windows change slashes to backslashes.
Here's a solution I figured out. I didn't do all of it in PHP though it'd be easy enough to do if you wanted:
$fh = popen('find /home/admin -type f | xargs sha1sum', 'r');
$files = array();
while ($line = fgets($fh)) {
list($hash,$file) = explode(' ', trim($line));
$files[$hash][] = $file;
}
$dupes = array_filter($files, function($a) { return count($a) > 1; });
I realise I've not used databases here. How many files are you going to be indexing? Do you need to put that data into a database and then search for the dupes there?