Where-Object with complex evaluation - csv

I have a PowerShell script where I read in a CSV file, and if the date in a certain column is greater than a parameter date, I output that row to a new file.
As of now, I read the CSV file and then pipe to a ForEach-Object where if the row "passes" I store it in an Arraylist. Then when all the rows are processed, I output the Arraylist to an output CSV file. My starting CSV file is 225MB with over a quarter million rows, meaning that this process is slow.
Is there a way I can add a filter function to my piping so that only the passing rows are passed to the output CSV in one fell swoop? The current Where-Object just uses things like -like, -contains... and not more complex forms of evaluation.
For best practices, I've got my code below:
Import-Csv -Delimiter "`t" -Header $headerCounter -Path $filePath |
Select-Object -Skip(1) |
ForEach-Object {
#Skip the header
if( $lineCounter -eq 1)
{
return
}
$newDate = if ([string]::IsNullOrEmpty($_.1) -eq $true)
{ [DateTime]::MinValue }
else { [datetime]::ParseExact($_.1,”yyyyMMdd”,$null) }
$updateDate = if ([string]::IsNullOrEmpty($_.2) -eq $true)
{ [DateTime]::MinValue }
else { [datetime]::ParseExact($_.2,”yyyyMMdd”,$null) }
$distanceDate = (Get-Date).AddDays($daysBack * -1)
if( $newDate -gt $distanceDate -or $updateDate -gt $distanceDate )
{
[void]$filteredArrayList.Add($_)
}
}
...
$filteredArrayList |
ConvertTo-Csv -Delimiter "`t" -NoTypeInformation |
select -Skip 1 |
% { $_ -replace '"', ""} |
out-file $ouputFile -fo -en unicode -Append

I've added ConvertToDate as a function to stop that confusing the Where block.
DistanceDate is out because it appears to be calculated only once.
ExportCsv is a little function that writes pipeline input to a file.
I haven't tested it, so bugs are quite likely unless I got lucky.
function ConvertToDate {
param(
[String]$DateString
)
if ($DateString -eq '') {
return [DateTime]::MinValue
} else {
return [DateTime]::ParseExact($DateString, ”yyyyMMdd”, $null)
}
}
filter ExportCsv {
param(
[Parameter(Position = 1)]
[String]$Path
)
$csv = $_ | ConvertTo-Csv -Delimiter "`t" | Select-Object -Last 1
$csv -replace '"' | Out-File $Path -Append -Encoding Unicode -Force
}
$distanceDate = (Get-Date).AddDays($daysBack * -1)
Import-Csv -Delimiter "`t" -Header $headerCounter -Path $filePath |
Select-Object -Skip 1 |
Where-Object { (ConvertToDate $_.1) -gt $distanceDate -or (ConvertToDate $_.2) -gt $distanceDate } |
ExportCsv $OutputFile

Sure, just add a function that takes a value from the pipeline and pipe the result of Import-Csv to it. Within the function you check whether you want to filter the current item or not. Here a simple example which uses a string list and filter all strings that starts with h:
$x = #('hello', 'world', 'hello', 'tree')
filter Filter-CsvByMyRequirements
{
Param(
[Parameter(Mandatory=$true,
ValueFromPipeline=$true)]
$InputObject
)
Process
{
if ($_ -match '^h.*')
{
$_
}
}
}
$x | Filter-CsvByMyRequirements | Write-Host
Output:
hello
hello

Related

What is the good way to read data from CSV and converting them to JSON?

I am trying to read the data from CSV file which has 2200000 records using PowerShell and storing each record in JSON file, but this takes almost 12 hours.
Sample CSV Data:
We will only concern about the 1st column value's.
Code:
function Read-IPData
{
$dbFilePath = Get-ChildItem -Path $rootDir -Filter "IP2*.CSV" | ForEach-Object{ $_.FullName }
Write-Host "file path - $dbFilePath"
Write-Host "Reading..."
$data = Get-Content -Path $dbFilePath | Select-Object -Skip 1
Write-Host "Reading data finished"
$count = $data.Count
Write-host "Total $count records found"
return $data
}
function Convert-NumbetToIP
{
param(
[Parameter(Mandatory=$true)][string]$number
)
try
{
$w = [int64]($number/16777216)%256
$x = [int64]($number/65536)%256
$y = [int64]($number/256)%256
$z = [int64]$number%256
$ipAddress = "$w.$x.$y.$z"
Write-Host "IP Address - $ipAddress"
return $ipAddress
}
catch
{
Write-Host "$_"
continue
}
}
Write-Host "Getting IP Addresses from $dbFileName"
$data = Read-IPData
Write-Host "Checking whether output.json file exist, if not create"
$outputFile = Join-Path -Path $rootDir -ChildPath "output.json"
if(!(Test-Path $outputFile))
{
Write-Host "$outputFile doestnot exist, creating..."
New-Item -Path $outputFile -type "file"
}
foreach($item in $data)
{
$row = $item -split ","
$ipNumber = $row[0].trim('"')
Write-Host "Converting $ipNumber to ipaddress"
$toIpAddress = Convert-NumbetToIP -number $ipNumber
Write-Host "Preparing document JSON"
$object = [PSCustomObject]#{
"ip-address" = $toIpAddress
"is-vpn" = "true"
"#timestamp" = (Get-Date).ToString("o")
}
$document = $object | ConvertTo-Json -Compress -Depth 100
Write-Host "Adding document - $document"
Add-Content -Path $outputFile $document
}
Could you please help optimize the code or is there a better way to do it. or is there a way like multi-threading.
Here is a possible optimization:
function Get-IPDataPath
{
$dbFilePath = Get-ChildItem -Path $rootDir -Filter "IP2*.CSV" | ForEach-Object FullName | Select-Object -First 1
Write-Host "file path - $dbFilePath"
$dbFilePath # implicit output
}
function Convert-NumberToIP
{
param(
[Parameter(Mandatory=$true)][string]$number
)
[Int64] $numberInt = 0
if( [Int64]::TryParse( $number, [ref] $numberInt ) ) {
if( ($numberInt -ge 0) -and ($numberInt -le 0xFFFFFFFFl) ) {
# Convert to IP address like '192.168.23.42'
([IPAddress] $numberInt).ToString()
}
}
# In case TryParse() returns $false or the number is out of range for an IPv4 address,
# the output of this function will be empty, which converts to $false in a boolean context.
}
$dbFilePath = Get-IPDataPath
$outputFile = Join-Path -Path $rootDir -ChildPath "output.json"
Write-Host "Converting CSV file $dbFilePath to $outputFile"
$object = [PSCustomObject]#{
'ip-address' = ''
'is-vpn' = 'true'
'#timestamp' = ''
}
# Enclose foreach loop in a script block to be able to pipe its output to Set-Content
& {
foreach( $item in [Linq.Enumerable]::Skip( [IO.File]::ReadLines( $dbFilePath ), 1 ) )
{
$row = $item -split ','
$ipNumber = $row[0].trim('"')
if( $ip = Convert-NumberToIP -number $ipNumber )
{
$object.'ip-address' = $ip
$object.'#timestamp' = (Get-Date).ToString('o')
# Implicit output
$object | ConvertTo-Json -Compress -Depth 100
}
}
} | Set-Content -Path $outputFile
Remarks for improving performance:
Avoid Get-Content, especially for line-by-line processing it tends to be slow. A much faster alternative is the File.ReadLines method. To skip the header line, use the Linq.Enumerable.Skip() method.
There is no need to read the whole CSV into memory first. Using ReadLines in a foreach loop does lazy enumeration, i. e. it reads only one line per loop iteration. This works because it returns an enumerator instead of a collection of lines.
Avoid try and catch if exceptions occur often, because the "exceptional" code path is very slow. Instead use Int64.TryParse() which returns a boolean indicating successful conversion.
Instead of "manually" converting the IP number to bytes, use the IPAddress class which has a constructor that takes an integer number. Use its method .GetAddressBytes() to get an array of bytes in network (big-endian) order. Finally use the PowerShell -join operator to create a string of the expected format.
Don't allocate a [pscustomobject] for each row, which has some overhead. Create it once before the loop and inside the loop only assign the values.
Avoid Write-Host (or any output to the console) within inner loops.
Unrelated to performance:
I've removed the New-Item call to create the output file, which isn't necessary because Set-Content automatically creates the file if it doesn't exist.
Note that the output is in NDJSON format, where each line is like a JSON file. In case you actually want this to be a regular JSON file, enclose the output in [ ] and insert a comma , between each row.
Modified processing loop to write a regular JSON file instead of NDJSON file:
& {
'[' # begin array
$first = $true
foreach( $item in [Linq.Enumerable]::Skip( [IO.File]::ReadLines( $dbFilePath ), 1 ) )
{
$row = $item -split ','
$ipNumber = $row[0].trim('"')
if( $ip = Convert-NumberToIP -number $ipNumber )
{
$object.'ip-address' = $ip
$object.'#timestamp' = (Get-Date).ToString('o')
$row = $object | ConvertTo-Json -Compress -Depth 100
# write array element delimiter if necessary
if( $first ) { $row; $first = $false } else { ",$row" }
}
}
']' # end array
} | Set-Content -Path $outputFile
You can optimize the function Convert-NumberToIP like below:
function Convert-NumberToIP {
param(
[Parameter(Mandatory=$true)][uint32]$number
)
# either do the math yourself like this:
# $w = ($number -shr 24) -band 255
# $x = ($number -shr 16) -band 255
# $y = ($number -shr 8) -band 255
# $z = $number -band 255
# '{0}.{1}.{2}.{3}' -f $w, $x, $y, $z # output the dotted IP string
# or use .Net:
$n = ([IPAddress]$number).GetAddressBytes()
[array]::Reverse($n)
([IPAddress]$n).IPAddressToString
}

Modifying JSON file using data from CSV in PowerShell

I'm trying to modify some specific values in a .json file based on two columns in a .csv file. If the current value in the .json file is identical to the one in the left column, I want to change it to the one in the right column.
This is my first time with PowerShell though, so I'm struggling to figure out how to go about doing this. I feel like my solution is not only wrong, but is using a double for loop when it might not need to. Here's what I have so far.
$jsonData = Get-Content -Path $jsonFile | ConvertFrom-Json
$csvData = Get-Content -Path $csvFile | Select-Object -Skip 1 # Skipping the header
foreach ($jsonItem in $jsonData.'Placeable List') {
foreach ($csvRow in $csvData) {
$splitRow = $csvRow -split ","
$lCol = $splitRow[0]
$rCol = $splitRow[1]
$currentItem = $jsonItem.'value'.'Appearance'.'value'
if ($currentItem -eq $lCol) {
$currentItem -eq $rCol
}
}
}
I managed to figure it out.
$csvData = Get-Content -Path $csvFile | Select-Object -Skip 1 # Skipping the header
$jsonData = Get-Content -Path $jsonFile -raw | ConvertFrom-Json
foreach($csvRow in $csvData) {
$splitRow = $csvRow -split ","
$lCol = $splitRow[0]
$rCol = $splitRow[1]
foreach($item in $jsonData.'Placeable List'.value) {
$item.Appearance | % {
if ($_.value -eq $lCol) {
$_.value = $rCol
}
}
}
}
$jsonData | ConvertTo-Json -depth 32 | Set-Content $jsonFile

Compare 2 CSV files and write all differences

I have 3 CSV files that contain user information. CSV1 is a "master" list of all inactive users. CSV2 is a current list of users that need to be deactivated and CSV3 is a list of users that need to be activated.
What I want is to have a PowerShell script that can be called from another script (the one that creates CSV2/3) to have it compare CSV1/2 and write all unique records back to CSV1. Then I want it to compare CSV1/3 and remove all records in CSV1 that exist in CSV3. CSV2/3 can change daily and it is possible to have no data in them, other than the header.
There are several unique fields, but I would want to compare on 'EmployeeID'.
All 3 CSV files have headers (same headers in all of them, so the data is consistent).
What I have ended up with so far will add the records from CSV2 to CSV1, but it adds both headers.
$ICM= Import-Csv inactiveicmaster.csv -Header 'StudentDistrictID', 'StudentSiteCode', 'StudentLastName', 'StudentFirstName', 'StudentGradeLevel', 'GraduationYr', 'Masterck', 'Homeroom', 'MiddleName', 'Birthday', 'Gender', 'Email'
$IC = Import-Csv csv\inactiveic.csv -Header 'StudentDistrictID', 'StudentSiteCode', 'StudentLastName', 'StudentFirstName', 'StudentGradeLevel', 'GraduationYr', 'Masterck', 'Homeroom', 'MiddleName', 'Birthday', 'Gender', 'Email'
$DIS = Import-Csv csv\disinad.csv -Header 'StudentDistrictID', 'StudentSiteCode', 'StudentLastName', 'StudentFirstName', 'StudentGradeLevel', 'GraduationYr', 'Masterck', 'Homeroom', 'MiddleName', 'Birthday', 'Gender', 'Email'
foreach ($f in $ic) {
$found = $false
foreach ($g in $icm) {
if ($g.StudentDistrictID -eq $f.StudentDistrictID) {
$found = $true
}
}
if ($found -eq $false) {
$icm += $f
if ($f.masterck -eq "") {
$f.masterck = "IM"
}
}
}
<#
foreach ($h in $dis) {
$found = $false
foreach ($g in $icm) {
if ($g.studentdistrictid -eq $h.studentdistrictid) {
$found = $true
}
if ($found -ne $false) {
#don't know what to do here to remove the duplicate
}
}
}
#>
$icm | select * | Export-Csv master.csv -NoTypeInformation
I don't know the exact answer but can't you do something like this?
$file1 = import-csv -Path "C:\temp\Test1.csv"
$file2 = import-csv -Path "C:\temp\Test2.csv"
Compare-Object $file1 $file2 -property MPFriendlyName
look at this link for complete example and result : Compare csv with same headers
If you know the differences it is easy enough to write them in the other csv.
Edit:
I don't have much experience with compare-objects but since it is a csv you can just delete the column with this.
Import-Csv C:\fso\csv1.csv | select ColumnYouWant1,ColumnYouWant2| Export-Csv -Path c:\fso\csvResult.csv –NoTypeInformation
This command will read your last csv and select the columns you want to keep and export it to a new csv.
Add a remote-item command to remove any csv's you don't need anymore and your done.
I know this is old but wanted to answer for others looking for this solution. I am trying to use Compare-Object myself because the two matrices but am running into a problem where if one is larger than the other it runs forever making a very larger matrix with lots of dupes.
Any who, to the above solution, you may want to consider using a break when you nest loops for this purpose. It'll allow you to compare much faster. Break will tell the 2nd for-each loop to stop and move on to the next item.
Sorry, first time posting on here. not sure how to format well and I gotta get back to action.
$ICM= Import-Csv InactiveICMaster.csv
$IC = Import-Csv csv\InactiveIC.csv
$DIS = Import-Csv csv\DisinAD.csv
foreach ($f in $ic)
foreach($g in $icm){
if ($g.StudentDistrictID -eq $f.StudentDistrictID){
break
}else{
$icm += $f
if ($f.masterck -eq ""){
$f.masterck = "IM"
}
}
}
$icm | select * | export-csv InactiveICMaster.csv -NoTypeInformation
$icma = import-csv InactiveICMaster.csv
compare-object $icma $dis -property studentdistrictid -passthru|Where-Object {$_.SideIndicator -eq "<="}|select StudentDistrictID,StudentSiteCode,StudentLastName,StudentFirstName,StudentGradeLevel,GraduationYr,Masterck,Homeroom,MiddleName,Birthday,Gender,Email |export-csv inactiveicmastertest.csv -NoTypeInformation
remove-item inactiveicmaster.csv
import-csv inactiveicmastertest.csv|sort StudentDistrictID|export-csv InactiveICMaster.csv -NoTypeInformation
remove-item InactiveICMasterTest.csv
Solution:
$ICM= Import-Csv InactiveICMaster.csv
$IC = Import-Csv csv\InactiveIC.csv
$DIS = Import-Csv csv\DisinAD.csv
foreach ($f in $ic)
{
$found = $false
foreach($g in $icm)
{
if ($g.StudentDistrictID -eq $f.StudentDistrictID)
{
$found = $true
}
}
if ($found -eq $false)
{
$icm += $f
if ($f.masterck -eq "")
{
$f.masterck = "IM"
}
}
}
$icm | select * | export-csv InactiveICMaster.csv -NoTypeInformation
$icma = import-csv InactiveICMaster.csv
compare-object $icma $dis -property studentdistrictid -passthru|Where-Object {$_.SideIndicator -eq "<="}|select StudentDistrictID,StudentSiteCode,StudentLastName,StudentFirstName,StudentGradeLevel,GraduationYr,Masterck,Homeroom,MiddleName,Birthday,Gender,Email |export-csv inactiveicmastertest.csv -NoTypeInformation
remove-item inactiveicmaster.csv
import-csv inactiveicmastertest.csv|sort StudentDistrictID|export-csv InactiveICMaster.csv -NoTypeInformation
remove-item InactiveICMasterTest.csv

How can I combine fields in a .csv based off of a shared value in powershell?

I have two files in identical formats, one containing destination IP addresses and URLs, and one that contains only the destination IP addresses. I am attempting to write a powershell script to add the URL field from the first file to the second file for that row if the destination IP addresses are equal. Here is an example of the two files:
File Containing URLs:
Date;Time;Source;Destination;Port;User;URL
3/7/2016;0:00:07;168.254.25.6;10.0.1.27;80;jsmith;abcnet
File to add URLs to:
Date;Time;Source;Destination;Port;User;URL
3/7/2016;0:00:09;168.254.25.6;10.0.1.27;80;;
Whenever I run the code below, it appears to be caught in an infinite loop because it does not run to completion, but it throws no errors. My data set is thousands of lines long, but it works when I test it with a sample set that is only a few lines long.
$noURLs = Import-Csv C:\Path\to\noURLs.csv
$containsURLs = Import-Csv C:\Path\to\containsURLs.csv | Select-Object Destination, URL
$outputFile = "C:\Path\to\output.csv"
if(Test-Path $outputFile){
Remove-Item $outputFile
}
foreach($line in $noURLs){
$cpDest = $line.Destination
$destURL = $containsURLs | Where-Object {$_.Destination -eq $cpDest} | Select-Object -ExpandProperty URL | Select-Object -Unique
if($destURL -ne $null){
if( $destURL.Count -gt 1) {
$destURL = $destURL -join ';'
}
}
$line.URL = $destURL
}
$noURLs | Export-Csv $outputFile
I forgot to add a -unique switch to my select object, so for every one record in the first csv, it was looping through every single line of the second csv. Fixed code looks like this:
$noURLs = Import-Csv C:\Path\to\noURLs.csv
$containsURLs = Import-Csv C:\Path\to\containsURLs.csv | Select-Object -Unique Destination, URL
$outputFile = "C:\Path\to\output.csv"
if(Test-Path $outputFile){
Remove-Item $outputFile
}
foreach($line in $noURLs){
$cpDest = $line.Destination
$destURL = $containsURLs | Where-Object {$_.Destination -eq $cpDest} | Select-Object -ExpandProperty URL | Select-Object -Unique
if($destURL -ne $null){
if( $destURL.Count -gt 1) {
$destURL = $destURL -join ';'
}
}
$line.URL = $destURL
}
$noURLs | Export-Csv $outputFile -NoTypeInformation

PowerShell: Function doesn't have proper return value

I wrote a powershell script to compare the content of two folders:
$Dir1 ="d:\TEMP\Dir1"
$Dir2 ="d:\TEMP\Dir2"
function Test-Diff($Dir1, $Dir2) {
$fileList1 = Get-ChildItem $Dir1 -Recurse | Where-Object {!$_.PsIsContainer} | Get-Item | Sort-Object -Property Name
$fileList2 = Get-ChildItem $Dir2 -Recurse | Where-Object {!$_.PsIsContainer} | Get-Item | Sort-Object -Property Name
if($fileList1.Count -ne $fileList2.Count) {
Write-Host "Following files are different:"
Compare-Object -ReferenceObject $fileList1 -DifferenceObject $fileList2 -Property Name -PassThru | Format-Table FullName
return $false
}
return $true
}
$i = Test-Diff $Dir1 $Dir2
if($i) {
Write-Output "Test OK"
} else {
Write-Host "Test FAILED" -BackgroundColor Red
}
If I set a break point on Compare-Object, and I run this command in console, I get the list of differences. If I run the whole script, I don't get any output. Why?
I'm working in PowerGUI Script Editor, but I tried the normal ps console too.
EDIT:
The problem is the check on the end of the script.
$i = Test-Diff $Dir1 $Dir2
if($i) {
Write-Output "Test OK"
...
If I call Test-Diff without $i = check, it works!
Test-Diff returns with an array of objects and not with an expected bool value:
[DBG]: PS D:\>> $i | ForEach-Object { $_.GetType() } | Format-Table -Property Name
Name
----
FormatStartData
GroupStartData
FormatEntryData
GroupEndData
FormatEndData
Boolean
If I comment out the line with Compare-Object, the return value is a boolean value, as expected.
The question is: why?
I've found the answer here: http://martinzugec.blogspot.hu/2008/08/returning-values-from-fuctions-in.html
Functions like this:
Function bar {
[System.Collections.ArrayList]$MyVariable = #()
$MyVariable.Add("a")
$MyVariable.Add("b")
Return $MyVariable
}
uses a PowerShell way of returning objects: #(0,1,"a","b") and not #("a","b")
To make this function work as expected, you will need to redirect output to null:
Function bar {
[System.Collections.ArrayList]$MyVariable = #()
$MyVariable.Add("a") | Out-Null
$MyVariable.Add("b") | Out-Null
Return $MyVariable
}
In our case, the function has to be refactored as suggested by Koliat.
An alternative to adding Out-Null after every command but the last is doing this:
$i = (Test-Diff $Dir1 $Dir2 | select -last 1)
PowerShell functions always return the result of all the commands executed in the function as an Object[] (unless you pipe the command to Out-Null or store the result in a variable), but the expression following the return statement is always the last one, and can be extracted with select -last 1.
I have modified the bit of your script, to make it run the way you want it. I'm not exactly sure you would want to compare files only by the .Count property though, but its not within the scope of this question. If that wasn't what you were looking after, please comment and I'll try to edit this answer. Basically from what I understand you wanted to run a condition check after the function, while it can be easily implemented inside the function.
$Dir1 ="C:\Dir1"
$Dir2 ="C:\Users\a.pawlak\Desktop\Dir2"
function Test-Diff($Dir1,$Dir2)
{
$fileList1 = Get-ChildItem $Dir1 -Recurse | Where-Object {!$_.PsIsContainer} | Get-Item | Sort-Object -Property Name
$fileList2 = Get-ChildItem $Dir2 -Recurse | Where-Object {!$_.PsIsContainer} | Get-Item | Sort-Object -Property Name
if ($fileList1.Count -ne $fileList2.Count)
{
Write-Host "Following files are different:"
Compare-Object -ReferenceObject $fileList1 -DifferenceObject $fileList2 -Property FullName -PassThru | Format-Table FullName
Write-Host "Test FAILED" -BackgroundColor Red
}
else
{
return $true
Write-Output "Test OK"
}
}
Test-Diff $Dir1 $Dir2
If there is anything unclear, let me know
AlexP