How To Access Specific Rows in an Import-Csv Array? - csv

I need to split a large file upload into many parallel processes and want to use a single CSV file as input.
Is it possible to access blocks of rows from an Import-Csv object, something like this:
$SODAData = Import-Csv $CSVPath -Delimiter "|" |
Where $_.Rownum == 20,000..29,999 |
Foreach-Object { ... }
What is the syntax for such an extraction?
I'm using Powershell 5.

Import-Csv imports the file as an array of objects, so you could do something like this (using the range operator):
$csv = Import-CSv $CSVPath -Delimiter '|'
$SOAData = $csv[20000..29999] | ForEach-Object { ... }
An alternative would be to use Select-Object:
$offset = 20000
$count = 10000
$csv = Import-Csv $CSVPath -Delimiter '|'
$SODAData = $csv |
Select-Object -Skip $offset -First $count |
ForEach-Object { ... }
If you want to avoid reading the entire file into memory you can change the above to a single pipeline:
$offset = 20000
$count = 10000
$SODAData = Import-Csv $CSVPath -Delimiter '|' |
Select-Object -Skip $offset -First $count |
ForEach-Object { ... }
Beware, though, that with this approach you need to read the file multiple times for processing multiple chunks of data.

Related

Where-Object with complex evaluation

I have a PowerShell script where I read in a CSV file, and if the date in a certain column is greater than a parameter date, I output that row to a new file.
As of now, I read the CSV file and then pipe to a ForEach-Object where if the row "passes" I store it in an Arraylist. Then when all the rows are processed, I output the Arraylist to an output CSV file. My starting CSV file is 225MB with over a quarter million rows, meaning that this process is slow.
Is there a way I can add a filter function to my piping so that only the passing rows are passed to the output CSV in one fell swoop? The current Where-Object just uses things like -like, -contains... and not more complex forms of evaluation.
For best practices, I've got my code below:
Import-Csv -Delimiter "`t" -Header $headerCounter -Path $filePath |
Select-Object -Skip(1) |
ForEach-Object {
#Skip the header
if( $lineCounter -eq 1)
{
return
}
$newDate = if ([string]::IsNullOrEmpty($_.1) -eq $true)
{ [DateTime]::MinValue }
else { [datetime]::ParseExact($_.1,”yyyyMMdd”,$null) }
$updateDate = if ([string]::IsNullOrEmpty($_.2) -eq $true)
{ [DateTime]::MinValue }
else { [datetime]::ParseExact($_.2,”yyyyMMdd”,$null) }
$distanceDate = (Get-Date).AddDays($daysBack * -1)
if( $newDate -gt $distanceDate -or $updateDate -gt $distanceDate )
{
[void]$filteredArrayList.Add($_)
}
}
...
$filteredArrayList |
ConvertTo-Csv -Delimiter "`t" -NoTypeInformation |
select -Skip 1 |
% { $_ -replace '"', ""} |
out-file $ouputFile -fo -en unicode -Append
I've added ConvertToDate as a function to stop that confusing the Where block.
DistanceDate is out because it appears to be calculated only once.
ExportCsv is a little function that writes pipeline input to a file.
I haven't tested it, so bugs are quite likely unless I got lucky.
function ConvertToDate {
param(
[String]$DateString
)
if ($DateString -eq '') {
return [DateTime]::MinValue
} else {
return [DateTime]::ParseExact($DateString, ”yyyyMMdd”, $null)
}
}
filter ExportCsv {
param(
[Parameter(Position = 1)]
[String]$Path
)
$csv = $_ | ConvertTo-Csv -Delimiter "`t" | Select-Object -Last 1
$csv -replace '"' | Out-File $Path -Append -Encoding Unicode -Force
}
$distanceDate = (Get-Date).AddDays($daysBack * -1)
Import-Csv -Delimiter "`t" -Header $headerCounter -Path $filePath |
Select-Object -Skip 1 |
Where-Object { (ConvertToDate $_.1) -gt $distanceDate -or (ConvertToDate $_.2) -gt $distanceDate } |
ExportCsv $OutputFile
Sure, just add a function that takes a value from the pipeline and pipe the result of Import-Csv to it. Within the function you check whether you want to filter the current item or not. Here a simple example which uses a string list and filter all strings that starts with h:
$x = #('hello', 'world', 'hello', 'tree')
filter Filter-CsvByMyRequirements
{
Param(
[Parameter(Mandatory=$true,
ValueFromPipeline=$true)]
$InputObject
)
Process
{
if ($_ -match '^h.*')
{
$_
}
}
}
$x | Filter-CsvByMyRequirements | Write-Host
Output:
hello
hello

PowerShell: Delete multiple entrys from a csv file

I got a script which works intensely with csv files. In those files different kind of data is stored. At some point I want to delete entrys from such files.
When I want to delete a single entry I do it like this:
$csv = Import-Csv -Path $path -Delimiter ";"
$selectedEntry = $csv | Out-GridView -Title $title -OutputMode Single
$csv = $csv -notmatch $selectedEntry
$csv | Export-Csv $path -NoTypeInformation -Delimiter ";"
This approach work quite stable but if I change the parameter -OutputMode from Single to Multiple, the following line doesnt work anymore:
$csv = $csv -notmatch $selectedEntry
Why is this so? How can I delete multiple entrys from a csv file?
You should be able to do this with an array differencing Where-Object clause.
$selectedEntry = $csv | Out-GridView -Title $title -OutputMode Multiple
$filteredCsv = $csv | where { $selectedEntry -notcontains $_ }
This iterates over each element of the $csv array, and only produces the ones that are not in the $selectedEntry array.

Loop through a CSV file and verify column count for each row

I'm new to PowerShell and have been trying to loop through a CSV file and return column count of each row. Compare that column count to the first row and have something happen it its not equal. In this case replace comma with nothing. Then create a new file with the changes.
$csvColumnCount = (import-csv "a CSV file" | get-member -type NoteProperty).count
$CurrentFile = Get-Content "a CSV file" |
ForEach-Object { $CurrentLineCount = import-csv "a CSV file" | get-member -type NoteProperty).count
$Line = $_
if ($csvColumnCount -ne $CurrentLineCount)
{ $Line -Replace "," , "" }
else
{ $Line } ;
$CurrentLineCount++} |
Set-Content ($CurrentFile+".out")
Copy-Item ($CurrentFile+".out") $ReplaceCSVFile
If your intention is to check which rows of a CSV file are invalid then just use a simple split and count, something like so:
$csv = Get-Content 'your_file.csv'
$count = ($csv[0] -split ',').count
$csv | Select -Skip 1 | % {
if(($_ -split ',').count -eq $count) {
...do valid stuff
} else {
...do invalid stuff
}
}
For CSV checking purposes avoid CSV cmdlets because these will have a tendency to try and correct problems, for example:
$x = #"
a,b,c
1,2,3,4
"#
$x | ConvertFrom-Csv
> a b c
- - -
1 2 3
Also I think the flow of your code is a little confused. You trying to return the results of a pipeline to a variable called $CurrentFile whilst at the other end of that pipeline you are trying use the same variable as a file name for Set-Content.
If your CSV has quoted fields which could contain commas then a simple split will not work. If that is the case a better option would be to use a regex to break each line into columns which can then be counted. Something like this:
$re = '(?:^|,)(?:\"(?:[^\"]+|\"\")*\"|[^,]*)'
$csv = Get-Content 'your_file.csv'
$count = [regex]::matches($csv[0], $re).groups.count
$csv | Select -Skip 1 | % {
if([regex]::matches($_, $re).groups.count -eq $count) {
...do valid stuff
} else {
...do invalid stuff
}
}

How can I combine fields in a .csv based off of a shared value in powershell?

I have two files in identical formats, one containing destination IP addresses and URLs, and one that contains only the destination IP addresses. I am attempting to write a powershell script to add the URL field from the first file to the second file for that row if the destination IP addresses are equal. Here is an example of the two files:
File Containing URLs:
Date;Time;Source;Destination;Port;User;URL
3/7/2016;0:00:07;168.254.25.6;10.0.1.27;80;jsmith;abcnet
File to add URLs to:
Date;Time;Source;Destination;Port;User;URL
3/7/2016;0:00:09;168.254.25.6;10.0.1.27;80;;
Whenever I run the code below, it appears to be caught in an infinite loop because it does not run to completion, but it throws no errors. My data set is thousands of lines long, but it works when I test it with a sample set that is only a few lines long.
$noURLs = Import-Csv C:\Path\to\noURLs.csv
$containsURLs = Import-Csv C:\Path\to\containsURLs.csv | Select-Object Destination, URL
$outputFile = "C:\Path\to\output.csv"
if(Test-Path $outputFile){
Remove-Item $outputFile
}
foreach($line in $noURLs){
$cpDest = $line.Destination
$destURL = $containsURLs | Where-Object {$_.Destination -eq $cpDest} | Select-Object -ExpandProperty URL | Select-Object -Unique
if($destURL -ne $null){
if( $destURL.Count -gt 1) {
$destURL = $destURL -join ';'
}
}
$line.URL = $destURL
}
$noURLs | Export-Csv $outputFile
I forgot to add a -unique switch to my select object, so for every one record in the first csv, it was looping through every single line of the second csv. Fixed code looks like this:
$noURLs = Import-Csv C:\Path\to\noURLs.csv
$containsURLs = Import-Csv C:\Path\to\containsURLs.csv | Select-Object -Unique Destination, URL
$outputFile = "C:\Path\to\output.csv"
if(Test-Path $outputFile){
Remove-Item $outputFile
}
foreach($line in $noURLs){
$cpDest = $line.Destination
$destURL = $containsURLs | Where-Object {$_.Destination -eq $cpDest} | Select-Object -ExpandProperty URL | Select-Object -Unique
if($destURL -ne $null){
if( $destURL.Count -gt 1) {
$destURL = $destURL -join ';'
}
}
$line.URL = $destURL
}
$noURLs | Export-Csv $outputFile -NoTypeInformation

Import-Csv TrimEnd Column Header

I need import a CSV and run it through a foreach loop. I want to trim the end on the column header DeviceName to avoid any potential issues. I have tried the following but it is not working as expected.
$Import = Import-CSV $csv
foreach ($i in ($import.DeviceName).TrimEnd())
{do something}
Any help? Thank you!
If you need to change both the header and the content in the column for devicename which has spaces I have come up with this forgiving code.
$csvData = import-csv $csv
$properties = $csvData[0].psobject.Properties.name
$csvHeader = "`"$(($properties | ForEach-Object{$_.Trim()}) -join '","')`""
$deviceHeader = $properties -match "DeviceName"
$csvHeader
$csvHeader | Set-Content $file
$csvData | ForEach-Object{
$_.$deviceHeader = ($_.$deviceHeader).trim()
$_
} | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1 | Add-Content $file
What this does is read in the CSV like normal. Parse the property names of the object in the order they appear. We find the one that has DeviceName no matter how many spaces (if there is more that one you could have a problem). Keep that so we can use it to call the correct property of each "row".
Export the new cleaned header to the file. Then we go through each "row" removing all the leading and trailing space from the DeviceName. Once that is done write back the CSV to the original file.
The best solution would be to tell the other team to fix their generation procedure. However, if for some reason that's not an option, I'd recommend pre-processing the file before you import it as a CSV.
$filename = 'C:\path\to\your.csv'
(Get-Content $filename -Raw) -replace '^(.*DeviceName)[ ]*(.*)', '$1$2' |
Set-Content $filename
Reading the file as a single string (-Raw) and anchoring the expression at the beginning of the string (^) ensures that only the column title is replaced.
For large input files you may want to consider a different approach, though, since the above reads the entire file into memory before replacing the first line.
$infile = 'C:\path\to\input.csv'
$outfile = 'C:\path\to\output.csv'
$firstLine = $true
Get-Content $infile | % {
if ($firstLine) {
$_ -replace '(DeviceName)[ ]*', '$1'
$firstLine = $false
} else {
$_
}
} | Set-Content $outfile
Thinking about it some more and taking inspiration from a comment to #Zeek's answer, you could also extract the headers first and then convert the rest of the file.
$infile = 'C:\path\to\input.csv'
$outfile = 'C:\path\to\output.csv'
$header = (Get-Content $infile -First 1) -split '\s*,\s*'
Get-Content $infile |
select -Skip 1 |
ConvertFrom-Csv -Header $header |
Export-Csv $outfile -NoType
Is this all you're trying to do? This will give you a collection of objects imported from your csv file but trim the end of the DeviceName property on each object.
$items = Import-CSV -Path $csv
$items.ForEach({ $_.DeviceName = $_.DeviceName.TrimEnd() })