Compare 2 CSV files and write all differences

Compare 2 CSV files and write all differences - csv

I have 3 CSV files that contain user information. CSV1 is a "master" list of all inactive users. CSV2 is a current list of users that need to be deactivated and CSV3 is a list of users that need to be activated.
What I want is to have a PowerShell script that can be called from another script (the one that creates CSV2/3) to have it compare CSV1/2 and write all unique records back to CSV1. Then I want it to compare CSV1/3 and remove all records in CSV1 that exist in CSV3. CSV2/3 can change daily and it is possible to have no data in them, other than the header.
There are several unique fields, but I would want to compare on 'EmployeeID'.
All 3 CSV files have headers (same headers in all of them, so the data is consistent).
What I have ended up with so far will add the records from CSV2 to CSV1, but it adds both headers.
$ICM= Import-Csv inactiveicmaster.csv -Header 'StudentDistrictID', 'StudentSiteCode', 'StudentLastName', 'StudentFirstName', 'StudentGradeLevel', 'GraduationYr', 'Masterck', 'Homeroom', 'MiddleName', 'Birthday', 'Gender', 'Email'
$IC = Import-Csv csv\inactiveic.csv -Header 'StudentDistrictID', 'StudentSiteCode', 'StudentLastName', 'StudentFirstName', 'StudentGradeLevel', 'GraduationYr', 'Masterck', 'Homeroom', 'MiddleName', 'Birthday', 'Gender', 'Email'
$DIS = Import-Csv csv\disinad.csv -Header 'StudentDistrictID', 'StudentSiteCode', 'StudentLastName', 'StudentFirstName', 'StudentGradeLevel', 'GraduationYr', 'Masterck', 'Homeroom', 'MiddleName', 'Birthday', 'Gender', 'Email'
foreach ($f in $ic) {
$found = $false
foreach ($g in $icm) {
if ($g.StudentDistrictID -eq $f.StudentDistrictID) {
$found = $true
}
}
if ($found -eq $false) {
$icm += $f
if ($f.masterck -eq "") {
$f.masterck = "IM"
}
}
}
<#
foreach ($h in $dis) {
$found = $false
foreach ($g in $icm) {
if ($g.studentdistrictid -eq $h.studentdistrictid) {
$found = $true
}
if ($found -ne $false) {
#don't know what to do here to remove the duplicate
}
}
}
#>
$icm | select * | Export-Csv master.csv -NoTypeInformation

I don't know the exact answer but can't you do something like this?
$file1 = import-csv -Path "C:\temp\Test1.csv"
$file2 = import-csv -Path "C:\temp\Test2.csv"
Compare-Object $file1 $file2 -property MPFriendlyName
look at this link for complete example and result : Compare csv with same headers
If you know the differences it is easy enough to write them in the other csv.
Edit:
I don't have much experience with compare-objects but since it is a csv you can just delete the column with this.
Import-Csv C:\fso\csv1.csv | select ColumnYouWant1,ColumnYouWant2| Export-Csv -Path c:\fso\csvResult.csv –NoTypeInformation
This command will read your last csv and select the columns you want to keep and export it to a new csv.
Add a remote-item command to remove any csv's you don't need anymore and your done.

I know this is old but wanted to answer for others looking for this solution. I am trying to use Compare-Object myself because the two matrices but am running into a problem where if one is larger than the other it runs forever making a very larger matrix with lots of dupes.
Any who, to the above solution, you may want to consider using a break when you nest loops for this purpose. It'll allow you to compare much faster. Break will tell the 2nd for-each loop to stop and move on to the next item.
Sorry, first time posting on here. not sure how to format well and I gotta get back to action.
$ICM= Import-Csv InactiveICMaster.csv
$IC = Import-Csv csv\InactiveIC.csv
$DIS = Import-Csv csv\DisinAD.csv
foreach ($f in $ic)
foreach($g in $icm){
if ($g.StudentDistrictID -eq $f.StudentDistrictID){
break
}else{
$icm += $f
if ($f.masterck -eq ""){
$f.masterck = "IM"
}
}
}
$icm | select * | export-csv InactiveICMaster.csv -NoTypeInformation
$icma = import-csv InactiveICMaster.csv
compare-object $icma $dis -property studentdistrictid -passthru|Where-Object {$_.SideIndicator -eq "<="}|select StudentDistrictID,StudentSiteCode,StudentLastName,StudentFirstName,StudentGradeLevel,GraduationYr,Masterck,Homeroom,MiddleName,Birthday,Gender,Email |export-csv inactiveicmastertest.csv -NoTypeInformation
remove-item inactiveicmaster.csv
import-csv inactiveicmastertest.csv|sort StudentDistrictID|export-csv InactiveICMaster.csv -NoTypeInformation
remove-item InactiveICMasterTest.csv

Solution:
$ICM= Import-Csv InactiveICMaster.csv
$IC = Import-Csv csv\InactiveIC.csv
$DIS = Import-Csv csv\DisinAD.csv
foreach ($f in $ic)
{
$found = $false
foreach($g in $icm)
{
if ($g.StudentDistrictID -eq $f.StudentDistrictID)
{
$found = $true
}
}
if ($found -eq $false)
{
$icm += $f
if ($f.masterck -eq "")
{
$f.masterck = "IM"
}
}
}
$icm | select * | export-csv InactiveICMaster.csv -NoTypeInformation
$icma = import-csv InactiveICMaster.csv
compare-object $icma $dis -property studentdistrictid -passthru|Where-Object {$_.SideIndicator -eq "<="}|select StudentDistrictID,StudentSiteCode,StudentLastName,StudentFirstName,StudentGradeLevel,GraduationYr,Masterck,Homeroom,MiddleName,Birthday,Gender,Email |export-csv inactiveicmastertest.csv -NoTypeInformation
remove-item inactiveicmaster.csv
import-csv inactiveicmastertest.csv|sort StudentDistrictID|export-csv InactiveICMaster.csv -NoTypeInformation
remove-item InactiveICMasterTest.csv

Related

Where-Object with complex evaluation

I have a PowerShell script where I read in a CSV file, and if the date in a certain column is greater than a parameter date, I output that row to a new file.
As of now, I read the CSV file and then pipe to a ForEach-Object where if the row "passes" I store it in an Arraylist. Then when all the rows are processed, I output the Arraylist to an output CSV file. My starting CSV file is 225MB with over a quarter million rows, meaning that this process is slow.
Is there a way I can add a filter function to my piping so that only the passing rows are passed to the output CSV in one fell swoop? The current Where-Object just uses things like -like, -contains... and not more complex forms of evaluation.
For best practices, I've got my code below:
Import-Csv -Delimiter "`t" -Header $headerCounter -Path $filePath |
Select-Object -Skip(1) |
ForEach-Object {
#Skip the header
if( $lineCounter -eq 1)
{
return
}
$newDate = if ([string]::IsNullOrEmpty($_.1) -eq $true)
{ [DateTime]::MinValue }
else { [datetime]::ParseExact($_.1,”yyyyMMdd”,$null) }
$updateDate = if ([string]::IsNullOrEmpty($_.2) -eq $true)
{ [DateTime]::MinValue }
else { [datetime]::ParseExact($_.2,”yyyyMMdd”,$null) }
$distanceDate = (Get-Date).AddDays($daysBack * -1)
if( $newDate -gt $distanceDate -or $updateDate -gt $distanceDate )
{
[void]$filteredArrayList.Add($_)
}
}
...
$filteredArrayList |
ConvertTo-Csv -Delimiter "`t" -NoTypeInformation |
select -Skip 1 |
% { $_ -replace '"', ""} |
out-file $ouputFile -fo -en unicode -Append

I've added ConvertToDate as a function to stop that confusing the Where block.
DistanceDate is out because it appears to be calculated only once.
ExportCsv is a little function that writes pipeline input to a file.
I haven't tested it, so bugs are quite likely unless I got lucky.
function ConvertToDate {
param(
[String]$DateString
)
if ($DateString -eq '') {
return [DateTime]::MinValue
} else {
return [DateTime]::ParseExact($DateString, ”yyyyMMdd”, $null)
}
}
filter ExportCsv {
param(
[Parameter(Position = 1)]
[String]$Path
)
$csv = $_ | ConvertTo-Csv -Delimiter "`t" | Select-Object -Last 1
$csv -replace '"' | Out-File $Path -Append -Encoding Unicode -Force
}
$distanceDate = (Get-Date).AddDays($daysBack * -1)
Import-Csv -Delimiter "`t" -Header $headerCounter -Path $filePath |
Select-Object -Skip 1 |
Where-Object { (ConvertToDate $_.1) -gt $distanceDate -or (ConvertToDate $_.2) -gt $distanceDate } |
ExportCsv $OutputFile

Sure, just add a function that takes a value from the pipeline and pipe the result of Import-Csv to it. Within the function you check whether you want to filter the current item or not. Here a simple example which uses a string list and filter all strings that starts with h:
$x = #('hello', 'world', 'hello', 'tree')
filter Filter-CsvByMyRequirements
{
Param(
[Parameter(Mandatory=$true,
ValueFromPipeline=$true)]
$InputObject
)
Process
{
if ($_ -match '^h.*')
{
$_
}
}
}
$x | Filter-CsvByMyRequirements | Write-Host
Output:
hello
hello

Convert tab delimiter to semicolon

I have updated a piece of software for our T&A system, this produces a CSV file in tab-delimited format. The payroll software needs this in the older format which was semicolon-delimited. I have been in touch with both vendors and neither one has a way to accommodate the other so I need to convert the CSV file to suit the payroll software. I have tried to do this with PowerShell with mixed results.
First I tried
Import-Csv ".\desktop\new version.csv" -Delimiter `t |
Export-Csv ".\converted.csv" -NoTypeInf
which removed the tab delimiter but didn't do the ;. So I then tried
Import-Csv ".\desktop\new version.csv" -Delimiter `t |
Export-Csv ".\desktop\converted.csv" -NoTypeInformation -Delimiter ";"
which did convert it from tabbed to ;, but only for the headers. It totally ignored the rest of the data. I then tried a different approach and used
$path = ".\desktop\new.csv"
$outPath = ".\desktop\converted.csv"
Get-Content -path $path |
ForEach-Object {$_ -replace "`t",";" } |
Out-File -filepath $outPath
which formatted the file correctly, but put an extra empty row between each row of data. I'm not sure what I'm doing wrong!

I'm pretty sure you are having an encoding issue with your last example. Get-Content reads in as Ascii whereas Out-File defaults to Unicode. Either set the -Encoding on Out-File or just use Set-Content.
Get-Content -path $path |
ForEach-Object {$_ -replace "`t",";" } |
Set-Content -filepath $outPath
You could even trim this down a bit if need be.
(Get-Content -path $path) -replace "`t",";" | Set-Content -filepath $outPath
However your 2nd code example...
Import-Csv ".\desktop\new version.csv" -Delimiter `t | Export-Csv ".\desktop\converted.csv" -NoTypeInformation -Delimiter ";"
should have worked just fine to replacing the tabs to semicolons. If it is not working then I would think your source data has an issue.
About the source file
Based on comments the code above is creating a trailing column. Most likely reason for that is trailing tabs on each row that are being converted. If that is the case then a little more manipulation would be required. Easier to use the foreach loop in this case.
Get-Content -path $path |
ForEach-Object {$_.Trim() -replace "`t",";" } |
Set-Content -filepath $outPath
That would remove the last tab/whitespace of each line. There is a potential enormous caveat doing it this way though. I think it has the potential to drop data if you have empty columns on the end. However if those columns were already empty it should not matter as long as the header is formed well and the input program can account for this. Else you are looking at reading in the file with Import-CSV and dropping the last column which can be done.

Here's a function I used to replace strings in text files like you're doing. This is assuming there's no tabs inside the text file other than those that are delimiting the columns. I'm assuming there's not. You can use it like this:
Find-InTextFile -FilePath C:\MyFile.csv -Find "`t" -Replace ';'
function Find-InTextFile
{
<#
.SYNOPSIS
Performs a find (or replace) on a string in a text file or files.
.EXAMPLE
PS> Find-InTextFile -FilePath 'C:\MyFile.txt' -Find 'water' -Replace 'wine'
Replaces all instances of the string 'water' into the string 'wine' in
'C:\MyFile.txt'.
.EXAMPLE
PS> Find-InTextFile -FilePath 'C:\MyFile.txt' -Find 'water'
Finds all instances of the string 'water' in the file 'C:\MyFile.txt'.
.PARAMETER FilePath
The file path of the text file you'd like to perform a find/replace on.
.PARAMETER Find
The string you'd like to replace.
.PARAMETER Replace
The string you'd like to replace your 'Find' string with.
.PARAMETER UseRegex
Use this switch parameter if you're finding strings using regex else the Find string will
be escaped from regex characters
.PARAMETER NewFilePath
If a new file with the replaced the string needs to be created instead of replacing
the contents of the existing file use this param to create a new file.
.PARAMETER Force
If the NewFilePath param is used using this param will overwrite any file that
exists in NewFilePath.
#>
[CmdletBinding(DefaultParameterSetName = 'NewFile')]
param (
[Parameter(Mandatory = $true)]
[ValidateScript({ Test-Path -Path $_ -PathType 'Leaf' })]
[string[]]$FilePath,
[Parameter(Mandatory = $true)]
[string]$Find,
[Parameter()]
[string]$Replace,
[Parameter()]
[switch]$UseRegex,
[Parameter(ParameterSetName = 'NewFile')]
[ValidateScript({ Test-Path -Path ($_ | Split-Path -Parent) -PathType 'Container' })]
[string]$NewFilePath,
[Parameter(ParameterSetName = 'NewFile')]
[switch]$Force
)
begin
{
if (!$UseRegex.IsPresent)
{
$Find = [regex]::Escape($Find)
}
}
process
{
try
{
foreach ($File in $FilePath)
{
if ($Replace)
{
if ($NewFilePath)
{
if ((Test-Path -Path $NewFilePath -PathType 'Leaf') -and $Force.IsPresent)
{
Remove-Item -Path $NewFilePath -Force
(Get-Content $File) -replace $Find, $Replace | Add-Content -Path $NewFilePath -Force
}
elseif ((Test-Path -Path $NewFilePath -PathType 'Leaf') -and !$Force.IsPresent)
{
Write-Warning "The file at '$NewFilePath' already exists and the -Force param was not used"
}
else
{
(Get-Content $File) -replace $Find, $Replace | Add-Content -Path $NewFilePath -Force
}
}
else
{
(Get-Content $File) -replace $Find, $Replace | Add-Content -Path "$File.tmp" -Force
Remove-Item -Path $File
Rename-Item -Path "$File.tmp" -NewName $File
}
}
else
{
Select-String -Path $File -Pattern $Find
}
}
}
catch
{
Write-Error -Message $_.Exception.Message
}
}
}

Replace blank characters from a file line by line

I would like to be able to find all blanks from a CSV file and if a blank character is found on a line then should appear on the screen and I should be asked if I want to keep the entire line which contains that white space or remove it.
Let's say the directory is C:\Cr\Powershell\test. In there there is one CSV file abc.csv.
Tried doing it like this but in PowerShell ISE the $_.PSObject.Properties isn't recognized.
$csv = Import-Csv C:\Cr\Powershell\test\*.csv | Foreach-Object {
$_.PSObject.Properties | Foreach-Object {$_.Value = $_.Value.Trim()}
}
I apologize for not includding more code and what I tried more so far but they were silly attempts since I just begun.
This looks helpful but I don't know exactly how to adapt it for my problem.

Ok man here you go:
$yes = New-Object System.Management.Automation.Host.ChoiceDescription "&Yes", "Retain line."
$no = New-Object System.Management.Automation.Host.ChoiceDescription "&No", "Delete line."
$n = #()
$f = Get-Content .\test.csv
foreach($item in $f) {
if($item -like "* *"){
$res = $host.ui.PromptForChoice("Title", "want to keep this line? `n $item", [System.Management.Automation.Host.ChoiceDescription[]]($yes, $no), 0)
switch ($res)
{
0 {$n+=$item}
1 {}
}
} else {
$n+=$item
}
}
$n | Set-Content .\test.csv
if you have questions please post in the comments and i will explain

Get-Content is probably a better approach than Import-Csv, because that'll allow you to check an entire line for spaces instead of having to check each individual field. For fully automated processing you'd just use a Where-Object filter to remove non-matching lines from the output:
Get-Content 'C:\CrPowershell\test\input.csv' |
Where-Object { $_ -notlike '* *' } |
Set-Content 'C:\CrPowershell\test\output.csv'
However, since you want to prompt for each individual line that contains spaces you need a ForEach-Object (or a similiar construct) and a nested conditional, like this:
Get-Content 'C:\CrPowershell\test\input.csv' | ForEach-Object {
if ($_ -notlike '* *') { $_ }
} | Set-Content 'C:\CrPowershell\test\output.csv'
The simplest way to prompt a user for input is Read-Host:
$answer = Read-Host -Prompt 'Message'
if ($answer -eq 'y') {
# do one thing
} else {
# do another
}
In your particular case you'd probably do something like this for any matching line:
$anwser = Read-Host "$_`nKeep the line? [y/n] "
if ($answer -ne 'n') { $_ }
The above checks if the answer is not n to make removal of the line a conscious decision.
Other ways to prompt for user input are choice.exe (which has the additional advantage of allowing a timeout and a default answer):
choice.exe /c YN /d N /t 10 /m "$_`nKeep the line"
if ($LastExitCode -ne 2) { $_ }
or the host UI:
$title = $_
$message = 'Keep the line?'
$yes = New-Object Management.Automation.Host.ChoiceDescription '&Yes'
$no = New-Object Management.Automation.Host.ChoiceDescription '&No'
$options = [Management.Automation.Host.ChoiceDescription[]]($yes, $no)
$answer = $Host.UI.PromptForChoice($title, $message, $options, 1)
if ($answer -ne 1) { $_ }
I'm leaving it as an exercise for you to integrate whichever prompting routine you chose with the rest of the code.

Powershell script poor performance when creating a CSV file from JSON

I have an performance issue with the below code. I want to parse some information from a JSON file to a CSV. The JSON itself has around 200k lines. The performance of this conversion is not good as it takes over 1h to process such a file.
I think the problem might be with the Add-Content function as I'm using a normal HDD for it. Could you please let me know if you see any improvements of the code or any changes that I could do?
$file = "$disk\TEMP\" + $mask
$res = (Get-Content $file) | ConvertFrom-Json
$file = "$disk\TEMP\result.csv"
Write-Host "Creating CSV from JSON" -ForegroundColor Green
Add-Content $file ("{0},{1},{2},{3},{4}" -f "TargetId", "EventType", "UserId", "Username", "TimeStamp")
$l = 0
foreach ($line in $res) {
if($line.EventType -eq 'DirectDownloadCompleted' -and $line.TargetDefinition -eq 'GOrder') {
#nothing here
} elseif($line.EventType -eq 'DirectDownloadCompleted' -and $line.TargetDefinition -eq 'GFile') {
Add-Content $file ("{0},{1},{2},{3},{4}" -f
$line.AssetId, $line.EventType, $line.UserId, $line.UserName, $line.TimeStamp)
$l = $l + 1
} else {
Add-Content $file ("{0},{1},{2},{3},{4}" -f $line.TargetId, $line.EventType, $line.UserId, $line.UserName, $line.TimeStamp)
$l = $l + 1
}
}

Ok, a few lessons here I think. First off, don't re-write the Export-CSV cmdlet. Instead convert your info into an array of objects, and output it all at once. This will make it so that you only have to write to the file once, which should increase your speed dramatically. Also, don't do ForEach>If>IfElse>Else when this function already exists in the Switch cmdlet. Try something like this:
$Results = Switch($res){
{$_.EventType -eq 'DirectDownloadCompleted' -and $_.TargetDefinition -eq 'GOrder'}{Continue}
{$_.EventType -eq 'DirectDownloadCompleted' -and $_.TargetDefinition -eq 'GFile'}{$_ | Select #{l='TargetId';e={$_.AssetId}},EventType,UserId,Username,TimeStamp;Continue}
Default {$_ | Select TargetId,EventType,UserId,Username,TimeStamp}
}
$Results | Export-CSV $file -NoType
$l = $Results.Count

Import-Csv TrimEnd Column Header

I need import a CSV and run it through a foreach loop. I want to trim the end on the column header DeviceName to avoid any potential issues. I have tried the following but it is not working as expected.
$Import = Import-CSV $csv
foreach ($i in ($import.DeviceName).TrimEnd())
{do something}
Any help? Thank you!

If you need to change both the header and the content in the column for devicename which has spaces I have come up with this forgiving code.
$csvData = import-csv $csv
$properties = $csvData[0].psobject.Properties.name
$csvHeader = "`"$(($properties | ForEach-Object{$_.Trim()}) -join '","')`""
$deviceHeader = $properties -match "DeviceName"
$csvHeader
$csvHeader | Set-Content $file
$csvData | ForEach-Object{
$_.$deviceHeader = ($_.$deviceHeader).trim()
$_
} | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1 | Add-Content $file
What this does is read in the CSV like normal. Parse the property names of the object in the order they appear. We find the one that has DeviceName no matter how many spaces (if there is more that one you could have a problem). Keep that so we can use it to call the correct property of each "row".
Export the new cleaned header to the file. Then we go through each "row" removing all the leading and trailing space from the DeviceName. Once that is done write back the CSV to the original file.

The best solution would be to tell the other team to fix their generation procedure. However, if for some reason that's not an option, I'd recommend pre-processing the file before you import it as a CSV.
$filename = 'C:\path\to\your.csv'
(Get-Content $filename -Raw) -replace '^(.*DeviceName)[ ]*(.*)', '$1$2' |
Set-Content $filename
Reading the file as a single string (-Raw) and anchoring the expression at the beginning of the string (^) ensures that only the column title is replaced.
For large input files you may want to consider a different approach, though, since the above reads the entire file into memory before replacing the first line.
$infile = 'C:\path\to\input.csv'
$outfile = 'C:\path\to\output.csv'
$firstLine = $true
Get-Content $infile | % {
if ($firstLine) {
$_ -replace '(DeviceName)[ ]*', '$1'
$firstLine = $false
} else {
$_
}
} | Set-Content $outfile
Thinking about it some more and taking inspiration from a comment to #Zeek's answer, you could also extract the headers first and then convert the rest of the file.
$infile = 'C:\path\to\input.csv'
$outfile = 'C:\path\to\output.csv'
$header = (Get-Content $infile -First 1) -split '\s*,\s*'
Get-Content $infile |
select -Skip 1 |
ConvertFrom-Csv -Header $header |
Export-Csv $outfile -NoType

Is this all you're trying to do? This will give you a collection of objects imported from your csv file but trim the end of the DeviceName property on each object.
$items = Import-CSV -Path $csv
$items.ForEach({ $_.DeviceName = $_.DeviceName.TrimEnd() })

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008