How to compare, match, and append multiple values in multiple CSV files?

How to compare, match, and append multiple values in multiple CSV files? - csv

I'm trying to figure out the best way to do this, and I'm not sure how to Import-Csv with 2 different files through the same pipeline and export a value found...
So lets start with CSV file 1: I only want the values for LoginNumber where Type = H and (ContractorDomain -ne $null -or ContractorDomain -ne ""). For example, this should only pull values 0031482 and 2167312 from below.
Note: I only added spaces and arrows to make it easier to read as columns here. The csv files have no spaces between the column values or arrows.
"LoginNumber","Type","ContractorDomain"
"0031482" ,"H" ,"P12345" <<
"1251632" ,"P" ,"A52671"
"2167312" ,"H" ,"425126" <<
"0598217" ,"L" ,""
"1405735" ,"H" ,""
"2058194" ,"A" ,"L21514"
When the value number for LoginNumber (based on conditions explained above) is found, search for it in CSV file 2. Then grab the value of AccountStatus and SamAccountName for the respective value of UserIDNumber.
"SamAccountName","UserIDNumber","AccountDescriptionDetails","AccountStatus"
"jd12395" ,"0052142" ,"Company CEO" ,"Enabled"
"jwet" ,"2167312" ,"Software Developer" ,"Disabled" <<
"1b3gas5" ,"1385293" ,"Project Manager" ,"Disabled"
"632g1fsa" ,"0031482" ,"QA Tester" ,"Enabled" <<
"4126hs" ,"0000418" ,"Program Manager" ,"Disabled"
"axv" ,"1840237" ,"Accountant Administrator" ,"Disabled"
For the 3rd CSV file we have the following:
"domainName","SameAccountName","DateExpired"
"TempDomain","jwet" ,"20151230" <<
"PermDomain","p21942" ,""
"PermDomain","qz231034" ,""
"TempDomain","632g1fsa" ,"20151231" <<
"TempDomain","ru20da2bb22" ,"20160425"
Next, for the 3rd file, I want to add the column to plug in the Disabled and Enabled values (or User Match Not Found value):
"domainName","SameAccountName","DateExpired","UserStatus"
"TempDomain","jwet" ,"20151230" ,"Disabled" <<
"PermDomain","p21942" ,"" ,"User Match Not Found"
"PermDomain","qz231034" ,"" ,"User Match Not Found"
"TempDomain","632g1fsa" ,"20151231" ,"Enabled" <<
"TempDomain","ru20da2bb22" ,"20160425" ,"User Match Not Found"
I learned how to import-csv and create new columns with something like this...
Import-Csv $file | Select-Object -Property *, #{Name="UserStatus";Expression={
if ($true) {"fill value in here"}
}} | Export-Csv $newFile -NoType
So I'm thinking something like this. I'm just not sure how to search/find/pass multiple CSV files values through the pipeline.
Note: some of these CSV files have like 15 columns before and after the columns we are searching for. Also, some of the columns values have a comma, so I can't really rely on the -Delimiter ,. Also, some of the column values do not have " (if you were to open the CSV in txt format).

Columns containing commas shouldn't be an issue if the values are properly quoted (i.e. if the CSV is valid). Import-Csv will correctly import a record 42,"a,b",c as three values 42, a,b and c. If your CSV isn't well-formed: fix that first.
Fetch the login IDs from the first CSV file:
$logins = Import-Csv 'C:\path\to\file1.csv' |
Where-Object { $_.Type -eq 'H' -and $_.ContractorDomain } |
Select-Object -Expand LoginNumber
You can simplify the ContractorDomain property check to just $_.ContractorDomain, because PowerShell interprets both an empty string and $null as a boolean value $false in that context. The same would happen for other zero or empty values (0, 0.0, empty array, etc.), but that shouldn't be an issue in your scenario.
Next create a hashtable mapping account names to their respective status. Filter the imported second CSV by the list of IDs you created before, so the hashtable contains only relevant mappings.
$accountStatus = #{}
Import-Csv 'C:\path\to\file2.csv' | Where-Object {
$logins -contains $_.UserIDNumber
} | ForEach-Object {
$accountStatus[$_.SamAccountName] = $_.AccountStatus
}
With that hashtable you can now add the UserStatus column to your third CSV:
(Import-Csv 'C:\path\to\file3.csv') |
Select-Object -Property *, #{n='UserStatus';e={
if ($accountStatus.ContainsKey($_.SameAccountName)) {
$accountStatus[$_.SameAccountName]
} else {
'User Match Not Found'
}
}} | Export-Csv 'C:\path\to\file3.csv' -NoType
The parentheses around the Import-Csv statement ensure that the file is completely read and closed before Export-Csv starts writing to it. They're only required if you're writing the modified data back to the same file and can be omitted otherwise. The asterisk selects all imported columns, and the additional calculated property adds the new column you want to include.

Related

use where-object to find data, but want to add data to every row also the export to csv

Hi I have a script that reads a csv file, creates a json file, checks the users in the file against a service, then i get the result as a json file.
I take that result and finds the users i csv file and creates a new file.
I do that with a where-object
But i need to add some extra values on every user before i export it to csv
This is my 2 lines for finding users and then export
$matches = $users | where-object { $_.number -in $response.allowedItemIds } | Select-Object -Property Number,Surname,Forename,Emailaddress
$matches | Export-Csv -path $Saved$savefile -NoTypeInformation -Append
Is that possible or do i need to do a for each?
Cheers

Assuming I've interpretted your question correctly, you should be able to use PowerShell's Calculated Properties for this purpose.
For example, if you wanted to add a field called "Date" and set the current Date/Time to each user row, you could do the following:
$matches = $users | where-object { $_.number -in $response.allowedItemIds } | Select-Object -Property Number,Surname,Forename,Emailaddress, #{Name="Date";Expression={Get-Date}}
The Expression value can either be a static value such as "StaticValue", a variable such as $i (useful if used as part of a loop, for example) or more complex value that is returned from other cmdlets (as in my example above)

Transform exported CSV (includes embedded JSON) and save relevant columns and keys in new CSV file - Powershell

I am currently writing a script for the following problem:
Initial Problem
Data was exported from an Audit System into a CSV. The CSV itself consists of several columns of which one column has JSON data inside. Sadly there arent many options to influence the export / structure of the export. Since the amount of data included there is tough to filter and to analyse, the exported CSVs (when needed) need to be transformed so that only relevant columns and JSON keys are remaining within the new to-be-exported CSV. It has to be a new CSV as the file needs to potentially be shared. A textfile to-be-imported contains the relevant JSON keys that should remain in the to-be-exported CSV.
About the JSON: The keys can vary based on the events that are exported. Lets say there are 3-4 different variants but the textfile to be imported contains for all 3-4 keys the relevant subkeys that need to be included as new column in the export. If the subkey does not exist its okay if that particular column is empty in the export.
Initial Thoughts
a) Import the CSV and the file that is listing the relevant JSON keys that should be kept
b) Expand the JSON
c) Select the JSON entries that are relevant
d) Merge everything into new columns
e) Export again into a new file
What Questions are open / Where are the Problems?
I was writing some piece of code (I started my PS experience just 2 days ago) and encountered/wondering the following:
Are there any recommendations to improve the code since I am sure to do my very recent PS adventure there are probably many obvious things that have to be improved
Is there a way to make the export straight into a CSV format without the manual -join and then using Out-File?. I noticed that for my final test cases (I cannot share those because the data is extremely hard to anonymize) I didnt manage to come up with a delimiter (tried ",", ";" and "´t") that doesnt seem to be included in parts of the imported cells. Excel (when importing from text) doesnt seem to have an issue tho loading and parsing the data as CSV and to recognize the columns and boundaries correctly.
Happy to hear any tips!
Current Code
### Configure variables
$inputPath = "C:\Users\test\Downloads\inputTest.csv"
$formatTemplate = "C:\Users\test\Downloads\templateTest.txt"
$outputPath = "C:\Users\test\Downloads\outputTest.csv"
### Load the columns from template file to perform transformation depending on the required AuditData Fields. The file contains a list of relevant JSON keys from the Audit Data columns
$selectedAuditDataFields = Get-Content $formatTemplate
### Load CSV, select needed columns and expand the JSON entries within the AuditData column
$importCsvCompact = Import-Csv -Path $inputPath -Delimiter "," | Select-Object -Property CreationDate, UserIds, Operations, #{name = "AuditData"; Expression = {$_.AuditData | ConvertFrom-Json }}
### Calculate the number of Rows (import and export CSV have same number of rows) and Columns (3 standard columns plus template columns) for the CSV to be exported
$exportCsvNumberOfRows = $importCsvCompact.Count
$exportCsvNumberOfColumns = $selectedAuditDataFields.Length + 3
### Add header to to-be-exported-CSV
$header = [object[]]::new($exportCsvNumberOfColumns);
$header[0] = "CreationDate"
$header[1] = "UserIds"
$header[2] = "Operations"
for($columnIncrement = 3; $columnIncrement -ne $exportCsvNumberOfColumns; $columnIncrement++) {
$header[$columnIncrement] = ($selectedAuditDataFields[$columnIncrement-3])
}
$toAppend = $header -join ","
$toAppend | Out-File -FilePath $outputPath -Append
### initiate array for each transformed row and initiate counter of current row
$processingRowCounter = 0
### traverse each row of the CSV import and setup the new column structure
### connect the 3 standard columns with a subset of the expanded JSON entries (based on the imported template)
$importCsvCompact | ForEach-Object {
$csvArrayColumn = [object[]]::new($exportCsvNumberOfColumns);
$csvArrayColumn[0] = $importCsvCompact.CreationDate[$processingRowCounter]
$csvArrayColumn[1] = $importCsvCompact.UserIds[$processingRowCounter]
$csvArrayColumn[2] = $importCsvCompact.Operations[$processingRowCounter]
for($columnIncrement = 3; $columnIncrement -ne $exportCsvNumberOfColumns; $columnIncrement++) {
$csvArrayColumn[$columnIncrement] = $importCsvCompact.AuditData.($selectedAuditDataFields[$columnIncrement-3])[$processingRowCounter]
}
$processingRowCounter++
$directExport = $csvArrayColumn -join ","
$directExport | Out-File -FilePath $outputPath -Append
Write-Host "Processed $processingRowCounter Rows..."
}
Testfiles
templateTest.txt
https://easyupload.io/vx7k75
inputTest.csv
https://easyupload.io/ab77q9
Current Version based on Feedback
### Configure variables
$inputPath = "C:\Users\forstchr\Downloads\inputTest.csv"
$formatTemplate = "C:\Users\forstchr\Downloads\templateTest.txt"
$outputPath = "C:\Users\forstchr\Downloads\outputTest.csv"
### Load the columns from template file to perform transformation depending on the required AuditData Fields. The file contains a list of relevant JSON keys from the Audit Data columns
$selectedAuditDataFields = Get-Content $formatTemplate
### Calculate the number of Rows (import and export CSV have same number of rows) and Columns (3 standard columns plus template columns) for the CSV to be exported
$exportCsvNumberOfRows = $importCsvCompact.Count
$exportCsvNumberOfAuditColumns = $selectedAuditDataFields.Length
###Load CSV, select needed columns and expand the JSON entries within the AuditData column
Import-csv -Path $inputPath -Delimiter "," | Select-Object -Property CreationDate, UserIds, Operations, #{name = "AuditData"; Expression = {$_.AuditData | ConvertFrom-Json }} | % {
[pscustomobject]#{
'CreationDate' = $_.CreationDate
'UserIds' = $_.UserIds
'Operations' = $_.Operations
# the next part is not correct but hopefully displays what I am trying to achieve with the JSON in the AuditData column
for($auditFieldIncrement = 0; $auditFieldIncrement -ne $exportCsvNumberOfAuditColumn; $auditFieldIncrement++) {
'$selectedAuditDataFields[$auditFieldIncrement]' = $_.AuditData.($selectedAuditDataFields[$auditFieldIncrement])
}
}
} | Export-csv $outputPath

I have had to produce a "cleansed" csv file in one project. My general approach was as follows: import the existing csv data, and send it through the pipeline.
Foreach-object, do some processing, storing the results in variables. The last step in processing creates a hashtable typecast as a pscustomobject, and this result in passed through the pipeline. The output of the second pipeline is fed to Export-csv. Export-csv does all the joining and the commas for me, and also encloses the output fields in quotes, making them strings.
Here is a code snippet that illustrates the approach. The cleansing consists of reformatting dates so that they use a standard 14 digit format, and reformatting currency amounts so that they don't contain dollar signs. But that is not relevant to you.
Import-csv checking.csv | % {
$balance += [decimal]$(Get-Amount $_.AMOUNT)
[pscustomobject]#{
'TRNTYPE' = Get-Type $_.AMOUNT
'DTPOSTED' = (Get-Date $_.DATE).Tostring('yyyyMMddHHmmss')
'TRNAMT' = Get-Amount $_.AMOUNT
'FITID' = $fitid++ #this is a stopgap
'NAME' = $_.DESCRIPTION
'MEMO' = $memo
}
} |
Export-csv transactions.csv
Get-Type is a function that yields 'CREDIT' or 'DEBIT' depending on the sign of the amount. Get-Amount is a function that gives a numeric amount without commas and dollar signs. Those functions are defined at the beginning of the script. Note that, when you call a powershell function, there are no parentheses involved. That was a big jolt to me, but it's actually a feature of powershell.

Matching values in hash tables while ignoring case sensitivity betwen two CSV files

I'm trying to append a column by seeing if the value of the column from CSV file2 is contained in CSV file1.
I have a CSV file1 (test1.csv):
csv1ColumnOne,csv1ColumnTwo
1,dF3aWv
2,
3,ka21p
4,NAE31
5,dsafl
6,nv02k
7,qng02
8,xcw3r
9,dF3aW
I have a CSV file2 (test2.csv):
csv2ColumnOne,csv2ColumnTwo
bbetfe,DF3AW
asdf,dsafl
qwer,
zxcv,NAE31
poiu,nbrwp1
Given the following code...
$hashTable = #()
Import-Csv C:\path\test1.csv | ForEach-Object {
$hashTable[$_.csv1ColumnOne] = $_.csv1ColumnTwo
}
(Import-Csv C:\path\test2.csv) |
Select-Object -Property *, #{n='csv1ColumnThree';e={
if ($hashTable.ContainsKey($_.csv2ColumnTwo)) {
$_.csv2ColumnTwo
} elseif (-not ($_.csv2ColumnTwo)) {
'No value found from csv file2'
} else {
'No value found from csv file1'
}
}} | Export-Csv "C:\path\testresults.csv" -NoType
The results look like this:
csv2ColumnOne,csv2ColumnTwo,csv1ColumnThree
bbetfe,DF3AW,"No value found from csv file1"
asdf,dsafl,dsafl
qwer,,No value found from csv file2
zxcv,NAE31,NAE31
poiu,nbrwp1,"No value found from csv file1"
When instead it should look like this:
csv2ColumnOne,csv2ColumnTwo,csv1ColumnThree
bbetfe,DF3AW,dF3aW
asdf,dsafl,dsafl
qwer,,"No value found from csv file2"
zxcv,NAE31,NAE31
poiu,nbrwp1,"No value found from csv file1"
The reason I see bbetfe,DF3AW,"No value found from csv file1" ins't bbetfe,DF3AW,dF3aW is because of the case sensitivity of the value. Anyway to ignore the case sensitivity with alpha-numeric values?

Lookups with ContainsKey() already are case-insensitive. You're just using the wrong data structure, and using it in the wrong way, too.
If you want to look up a key in a hashtable you need to actually use the data you want to look up as the key of the hashtable:
$hashTable[$_.csv1ColumnTwo] = $_.csv1ColumnOne
For looking up something in the values of a hashtable use ContainsValue().
However, since you just want to check if the second column of the first CSV contains a the value from the second column of the second CSV you don't need a hashtable in the first place. A simple array will suffice.
$list = Import-Csv 'C:\path\test1.csv' | Select-Object -Expand csv1ColumnTwo
Import-Csv 'C:\path\test2.csv' |
Select-Object -Property *, #{n='csv1ColumnThree';e={
if ($list -contains $_.csv2ColumnTwo) {
$_.csv2ColumnTwo
} elseif (-not ($_.csv2ColumnTwo)) {
'No value found from csv file2'
} else {
'No value found from csv file1'
}
}} | Export-Csv 'C:\path\testresults.csv' -NoType
If you don't want empty strings "found" in the second CSV simply exclude the element from $list:
$list = Import-Csv 'C:\path\test1.csv' |
Select-Object -Expand csv1ColumnTwo |
Where-Object { $_ } # allow only non-empty values
Not every problem is a nail, so don't try to fix everything with a hammer.

To avoid having to convert the string to lower case, just use the -icontains comparison operator (the "i" means case insenstive comparison):
So instead of
If ($hashTable.ContainsKey($_.csv2ColumnTwo)){
try this:
If ($hashTable.keys -icontains $_.csv2ColumnTwo){

can you just make them all lowercase?
$a = ipcsv 'C:\path\test1.csv'
$a | % {$_.csv1columntwo = $_.csv1columntwo.tolower()}
$a
$b = ipcsv 'C:\path\test2.csv'
$b | % {$_.csv2ColumnOne = $_.csv2ColumnOne.tolower(); $_.csv2ColumnTwo = $_.csv2ColumnTwo.tolower()}
$b

Ansgar basically had the right answer, but with a bug. it was printing the row in the second file as qwer,, when instead it should have printed qwer,,No value found from csv file2. There is another condition that needed to be added in the first if statement as shown below.
$list = Import-Csv 'C:\path\test1.csv' | Select-Object -Expand csv1ColumnTwo
Import-Csv 'C:\path\test2.csv' |
Select-Object -Property *, #{n='csv1ColumnThree';e={
if (($list -contains $_.csv2ColumnTwo) -and ($_.csv2ColumnTwo)) {
$_.csv2ColumnTwo
} elseif (-not ($_.csv2ColumnTwo)) {
'No value found from csv file2'
} else {
'No value found from csv file1'
}
}} | Export-Csv 'C:\path\testresults.csv' -NoType
The empty values in the 2nd file were being checked as true, so the elseif was never being reached.

powershell: compare specific columns of CSV files and return entire row of differences

I have 2 CSV files I'd like to compare. They both have multiple columns of different data but they also both have a column with IP addresses. Call them $log1 and $log2
I am trying to compare the files. If an IP from $log1 is found in $log2 I would like an output file that has the entire row of data from the match of $log2...
When I use:
Compare-Object -Property 'IP address' -ReferenceObject $log1 -DifferenceObject $log2
It returns only the 'IP address' column and the SideIndicator.
I think I'm barking up the wrong tree here, can anyone offer some advice?

I would try another approach :
$log1,$log2 | group-object -Property 'IP address' | where {$_.count -eq 2}
In the result you will find a group property with the two lines.

"Try adding the -PassThru flag to your command."
- Dave Sexton
This works. I exported the results to CSV and sorted by the SideIndicator when I opened the file (don't think you can get PS to sort by SideIndicator).
Thanks Dave.
There's probably more ways to accomplish this, as noted by others but this achieved my goal.

This script will compare your both csv files and writhe output for each double ip address found.
#import your csv files
$csv1 = Import-Csv "C:\Users\Admin\Desktop\csv\csv1.csv"
$csv2 = Import-Csv "C:\Users\Admin\Desktop\csv\csv2.csv"
#Compare both csv files (.ip corresponds to your column name for your ip address in the csv file )
$compare = Compare-Object $csv1.ip $csv2.ip -IncludeEqual | ? {$_.SideIndicator -eq "=="}
if ($compare) {
foreach ($ip in $compare.InputObject) {
Write-Output "IP $ip exist in both csv files"
}
}
Else
{
Write-Output "Double ip not found"
}

delete the last column from a csv file in powershell

I am new to powershell. Currently we are in need of a poweshell script to compare two large (100000 rows and n columns (n > 300, also column headers are Dates corresponding to each wednesday). The value of n keeps on incrementing each week in the file. We need to compare the files (current week and last week), and need to make sure that the only difference between the two files is the last column.
I have gone through some Forums and Blogs and I could do only Little due to my ignorance.
If there is a way to drop the last column from a csv file in powershell, we may be able to make use of the below script below to compare the previous week's file and the current week's file after droping the last column from current week's file.
It would be really helpful if someone can help me here with your hard earned knowledge
[System.Collections.ArrayList]$file1Array = Get-Content "C:\Risk Management\ref_previous.csv"|Sort-Object
[System.Collections.ArrayList]$file2Array = Get-Content "C:\Risk Management\ref_current.csv"|Sort-Object
$matchingEntries = #()
foreach ($entry in $file1Array) {
if ($file2Array.Contains($entry)) {
$matchingEntries += $entry
}
}
foreach ($entry in $matchingEntries){
$file1Array.Remove($entry)
$file2Array.Remove($entry)
}
Cheers,
Anil

Assuming that the column name you want to exclude is LastCol (adjust to your actual column name):
$previous = Import-csv "C:\Risk Management\ref_previous.csv" | Select-Object -Property * -ExcludeProperty LastCol | Sort-Object;
$current = Import-csv "C:\Risk Management\ref_current.csv" | Sort-Object;
Compare-Object $previous $current;
This will drop the last column from each of the input files and indicate whether the remaining content differs.

Based on the answer that alroc gave, you should be able to get the last column name using a split operation on the first line of the CSV file, and then using that on the -ExcludeProperty parameter.
However, the Compare-Object command on this doesn't work for me, but it does pull back the right data into each variable.
$CurrentFile = "C:\Temp\Current.csv"
$PreviousFile = "C:\Temp\Previous.csv"
$CurrentHeaders = gc $CurrentFile | Select -First 1
$CurrentHeadersSplit = $CurrentHeaders.Split(",")
$LastColumn = $CurrentHeadersSplit[-1] -Replace '"'
$Current = Import-Csv $CurrentFile | Select -Property * -ExcludeProperty $LastColumn | Sort-Object
$Previous = Import-Csv $PreviousFile | Sort-Object
Compare-Object $Current $Previous

The import-csv and export-csv both give the opportunity to exclude columns.
The import-csv has the -header option and you simply name the incoming headers and exclude the last columns header. If there are 10 columns, only name 9. The last column will be excluded.
For export-csv, select the columns you'd like to write out ( |select col1,col2,col3|export-csv... ) and don't select the column you're trying to exclude.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to compare, match, and append multiple values in multiple CSV files? - csv

Related

use where-object to find data, but want to add data to every row also the export to csv

Transform exported CSV (includes embedded JSON) and save relevant columns and keys in new CSV file - Powershell

Matching values in hash tables while ignoring case sensitivity betwen two CSV files

powershell: compare specific columns of CSV files and return entire row of differences

delete the last column from a csv file in powershell

Categories

Resources