Compare two csv files and deduct matches from original - csv

Given two csv files:
File1.csv
SKU,Description,UPC
101,Saw,101010103
102,Drill,101010102
103,Screw,101010101
104,Nail,101010104
File2.csv
SKU,Description,UPC
100,Light,101010105
101,Saw,101010103
104,Nail,101010104
106,Battery,101010106
108,Bucket,101010114
I'd like to create a new csv file, we'll call UpdatedList.csv, that has every entry from File1.csv minus any rows where the SKU is in both File1.csv and File2.csv. In this case UpdatedList.csv will look like
UpdatedList.csv
"SKU","Description","UPC"
"102","Drill","101010102"
"103","Screw","101010101"
The following code will do what I want but I believe there is a more efficient way. How can I do this without loops? My code is as follows.
#### Create a third file that has all elements of file 1 minus those in file 2 ###
$FileName1 = Get-FileName "C:\LowInventory"
$FileName2 = Get-FileName "C:\LowInventory"
$f1 = ipcsv $FileName1
$f2 = ipcsv $FileName2
$f3 = ipcsv $FileName1
For($i=0; $i -lt $f1.length; $i++){
For($j=0; $j -lt $f2.length; $j++){
if ($f1[$i].SKU -eq $f2[$j].SKU){$f3[$i].SKU = 0}
}
}
$f3 | Where-Object {$_.SKU -ne "0"} | epcsv "C:\LowInventory\UpdatedList.csv" -NoTypeInformation
Invoke-Item "C:\LowInventory\UpdatedList.csv"
################################

You can do this without loops by taking advantage of the Group-Object cmdlet:
$f1 = ipcsv File1.csv;
$f2 = ipcsv File2.csv;
$f1.ForEach({Add-Member -InputObject $_ 'X' 0}) # So we can select these after
$f1 + $f2 | # merge our lists
group SKU | # group by SKU
where {$_.Count -eq 1} | # select ones with unique SKU
select -expand Group | # ungroup
where {$_.X -eq 0} # where from file1

Related

Remove mismatch and add missing json jq

Hi I have two file for example:
file1.json
{
"id": "001",
"name" : "my_policy",
"list_1": ["111111111", "22222222","33333333"],
"list_2": ["a", "b","c"],
.....
}
Then I have file2.json (not always has the same field as f1)
{
"list_1": ["111111111","111111122","33333333"],
"list_2": ["a", "b","c","d","e"],
.....
}
How I can via jq merge same keys values in the two file json and in addiction to the merge operation remove from file1 keys the values non present in file2 ?
So get this result:
{
"id": "001",
"policy" : "my_policy",
"list_1": ["111111111","111111122","33333333"],
"list_2": ["a", "b","c","d","e"],
.....
}
I solved the merge operation via:
jq -s 'reduce .[] as $item ({}; reduce ($item | keys_unsorted[]) as $key (.; $item[$key] as $val | ( $val | type) as $ type | .[$key] = if ( $type == "array") then (.[$key] + $val | unique) elif ($type == "object") then (.[$key] + $val) else $val end ))' file1.json f2.json
How I can solve? Or is impossible via jq?
It is quite simple once you figure out how to find the difference among two list items and add/unique them. One way would be to
jq --slurpfile s 2.json -f script.jq 1.json
Where my script contents are
#!/usr/bin/env jq -f
# backup list_1, list_2 from 1.json
.list_1 as $l1 |
.list_2 as $l2 |
# Perform the operation of removing file1 keys not present in 2.json
# for both list_1 and list_2
( ( $l1 - ( $l1 - $s[].list_1 ) ) + $s[].list_1 | unique ) as $f1 |
( ( $l2 - ( $l2 - $s[].list_2 ) ) + $s[].list_2 | unique ) as $f2 |
# Update the original result 1.json with the modified content
.list_1 |= $f1 |
.list_2 |= $f2
or directly from the command line as
jq --slurpfile s 2.json '
.list_1 as $l1 |
.list_2 as $l2 |
( ( $l1 - ( $l1 - $s[].list_1 ) ) + $s[].list_1 | unique ) as $f1 |
( ( $l2 - ( $l2 - $s[].list_2 ) ) + $s[].list_2 | unique ) as $f2 |
.list_1 |= $f1 |
.list_2 |= $f2
' 1.json

Powershell - CSV - Header - Save

I'm reading a fixed-width file with 4000 rows though substrings, and assigning each substring to a header in a csv. But I'm not sure how to save the csv.
An example row am reading:
$line = ABC 7112123207/24/16Smith Timpson Head Coach 412-222-0000 00011848660 ELl CAAN HIGH SCHOOL 325 N Peal AVE. Smith Timpson Head Coach COLORADO CITY AZ 86021 01 FALL MALE 07/29/16EQ15031 1977904 BUDDY'S ALL STARS INC. BUDDY ALL STARS N V12V70R16 1.00V12V70R16
I've the csv with the headers.
$csvheaders = import-csv temp.csv
foreach ($Line in (Get-Content $FILE.FullName))
{
foreach($csh in $csvheaders)
{
$csh.GROUP = $line.Substring(0,10).Trim()
$csh.NUMBER = $line.Substring(10,8).Trim()
$csh.DATE=$line.Substring(18,8).Trim()
$csh.CONTACT_FIRST=$line.Substring(26,35).Trim()
$csh.CONTACT_LAST=$line.Substring(61,35).Trim()
}
}
I would need the csv output as:
Group Number Date Contact_First Contact_Last
ABC 71121232 07/24/16 Smith Timpson
There is a Export-Csv cmdlet:
Get-Content $FILE.FullName | ForEach-Object {
[PSCustomObject]#{
Group = $_.Substring(0,10).Trim()
Number = $_.Substring(10,8).Trim()
Date = $_.Substring(18,8).Trim()
Contact_First = $_.Substring(26,35).Trim()
Contact_Last = $_.Substring(61,35).Trim()
}
} | Export-Csv -Path 'Your_Output_Path.csv' -NoTypeInformation
Note: You probably need to specify a tab delimiter for the Export-Csv cmdlet.

System Object Error when exporting results to CSV

I am trying to export my results from a compare-object into a csv but I get an error When I export it. It looks ok when I just call it in excel. My guess is whenever there is a output of more than one value the error is placed instead of the value.
Here are my csvs
past.csv
VKEY
V-12345
V-23456
V-1111
current.csv
VKEY
V-12345
V-6789
V-23456
V-256
My new csv should say
Past, Current
V-6789,V-1111
V-256
What I am getting now is
Past, Current
System.Object[],#{vkey=V-1111}
.
$Past = Import-CSV "past.csv"
$Current = Import-CSV "Current.csv"
$pastchange = Compare-Object $Past $Current -Property vkey | Where-Object {$_.SideIndicator -eq '=>'} | Select-Object VKEY
$currentchange = Compare-Object $Past $Current -Property vkey | Where-Object {$_.SideIndicator -eq '<='} | Select-Object VKEY
$obj = New-Object PSObject
$obj | Add-Member NoteProperty Past $pastchange
$obj | Add-Member NoteProperty Current $currentchange
$obj | Export-Csv "ChangeResults.csv" -NoTypeInformation
That System.Object[] displayed in $obj.Past column is simply an array of custom objects similar to #{vkey=V-1111} in $obj.Past column. Proof:
PS D:\PShell> $obj
$obj.Past.Gettype() | Format-Table
$obj.Current.Gettype()
"---"
$obj.Past | ForEach-Object { $_.Gettype() }
Past Current
---- -------
{#{vkey=V-6789}, #{vkey=V-256}} #{vkey=V-1111}
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False PSCustomObject System.Object
---
True False PSCustomObject System.Object
True False PSCustomObject System.Object
My solution makes use of ArrayList Class (.NET Framework):
$csvOutFile = "d:\test\ChangeResults.csv" # change to fit your circumstances
$PastInFile = "d:\test\past.csv"
$CurrInFile = "d:\test\curr.csv"
$Past = Import-CSV $PastInFile
$Curr = Import-CSV $CurrInFile
# compare CSV files and convert results to arrays
$PastCh=#(, <# always return an array #>
$( Compare-Object $Past $Curr -Property vkey |
Where-Object { $_.SideIndicator -eq '=>' } ) |
ForEach-Object { $_ | Select-Object -ExpandProperty vkey }
)
$CurrCh=#(, <# even if no SideIndicator matches #>
$( Compare-Object $Past $Curr -Property vkey |
Where-Object { $_.SideIndicator -eq '<=' } ) |
ForEach-Object { $_ | Select-Object -ExpandProperty vkey }
)
[System.Collections.ArrayList]$csvout = New-Object System.Collections.ArrayList($null)
$auxHash = #{} # an auxiliary hash object
$max = ($CurrCh.Count, $PastCh.Count | Measure-Object -Maximum).Maximum
for ($i=0; $i -lt $max; $i++) {
Try { $auxHash.Past = $PastCh.GetValue($i) } Catch { $auxHash.Past = '' }
Try { $auxHash.Curr = $CurrCh.GetValue($i) } Catch { $auxHash.Curr = '' }
$csvout.Add((New-Object PSObject -Property $auxHash)) > $null
}
$csvout | Format-Table -AutoSize # show content: 'variable $csvout'
$csvout | Export-Csv $csvOutFile -NoTypeInformation
Get-Content $csvOutFile # show content: "output file $csvOutFile"
Output:
PS D:\PShell> D:\PShell\SO\37753277.ps1
Past Curr
---- ----
V-6789 V-1111
V-256
"Past","Curr"
"V-6789","V-1111"
"V-256",""
PS D:\PShell>
Here is an alternative for Try … Catch blocks:
<# another approach instead of `Try..Catch`:
if ($i -lt $PastCh.Count) { $auxHash.Past = $PastCh.GetValue($i)
} else { $auxHash.Past = '' }
if ($i -lt $CurrCh.Count) { $auxHash.Curr = $CurrCh.GetValue($i)
} else { $auxHash.Curr = '' }
#>

Getting blanks while reading a CSV

I am trying to read a csv file and store in a hasmap. Below is the code I am using.
$data | ForEach-Object {
$ht = #{}
$_.psobject.Properties |
#Get only the grouped properties (that have a number at the end)
Where-Object { $_.Name -match '\d+$' } |
#Group properties by param/group number
Group-Object {$_.Name -replace '\w+(\d+)$', '$1' } | ForEach-Object {
$param = $_.Group | Where-Object { $_.Name -match 'param' }
$value = $_.Group | Where-Object { $_.Name -match 'value' }
#If property has value
if($value.value -ne ""){
#Add to hashtable
$ht.add($param.Value,$value.Value)
}
}
$ht
}
Below is the output for $ht. I am getting 1 $null value for one of the field OrgId.
Name Value
---- -----
{orgId, } {1000002, $null}
type CSVFile
codepage MS1252
agentId 00000208000000000002
name infa_param_file_Pravakar
dateFormat MM/dd/yyyy HH:mm:ss
database C:\\Program Files\\Informatica Cloud Secure A
Sample CSV:
"param1","value1","param2","value2","param3","value3","param4","value4","param5","value5","param6","value6","param7","value7","param8","value8","param9","value9","param10","value10","param11","value11"
"orgId","000002","name","infa_param_file_Pravakar","agentId","00000208000000000002","dateFormat","MM/dd/yyyy HH:mm:ss","database","C:\\Program Files\\Informatica Cloud Secure Agent\\main\\rdtmDir\\userparameters","codepage","MS1252","type","CSVFile","","","","","","","",""

Powershell Script to Delete Blank Columns from CSV

Powershell Script to Delete Blank Columns from CSV
I have a spread sheet which I'm importing into a MySQL database, the import fails because of blank columns in the spread sheet.
Is there a powershell script I can run / create that will check any given CSV file and remove blank columns?
Col1,Col2,Col3,Col4,,,,
Val1,Val2,Val3,Val4
How about something like this:
$x = Import-Csv YourFile.csv
$f = $x[0] | Get-Member -MemberType NoteProperty | Select name
$f | Add-Member -Name count -Type NoteProperty -Value 0
$f | %{
$n = $_.Name
$_.Count = #($x | Select $n -ExpandProperty $n | ? {$_ -ne ''}).count
}
$f = #($f | ? {$_.count -gt 0} | Select Name -expandproperty Name)
$x | Select $f | Export-Csv NewFile.csv -NoTypeInformation
It uses Get-Member to get the column names, cycles though each one to check how many are not blank and then uses the results in a select.
When I run Dave Sexton's code, I get:
Select-Object : Cannot convert System.Management.Automation.PSObject to one of the following
types {System.String, System.Management.Automation.ScriptBlock}.
At line:15 char:12
+ $x | Select <<<< $f | Export-Csv ColsRem.test.$time.csv -NoTypeInformation
+ CategoryInfo : InvalidArgument: (:) [Select-Object], NotSupportedException
+ FullyQualifiedErrorId :
DictionaryKeyUnknownType,Microsoft.PowerShell.Commands.SelectObjectCommand
I corrected this issue by adding one more line, to force each array element to be a string.
$x = Import-Csv YourFile.csv
$f = $x[0] | Get-Member -MemberType NoteProperty | Select name
$f | Add-Member -Name count -Type NoteProperty -Value 0
$f | %{
$n = $_.Name
$_.Count = #($x | Select $n -ExpandProperty $n | ? {$_ -ne ''}).count
}
$f = #($f | ? {$_.count -gt 0} | Select Name -expandproperty Name)
# I could get the select to work with strings separated by commas, but the array would
# always produce the error until I added the following line, explicitly changing the
#values to strings.
$f = $f | Foreach-Object { "$_" }
$x | Select $f | Export-Csv NewFile.csv -NoTypeInformation
My import CSV contains a few hundred columns and about half likely won't be populated, so getting rid of the extra columns was necessary. Now I just need to figure out how to counteract the unintended re-ordering of the columns into alphabetical order by name, without changing the names.