Splitting one line of a Csv into multiple lines in PowerShell

Splitting one line of a Csv into multiple lines in PowerShell - csv

I have a Csv which looks something like this:
No,BundleNo,Grossweight,Pieces,Tareweight,Netweight
1,Q4021317/09,193700,1614,646,193054
2,Q4021386/07,206400,1720,688,205712
What I first need to do is do some maths with the Netweight column to get two values, $x and $y.
Then I need to split each line in the Csv, into $x lines of data, where each line of data will look like "AL,$y"
In the below example, I get the values of $x and $y for each row successfully. The trouble comes when trying to then split each row into $x rows...:
$fileContent = Import-Csv $File
$x = ( $fileContent | ForEach-Object { ( [System.Math]::Round( $_.Netweight /25000 ) ) } )
$y = ( $fileContent | ForEach-Object { $_.Netweight / ( [System.Math]::Round( $_.Netweight /25000 ) ) } )
$NumRows = ( $fileContent | ForEach-Object { (1..$x) } )
$rows = ( $NumRows | ForEach-Object { "Al,$y" } )
This code works as I would hope when there is one row of data in the Csv. I.e. if $x = 8 and $y = 20, it would return 8 rows of data which look like "AL,20".
However, I get an error message when there is more than row in the Csv:
Cannot convert the "System.Object[]" value of type "System.Object[]" to type "System.Int32".
I hope I've explained that OK and any help is appreciated.
Thanks,
John

Instead of using ForEach-Object over and over again, just iterate over the csv once and generate your $x and $y results one at a time:
$fileContent = #'
No,BundleNo,Grossweight,Pieces,Tareweight,Netweight
1,Q4021317/09,193700,1614,646,193054
2,Q4021386/07,206400,1720,688,205712
'# | ConvertFrom-Csv
foreach($line in $fileContent){
$x = [System.Math]::Round( $line.Netweight / 25000 )
if($x -ne 0){
$y = $line.Netweight / $x
1..$x |ForEach-Object {"AL,$y"}
}
}
Resulting in $x number of "AL,$y" strings per row:
AL,24131.75
AL,24131.75
AL,24131.75
AL,24131.75
AL,24131.75
AL,24131.75
AL,24131.75
AL,24131.75
AL,25714
AL,25714
AL,25714
AL,25714
AL,25714
AL,25714
AL,25714
AL,25714

Related

Efficient code to do a countif, ranking on huge csv file

need help to accelerate the run of below perl code. 4 to 6hours should be fine. faster is better :)
csv file is about 14m to 14.5m rows, and aorund 1100 to 1500columns; 62gig
what it does:
do a count (like a countif in excel)
get the percent (based on 14m rows)
get the rank based on count
My current code:
use List::MoreUtils qw(uniq);
$x="Room_reserve";
$in = "D:\\package properties\\${x}.csv";
$out = "D:\\package properties\\output\\${x}_output.csv";
open($fh, '<', $in) or die "Could not open file '$file' $!";
#data = <$fh>;
close($fh);
%counts;
#columns;
$first = 1;
#counter
foreach $dat (#data) {
chomp($dat);
#rows = split(',',$dat);
if ($first == 1) {
$first = 0;
next;
}
else {
$count = 1;
foreach $i (0..$#rows) {
if ( exists($columns[$i]{$rows[$i]}) ) {
$columns[$i]{$rows[$i]}++;
}
else {
$columns[$i]{$rows[$i]} = int($count);
}
}
}
}
#output
$first = 1;
open($fh, '>', $out) or die "Could not open file '$file' $!";
foreach $dat (#data) {
chomp($dat);
#rows = split(',',$dat);
foreach $i (0..$#rows) {
if ($i > 6) {
#for modifying name
if ( $first == 1 ) {
$line = join( ",", "Rank_$rows[$i]", "Percent_$rows[$i]",
"Count_$rows[$i]", $rows[$i]);
print $fh "$line,";
if ( $i == $#rows ) {
$first = 0;
}
}
else {
#dat_val = reverse sort { $a <=> $b } values %{$columns[$i]};
%ranks = {};
$rank_cnt = 0;
foreach $val (#dat_val) {
if ( ! exists($ranks{$val}) ) {
$rank_cnt++;
}
$ranks{$val} = $rank_cnt;
}
$rank = $ranks{$columns[$i]{$rows[$i]}};
$cnt = $columns[$i]{$rows[$i]};
$ave = ($cnt / 14000000) * 100;
$line = join( ",", $rank, $ave, $cnt, $rows[$i]);
print $fh "$line,";
}
}
else {
print $fh "$rows[$i],";
}
}
print $fh "\n";
}
close($fh);
thanks in advance.
my table
Col
_1
_2
_3
_4
_5
_6
Col2
Col3
Col
Col5
FALSE
1
2
3
4
5
6
6
6
1
4
FALSE
1
2
3
4
5
6
6
6
1
4
FALSE
1
2
3
4
5
7
6
6
1
3
edited to show sample table and correct $x
##Sample output
Col
_1
_2
_3
_4
_5
_6
Col2
rank_Col2
percent_rank_Col2
count_Col2
Col3
rank_Col3
percent_rank_Col3
count_Col3
Col
rank_Col
percent_rank_Col
count_Col
Col5
rank_Col5
percent_rank_Col5
count_Col5
FALSE
1
2
3
4
5
6
9
2
0.33
1
6
1
0.67
2
1
1
0.67
2
11
1
0.33
1
FALSE
1
2
3
4
5
6
6
1
0.67
2
6
1
0.67
2
2
2
0.33
1
4
1
0.33
1
FALSE
1
2
3
4
5
7
6
1
0.67
2
4
2
0.33
1
1
1
0.67
2
3
1
0.33
1

Presume you have this file:
% ls -lh file
-rw-r--r-- 1 dawg wheel 57G Jul 24 13:15 file
% time wc -l file
14000000 file
wc -l file 29.24s user 7.27s system 99% cpu 36.508 total
% awk -F, 'FNR==1{print $NF; exit}' file
1099
So we have a 57GB file with 14,000,000 line by 1099 col csv with random numbers.
It only takes 20 to 30 SECONDS to read the entire file in a line-by-line fashion.
How long in Perl?
% time perl -lnE '' file
perl -lnE '' file 5.70s user 9.67s system 99% cpu 15.381 total
So only 15 seconds in Perl line by line. How long to 'gulp' it?
% time perl -0777 -lnE '' file
perl -0777 -lnE '' file 12.13s user 23.86s system 98% cpu 36.688 total
But that is on THIS computer which has 255GB of RAM...
It took this Python script approximately 23 minutes to write that file:
total=0
cols=1100
with open('/tmp/file', 'w') as f_out:
for cnt in range(14_000_000):
line=','.join(f'{x}' for x in range(cols))
total+=cols
f_out.write(f'{cnt},{total},{line}\n')
The issue with your Perl script is you are gulping the whole file (with #data = <$fh>;) and the OS maybe is running out of RAM. It is likely swapping and this is very slow. (How much RAM do you have?)
Rewrite the script to do it line by line. You should be able to do your entire analysis in less than 1 hour.

Can you try the following script? It precomputes the ranks instead of recalculating them for each row. It also avoids saving the first 7 columns in #columns since they appear not to be used for anything.
use feature qw(say);
use strict;
use warnings;
{
my $x="Room_reserve";
my $in = "D:\\package properties\\${x}.csv";
my $out = "D:\\package properties\\output\\${x}_output.csv";
my $min_col = 7;
my ($data1, $data2) = read_input_file($in, $min_col);
my $columns = create_counter($data2);
my $ranks = create_ranks( $columns);
write_output($data1, $data2, $columns, $ranks, $min_col, $out);
}
sub create_counter {
my ($data) = #_;
print "Creating counter..";
my #columns;
my $first = 1;
for my $row (#$data) {
if ( $first ) {
$first = 0; next; # skip header
}
for my $i (0..$#$row) {
$columns[$i]{$row->[$i]}++;
}
}
say "done.";
return \#columns;
}
sub create_ranks {
my ( $columns ) = #_;
print "Creating ranks..";
my #ranks;
for my $col (#$columns) {
# sort the column values according to highest frequency.
my #freqs = sort { $b <=> $a } values %$col;
my $idx = 1;
my %ranks = map {$_ => $idx++} #freqs;
push #ranks, \%ranks;
}
say "done.";
return \#ranks;
}
sub read_input_file {
my ($in, $min_col) = #_;
print "Reading input file $in ..";
open(my $fh, '<', $in) or die "Could not open file '$in' $!";
my #data1;
my #data2;
while( my $line = <$fh> ) {
chomp $line;
my #fields = split(',', $line);
next if #fields == 0;
die "Unexpected column count at line $.\n" if #fields < $min_col;
push #data1, [#fields[0..($min_col-1)]];
push #data2, [#fields[$min_col..$#fields]];
}
say " $. lines.";
close($fh);
return \#data1, \#data2;
}
sub write_output {
my ($data1, $data2, $columns, $ranks, $min_col, $out) = #_;
say "Saving output file $out..";
open(my $fh, '>', $out) or die "Could not open file '$out': $!";
for my $i (0..$#$data1) {
my $row1 = $data1->[$i];
my $row2 = $data2->[$i];
if ($i == 0) {
print $fh join ",", #$row1;
print $fh ",";
for my $j (0..$#$row2) {
print $fh join ",", $row2->[$j],
"Rank_" . $row2->[$j],
"Percent_". $row2->[$j],
"Count_" . $row2->[$j];
print $fh "," if $j < $#$row2;
}
print $fh "\n";
next;
}
print $fh join ",", #$row1;
print $fh ",";
for my $j (0..$#$row2) {
my $cnt = $columns->[$j]{$row2->[$j]};
my $rank = $ranks->[$j]{$cnt};
my $ave = ($cnt / 14000000) * 100;
print $fh join ",", $row2->[$j], $rank, $ave, $cnt;
print $fh "," if $j < $#$row2;
}
print $fh "\n";
}
close $fh;
say "Done.";
}

TCL how to traverse and print all values for a variable having string and list

I have a 3rd party APIs that gives me below output:
puts [GetDesc $desc " "] #prints below data
#A_Name 9023212134(M) emp#121 M { 41 423 }
How can I access all the token of the value printed and the list { 41 423 }?

The output is a list of 5 items, where the last is a list of two elements. You extract elements in a list using lindex:
set var {A_Name 9023212134(M) emp#121 M { 41 423 }}; # A_Name 9023212134(M) emp#121 M { 41 423 }
lindex $var 0; # A_Name
lindex $var 4; # 41 423 (Note: leading and trailing spaces are preserved)
lindex $var 4 0; # 41
lindex $var 4 1; # 432

You can iterate over the values in the result with foreach:
foreach value [GetDesc $desc " "] {
puts ">>> $value <<<"
}
This will print something like this (note the extra spaces with the last item; they're part of the value):
>>> A_Name <<<
>>> 9023212134(M) <<<
>>> emp#121 <<<
>>> M <<<
>>> 41 423 <<<
Another approach is to use lassign to put those values into variables:
lassign [GetDesc $desc " "] name code1 code2 code3 pair_of_values
Then you can work with $pair_of_values easily enough on its own.

ConvertTo-Json unboxing single item

I am working in PowerShell with a hashtable like so:
When I convert it to JSON notice that the "oranges" key does not contain brackets:
I have tried to accommodate for this when I create my hashtable by doing something like this:
foreach ($Group in ($input | Group fruit)) {
if ($Group.Count -eq 1) {
$hashtable[$Group.Name] = "{" + ($Group.Group | Select -Expand number) + "}"
} else {
$hashtable[$Group.Name] = ($Group.Group | Select -Expand number)
}
}
Which looks fine when I output it as a hashtable but then when I convert to JSON I get this:
I am trying to get that single item also surrounded in []. I found a few things here and one of them took me to this:
https://superuser.com/questions/414650/why-does-powershell-silently-convert-a-string-array-with-one-item-to-a-string
But I don't know how to target just that one key when it only contains a single item.

You want to ensure that all hashtable values are arrays (that is what the curly brackets in the hashtable output and the square brackets in the JSON mean).
Change this code:
if ($Group.Count -eq 1) {
$hashtable[$Group.Name] = "{" + ($Group.Group | Select -Expand number) + "}"
} else {
$hashtable[$Group.Name] = ($Group.Group | Select -Expand number)
}
into this:
$hashtable[$Group.Name] = #($Group.Group | Select -Expand number)
and the problem will disappear.

Piping an empty object has a count of 1

I can't seem to get this function quite right. I want to pass it an object and if the object is empty, return 1, else count items in the object and increment by 1.
Assuming the following function "New-Test":
function New-Test
{
[cmdletbinding()]
Param
(
[Parameter(ValueFromPipeline=$true,ValueFromPipelineByPropertyName=$true)]
[object[]]$Object
#[object]$object
)
Begin
{
$oData=#()
}
Process
{
"Total objects: $($object.count)"
if($Object.count -gt 0)
{
$oData += [pscustomobject]#{
Name = $_.Name
Value = $_.Value
}
}
Else
{
Write-Verbose "No existing object to increment. Assuming first entry."
$oData = [pscustomobject]#{Value = 0}
}
}
End
{
$LatestName = ($oData | Sort-Object -Descending -Property Value | Select -First 1).value
[int]$intNum = [convert]::ToInt32($LatestName, 10)
$NextNumber = "{0:00}" -f ($intNum+1)
$NextNumber
}
}
And the following test hashtable:
#Create test hashtable:
$a = 00..08
$obj = #()
$a | foreach-object{
$obj +=[pscustomobject]#{
Name = "TestSting" + "{0:00}" -f $_
Value = "{0:00}" -f $_
}
}
As per the function above, if I pass it $Obj, I get:
$obj | New-Test -Verbose
Total objects: 1
Total objects: 1
Total objects: 1
Total objects: 1
Total objects: 1
Total objects: 1
Total objects: 1
Total objects: 1
Total objects: 1
09
Which is as expected. However, if I pass it $Obj2:
#Create empty hash
$obj2 = $null
$obj2 = #{}
$obj2 | New-Test -Verbose
I get:
Total objects: 1
Exception calling "ToInt32" with "2" argument(s): "Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: startIndex"
At line:33 char:9
+ [int]$intNum = [convert]::ToInt32($LatestName, 10)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ArgumentOutOfRangeException
01
I don't understand why $object.count is 1, when there's nothing in the hashtable.
If I change the parameter, $object's type from [object[]] to [object], the empty hashtable test results in:
$obj2 | New-Test -Verbose
Total objects: 0
VERBOSE: No existing object to increment. Assuming first entry.
01
Which is what I'd expect, however, if I run the first test, it results in:
$obj | New-Test -Verbose
Total objects:
VERBOSE: No existing object to increment. Assuming first entry.
Total objects:
VERBOSE: No existing object to increment. Assuming first entry.
This time $objects has nothing in it.
I'm sure it's simple, but I can't fathom this one out. Any help is appreciated.
P.S. PowerShell 5.1

$obj2 is a hashtable, not an array. Hashtables are not enumerated by default, so the hashtable itself is the one object. If you want to loop through an hashtable using the pipeline you need to use $obj2.GetEnumerator().
#{"hello"="world";"foo"="bar"} | Measure-Object | Select-Object Count
Count
-----
1
#{"hello"="world";"foo"="bar"}.GetEnumerator() | Measure-Object | Select-Object Count
Count
-----
2

Compare two csv files and deduct matches from original

Given two csv files:
File1.csv
SKU,Description,UPC
101,Saw,101010103
102,Drill,101010102
103,Screw,101010101
104,Nail,101010104
File2.csv
SKU,Description,UPC
100,Light,101010105
101,Saw,101010103
104,Nail,101010104
106,Battery,101010106
108,Bucket,101010114
I'd like to create a new csv file, we'll call UpdatedList.csv, that has every entry from File1.csv minus any rows where the SKU is in both File1.csv and File2.csv. In this case UpdatedList.csv will look like
UpdatedList.csv
"SKU","Description","UPC"
"102","Drill","101010102"
"103","Screw","101010101"
The following code will do what I want but I believe there is a more efficient way. How can I do this without loops? My code is as follows.
#### Create a third file that has all elements of file 1 minus those in file 2 ###
$FileName1 = Get-FileName "C:\LowInventory"
$FileName2 = Get-FileName "C:\LowInventory"
$f1 = ipcsv $FileName1
$f2 = ipcsv $FileName2
$f3 = ipcsv $FileName1
For($i=0; $i -lt $f1.length; $i++){
For($j=0; $j -lt $f2.length; $j++){
if ($f1[$i].SKU -eq $f2[$j].SKU){$f3[$i].SKU = 0}
}
}
$f3 | Where-Object {$_.SKU -ne "0"} | epcsv "C:\LowInventory\UpdatedList.csv" -NoTypeInformation
Invoke-Item "C:\LowInventory\UpdatedList.csv"
################################

You can do this without loops by taking advantage of the Group-Object cmdlet:
$f1 = ipcsv File1.csv;
$f2 = ipcsv File2.csv;
$f1.ForEach({Add-Member -InputObject $_ 'X' 0}) # So we can select these after
$f1 + $f2 | # merge our lists
group SKU | # group by SKU
where {$_.Count -eq 1} | # select ones with unique SKU
select -expand Group | # ungroup
where {$_.X -eq 0} # where from file1

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Splitting one line of a Csv into multiple lines in PowerShell - csv

Related

Efficient code to do a countif, ranking on huge csv file

TCL how to traverse and print all values for a variable having string and list

ConvertTo-Json unboxing single item

Piping an empty object has a count of 1

Compare two csv files and deduct matches from original

Categories

Resources