How to read contents of a csv file inside zip file using PowerShell - csv

I have a zip file which contains several CSV files inside it. How do I read the contents of those CSV files without extracting the zip files using PowerShell?
I having been using the Read-Archive Cmdlet which is included as part of the PowerShell Community Extensions (PSCX)
This is what I have tried so far.
$path = "$env:USERPROFILE\Downloads\"
$fullpath = Join-Path $path filename.zip
Read-Archive $fullpath | Foreach-Object {
Get-Content $_.Name
}
But when I run the code, I get this error message
Get-Content : An object at the specified path filename.csv does not exist, or has been filtered by the -Include or -Exclude parameter.
However, when I run Read-Archive $fullpath, it lists all the file inside the zip file

There are multiple ways of achieving this:
1. Here's an example using Ionic.zip dll:
clear
Add-Type -Path "E:\sw\NuGet\Packages\DotNetZip.1.9.7\lib\net20\Ionic.Zip.dll"
$zip = [Ionic.Zip.ZipFile]::Read("E:\E.zip")
$file = $zip | where-object { $_.FileName -eq "XMLSchema1.xsd"}
$stream = new-object IO.MemoryStream
$file.Extract($stream)
$stream.Position = 0
$reader = New-Object IO.StreamReader($stream)
$text = $reader.ReadToEnd()
$text
$reader.Close()
$stream.Close()
$zip.Dispose()
It's picking the file by name (XMLSchema1.xsd) and extracting it into the memory stream. You then need to read the memory stream into something that you like (string in my example).
2. In Powershell 5, you could use Expand-Archive, see: https://technet.microsoft.com/en-us/library/dn841359.aspx?f=255&MSPPError=-2147217396
It would extract entire archive into a folder:
Expand-Archive "E:\E.zip" "e:\t"
Keep in mind that extracting entire archive is taking time and you will then have to cleanup the temporary files
3. And one more way to extract just 1 file:
$shell = new-object -com shell.application
$zip = $shell.NameSpace("E:\E.zip")
$file = $zip.items() | Where-Object { $_.Name -eq "XMLSchema1.xsd"}
$shell.Namespace("E:\t").copyhere($file)
4. And one more way using native means:
Add-Type -assembly "system.io.compression.filesystem"
$zip = [io.compression.zipfile]::OpenRead("e:\E.zip")
$file = $zip.Entries | where-object { $_.Name -eq "XMLSchema1.xsd"}
$stream = $file.Open()
$reader = New-Object IO.StreamReader($stream)
$text = $reader.ReadToEnd()
$text
$reader.Close()
$stream.Close()
$zip.Dispose()

Based on 4. solution of Andrey, I propose the following function:
(keep in mind that "ZipFile" class exists starting at .NET Framework 4.5)
Add-Type -assembly "System.IO.Compression.FileSystem"
function Read-FileInZip($ZipFilePath, $FilePathInZip) {
try {
if (![System.IO.File]::Exists($ZipFilePath)) {
throw "Zip file ""$ZipFilePath"" not found."
}
$Zip = [System.IO.Compression.ZipFile]::OpenRead($ZipFilePath)
$ZipEntries = [array]($Zip.Entries | where-object {
return $_.FullName -eq $FilePathInZip
});
if (!$ZipEntries -or $ZipEntries.Length -lt 1) {
throw "File ""$FilePathInZip"" couldn't be found in zip ""$ZipFilePath""."
}
if (!$ZipEntries -or $ZipEntries.Length -gt 1) {
throw "More than one file ""$FilePathInZip"" found in zip ""$ZipFilePath""."
}
$ZipStream = $ZipEntries[0].Open()
$Reader = [System.IO.StreamReader]::new($ZipStream)
return $Reader.ReadToEnd()
}
finally {
if ($Reader) { $Reader.Dispose() }
if ($Zip) { $Zip.Dispose() }
}
}

Related

Powershell not returning correct value

As some background, this should take an excel file, and convert it to PDF (and place the PDF into a temporary folder).
E.g. 'C:\Users\gjacobs\Desktop\test\stock.xlsx'
becomes
'C:\Users\gjacobs\Desktop\test\pdf_merge_tmp\stock.pdf'
However, the new file path does not return correctly.
If I echo the string $export_name from within the function, I can see that it has the correct value: "C:\Users\gjacobs\Desktop\test\pdf_merge_tmp\stock.pdf".
But once $export_name is returned, it has a different (incorrect value): "C:\Users\gjacobs\Desktop\test\pdf_merge_tmp C:\Users\gjacobs\Desktop\test\pdf_merge_tmp\stock.pdf".
function excel_topdf{
param(
$file
)
#Get the parent path
$parent = Split-Path -Path $file
#Get the filename (no ext)
$leaf = (Get-Item $file).Basename
#Add them together.
$export_name = $parent + "\pdf_merge_tmp\" + $leaf + ".pdf"
echo ($export_name) #prints without issue.
#Create tmp dir
New-Item -Path $parent -Name "pdf_merge_tmp" -ItemType "Directory" -Force
$objExcel = New-Object -ComObject excel.application
$objExcel.visible = $false
$workbook = $objExcel.workbooks.open($file, 3)
$workbook.Saved = $true
$xlFixedFormat = “Microsoft.Office.Interop.Excel.xlFixedFormatType” -as [type]
$workbook.ExportAsFixedFormat($xlFixedFormat::xlTypePDF, $export_name)
$objExcel.Workbooks.close()
$objExcel.Quit()
return $export_name
}
$a = excel_topdf -file 'C:\Users\gjacobs\Desktop\test\stock.xlsx'
echo ($a)
The issue you're experiencing is caused by the way how PowerShell returns from functions. It's not something limited to New-Item cmdlet. Every cmdlet which returns anything would cause function output being altered with the value from that cmdlet.
As an example, let's take function with one cmdlet, which returns an object:
function a {
Get-Item -Path .
}
$outputA = a
$outputA
#### RESULT ####
Directory:
Mode LastWriteTime Length Name
---- ------------- ------ ----
d--hs- 12/01/2021 10:47 C:\
If you want to avoid that, these are most popular options (as pointed out by Lasse V. Karlsen in comments):
# Assignment to $null (or any other variable)
$null = Get-Item -Path .
# Piping to Out-Null
Get-Item -Path . | Out-Null
NOTE: The behavior described above doesn't apply to Write-Host:
function b {
Write-Host "bbbbbb"
}
$outputB = b
$outputB
# Nothing displayed
Interesting thread to check if you want to learn more.

log file variables in function eat up all memory

i have the following simple function which is used several times in a script which iterates through a directory and checks the age of the files in it.
function log($Message) {
$logFilePath = 'C:\logPath\myLog.txt'
$date = Get-Date -Format 'yyyyMMddHHmmss'
$logMessage = "{0}_{1}" -f $date,$Message
if(Test-Path -Path $logFilePath) {
$logFileContent = Get-Content -Path $logFilePath
} else {
$logFileContent = ''
}
$logMessage,$logFileContent | Out-File -FilePath $logFilePath
}
i figured out that this eats up all ram. i don't understand why. i thought the scope of variables in a function are destroyed once the function is run. i fixed the ram issue by adding Remove-Variable logMessage,logFileContent,logFilePath,date to the very end of the function but would like to know how this ram issue could be solved otherwise and why the variables within the function are not destroyed automatically.
Powershell or .Net has a garbage collector, so freeing the memory isn't instant. Garbage Collection in Powershell to Speed Scripts Also the memory management is probably better in Powershell 7. I tried repeating your function many times, but the memory usage didn't go above a few hundred megs.
There's probably some more efficient .net way to prepend a line to a file: Prepend "!" to the beginning of the first line of a file
I have a weird way to prepend a line. I'm not sure how well this would work with a large file. With a 700 meg file, the Working Set memory stayed at 76 megs.
$c:file
one
two
three
$c:file = 'pre',$c:file
$c:file
pre
one
two
three
ps powershell
Handles NPM(K) PM(K) WS(K) CPU(s) Id SI ProcessName
------- ------ ----- ----- ------ -- -- -----------
607 27 67448 76824 1.14 3652 3 powershell
Although as commented, I can hardly believe that this function gobbles up your memory, you could optimize it:
function log ([string]$Message) {
$logFilePath = 'C:\logPath\myLog.txt'
# prefix the message with the current date
$Message = "{0:yyyyMMddHHmmss}_{1}" -f (Get-Date), $Message
if (Test-Path -Path $logFilePath -PathType Leaf) {
# newest log entry on top: append current content
$Message = "{0}`r`n{1}" -f $Message, (Get-Content -Path $logFilePath -Raw)
}
Set-Content -Path $logFilePath -Value $Message
}
I just want to rule out the RAM usage is caused by prepending to the file. Have you tried not storing the log contents in a variable? i.e.
$logMessage,(Get-Content -Path $logFilePath) | Out-File -FilePath $logFilePath
Edit 5/8/20 - Turns out that prepending (when done efficiently) isn't as slow as I thought - it is on the same order as using -Append. However the code (that js2010 pointed to) is long and ugly, but if you really need to prepend to the file, this is the way to do it.
I modified the OP's code a bit to automatically insert a new line.
function log-prepend{
param(
$content,
$filePath = 'C:\temp\myLogP.txt'
)
$file = get-item $filePath
if(!$file.exists){
write-error "$file does not exist";
return;
}
$filepath = $file.fullname;
$tmptoken = (get-location).path + "\_tmpfile" + $file.name;
write-verbose "$tmptoken created to as buffer";
$tfs = [System.io.file]::create($tmptoken);
$fs = [System.IO.File]::Open($file.fullname,[System.IO.FileMode]::Open,[System.IO.FileAccess]::ReadWrite);
try{
$date = Get-Date -Format 'yyyyMMddHHmmss'
$logMessage = "{0}_{1}`r`n" -f $date,$content
$msg = $logMessage.tochararray();
$tfs.write($msg,0,$msg.length);
$fs.position = 0;
$fs.copyTo($tfs);
}
catch{
write-verbose $_.Exception.Message;
}
finally{
$tfs.close();
$fs.close();
if($error.count -eq 0){
write-verbose ("updating $filepath");
[System.io.File]::Delete($filepath);
[System.io.file]::Move($tmptoken,$filepath);
}
else{
$error.clear();
[System.io.file]::Delete($tmptoken);
}
}
}
Here was my original answer that shows how to test the timing using a stopwatch.
When you prepend to a log file, you're reading the entire log file into memory, then writing it back.
You really should be using append - that would make the script run a lot faster.
function log($Message) {
$logFilePath = 'C:\logPath\myLog.txt'
$date = Get-Date -Format 'yyyyMMddHHmmss'
$logMessage = "{0}_{1}" -f $date,$Message
$logMessage | Out-File -FilePath $logFilePath -Append
}
Edit: To convince you that prepending to a log file is a bad idea, here's a test you can do on your own system:
function logAppend($Message) {
$logFilePath = 'C:\temp\myLogA.txt'
$date = Get-Date -Format 'yyyyMMddHHmmss'
$logMessage = "{0}_{1}" -f $date,$Message
$logMessage | Out-File -FilePath $logFilePath -Append
}
function logPrepend($Message) {
$logFilePath = 'C:\temp\myLogP.txt'
$date = Get-Date -Format 'yyyyMMddHHmmss'
$logMessage = "{0}_{1}" -f $date,$Message
if(Test-Path -Path $logFilePath) {
$logFileContent = Get-Content -Path $logFilePath
} else {
$logFileContent = ''
}
$logMessage,$logFileContent | Out-File -FilePath $logFilePath
}
$processes = Get-Process
$stopwatch = [system.diagnostics.stopwatch]::StartNew()
foreach ($p in $processes)
{
logAppend($p.ProcessName)
}
$stopwatch.Stop()
$stopwatch.Elapsed
$stopwatch = [system.diagnostics.stopwatch]::StartNew()
foreach ($p in $processes)
{
logPrepend($p.ProcessName)
}
$stopwatch.Stop()
$stopwatch.Elapsed
I've run this several times until I got a few thousand lines in the log file.
Going from: 1603 to 1925 lines, my results were:
Append: 7.0167008 s
Prepend: 21.7046793 s

Unable to combine all csv files using powershell

I would like to combine all the csv files in my local folder but it shows empty results. I am trying to take the header of the first file and skip all the headers in the rest of the files in the folder and join them.
get-childItem "C:\Users\*.csv" | foreach {[System.IO.File]::AppendAllText
("C:\Users\finalCSV.csv", [System.IO.File]::ReadAllText($_.FullName))}
$getFirstLine = $true
get-childItem "C:\Users\*.csv" | foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "C:\Users\finalCSV.csv" $linesToWrite
}
My end result is that when I open finalCSV.csv it shows no results.
I think you are trying to overwork your solution. Just use Import-Csv and append to an array. Something like this:
$a = #(); ls *.csv | % {$a += (Import-Csv $_.FullName)}; $a
Works even if the columns are in a different order.

Import CSV and updating specific lines

So I have a script that runs at logon to search for PST's on a users machine, then copies them to a holding area waiting for migration.
When the search/copy is complete it outputs to a CSV that looks something like this:
Hostname,User,Path,Size_in_MB,Creation,Last_Access,Copied
COMP1,user1,\\comp1\c$\Test PST.pst,20.58752,08/12/2015,08/12/2015,Client copied
COMP1,user1,\\comp1\c$\outlook\outlook.pst,100,08/12/2015,15,12,2015,In Use
The same logon script has an IF to import the CSV if the copied status is in use and makes further attempts at copying the PST into the holding area. If it's successful it exports the results to the CSV file.
My question is, is there anyway of getting it to either amend the existing CSV changing the copy status? I can get it to add the new line to the end, but not update.
This is my 'try again' script:
# imports line of csv where PST file is found to be in use
$PST_IN_USE = Import-CSV "\\comp4\TEMPPST\PST\$HOSTNAME - $USER.csv" | where { $_.copied -eq "In Use" }
ForEach ( $PST_USE in $PST_IN_USE )
{ $NAME = Get-ItemProperty $PST_IN_USE.Path | select -ExpandProperty Name
$NEW_NAME = $USER + "_" + $PST_IN_USE.Size_in_MB + "_" + $NAME
# attempts to copy the file to the pst staging area then rename it.
TRY { Copy-Item $PST_IN_USE.Path "\\comp4\TEMPPST\PST\$USER" -ErrorAction SilentlyContinue
Rename-Item "\\comp4\TEMPPST\PST\$USER\$NAME" -NewName $NEW_NAME
# edits the existing csv file replacing "In Use" with "Client Copied"
$PST_IN_USE.Copied -replace "In Use","Client Copied"
} # CLOSES TRY
# silences any errors.
CATCH { }
$PST_IN_USE | Export-Csv "\\comp4\TEMPPST\PST\$HOSTNAME - $USER.csv" -NoClobber -NoTypeInformation -Append
} # CLOSES ForEach ( $PST_USE in $PST_IN_USE )
This is the resulting CSV
Hostname,User,Path,Size_in_MB,Creation,Last_Access,Copied
COMP1,user1,\\comp1\c$\Test PST.pst,20.58752,08/12/2015,08/12/2015,Client copied
COMP1,user1,\\comp1\c$\outlook\outlook.pst,100,08/12/2015,15,12,2015,In Use
COMP1,user1,\\comp1\c$\outlook\outlook.pst,100,08/12/2015,15,12,2015,Client copied
It's almost certainly something really simple, but if it is, it's something I've yet to come across in my scripting. I'm mostly working in IF / ELSE land at the moment!
If you want to change the CSV file, you have to write it completely again, not just appending new lines. In your case this means:
# Get the data
$data = Import-Csv ...
# Get the 'In Use' entries
$inUse = $data | where Copied -eq 'In Use'
foreach ($x in $inUse) {
...
$x.Copied = 'Client Copied'
}
# Write the file again
$data | Export-Csv ...
The point here is, you grab all the lines from the CSV, modify those that you process and then write the complete collection back to the file again.
I've cracked it. It's almost certainly a long winded way of doing it, but it works and is relatively clean too.
#imports line of csv where PST file is found to be in use
$PST_IN_USE = Import-CSV "\\comp4\TEMPPST\PST\$HOSTNAME - $USER.csv" | where { $_.copied -eq "In Use" }
$PST_IN_USE | select -ExpandProperty path | foreach {
# name of pst
$NAME = Get-ItemProperty $_ | select -ExpandProperty Name
# size of pst in MB without decimals
$SIZE = Get-ItemProperty $_ | select -ExpandProperty length | foreach { $_ / 1000000 }
# path of pst
$PATH = $_
# new name of pst when copied to the destination
$NEW_NAME = $USER + "_" + $SIZE + "_" + $NAME
TRY { Copy-Item $_ "\\comp4\TEMPPST\PST\$USER" -ErrorAction SilentlyContinue
TRY { Rename-Item "\\comp4\TEMPPST\PST\$USER\$NAME" -NewName $NEW_NAME -ErrorAction SilentlyContinue | Out-Null }
CATCH { $NEW_NAME = "Duplicate exists" }
$COPIED = "Client copied" }
CATCH { $COPIED = "In use" ; $NEW_NAME = " " }
$NEW_FILE = Test-Path "\\comp4\TEMPPST\PST\$HOSTNAME - $USER 4.csv"
IF ( $NEW_FILE -eq $FALSE )
{ "Hostname,User,Path,Size_in_MB,Creation,Last_Access,Copied,New_Name" |
Set-Content "\\lccfp1\TEMPPST\PST\$HOSTNAME - $USER 4.csv" }
"$HOSTNAME,$USER,$PATH,$SIZE,$CREATION,$LASTACCESS,$COPIED,$NEW_NAME" |
Add-Content "\\comp4\TEMPPST\PST\$HOSTNAME - $USER 4.csv"
} # CLOSES FOREACH #
$a = Import-CSV "\\comp4\TEMPPST\PST\$HOSTNAME - $USER.csv" | where { $_.copied -ne "in use" }
$b = Import-Csv "\\comp4\TEMPPST\PST\$HOSTNAME - $USER 4.csv"
$a + $b | export-csv "\\comp4\TEMPPST\PST\$HOSTNAME - $USER 8.csv" -NoClobber -NoTypeInformation
Thanks for the help. Sometimes it takes a moments break and a large cup of coffee to see things a different way.

Powershell WebClient DownloadFile Exception Illegal Characters in Path

I am trying to download zip files from an FTP site, based off retrieving a directory list to find file names.
Download Portion:
$folderPath='ftp://11.111.11.11/'
$target = "C:\Scripts\ps\ftpdl\"
Foreach ($file in ($array | where {$_ -like "data.zip"})) {
$Source = $folderPath+$file
$Path = $target+$file
#$Source = "ftp://11.111.11.11/data.zip"
#$Path = "C:\Scripts\ps\ftpdl\data.zip"
$source
Write-Verbose -Message $Source -verbose
$path
Write-Verbose -message $Path -verbose
$U = "User"
$P = "Pass"
$WebClient2 = New-Object System.Net.WebClient
$WebClient2.Credentials = New-Object System.Net.Networkcredential($U, $P)
$WebClient2.DownloadFile( $source, $path )
}
If I use the commented out and define the string it downloads correctly. But if I run it as shown I receive the exception error illegal characters in path. Interestingly enough, there is a difference between write-verbose and not.
Output when run as shown:
ftp://11.111.11.11/data.zip
data.zip
C:\Scripts\ps\ftpdl\data.zip
data.zip
Exception calling "DownloadFile" with "2" .........
Output when run with hard coded path & source
ftp://11.111.11.11/data.zip
VERBOSE: ftp://11.111.11.11/data.zip
C:\Scripts\ps\ftpdl\data.zip
VERBOSE: C:\Scripts\ps\ftpdl\data.zip
And the file downloads nicely.
Well, of course once I post the question I figured it out. My $array contained `n and `r characters. I needed to find and replace both of them out.
$array=$array -replace "`n",""
$array=$array -replace "`r",""