PowerShell: Get 2 strings into a hashtable and out to .csv - csv

PowerShell newbie here,
I need to:
Get text files in recursive local directories that have a common string, students.txt in them.
Get another string, gc.student="name,name" in the resulting file set from #1 and get the name(s).
Put the filename from #1, and just the name,name from #2 (not gc.student="") into a hashtable where the filename is paired with its corresponding name,name.
Output the hashtable to an Excel spreadsheet with 2 columns: File and Name.
I've figured out, having searched and learned here and elsewhere, how to output #1 to the screen, but not how to put it into a hashtable with #2:
$documentsfolder = "C:\records\"
foreach ($file in Get-ChildItem $documentsfolder -recurse | Select String -pattern "students.txt" ) {$file}
I'm thinking to get name in #2 I'll need to use a RegEx since there might only be 1 name sometimes.
And for the output to Excel, this: | Export-Csv -NoType output.csv
Any help moving me on is appreciated.

I think this should get you started. The explanations are in the code comments.
# base directory
$documentsfolder = 'C:\records\'
# get files with names ending with students.txt
$files = Get-ChildItem $documentsfolder -recurse | Where-Object {$_.Name -like "*students.txt"}
# process each of the files
foreach ($file in $files)
{
$fileContents = Get-Content $file
$fileName = $file.Name
#series of matches to clean up different parts of the content
#first find the gc.... pattern
$fileContents = ($fileContents | Select-String -Pattern 'gc.student=".*"').Matches.Value
# then select the string with double quotes
$fileContents = ($fileContents | Select-String '".*"').Matches.Value
# then remove the leading and trailing double quotes
$fileContents = $fileContents -replace '^"','' -replace '"$',''
# drop the objects to the pipeline so that you can pipe it to export-csv
# I am creating custom objects so that your CSV headers will nave nice column names
Write-Output [pscustomobject]#{file=$fileName;name=$fileContents}
} | Export-Csv -NoType output.csv

Related

What is the most efficient way to replace all \ with \\, within a huge JSON File?

I have to replace all occurrences of \ with \\ within a huge JSON Lines File. I wanted to use Powershell, but there might be other options too.
The source file is 4.000.000 lines and is about 6GB.
The Powershell script I was using took too much time, I let it run for 2 hours and it wasn't done yet. A performance of half an hour would be acceptable.
$Importfile = "C:\file.jsonl"
$Exportfile = "C:\file2.jsonl"
(Get-Content -Path $Importfile) -replace "[\\]", "\\" | Set-Content -Path $Exportfile
If the replacement is simply a conversion of a single backslash to a a double backslash, the file can be processed row by row.
Using a StringBuilder puts data into a memory buffer, which is flushed on disk every now and then. Like so,
$src = "c:\path\MyBigFile.json"
$dst = "c:\path\MyOtherFile.json"
$sb = New-Object Text.StringBuilder
$reader = [IO.File]::OpenText($src)
$i = 0
$MaxRows = 10000
while($null -ne ($line = $reader.ReadLine())) {
# Replace slashes
$line = $line.replace('\', '\\')
# ' markdown coloring is confused by backslash-apostrophe
# so here is an extra one just for looks
[void]$sb.AppendLine($line)
++$i
# Write builder contents into file every now and then
if($i -ge $MaxRows) {
add-content $dst $sb.ToString() -NoNewline
[void]$sb.Clear()
$i = 0
}
}
# Flush the builder after the while loop if there's data
if($sb.Length -gt 0) {
add-content $dst $sb.ToString() -NoNewline
}
$reader.close()
Use -ReadCount parameter for Get-Content cmdlet (and set it to 0).
-ReadCount
Specifies how many lines of content are sent through the pipeline at a
time. The default value is 1. A value of 0 (zero) sends all of the
content at one time.
This parameter does not change the content displayed, but it does
affect the time it takes to display the content. As the value of
ReadCount increases, the time it takes to return the first line
increases, but the total time for the operation decreases. This can
make a perceptible difference in large items.
Example (runs cca 17× faster for a file cca 20MB):
$file = 'D:\bat\files\FileTreeLista.txt'
(Measure-Command {
$xType = (Get-Content -Path $file ) -replace "[\\]", "\\"
}).TotalSeconds, $xType.Count -join ', '
(Measure-Command {
$yType = (Get-Content -Path $file -ReadCount 0) -replace "[\\]", "\\"
}).TotalSeconds, $yType.Count -join ', '
Get-Item $file | Select-Object FullName, Length
13,3288848, 338070
0,7557814, 338070
FullName Length
-------- ------
D:\bat\files\FileTreeLista.txt 20723656
Based on the your earlier question How can I optimize this Powershell script, converting JSON to CSV?. You should try to use the PopwerShell pipeline for this, especially as it concerns large input and output files.
The point is that you shouldn't focus on single parts of the solution to determine the performance because this usually leaves wrong impression as the performance of a complete (PowerShell) pipeline solution is supposed to be better than the sum of its parts. Besides it saves a lot of memory and result is a lean PowerShell syntax...
In your specific case, if correctly setup, the CPU will replacing the slashes, rebuilds the json strings and converting it to objects while the harddisk is busy reading and writing the data...
To implement the replacement of the slashes into the PowerShell pipeline together with the ConvertFrom-JsonLines cmdlet:
Get-Content .\file.jsonl | ForEach-Object { $_.replace('\', '\\') } |
ConvertFrom-JsonLines | ForEach-Object { $_.events.items } |
Export-Csv -Path $Exportfile -NoTypeInformation -Encoding UTF8

PowerShell to prettify multiple JSON files in one folder

My goal is to prettify all the JSON files (about 70K of them) in one folder.
Say I have one folder named "juicero" and in there there are about 70K .JSON files with different names -
a.json, b.json
This is what I tried -
PS> $files = Get-ChildItem C:\users\gamer\desktop\juicero\*.json
PS> $json_test = (Get-Content $files -raw | ConvertFrom-Json)
PS> foreach ($file in $files) { ConvertTo-Json | Set-Content $files }
I thought it would iterate through the path and prettify these (pretty straight-forward logic) but for some reason, this is deleting the content of the files. If I don't iterate and just use this function on one .json file it works - so I'm guessing there is something wrong with the iteration logic?
You need to work on files inside the loop, like that:
foreach ($file in $files) {
$content = Get-Content $file -Raw | ConvertFrom-Json
$newFilePath = $file.FullName.Replace("OldFolder","NewFolder")
ConvertTo-Json -InputObject $content| Set-Content $newFilePath
}
Notice that I'm putting output files into new folder, for easier debugging in case any issues.
There's one more issue with your code. Here you're converting all the files at once:
$json_test = (Get-Content $files -raw | ConvertFrom-Json)
However, later on, PowerShell has no information about source file name (file name is not included in $json_test).

DOS to UNIX conversion

I have this code to remove return carriage (^M) characters to be absorbed by Unix. The below codes works, but I can't figure out how to:
loop through a number of CSVs (5), effectively using the for loop
replace the existing files with the new files
$csv = (Get-Content -Raw *.csv) -replace "`r`n","`n"
[io.file]::WriteAllText('C:\Powershell\test.csv', $csv)
The code you posted will take all CSV files and concatenate them to a single output file. You need enumerate and process the files individually. There's also no need to collect the content in a variable. Just pipe the modified content into Set-Content.
Get-ChildItem 'C:\some\folder' -Filter *.csv | ForEach-Object {
(Get-Content -Raw $_.FullName) -replace "`r`n", "`n" | Set-Content $_.FullName
}

Find and Replace many Items with Powershell from Data within a CSV, XLS or two txt documents

So I recently have found the need to do a find and replace of mutliple items within a XML document. Currently I have found the code below which will allow me to do multiple find and replaces but these are hard coded within the powershell.
(get-content c:\temp\report2.xml) | foreach-object {$_ -replace "192.168.1.1", "Server1"} | foreach-object {$_ -replace "192.168.1.20", "RandomServername"} | set-content c:\temp\report3.xml
Ideally instead of hard coding the value I would like to find and replace from a list, ideally in a CSV or and XLSX. Maybe two txt file would be easier.
If it was from a CSV it could grab the value to find from A1 and the value to replace it with from B1 and keep looping down until the values are empty.
I understand I would have to use the get-content and the for each command I was just wondering if this was possible and how to go about it/ if anybody could help me.
Thanks in advance.
SG
#next line is to clear output file
$null > c:\temp\report3.xml
$replacers = Import-Csv c:\temp\replaceSource.csv
gc c:\temp\aip.xml | ForEach-Object {
$output = $_
foreach ($r in $replacers) {
$output = $output -replace $r.ReplaceWhat, $r.ReplaceTo
}
#the output has to be appended, not to rewrite everything
return $output | Out-File c:\temp\report3.xml -Append
}
Content of replaceSource.csv looks like:
ReplaceWhat,ReplaceTo
192.168.1.1,server1
192.168.1.20,SERVER2
Note the headers

Import-Csv TrimEnd Column Header

I need import a CSV and run it through a foreach loop. I want to trim the end on the column header DeviceName to avoid any potential issues. I have tried the following but it is not working as expected.
$Import = Import-CSV $csv
foreach ($i in ($import.DeviceName).TrimEnd())
{do something}
Any help? Thank you!
If you need to change both the header and the content in the column for devicename which has spaces I have come up with this forgiving code.
$csvData = import-csv $csv
$properties = $csvData[0].psobject.Properties.name
$csvHeader = "`"$(($properties | ForEach-Object{$_.Trim()}) -join '","')`""
$deviceHeader = $properties -match "DeviceName"
$csvHeader
$csvHeader | Set-Content $file
$csvData | ForEach-Object{
$_.$deviceHeader = ($_.$deviceHeader).trim()
$_
} | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1 | Add-Content $file
What this does is read in the CSV like normal. Parse the property names of the object in the order they appear. We find the one that has DeviceName no matter how many spaces (if there is more that one you could have a problem). Keep that so we can use it to call the correct property of each "row".
Export the new cleaned header to the file. Then we go through each "row" removing all the leading and trailing space from the DeviceName. Once that is done write back the CSV to the original file.
The best solution would be to tell the other team to fix their generation procedure. However, if for some reason that's not an option, I'd recommend pre-processing the file before you import it as a CSV.
$filename = 'C:\path\to\your.csv'
(Get-Content $filename -Raw) -replace '^(.*DeviceName)[ ]*(.*)', '$1$2' |
Set-Content $filename
Reading the file as a single string (-Raw) and anchoring the expression at the beginning of the string (^) ensures that only the column title is replaced.
For large input files you may want to consider a different approach, though, since the above reads the entire file into memory before replacing the first line.
$infile = 'C:\path\to\input.csv'
$outfile = 'C:\path\to\output.csv'
$firstLine = $true
Get-Content $infile | % {
if ($firstLine) {
$_ -replace '(DeviceName)[ ]*', '$1'
$firstLine = $false
} else {
$_
}
} | Set-Content $outfile
Thinking about it some more and taking inspiration from a comment to #Zeek's answer, you could also extract the headers first and then convert the rest of the file.
$infile = 'C:\path\to\input.csv'
$outfile = 'C:\path\to\output.csv'
$header = (Get-Content $infile -First 1) -split '\s*,\s*'
Get-Content $infile |
select -Skip 1 |
ConvertFrom-Csv -Header $header |
Export-Csv $outfile -NoType
Is this all you're trying to do? This will give you a collection of objects imported from your csv file but trim the end of the DeviceName property on each object.
$items = Import-CSV -Path $csv
$items.ForEach({ $_.DeviceName = $_.DeviceName.TrimEnd() })