Kamis, 10 November 2016

PowerShell v3 Find Duplicate lines in File

On occasion I need to track down duplicate entries in a file. Without going through a bunch of mechanics, I found this approach useful, and, most importantly, easy. First, we will create a dummy array and store the contents in a temp file:
# Create temp file with dummy data including duplicate lines
1,2,3,4,1,2,3,1,2,1 |
Out-File -FilePath ($tempfile = [IO.Path]::GetTempFileName) -Encoding ASCII -Append
Next, we get the data into an array. Interestingly, Get-Content does this for you without any extra work:
# Get file contents into an array
$filecontents = Get-Content -Path $tempfile
Once we have an array, which is verifiable by using this command:
$filecontents.GetType()

IsPublic IsSerial Name                                     BaseType                                                                     
-------- -------- ----                                     --------                                                                      
True     True     Object[]                                 System.Array
we can use the Group-Object (or group alias) with a Where-Object (or where alias) cmdlet pattern to find collections (or groupings) with more than 1 entry. In essence, this is a set of lines (or array entries) where more than 1 entry exists per group:
# Find duplicates
$filecontents |
Group |
Where {$_.count -gt 1}
When this gets run, it shows results:
Count Name                      Group                                                                                                   
----- ----                      -----                                                                                                   
    4 1                         {1, 1, 1, 1}                                                                                            
    3 2                         {2, 2, 2}                                                                                               
    2 3                         {3, 3} 
To finalize this sample, remove the temp file:
# Clean up
Remove-Item -Path $tempfile
While such a simple example may seem artificial, I am working on a way to reference the actual lines where duplicates appear this may "break" the simple Group cmdlet usage shown above, but, if you are in a hurry, these steps can save you very easily with minimal effort.

lamsim

About lamsim

Author Description here.. Nulla sagittis convallis. Curabitur consequat. Quisque metus enim, venenatis fermentum, mollis in, porta et, nibh. Duis vulputate elit in elit. Mauris dictum libero id justo.

Subscribe to this Blog via Email :