PowerShell v3 Find Duplicate lines in File

On occasion I need to track down duplicate entries in a file. Without going through a bunch of mechanics, I found this approach useful, and, most importantly, easy. First, we will create a dummy array and store the contents in a temp file:

# Create temp file with dummy data including duplicate lines
1,2,3,4,1,2,3,1,2,1 |
Out-File -FilePath ($tempfile = [IO.Path]::GetTempFileName) -Encoding ASCII -Append

Next, we get the data into an array. Interestingly, Get-Content does this for you without any extra work:

# Get file contents into an array
$filecontents = Get-Content -Path $tempfile

Once we have an array, which is verifiable by using this command:

$filecontents.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array

we can use the Group-Object (or group alias) with a Where-Object (or where alias) cmdlet pattern to find collections (or groupings) with more than 1 entry. In essence, this is a set of lines (or array entries) where more than 1 entry exists per group:

# Find duplicates
$filecontents |
Group |
Where {$_.count -gt 1}

When this gets run, it shows results:

Count Name                      Group
----- ----                      -----
    4 1                         {1, 1, 1, 1}
    3 2                         {2, 2, 2}
    2 3                         {3, 3}

To finalize this sample, remove the temp file:

# Clean up
Remove-Item -Path $tempfile

While such a simple example may seem artificial, I am working on a way to reference the actual lines where duplicates appear this may "break" the simple Group cmdlet usage shown above, but, if you are in a hurry, these steps can save you very easily with minimal effort.

Computer Blog

Kamis, 10 November 2016

PowerShell v3 Find Duplicate lines in File

About lamsim