How to Remove Duplicate Lines: Step-by-Step Guide

Duplicate lines in text files, code, and data exports create noise that slows down analysis and causes errors in downstream processing. This guide covers every method to remove duplicate lines — from a free online tool to command-line utilities and scripts.

Remove Duplicate Lines Online

The fastest method for one-off deduplication: use Format Pilot’s text utilities. Paste your content, click Remove Duplicates, and copy the result. No setup required. Works for any plain text content including keyword lists, email lists, log extracts, and code imports.

Remove Duplicate Lines Using Linux/Mac Command Line

# Remove adjacent duplicates (after sorting)
sort file.txt | uniq > clean.txt

# Remove duplicates while preserving order (awk method)
awk '!seen[$0]++' file.txt > clean.txt

The sort | uniq combination sorts the file first, then removes adjacent duplicate lines. The awk '!seen[$0]++' pattern removes duplicates while preserving the original line order — often more useful for real-world data.

Remove Duplicate Lines in Python

with open('input.txt') as f:
    lines = f.readlines()

seen = set()
unique_lines = []
for line in lines:
    if line not in seen:
        seen.add(line)
        unique_lines.append(line)

with open('output.txt', 'w') as f:
    f.writelines(unique_lines)

Remove Duplicate Lines in VS Code

Install the Sort Lines extension, then use the Sort Lines Unique command from the Command Palette. This sorts all selected lines and removes duplicates in one step. For deduplication without sorting, use a Find and Replace with a regex pattern, or use Format Pilot’s online tool for simpler workflows.

Frequently Asked Questions

How do I remove duplicate lines in Excel?

Select your data range, go to Data → Remove Duplicates, choose which columns to check, and click OK. For a single column of text, select only that column before running Remove Duplicates. Excel keeps the first occurrence of each unique value and removes subsequent duplicates.