Quickstart Guide

This guide will help you get started using the Dup File Finder library in your own Python projects.

Installation

First, install the library and its dependencies:

pip install dup-file-finder

Basic Usage

You can use Dup File Finder as a command-line tool or import it as a library in your Python code.

Command-Line Usage

See the CLI documentation for more information!

Library Usage

Import the main function or class from the library and use it in your script:

from dup_file_finder import DuplicateFileFinder

finder = DuplicateFileFinder()
files_scanned = finder.scan_directory("/path/to/your/directory")
print(f"Scanned {files_scanned} files.")

for hash, duplicates in finder.find_duplicates().items():
  print(f"Duplicate group: {hash}")
  for filepath in duplicates.file_paths:
    print(f"  {filepath}")

Example: Scanning Multiple Directories

from dup_file_finder import DuplicateFileFinder

directories = ["/path/to/your/directory", "~/other/path/to/scan"]

finder = DuplicateFileFinder()
for d in directories:
  files_scanned = finder.scan_directory(d)
  print(f"Scanned {files_scanned} files.")

for hash, duplicates in finder.find_duplicates().items():
  print(f"Duplicate group: {hash}")
  for filepath in duplicates.file_paths:
    print(f"  {filepath}")

Filtering by File Extension

You can scan only specific file types by passing a list of extensions:

finder = DuplicateFileFinder()
files_scanned = finder.scan_directory("/path/to/dir", extensions=[".jpg", ".png"])
print(f"Scanned {files_scanned} image files.")

Ignoring Hidden Files

To skip hidden files and directories during scanning:

finder = DuplicateFileFinder(ignore_hidden=True)
files_scanned = finder.scan_directory("/path/to/dir")

Example: Deleting duplicate files

from dup_file_finder import DuplicateFileFinder

directories = ["/path/to/your/directory", "~/other/path/to/scan"]

finder = DuplicateFileFinder()
for d in directories:
  files_scanned = finder.scan_directory(d)
  print(f"Scanned {files_scanned} files.")

for _, duplicates in finder.find_duplicates().items():
  duplicates.delete_duplicates(0, dry_run=False)  # deletes all but the first one listed

Next Steps

  • See the API Reference for more details.
  • Check the Home page for project overview and features.