Skip to content

Text Extractor is a simple versatile GUI application that allows users to easily extract text from multiple files and consolidate it into a single output file.

License

Notifications You must be signed in to change notification settings

VoxDroid/Text-Extractor

Text Extractor

Text Extractor

GitHub


ko-fi



Welcome to Text Extractor, a powerful Python-based desktop application built with PyQt6, designed to extract and consolidate text from multiple files into a single output file. This tool is ideal for students, researchers, developers, and anyone needing to efficiently manage and merge text content from various sources, with advanced features like duplicate removal, customizable output formatting, and a modern, themeable interface.

Table of Contents

Features

  • File Selection and Management: Easily select, add, sort, and clear multiple files for text extraction.
  • Duplicate Removal: Use hash-based detection to eliminate duplicate files, ensuring a clean output.
  • Customizable Output: Add line numbers, timestamps, and file names to the consolidated output file.
  • Encoding Support: Choose from multiple encoding options (UTF-8, ASCII, Latin-1, UTF-16) for file reading and writing.
  • File Preview: Preview file contents in the app before processing, with customizable preview length.
  • Search Functionality: Search within files to filter the file list based on content.
  • File Information: View detailed file metadata, including size and last modified date.
  • Themeable Interface: Switch between dark and light themes for a personalized user experience.
  • Progress Monitoring: Track extraction progress with a progress bar and status updates.
  • Automatic Updates: Check for new versions on startup with optional notifications.
  • Cross-Platform UI: Built with PyQt6 for a modern, intuitive interface compatible with Windows, macOS, and Linux.

Supported File Formats

Text Extractor supports the following readable file formats:

  • Text Files: .txt
  • Java Files: .java
  • Python Files: .py
  • Markdown Files: .md
  • HTML Files: .html
  • CSV Files: .csv
  • XML Files: .xml
  • JSON Files: .json
  • Rich Text Format: .rtf
  • Microsoft Word Documents: .docx

Installation

Text Extractor is packaged using Briefcase, making it easy to run or distribute as a native application across platforms. You can either build from source or use pre-compiled binaries where available.

Building with Briefcase (All Platforms)

  1. Ensure you have Python 3.9+ installed on your system (Windows, macOS, Linux).

  2. Clone this repository:

    git clone https://github.com/VoxDroid/Text-Extractor.git
    cd Text-Extractor
  3. Install Briefcase and dependencies:

    pip install briefcase
    pip install -r requirements.txt
  4. Initialize the Briefcase project (if not already set up):

    briefcase create
  5. Build the application:

    • Windows: briefcase build windows
    • macOS: briefcase build macos
    • Linux: briefcase build linux
  6. Run the application:

    • Windows: briefcase run windows
    • macOS: briefcase run macos
    • Linux: briefcase run linux

Pre-Compiled Binaries

  • Windows: Download the latest .exe (portable) or .msi (installer) tagged with [W] for Windows, from the Releases section. Run the MSI installer or use the portable version for no-setup runs.
  • macOS: Download the latest universal .dmg (x86_64 and Apple Silicon) tagged with [M] for macOS, from the Releases section. Open the DMG, drag the app to Applications, and launch it.
  • Linux: Download the latest .rpm (for Fedora/Red Hat), .deb (for Debian/Ubuntu), or .pkg.tar.zst (for Arch/Pacman) tagged with [L] for Linux, from the Releases section. Run the installer and launch the app.

Usage

Upon launching, you’ll see the main interface featuring three tabs: Extractor, Settings, and Help. The in-app Help tab contains a comprehensive user manual.

Getting Started

  • Launch the application and explore the Extractor tab to begin selecting files.
  • Configure settings such as encoding, duplicate removal, and output formatting in the Settings tab.
  • Refer to the Help tab for detailed instructions and examples.

Extractor Tab

  • Purpose: Select and process files for text extraction.
  • How to Use:
    1. Click "Select Files" to choose files or "Add Files" to append more files.
    2. Use "Sort" to organize the file list or "Clear" to reset it.
    3. Click a file to preview its contents (up to the specified preview length).
    4. Use "Search in Files" to filter files by content.
    5. Click "File Info" to view metadata for all selected files.
    6. Specify an output filename and click "Save As" to consolidate the text into a single file.
    7. Monitor progress with the progress bar and cancel if needed.

Settings Tab

  • Purpose: Customize the application’s behavior.
  • How to Use:
    1. Enable or disable options like duplicate removal, line numbers, and timestamps.
    2. Select the desired encoding (e.g., UTF-8, ASCII).
    3. Set the preview length for file previews.
    4. Switch between dark and light themes.

Help Tab

  • Purpose: Access the embedded user manual.
  • How to Use: Navigate to the Help tab to read detailed guides, view example output, and find support information.

Screenshots

Here are previews of the main tabs in Text Extractor:

Extractor Tab
Extractor Tab

Settings Tab
Settings Tab

Help Tab
Help Tab

Releases

  • Windows: Pre-compiled .exe available in the Releases section.
  • macOS: Pre-compiled universal .dmg (x86_64 and Apple Silicon) available in the Releases section.
  • Linux: Pre-compiled .rpm (for Fedora/Red Hat), .deb (for Debian/Ubuntu), or .pkg.tar.tsz (for Arch/Pacman) available in the Releases section.
  • Check release notes for details on new features, bug fixes, and version updates.
  • The Briefcase-built Python source remains the primary method, supporting all platforms with proper setup.

Support

For ways to get help, report issues, or support the project’s development, please see the Support page.

Contributing

Text Extractor is open-source, and contributions are encouraged! Please read our Contributing Guidelines, Code of Conduct, and Security Policy before submitting issues or pull requests. Use the appropriate issue templates for reporting bugs, suggesting features, or other contributions, and the Pull Request template for code submissions.

Security

If you discover a security vulnerability, please follow our Security Policy by emailing izeno.contact@gmail.com or using the Security Report issue template for non-sensitive issues.

License

This project is licensed under the MIT License. Use, modify, and distribute it freely per the license terms.

Dependencies

To build from source, install the following Python packages:

  • PyQt6 (for the GUI)
  • requests (for HTTP requests)
  • packaging (for version parsing)
  • qtawesome (for icons)
  • briefcase (for packaging the app)

Create a requirements.txt file with these dependencies and run pip install -r requirements.txt.


Developed by VoxDroid
GitHub | Ko-fi


About

Text Extractor is a simple versatile GUI application that allows users to easily extract text from multiple files and consolidate it into a single output file.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Languages