Text Extractor

Welcome to Text Extractor, a powerful Python-based desktop application built with PyQt6, designed to extract and consolidate text from multiple files into a single output file. This tool is ideal for students, researchers, developers, and anyone needing to efficiently manage and merge text content from various sources, with advanced features like duplicate removal, customizable output formatting, and a modern, themeable interface.

Features

File Selection and Management: Easily select, add, sort, and clear multiple files for text extraction.
Duplicate Removal: Use hash-based detection to eliminate duplicate files, ensuring a clean output.
Customizable Output: Add line numbers, timestamps, and file names to the consolidated output file.
Encoding Support: Choose from multiple encoding options (UTF-8, ASCII, Latin-1, UTF-16) for file reading and writing.
File Preview: Preview file contents in the app before processing, with customizable preview length.
Search Functionality: Search within files to filter the file list based on content.
File Information: View detailed file metadata, including size and last modified date.
Themeable Interface: Switch between dark and light themes for a personalized user experience.
Progress Monitoring: Track extraction progress with a progress bar and status updates.
Automatic Updates: Check for new versions on startup with optional notifications.
Cross-Platform UI: Built with PyQt6 for a modern, intuitive interface compatible with Windows, macOS, and Linux.

Supported File Formats

Text Extractor supports the following readable file formats:

Text Files: .txt
Java Files: .java
Python Files: .py
Markdown Files: .md
HTML Files: .html
CSV Files: .csv
XML Files: .xml
JSON Files: .json
Rich Text Format: .rtf
Microsoft Word Documents: .docx

Installation

Text Extractor is packaged using Briefcase, making it easy to run or distribute as a native application across platforms. You can either build from source or use pre-compiled binaries where available.

Building with Briefcase (All Platforms)

Ensure you have Python 3.9+ installed on your system (Windows, macOS, Linux).

Clone this repository:

git clone https://github.com/VoxDroid/Text-Extractor.git
cd Text-Extractor

Install Briefcase and dependencies:

pip install briefcase
pip install -r requirements.txt

Initialize the Briefcase project (if not already set up):
```
briefcase create
```
Build the application:
- Windows: briefcase build windows
- macOS: briefcase build macos
- Linux: briefcase build linux
Run the application:
- Windows: briefcase run windows
- macOS: briefcase run macos
- Linux: briefcase run linux

Pre-Compiled Binaries

Windows: Download the latest .exe (portable) or .msi (installer) tagged with [W] for Windows, from the Releases section. Run the MSI installer or use the portable version for no-setup runs.
macOS: Download the latest universal .dmg (x86_64 and Apple Silicon) tagged with [M] for macOS, from the Releases section. Open the DMG, drag the app to Applications, and launch it.
Linux: Download the latest .rpm (for Fedora/Red Hat), .deb (for Debian/Ubuntu), or .pkg.tar.zst (for Arch/Pacman) tagged with [L] for Linux, from the Releases section. Run the installer and launch the app.

Usage

Upon launching, you’ll see the main interface featuring three tabs: Extractor, Settings, and Help. The in-app Help tab contains a comprehensive user manual.

Getting Started

Launch the application and explore the Extractor tab to begin selecting files.
Configure settings such as encoding, duplicate removal, and output formatting in the Settings tab.
Refer to the Help tab for detailed instructions and examples.

Extractor Tab

Purpose: Select and process files for text extraction.
How to Use:
1. Click "Select Files" to choose files or "Add Files" to append more files.
2. Use "Sort" to organize the file list or "Clear" to reset it.
3. Click a file to preview its contents (up to the specified preview length).
4. Use "Search in Files" to filter files by content.
5. Click "File Info" to view metadata for all selected files.
6. Specify an output filename and click "Save As" to consolidate the text into a single file.
7. Monitor progress with the progress bar and cancel if needed.

Settings Tab

Purpose: Customize the application’s behavior.
How to Use:
1. Enable or disable options like duplicate removal, line numbers, and timestamps.
2. Select the desired encoding (e.g., UTF-8, ASCII).
3. Set the preview length for file previews.
4. Switch between dark and light themes.

Help Tab

Purpose: Access the embedded user manual.
How to Use: Navigate to the Help tab to read detailed guides, view example output, and find support information.

Screenshots

Here are previews of the main tabs in Text Extractor:

Extractor Tab	Settings Tab
Help Tab

Releases

Windows: Pre-compiled .exe available in the Releases section.
macOS: Pre-compiled universal .dmg (x86_64 and Apple Silicon) available in the Releases section.
Linux: Pre-compiled .rpm (for Fedora/Red Hat), .deb (for Debian/Ubuntu), or .pkg.tar.tsz (for Arch/Pacman) available in the Releases section.
Check release notes for details on new features, bug fixes, and version updates.
The Briefcase-built Python source remains the primary method, supporting all platforms with proper setup.

Support

For ways to get help, report issues, or support the project’s development, please see the Support page.

Contributing

Text Extractor is open-source, and contributions are encouraged! Please read our Contributing Guidelines, Code of Conduct, and Security Policy before submitting issues or pull requests. Use the appropriate issue templates for reporting bugs, suggesting features, or other contributions, and the Pull Request template for code submissions.

Security

If you discover a security vulnerability, please follow our Security Policy by emailing izeno.contact@gmail.com or using the Security Report issue template for non-sensitive issues.

License

This project is licensed under the MIT License. Use, modify, and distribute it freely per the license terms.

Dependencies

To build from source, install the following Python packages:

PyQt6 (for the GUI)
requests (for HTTP requests)
packaging (for version parsing)
qtawesome (for icons)
briefcase (for packaging the app)

Create a requirements.txt file with these dependencies and run pip install -r requirements.txt.

Developed by VoxDroid
GitHub | Ko-fi

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
assets		assets
src/textextractor		src/textextractor
tests		tests
.gitignore		.gitignore
CHANGELOG		CHANGELOG
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Text Extractor

Table of Contents

Features

Supported File Formats