Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
3612c15
Rewrote build script and wrapper and tests.
JSv4 Jan 15, 2024
612a74c
Added great readme docs and quick start.
JSv4 Jan 15, 2024
730ce50
Update docs.
JSv4 Jan 15, 2024
91ad620
Removed duplicative binary.
JSv4 Jan 15, 2024
2bfad25
Update packaging.
JSv4 Jan 15, 2024
604a024
Update project config.
JSv4 Jan 15, 2024
5265dcd
Different tact to include data files.
JSv4 Jan 15, 2024
8244fce
Revise project conf.
JSv4 Jan 15, 2024
bda7a2b
Rename dist to data
JSv4 Jan 15, 2024
60438bc
Tweak packaging.
JSv4 Jan 15, 2024
70fbae1
Tweak packaging.
JSv4 Jan 15, 2024
e36bf4c
Still not getting zip file but am getting folder.
JSv4 Jan 15, 2024
b43e44f
Tweaking packaging again.
JSv4 Jan 15, 2024
76cb452
Change archive extension.
JSv4 Jan 15, 2024
846a5c6
Trying another packaging approach.
JSv4 Jan 15, 2024
841bd82
Trying another packaging approach.
JSv4 Jan 15, 2024
f8692a1
Update code due to packaging change.
JSv4 Jan 15, 2024
8b95bc2
Updated readme further with install instructions. Fixed packaging iss…
JSv4 Jan 15, 2024
f1c19c4
Fixed minor typo in README.
JSv4 Jan 15, 2024
2e51942
Updated readme for new repo name.
JSv4 Jan 15, 2024
b53bcd7
Fix typos
rishabh-sagar-20 Mar 21, 2024
660bae3
Merge pull request #1 from SpotDraft/fix-typos
rishabh-sagar-20 Mar 21, 2024
37b1d1c
Create python-publish.yml
JSv4 Mar 22, 2024
5f58999
Update quickstart.md
JSv4 Mar 22, 2024
7f01c86
Update README.md
JSv4 Mar 22, 2024
c98e153
Update __about__.py
JSv4 Mar 22, 2024
624990d
Add support for macOS in build and extraction process (#2)
rishabh-sagar-20 Mar 27, 2024
6e67f6e
Merge pull request #4 from SpotDraft/main
JSv4 Apr 1, 2024
b479de7
Bump version and add notes to README.
JSv4 Apr 1, 2024
e362a65
Merge pull request #5 from JSv4/JSv4/bump-version-and-add-note-to-docs
JSv4 Apr 1, 2024
83a7e25
Build custom build action yaml and improve packaging script.
JSv4 Apr 1, 2024
00d3ae6
Remove unwanted binary in VCS
JSv4 Apr 1, 2024
2025e84
Changed build hook invocation to not rely on hatch.
JSv4 Apr 1, 2024
2314e5f
Add log lines.
JSv4 Apr 1, 2024
af22191
Verified build hook working via github pip dist. Unpackage script has…
JSv4 Apr 1, 2024
3c833e4
Updated unzip function to not overwrite existing binaries unless a fl…
JSv4 Apr 1, 2024
995fb8d
Add support for macOS in build and extraction process
rishabh-sagar-20 Apr 2, 2024
342c214
Fix numbering in developer guide documentation
rishabh-sagar-20 Apr 2, 2024
4ee2e99
Merge pull request #7 from SpotDraft/fix-binary-extraction-flow
JSv4 Apr 2, 2024
bd8545b
Added a workflow to build the latest differs.
JSv4 Apr 2, 2024
841a1bb
Update name of GitHub action workflow.
JSv4 Apr 2, 2024
803c8af
Updated dist .gitignore.
JSv4 Apr 2, 2024
29c746d
Tweaked .gitignore and commit the latest builds.
JSv4 Apr 2, 2024
b47ee1b
Fix typo.
JSv4 Apr 2, 2024
f9087c9
Revised build yaml
JSv4 Apr 2, 2024
ff165aa
Went back to build hook to try to get builds to be cleaner and keep b…
JSv4 Apr 2, 2024
54dd769
Drop the build workflow.
JSv4 Apr 2, 2024
0cd7a75
Merge pull request #6 from JSv4/JSv4/work-on-build-and-dist-infra
JSv4 Apr 2, 2024
13d7061
Bump version tag
JSv4 Apr 2, 2024
19a7d5a
Drop binaries.
JSv4 Apr 2, 2024
01eb785
Drop binaries.
JSv4 Apr 2, 2024
f7133bb
Merge branch 'main' of https://github.com/JSv4/Python-Docx-Redlines
JSv4 Apr 2, 2024
35a0fef
Updated publish workflow.
JSv4 Apr 2, 2024
91c945f
Tweak pyproject include.
JSv4 Apr 2, 2024
a834070
Drop dependency on hatch from build hook (again... lost this in the P…
JSv4 Apr 2, 2024
cc042da
add arm support
ross-mcnairn-dev May 29, 2024
10fa4f9
add windows arm64, patch READMEs
ross-mcnairn-dev May 30, 2024
3987569
parametrise target path
ross-mcnairn-dev May 31, 2024
3998921
Merge pull request #9 from ross-mcnairn-dev/main
JSv4 Jun 4, 2024
67b97a3
Merge pull request #2 from JSv4/main
ross-mcnairn-dev Jun 4, 2024
30d2a45
Merge branch 'main' into parametrise_target
ross-mcnairn-dev Jun 4, 2024
254e742
Merge pull request #10 from ross-mcnairn-dev/parametrise_target
JSv4 Jun 24, 2024
86bc4d3
feat: Replace archived Open-XML-PowerTools with Clippit
cmartin303 Jan 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Upload Python Package

on:
release:
types: [published]

permissions:
contents: read

jobs:
deploy:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install hatch hatchling
- name: Build package
run: hatch build
- name: Publish package
run: |
hatch publish -u "__token__" -a ${{ secrets.PYPI_API_TOKEN }}
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ csproj/obj/*
.Python
build/
develop-eggs/
dist/
src/python_redlines/data/
downloads/
eggs/
.eggs/
Expand Down
8 changes: 8 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions .idea/python-redlines.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
MIT License

Copyright (c) 2024-present U.N. Owen <void@some.where>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
167 changes: 167 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# Python-Redlines: Docx Redlines (Tracked Changes) for the Python Ecosystem

## Project Goal - Democratizing DOCX Comparisons

The main goal of this project is to address the significant gap in the open-source ecosystem around `.docx` document
comparison tools. Currently, the process of comparing and generating redline documents (documents that highlight
changes between versions) is complex and largely dominated by commercial software. These
tools, while effective, often come with cost barriers and limitations in terms of accessibility and integration
flexibility.

`Python-redlines` aims to democratize the ability to run tracked change redlines for .docx, providing the
open-source community with a tool to create `.docx` redlines without the need for commercial software. This will let
more legal hackers and hobbyist innovators experiment and create tooling for enterprise and legal.

## Project Roadmap

### Step 1. Open-XML-PowerTools `WmlComparer` Wrapper

The [Open-XML-PowerTools](https://github.com/OpenXmlDev/Open-Xml-PowerTools) project historically offered a solid
foundation for working with `.docx` files and has an excellent (if imperfect) comparison engine in its `WmlComparer`
class. However, Microsoft archived the repository almost five years ago, and a forked repo is not being actively
maintained, as its most recent commits dates from 2 years ago and the repo issues list is disabled.

As a first step, our project aims to bring the existing capabilities of WmlCompare into the Python world. Thankfully,
XML Power Tools is full cross-platform as it is written in .NET and compiles with the still-maintained .NET 8. The
resulting binaries can be compiled for the latest versions of Windows, OSX and Linux (Ubuntu specifically, though other
distributions should work fine too). We have included an OSX build but do not have an OSX machine to test on. Please
report an issues by opening a new Issue.

The initial release has a single engine `XmlPowerToolsEngine`, which is just a Python wrapper for a simple C# utility
written to leverage WmlComparer for 1-to-1 redlines. We hope this provides a stop-gap capability to Python developers
seeking .docx redline capabilities.

**Note**, we don't plan to fork or maintain Open-XML-PowerTools. [Version 4.4.0](https://www.nuget.org/packages/Open-Xml-PowerTools/),
which appears to only be compatible with [Open XML SDK < 3.0.0](https://www.nuget.org/packages/DocumentFormat.OpenXml) works
for now, it needs to be made compatible with the latest versions of the Open XML SDK to extend its life. **There are
also some [issues](https://github.com/dotnet/Open-XML-SDK/issues/1634)**, and it seems the only maintainer of
Open-XML-PowerTools probably won't fix, and understanding the existing code base is no small task. Please be aware that
**Open XML PowerTools is not a perfect comparison engine, but it will work for many purposes. Use at your own risk.**

### Step 2. Pure Python Comparison Engine

Looking towards the future, rather than reverse engineer `WmlComparer` and maintain a C# codebase, we envision a
comparison engine written in python. We've done some experimentation with [`xmldiff`](https://github.com/Shoobx/xmldiff)
as the engine to compare the underlying xml of docx files. Specifically, we've built a prototype to unzip `.docx` files,
execute an xml comparison using `xmldiff`, and then reconstructed a tracked changes docx with the proper Open XML
(ooxml) tracked change tags. Preliminary experimentation with this approach has shown promise, indicating its
feasibility for handling modifications such as simple span inserts and deletes.

However, this ambitious endeavor is not without its challenges. The intricacies of `.docx` files and the potential for
complex, corner-case scenarios necessitate a thoughtful and thorough development process. In the interim, `WmlComparer`
is a great solution as it has clearly been built to account for many such corner cases, through a development process
that clearly was influenced by issues discovered by a large user base. The XMLDiff engine will take some time to reach
a level of maturity similar to WmlComparer. At the moment it is NOT included.

## Getting started

### Install .NET Core 8

The Open-XML-PowerTools engine we're using in the initial releases requires .NET to run (don't worry, this is very
well-supported cross-platform at the moment). Our builds are targeting x86-64 Linux and Windows, however, so you'll
need to modify the build script and build new binaries if you want to target another runtime / architecture.

#### On Linux

You can follow [Microsoft's instructions for your Linux distribution](https://learn.microsoft.com/en-us/dotnet/core/install/linux)

#### On Windows

You can follow [Microsoft's instructions for your Windows vesrion](https://learn.microsoft.com/en-us/dotnet/core/install/windows?tabs=net80)

### Install the Library

At the moment, we are not distributing via pypi. You can easily install directly from this repo, however.

```commandline
pip install git+https://github.com/JSv4/Python-Redlines
```

You can add this as a dependency like so

```requirements
python_redlines @ git+https://github.com/JSv4/Python-Redlines@v0.0.1
```

### Use the Library

If you just want to use the tool, jump into our [quickstart guide](docs/quickstart.md).

## Architecture Overview

`XmlPowerToolsEngine` is a Python wrapper class for the `redlines` C# command-line tool, source of which is available in
[./csproj/Program.cs](./csproj/Program.cs). The redlines utility and wrapper let you compare two docx files and
show the differences in tracked changes (a "redline" document).

### C# Functionality

The `redlines` C# utility is a command line tool that requires four arguments:
1. `author_tag` - A tag to identify the author of the changes.
2. `original_path.docx` - Path to the original document.
3. `modified_path.docx` - Path to the modified document.
4. `redline_path.docx` - Path where the redlined document will be saved.

The Python wrapper, `XmlPowerToolsEngine` and its main method `run_redline()`, simplifies the use of `redlines` by
orchestrating its execution with Python and letting you pass in bytes or file paths for the original and modified
documents.

### Packaging

The project is structured as follows:
```
python-redlines/
├── csproj/
│ ├── bin/
│ ├── obj/
│ ├── Program.cs
│ ├── redlines.csproj
│ └── redlines.sln
├── docs/
│ ├── developer-guide.md
│ └── quickstart.md
├── src/
│ └── python_redlines/
│ ├── bin/
│ │ └── .gitignore
│ ├── dist/
│ │ ├── .gitignore
│ │ ├── linux-x64-0.0.1.tar.gz
│ │ └── win-x64-0.0.1.zip
│ ├── __about__.py
│ ├── __init__.py
│ └── engines.py
├── tests/
| ├── fixtures/
| ├── test_openxml_differ.py
| └── __init__.py
|
├── .gitignore
├── build_differ.py
├── extract_version.py
├── License.md
├── pyproject.toml
└── README.md
```

- `src/your_package/`: Contains the Python wrapper code.
- `dist/`: Contains the zipped C# binaries for different platforms.
- `bin/`: Target directory for extracted binaries.
- `tests/`: Contains test cases and fixtures for the wrapper.

### Detailed Explanation and Dev Setup

If you want to contribute to the library or want to dive into some of the C# packaging architecture, go to our
[developer guide](docs/developer-guide.md).

## Additional Information

- **Contributing**: Contributions to the project should follow the established coding and documentation standards.
- **Issues and Support**: For issues, feature requests, or support, please use the project's issue tracker on GitHub.

## License

MIT
109 changes: 109 additions & 0 deletions build_differ.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
import subprocess
import os
import tarfile
import zipfile


def get_version():
"""
Extracts the version from the specified __about__.py file.
"""
about = {}
with open('./src/python_redlines/__about__.py') as f:
exec(f.read(), about)
return about['__version__']


def run_command(command):
"""
Runs a shell command and prints its output.
"""
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in process.stdout:
print(line.decode().strip())


def compress_files(source_dir, target_file):
"""
Compresses files in the specified directory into a tar.gz or zip file.
"""
if target_file.endswith('.tar.gz'):
with tarfile.open(target_file, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
elif target_file.endswith('.zip'):
with zipfile.ZipFile(target_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
for root, dirs, files in os.walk(source_dir):
for file in files:
zipf.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file),
os.path.join(source_dir, '..')))


def cleanup_old_builds(dist_dir, current_version):
"""
Deletes any build files ending in .zip or .tar.gz in the dist_dir with a different version tag.
"""
for file in os.listdir(dist_dir):
if not file.endswith((f'{current_version}.zip', f'{current_version}.tar.gz', '.gitignore')):
file_path = os.path.join(dist_dir, file)
os.remove(file_path)
print(f"Deleted old build file: {file}")


def main():
version = get_version()
print(f"Version: {version}")

dist_dir = "./src/python_redlines/dist/"

# Build for Linux x64
print("Building for Linux x64...")
run_command('dotnet publish ./csproj -c Release -r linux-x64 --self-contained')

# Build for Linux ARM64
print("Building for Linux ARM64...")
run_command('dotnet publish ./csproj -c Release -r linux-arm64 --self-contained')

# Build for Windows x64
print("Building for Windows x64...")
run_command('dotnet publish ./csproj -c Release -r win-x64 --self-contained')

# Build for Windows ARM64
print("Building for Windows ARM64...")
run_command('dotnet publish ./csproj -c Release -r win-arm64 --self-contained')

# Build for macOS x64
print("Building for macOS x64...")
run_command('dotnet publish ./csproj -c Release -r osx-x64 --self-contained')

# Build for macOS ARM64
print("Building for macOS ARM64...")
run_command('dotnet publish ./csproj -c Release -r osx-arm64 --self-contained')

# Compress the Linux x64 build
linux_x64_build_dir = './csproj/bin/Release/net8.0/linux-x64'
compress_files(linux_x64_build_dir, f"{dist_dir}/linux-x64-{version}.tar.gz")

# Compress the Linux ARM64 build
linux_arm64_build_dir = './csproj/bin/Release/net8.0/linux-arm64'
compress_files(linux_arm64_build_dir, f"{dist_dir}/linux-arm64-{version}.tar.gz")

# Compress the Windows x64 build
windows_build_dir = './csproj/bin/Release/net8.0/win-x64'
compress_files(windows_build_dir, f"{dist_dir}/win-x64-{version}.zip")

# Compress the macOS x64 build
macos_x64_build_dir = './csproj/bin/Release/net8.0/osx-x64'
compress_files(macos_x64_build_dir, f"{dist_dir}/osx-x64-{version}.tar.gz")

# Compress the macOS ARM64 build
macos_arm64_build_dir = './csproj/bin/Release/net8.0/osx-arm64'
compress_files(macos_arm64_build_dir, f"{dist_dir}/osx-arm64-{version}.tar.gz")

cleanup_old_builds(dist_dir, version)

print("Build and compression complete.")


if __name__ == "__main__":
main()
Loading