| Vulnerabilities | |||||
|---|---|---|---|---|---|
| Version | Suggest | Low | Medium | High | Critical |
| 1.27.2.2 | 0 | 0 | 0 | 0 | 0 |
| 1.27.2 | 0 | 0 | 0 | 0 | 0 |
| 1.27.1 | 0 | 0 | 0 | 0 | 0 |
| 1.26.7 | 0 | 0 | 0 | 0 | 0 |
| 1.26.6 | 0 | 0 | 0 | 1 | 0 |
| 1.26.5 | 0 | 0 | 0 | 1 | 0 |
| 1.26.4 | 0 | 0 | 0 | 0 | 0 |
| 1.26.3 | 0 | 0 | 0 | 0 | 0 |
| 1.26.1 | 0 | 0 | 0 | 0 | 0 |
| 1.26.0 | 0 | 0 | 0 | 0 | 0 |
| 1.25.5 | 0 | 0 | 0 | 0 | 0 |
| 1.25.4 | 0 | 0 | 0 | 0 | 0 |
| 1.25.3 | 0 | 0 | 0 | 0 | 0 |
| 1.25.2 | 0 | 0 | 0 | 0 | 0 |
| 1.25.1 | 0 | 0 | 0 | 0 | 0 |
| 1.25.0 | 0 | 0 | 0 | 0 | 0 |
| 1.24.14 | 0 | 0 | 0 | 0 | 0 |
| 1.24.13 | 0 | 0 | 0 | 0 | 0 |
| 1.24.12 | 0 | 0 | 0 | 0 | 0 |
| 1.24.11 | 0 | 0 | 0 | 0 | 0 |
| 1.24.10 | 0 | 0 | 0 | 0 | 0 |
| 1.24.9 | 0 | 0 | 0 | 0 | 0 |
| 1.24.8 | 0 | 0 | 0 | 0 | 0 |
| 1.24.7 | 0 | 0 | 0 | 0 | 0 |
| 1.24.6 | 0 | 0 | 0 | 0 | 0 |
| 1.24.5 | 0 | 0 | 0 | 0 | 0 |
| 1.24.4 | 0 | 0 | 0 | 0 | 0 |
| 1.24.3 | 0 | 0 | 0 | 0 | 0 |
| 1.24.2 | 0 | 0 | 0 | 0 | 0 |
| 1.24.1 | 0 | 0 | 0 | 0 | 0 |
| 1.24.0 | 0 | 0 | 0 | 0 | 0 |
| 1.23.26 | 0 | 0 | 0 | 0 | 0 |
| 1.23.25 | 0 | 0 | 0 | 0 | 0 |
| 1.23.24 | 0 | 0 | 0 | 0 | 0 |
| 1.23.23 | 0 | 0 | 0 | 0 | 0 |
| 1.23.22 | 0 | 0 | 0 | 0 | 0 |
| 1.23.21 | 0 | 0 | 0 | 0 | 0 |
| 1.23.20 | 0 | 0 | 0 | 0 | 0 |
| 1.23.19 | 0 | 0 | 0 | 0 | 0 |
| 1.23.18 | 0 | 0 | 0 | 0 | 0 |
| 1.23.17 | 0 | 0 | 0 | 0 | 0 |
| 1.23.16 | 0 | 0 | 0 | 0 | 0 |
| 1.23.15 | 0 | 0 | 0 | 0 | 0 |
| 1.23.14 | 0 | 0 | 0 | 0 | 0 |
| 1.23.13 | 0 | 0 | 0 | 0 | 0 |
| 1.23.12 | 0 | 0 | 0 | 0 | 0 |
| 1.23.11 | 0 | 0 | 0 | 0 | 0 |
| 1.23.10 | 0 | 0 | 0 | 0 | 0 |
| 1.23.9 | 0 | 0 | 0 | 0 | 0 |
| 1.23.9rc2 | 0 | 0 | 0 | 0 | 0 |
| 1.23.9rc1 | 0 | 0 | 0 | 0 | 0 |
| 1.23.8 | 0 | 0 | 0 | 0 | 0 |
| 1.23.7 | 0 | 0 | 0 | 0 | 0 |
| 1.23.6 | 0 | 0 | 0 | 0 | 0 |
| 1.23.5 | 0 | 0 | 0 | 0 | 0 |
| 1.23.4 | 0 | 0 | 0 | 0 | 0 |
| 1.23.3 | 0 | 0 | 0 | 0 | 0 |
| 1.23.2 | 0 | 0 | 0 | 0 | 0 |
| 1.23.2rc1 | 0 | 0 | 0 | 0 | 0 |
| 1.23.1 | 0 | 0 | 0 | 0 | 0 |
| 1.23.0rc2 | 0 | 0 | 0 | 0 | 0 |
| 1.23.0rc1 | 0 | 0 | 0 | 0 | 0 |
| 1.23.0 | 0 | 0 | 0 | 0 | 0 |
| 1.22.5 | 0 | 0 | 0 | 0 | 0 |
| 1.22.3 | 0 | 0 | 0 | 0 | 0 |
| 1.22.2 | 0 | 0 | 0 | 0 | 0 |
| 1.22.1 | 0 | 0 | 0 | 0 | 0 |
| 1.22.0 | 0 | 0 | 0 | 0 | 0 |
| 1.21.1 | 0 | 0 | 0 | 0 | 0 |
| 1.21.0 | 0 | 0 | 0 | 0 | 0 |
| 1.20.2 | 0 | 0 | 0 | 0 | 0 |
| 1.20.1 | 0 | 0 | 0 | 0 | 0 |
| 1.20.0 | 0 | 0 | 0 | 0 | 0 |
| 1.19.6 | 0 | 0 | 0 | 0 | 0 |
| 1.19.5 | 0 | 0 | 0 | 0 | 0 |
| 1.19.4 | 0 | 0 | 0 | 0 | 0 |
| 1.19.3 | 0 | 0 | 0 | 0 | 0 |
| 1.19.2 | 0 | 0 | 0 | 0 | 0 |
| 1.19.1 | 0 | 0 | 0 | 0 | 0 |
| 1.19.0 | 0 | 0 | 0 | 0 | 0 |
| 1.18.19 | 0 | 0 | 0 | 0 | 0 |
| 1.18.18 | 0 | 0 | 0 | 0 | 0 |
| 1.18.17 | 0 | 0 | 0 | 0 | 0 |
| 1.18.16 | 0 | 0 | 0 | 0 | 0 |
| 1.18.15 | 0 | 0 | 0 | 0 | 0 |
| 1.18.14 | 0 | 0 | 0 | 0 | 0 |
| 1.18.13 | 0 | 0 | 0 | 0 | 0 |
| 1.18.12 | 0 | 0 | 0 | 0 | 0 |
| 1.18.11 | 0 | 0 | 0 | 0 | 0 |
| 1.18.10 | 0 | 0 | 0 | 0 | 0 |
| 1.18.9 | 0 | 0 | 0 | 0 | 0 |
| 1.18.8 | 0 | 0 | 0 | 0 | 0 |
| 1.18.7 | 0 | 0 | 0 | 0 | 0 |
| 1.18.6 | 0 | 0 | 0 | 0 | 0 |
| 1.18.5 | 0 | 0 | 0 | 0 | 0 |
| 1.18.4 | 0 | 0 | 0 | 0 | 0 |
| 1.18.3 | 0 | 0 | 0 | 0 | 0 |
| 1.18.2 | 0 | 0 | 0 | 0 | 0 |
| 1.18.1 | 0 | 0 | 0 | 0 | 0 |
| 1.18.0 | 0 | 0 | 0 | 0 | 0 |
| 1.17.7 | 0 | 0 | 0 | 0 | 0 |
| 1.17.6 | 0 | 0 | 0 | 0 | 0 |
| 1.17.5 | 0 | 0 | 0 | 0 | 0 |
| 1.17.4 | 0 | 0 | 0 | 0 | 0 |
| 1.17.3 | 0 | 0 | 0 | 0 | 0 |
| 1.17.2 | 0 | 0 | 0 | 0 | 0 |
| 1.17.1 | 0 | 0 | 0 | 0 | 0 |
| 1.17.0 | 0 | 0 | 0 | 0 | 0 |
| 1.16.18 | 0 | 0 | 0 | 0 | 0 |
| 1.16.17 | 0 | 0 | 0 | 0 | 0 |
| 1.16.16 | 0 | 0 | 0 | 0 | 0 |
| 1.16.15 | 0 | 0 | 0 | 0 | 0 |
| 1.16.14 | 0 | 0 | 0 | 0 | 0 |
| 1.16.13 | 0 | 0 | 0 | 0 | 0 |
| 1.16.12 | 0 | 0 | 0 | 0 | 0 |
| 1.16.11 | 0 | 0 | 0 | 0 | 0 |
| 1.16.10 | 0 | 0 | 0 | 0 | 0 |
| 1.16.9 | 0 | 0 | 0 | 0 | 0 |
| 1.16.8 | 0 | 0 | 0 | 0 | 0 |
| 1.16.7 | 0 | 0 | 0 | 0 | 0 |
| 1.16.6 | 0 | 0 | 0 | 0 | 0 |
| 1.16.5 | 0 | 0 | 0 | 0 | 0 |
| 1.16.4 | 0 | 0 | 0 | 0 | 0 |
| 1.16.3 | 0 | 0 | 0 | 0 | 0 |
| 1.16.2 | 0 | 0 | 0 | 0 | 0 |
| 1.16.1 | 0 | 0 | 0 | 0 | 0 |
| 1.16.0 | 0 | 0 | 0 | 0 | 0 |
| 1.14.21 | 0 | 0 | 0 | 0 | 0 |
| 1.14.20 | 0 | 0 | 0 | 0 | 0 |
| 1.14.19 | 0 | 0 | 0 | 0 | 0 |
| 1.13.20 | 0 | 0 | 0 | 0 | 0 |
| 1.12.5 | 0 | 0 | 0 | 0 | 0 |
| 1.11.2 | 0 | 0 | 0 | 0 | 0 |
| 1.10.0 | 0 | 0 | 0 | 0 | 0 |
| 1.9.2 | 0 | 0 | 0 | 0 | 0 |
1.27.2.2 - This version is safe to use because it has no known security vulnerabilities at this time. Find out if your coding project uses this component and get notified of any reported security vulnerabilities with Meterian-X Open Source Security Platform
Maintain your licence declarations and avoid unwanted licences to protect your IP the way you intended.
UNKNOWN - Dual Licensed - GNU AFFERO GPL 3.0 or Artifex Commercial LicenseThe PDF engine behind over 50 million monthly downloads, powering AI pipelines worldwide.
PyMuPDF is a high-performance Python library for data extraction, analysis, conversion, rendering and manipulation of PDF (and other) documents. Built on top of MuPDF — a lightweight, fast C engine — PyMuPDF gives you precise, low-level control over documents alongside high-level convenience APIs. No mandatory external dependencies.
pip install pymupdf and you're donepip install pymupdfWheels are available for Windows, macOS, and Linux on Python 3.10–3.14. If no pre-built wheel exists for your platform, pip will compile from source (requires a C/C++ toolchain).
| Package | Purpose |
|---|---|
pymupdf-fonts |
Extended font collection for text output |
pymupdf4llm |
LLM/RAG-optimised Markdown and JSON extraction |
pymupdfpro |
Adds Office document support |
tesseract-ocr |
OCR for scanned pages and images (separate install) |
# More fonts
pip install pymupdf-fonts
# LLM-ready extraction
pip install pymupdf4llm
# Office support
pip install pymupdfpro
# OCR (Tesseract must be installed separately)
# macOS
brew install tesseract
# Ubuntu / Debian
sudo apt install tesseract-ocr| Category | Formats |
|---|---|
| PDF & derivatives | PDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT |
| Images | PNG, JPEG, BMP, TIFF, GIF, and more |
| Microsoft Office (Pro) | DOC, DOCX, XLS, XLSX, PPT, PPTX |
| Korean Office (Pro) | HWP, HWPX |
| Format | Notes |
|---|---|
| Full fidelity conversion from Office formats | |
| SVG | Vector page rendering |
| Image (PNG, JPEG, …) | Page rasterisation at any DPI |
| Markdown | Structure-aware, LLM-ready |
| JSON | Bounding boxes, layout data, per-element detail |
| Plain text | Fast, lightweight extraction |
import pymupdf
doc = pymupdf.open("document.pdf")
for page in doc:
print(page.get_text())import pymupdf
doc = pymupdf.open("document.pdf")
page = doc[0]
blocks = page.get_text("dict")["blocks"]
for block in blocks:
if block["type"] == 0: # text block
for line in block["lines"]:
for span in line["spans"]:
print(f"{span['text']!r} font={span['font']} size={span['size']:.1f}")import pymupdf
doc = pymupdf.open("spreadsheet.pdf")
page = doc[0]
tables = page.find_tables()
for table in tables:
print(table.to_markdown())
# or get as Pandas DataFrame
df = table.to_pandas()import pymupdf
doc = pymupdf.open("document.pdf")
page = doc[0]
pixmap = page.get_pixmap(dpi=150)
pixmap.save("page_0.png")import pymupdf
doc = pymupdf.open("scanned.pdf")
page = doc[0]
# Requires Tesseract installed and on PATH
text = page.get_textpage_ocr(language="eng").extractText()
print(text)import pymupdf4llm
md = pymupdf4llm.to_markdown("report.pdf")
# Pass directly to your LLM or vector store
print(md)import pymupdf
doc = pymupdf.open("contract.pdf")
page = doc[0]
# Add a highlight annotation
rect = pymupdf.Rect(72, 100, 400, 120)
page.add_highlight_annot(rect)
# Add a redaction and apply it
page.add_redact_annot(rect)
page.apply_redactions()
doc.save("contract_redacted.pdf")import pymupdf
merger = pymupdf.open()
for path in ["part1.pdf", "part2.pdf", "part3.pdf"]:
merger.insert_pdf(pymupdf.open(path))
merger.save("merged.pdf")import pymupdf.pro
pymupdf.pro.unlock("YOUR-LICENSE-KEY")
doc = pymupdf.open("presentation.pptx")
pdf_bytes = doc.convert_to_pdf()
with open("output.pdf", "wb") as f:
f.write(pdf_bytes)import pymupdf4llm
import pymupdf.pro
pymupdf.pro.unlock("YOUR-LICENSE-KEY")
md = pymupdf4llm.to_markdown("document.docx")
print(md)| Feature | Description |
|---|---|
| Text extraction | Plain text, rich dict (font, size, color, bbox), HTML, XML, raw blocks |
| Table detection |
find_tables() — locate, extract, and export tables as Markdown or structured data |
| Image extraction | Extract embedded images and render any page to a high-resolution Pixmap
|
| Rendering | Render PDF pages to images or Pixmap data for use in UI or other workflows |
| OCR | Tesseract integration — full-page or partial OCR, configurable language |
| Annotations | Read and write highlights, underlines, squiggly lines, sticky notes, free text, ink, stamps |
| Redaction | Add and permanently apply redaction annotations |
| Forms | Read and fill PDF AcroForm fields |
| PDF editing | Insert, delete, and reorder pages; set metadata; merge and split documents |
| Drawing | Draw lines, curves, rectangles, and circles; insert HTML boxes |
| Encryption | Open password-protected PDFs; save with RC4 or AES encryption |
| Links | Extract hyperlinks, internal cross-references, and URI targets |
| Bookmarks | Read and write the outline / table of contents tree |
| Metadata | Title, author, creation date, producer, subject, and custom entries |
| Color spaces | RGB, CMYK, greyscale; color space conversion |
| Output | API |
|---|---|
| Markdown | pymupdf4llm.to_markdown(path) |
| JSON | pymupdf4llm.to_json(path) |
| Plain text | pymupdf4llm.to_text(path) |
Supports multi-column layouts, natural reading order and page chunking.
Python 3.10 – 3.14 (as of v1.27.x). Wheels ship for:
manylinux x86_64 and aarch64musllinux x86_64PyMuPDF is built on MuPDF — one of the fastest PDF rendering engines available. Typical benchmarks against pure-Python PDF libraries show 10–50× speed improvements for text extraction and 100× or more for page rendering, with a minimal memory footprint.
For AI workloads, PyMuPDF4LLM processes documents without a GPU, cutting infrastructure costs significantly compared to vision-based LLM approaches.
import pymupdf
from pathlib import Path
doc = pymupdf.open("document.pdf")
out = Path("images")
out.mkdir(exist_ok=True)
for page_index, page in enumerate(doc):
for img_index, img in enumerate(page.get_images()):
xref = img[0]
pix = pymupdf.Pixmap(doc, xref)
if pix.n > 4: # convert CMYK
pix = pymupdf.Pixmap(pymupdf.csRGB, pix)
pix.save(out / f"page{page_index}_img{img_index}.png")import pymupdf
doc = pymupdf.open("document.pdf")
needle = "confidential"
for page in doc:
hits = page.search_for(needle)
if hits:
print(f"Page {page.number}: {len(hits)} occurrence(s)")
for rect in hits:
page.add_highlight_annot(rect)
doc.save("highlighted.pdf")import pymupdf
doc = pymupdf.open("document.pdf")
for i, page in enumerate(doc):
out = pymupdf.open()
out.insert_pdf(doc, from_page=i, to_page=i)
out.save(f"page_{i + 1}.pdf")import pymupdf
doc = pymupdf.open("document.pdf")
for page in doc:
page.insert_text(
point=pymupdf.Point(72, page.rect.height / 2),
text="DRAFT",
fontsize=72,
color=(0.8, 0.8, 0.8),
rotate=45,
)
doc.save("watermarked.pdf")PyMuPDF can be extended with PyMuPDF Pro. This adds a conversion layer that handles Microsoft and Korean Office formats natively — no Office installation, no COM interop, no LibreOffice subprocess.
Once unlocked, pymupdf.open() accepts Office files exactly like PDFs:
import pymupdf.pro
pymupdf.pro.unlock("YOUR-LICENSE-KEY")
# Works identically regardless of format
for fmt in ["contract.docx", "data.xlsx", "deck.pptx", "report.hwpx"]:
doc = pymupdf.open(fmt)
for page in doc:
print(page.get_text())Get a trial license key for PyMuPDF Pro
What you can do with Office documents:
doc.convert_to_pdf()
When pymupdf.pro.unlock() is called without a key, the following restrictions apply:
| Restriction | Detail |
|---|---|
| Page limit | Only the first 3 pages of any document are accessible |
| Time limit | Evaluation period — functionality expires after a set duration |
All other Pro features work normally within these constraints, making it straightforward to prototype before purchasing a license.
Yes, absolutely — and this is one of PyMuPDF's most significant advantages.
PyMuPDF runs entirely locally. It is a native Python library built on top of the MuPDF C engine. When you call pymupdf.open(), page.get_text(), page.find_tables(), or any other method, everything executes in-process on your own machine. No data is transmitted anywhere.
There are no telemetry calls, no licence validation callbacks, no cloud dependencies of any kind in the open-source AGPL build or the commercial build. Once the package is installed, it works fully air-gapped.
This makes PyMuPDF well-suited for:
The only thing you need an internet connection for is the initial pip install. After that, the package and all its capabilities are entirely self-contained.
Use import pymupdf. The fitz name is a legacy alias that still works as of v1.24.0+, but import pymupdf is the recommended and future-proof approach. The two are interchangeable in existing code:
import pymupdf # recommended
# import fitz # legacy alias — still works but avoid for new codeYes — PyMuPDF has solid CJK support
Let PyMuPDF4LLM do everything (recommended for RAG).
PyMuPDF4LLM is a high-level wrapper that outputs standard text and table content together in an integrated Markdown-formatted string across all document pages PyMuPDF — tables are detected, converted to GitHub-compatible Markdown, and interleaved with surrounding text in the correct reading order. This is the best starting point for feeding an LLM or building a RAG pipeline.
import pymupdf4llm
md = pymupdf4llm.to_markdown("report.pdf")
print(md)
# Tables appear as Markdown | col1 | col2 | ... inline with the textThis usually means the PDF uses custom font encodings without a proper character map (CMAP). The font's glyphs are present but cannot be mapped back to Unicode. In these cases:
page.get_textpage_ocr())Pass a clip rectangle to get_text():
import pymupdf
doc = pymupdf.open("input.pdf")
page = doc[0]
# Define the area you want (x0, y0, x1, y1) in points
clip = pymupdf.Rect(50, 100, 400, 300)
text = page.get_text("text", clip=clip)import pymupdf
doc = pymupdf.open("input.pdf")
page = doc[0]
# Returns a list of Rect objects surrounding each match
locations = page.search_for("invoice number")
for rect in locations:
print(rect) # e.g. Rect(72.0, 120.5, 210.0, 134.0)Charts and diagrams created by tools like matplotlib, Excel, or R are typically rendered as vector graphics (PDF drawing commands), not raster images. get_images only lists embedded raster image objects and will not detect vector graphics. To capture these, rasterise the entire page with page.get_pixmap().
PyMuPDF uses MuPDF's built-in Tesseract-based OCR support, so there is no Python-level pytesseract dependency. However, PyMuPDF still needs access to the Tesseract language data files (tessdata), and automatic tessdata discovery may invoke the tesseract executable (for example, to list available languages) if you do not explicitly provide a tessdata path. In practice, the recommended setup is to either install Tesseract so discovery works automatically, or configure the tessdata location yourself via the tessdata parameter or the TESSDATA_PREFIX environment variable. Over 100 languages are supported.
import pymupdf
doc = pymupdf.open("scanned.pdf")
page = doc[0]
# Get a text page using OCR
tp = page.get_textpage_ocr(language="eng")
text = page.get_text(textpage=tp)
print(text)import pymupdf
pix = pymupdf.Pixmap("image.png")
if pix.alpha:
pix = pymupdf.Pixmap(pix, 0) # remove alpha channel — required for OCR
# Wrap in a 1-page PDF and OCR it
doc = pymupdf.open()
page = doc.new_page(width=pix.width, height=pix.height)
page.insert_image(page.rect, pixmap=pix)
tp = page.get_textpage_ocr()
text = page.get_text(textpage=tp)import pymupdf
doc = pymupdf.open("input.pdf")
page = doc[0]
# Use quads=True for accurate highlights on non-horizontal text
quads = page.search_for("important term", quads=True)
page.add_highlight_annot(quads)
doc.save("highlighted.pdf")PyMuPDF supports all standard PDF text markers: highlight, underline, strikeout, and squiggly.
Redaction is a deliberate two-step process so you can review before committing:
import pymupdf
doc = pymupdf.open("input.pdf")
page = doc[0]
# Step 1: Mark the area(s) to redact
rect = page.search_for("confidential")[0]
page.add_redact_annot(rect, fill=(1, 1, 1)) # white fill
# Step 2: Apply — permanently removes the underlying content
page.apply_redactions()
doc.save("redacted.pdf")After apply_redactions(), the original content is gone. It cannot be recovered from the saved file.
import pymupdf
doc = pymupdf.open("form.pdf")
page = doc[0]
for field in page.widgets():
print(f"{field.field_name}: {field.field_value}")import pymupdf
doc = pymupdf.open("form.pdf")
page = doc[0]
for field in page.widgets():
if field.field_name == "First Name":
field.field_value = "Ada"
field.update()
doc.save("filled_form.pdf")No. PyMuPDF does not support multithreaded use, even with Python's newer free-threading mode. The underlying MuPDF library only provides partial thread safety, and a fully thread-safe PyMuPDF implementation would still impose a single-threaded overhead — negating the benefit.
Use multiprocessing instead. Each process opens the file independently and works on its own page range:
from multiprocessing import Pool
import pymupdf
def process_pages(args):
path, start, end = args
doc = pymupdf.open(path) # each process opens its own handle
results = []
for i in range(start, end):
results.append(doc[i].get_text())
return results
with Pool(4) as pool:
chunks = [("input.pdf", 0, 25), ("input.pdf", 25, 50), ...]
all_results = pool.map(process_pages, chunks)Reuse a TextPage object. Creating a TextPage is the expensive part — once created, switching between extraction formats is cheap:
import pymupdf
page = doc[0]
tp = page.get_textpage() # create once
text = page.get_text("text", textpage=tp)
words = page.get_text("words", textpage=tp)
data = page.get_text("dict", textpage=tp)This can reduce execution time by 50–95% for repeated extractions on the same page.
import pymupdf
doc = pymupdf.open("input.pdf")
# Read
print(doc.metadata)
# {'title': '...', 'author': '...', 'subject': '...', 'keywords': '...', ...}
# Write
doc.set_metadata({
"title": "Annual Report 2025",
"author": "Finance Team",
"keywords": "annual, finance, 2025"
})
doc.save("output.pdf")import pymupdf
doc = pymupdf.open("input.pdf")
# Read — returns a list of [level, title, page_number] entries
toc = doc.get_toc()
for level, title, page in toc:
print(" " * level, title, "→ page", page)
# Write
new_toc = [
[1, "Introduction", 1],
[1, "Methods", 5],
[2, "Data sources", 6],
]
doc.set_toc(new_toc)
doc.save("output.pdf")Full installation guide, API reference, cookbook, and tutorial at pymupdf.readthedocs.io.
| Project | Description |
|---|---|
| PyMuPDF4LLM | LLM/RAG-optimised Markdown and JSON extraction |
| PyMuPDF Pro | Adds Office and HWP document support |
| pymupdf-fonts | Extended font collection for PyMuPDF text output |
PyMuPDF and MuPDF are maintained by Artifex Software, Inc.
Contributions are welcome. Please open an issue before submitting large pull requests.
If you find this useful, please consider giving it a star — it helps others discover it!