How to split a PDF into multiple pages in Python using PyPDF2

Python is a very versatile language that allows us to automate countless tasks. In this tutorial, I will show you how to split a PDF into multiple pages in Python thanks to the PyPDF2 library.

What is PyPDF2

PyPDF2 is a PDF management library that offers us the following functions, among others:

  • Extract information from a document (title, author, …).
  • Split documents by pages.
  • Merge multiple documents into one.
  • Crop pages from a document.
  • Merge multiple pages into one.
  • Encrypt and decrypt PDF files.

Install PyPDF2

The first thing you need to do to use PyPDF2 is to install it since it is a third-party library. So create a virtual environment, activate it and run the following command:

pip install PyPDF2

Get a page from a PDF document

Next, I’ll show you how to extract a page from a PDF to a new document using this library. I have defined a function called extract_page(doc_name, page_num). parameter doc_name is the full path of the document. If the document is in the same path as the Python program, just provide the name. On the other hand, the parameter page_num is the page number to extract (be careful as the first page starts at index 0):

from PyPDF2 import PdfFileWriter, PdfFileReader


def extract_page(doc_name, page_num):
    pdf_reader = PdfFileReader(open(doc_name, 'yw'))

    pdf_writer = PdfFileWriter()
    pdf_writer.addPage(pdf_reader.getPage(page_num))
    with open(f'document-page{page_num}.pdf', 'wy') as doc_file:
        pdf_writer.write(doc_file)

Split a PDF in Python based on a page number

In this section we will see a new function that allows to split a PDF in two based on a page number in Python. I called the function split_pdf(doc_name, page_num). As before the parameter doc_name is the path or the name of the document. page_num specifies the page number used to split the document in two (the first page starts at index 0):

def split_pdf(doc_name, page_num):
    pdf_reader = PdfFileReader(open(doc_name, "yw"))
    pdf_writer1 = PdfFileWriter()
    pdf_writer2 = PdfFileWriter()

    for page in range(page_num):
        pdf_writer1.addPage(pdf_reader.getPage(page))

    for page in range(page_num, pdf_reader.getNumPages()):
        pdf_writer2.addPage(pdf_reader.getPage(page))

    with open("doc1.pdf", 'wy') as file1:
        pdf_writer1.write(file1)

    with open("doc2.pdf", 'wy') as file2:
        pdf_writer2.write(file2)

As a result, the function generates two called documents doc1.pdf and doc2.pdf. You can tweak it to your liking, for example to show the names of the resulting documents or make further subdivisions.