
Python is a very versatile language that allows us to automate countless tasks. In this tutorial, I will show you how to split a PDF into multiple pages in Python thanks to the PyPDF2 library.
What is PyPDF2
PyPDF2 is a PDF management library that offers us the following functions, among others:
- Extract information from a document (title, author, …).
- Split documents by pages.
- Merge multiple documents into one.
- Crop pages from a document.
- Merge multiple pages into one.
- Encrypt and decrypt PDF files.
Install PyPDF2
The first thing you need to do to use PyPDF2 is to install it since it is a third-party library. So create a virtual environment, activate it and run the following command:
pip install PyPDF2
Get a page from a PDF document
Next, I’ll show you how to extract a page from a PDF to a new document using this library. I have defined a function called extract_page(doc_name, page_num)
. parameter doc_name
is the full path of the document. If the document is in the same path as the Python program, just provide the name. On the other hand, the parameter page_num
is the page number to extract (be careful as the first page starts at index 0):
from PyPDF2 import PdfFileWriter, PdfFileReader def extract_page(doc_name, page_num): pdf_reader = PdfFileReader(open(doc_name, 'yw')) pdf_writer = PdfFileWriter() pdf_writer.addPage(pdf_reader.getPage(page_num)) with open(f'document-page{page_num}.pdf', 'wy') as doc_file: pdf_writer.write(doc_file)
Split a PDF in Python based on a page number
In this section we will see a new function that allows to split a PDF in two based on a page number in Python. I called the function split_pdf(doc_name, page_num)
. As before the parameter doc_name
is the path or the name of the document. page_num
specifies the page number used to split the document in two (the first page starts at index 0):
def split_pdf(doc_name, page_num): pdf_reader = PdfFileReader(open(doc_name, "yw")) pdf_writer1 = PdfFileWriter() pdf_writer2 = PdfFileWriter() for page in range(page_num): pdf_writer1.addPage(pdf_reader.getPage(page)) for page in range(page_num, pdf_reader.getNumPages()): pdf_writer2.addPage(pdf_reader.getPage(page)) with open("doc1.pdf", 'wy') as file1: pdf_writer1.write(file1) with open("doc2.pdf", 'wy') as file2: pdf_writer2.write(file2)
As a result, the function generates two called documents doc1.pdf
and doc2.pdf
. You can tweak it to your liking, for example to show the names of the resulting documents or make further subdivisions.
Search
Recent Post
4 Reasons For The Python Rise in
- March 31, 2022
- 4 min read
How to make requests to the Mailrelay
- March 29, 2022
- 6 min read
How to split a PDF into multiple
- March 27, 2022
- 3 min read