8.2 KiB
Introduction
PDFio is a simple C library for reading and writing PDF files. The primary goals of pdfio are:
- Read any PDF file with or without encryption or linearization
- Write PDF files without encryption or linearization
- Extract or embed useful metadata (author, creator, page information, etc.)
- "Filter" PDF files, for example to extract a range of pages or to embed fonts that are missing from a PDF
- Provide access to objects used for each page
PDFio is not concerned with rendering or viewing a PDF file, although a PDF RIP or viewer could be written using it.
PDFio is Copyright © 2021 by Michael R Sweet and is licensed under the Apache License Version 2.0 with an (optional) exception to allow linking against GPL2/LGPL2 software. See the files "LICENSE" and "NOTICE" for more information.
Requirements
PDFio requires the following to build the software:
- A C99 compiler such as Clang, GCC, or MS Visual C
- A POSIX-compliant
make
program - ZLIB (https://www.zlib.net) 1.0 or higher
IDE files for Xcode (macOS/iOS) and Visual Studio (Windows) are also provided.
Installing pdfio
PDFio comes with a portable makefile that will work on any POSIX-compliant system with ZLIB installed. To make it, run:
make all
To test it, run:
make test
To install it, run:
make install
If you want a shared library, run:
make all-shared
make install-shared
The default installation location is "/usr/local". Pass the prefix
variable
to make to install it to another location:
make install prefix=/some/other/directory
The makefile installs the pdfio header to "${prefix}/include", the library to
"${prefix}/lib", the pkg-config
file to "${prefix}/lib/pkgconfig", the man
page to "${prefix}/share/man/man3", and the documentation to
"${prefix}/share/doc/pdfio".
The makefile supports the following variables that can be specified in the make command or as environment variables:
AR
: the library archiver (default "ar")ARFLAGS
: options for the library archiver (default "cr")CC
: the C compiler (default "cc")CFLAGS
: options for the C compiler (default "")CODESIGN_IDENTITY
: the identity to use when code signing the shared library on macOS (default "Developer ID")COMMONFLAGS
: options for the C compiler and linker (typically architecture and optimization options, default is "-Os -g")CPPFLAGS
: options for the C preprocessor (default "")DESTDIR" and "DSTROOT
: specifies a root directory when installing (default is "", specify only one)DSOFLAGS
: options for the C compiler when linking the shared library (default "")LDFLAGS
: options for the C compiler when linking the test programs (default "")LIBS
: library options when linking the test programs (default "-lz")RANLIB
: program that generates a table-of-contents in a library (default "ranlib")prefix
: specifies the installation directory (default "/usr/local")
Visual Studio Project
The Visual Studio solution ("pdfio.sln") is provided for Windows developers and generates both a static library and DLL.
Xcode Project
There is also an Xcode project ("pdfio.xcodeproj") you can use on macOS which generates a static library that will be installed under "/usr/local" with:
sudo xcodebuild install
You can reproduce this with the makefile using:
sudo make 'COMMONFLAGS="-Os -mmacosx-version-min=10.14 -arch x86_64 -arch arm64"' install
Detecting PDFio
PDFio can be detected using the pkg-config
command, for example:
if pkg-config --exists pdfio; then
...
fi
In a makefile you can add the necessary compiler and linker options with:
CFLAGS += `pkg-config --cflags pdfio`
LIBS += `pkg-config --libs pdfio`
On Windows, you need to link to the PDFIO.LIB
(static) or PDFIO1.LIB
(DLL)
libraries and include the "zlib" NuGet package dependency.
Header Files
PDFio provides a primary header file that is always used:
#include <pdfio.h>
PDFio also provides helper functions for producing PDF content that are defined in a separate header file:
#include <pdfio-content.h>
API Overview
PDFio exposes several types:
pdfio_file_t
: A PDF file (for reading or writing)pdfio_array_t
: An array of valuespdfio_dict_t
: A dictionary of key/value pairs in a PDF file, object, etc.pdfio_obj_t
: An object in a PDF filepdfio_stream_t
: An object stream
Reading PDF Files
You open an existing PDF file using the pdfioFileOpen
function:
pdfio_file_t *pdf = pdfioFileOpen("myinputfile.pdf", error_cb, error_data);
where the three arguments to the function are the filename ("myinputfile.pdf"),
an optional error callback function (error_cb
), and an optional pointer value
for the error callback function (error_data
). The error callback is called
for both errors and warnings and accepts the pdfio_file_t
pointer, a message
string, and the callback pointer value, for example:
bool
error_cb(pdfio_file_t *pdf, const char *message, void *data)
{
(void)data; // This callback does not use the data pointer
fprintf(stderr, "%s: %s\n", pdfioFileGetName(pdf), message);
// Return false to treat warnings as errors
return (false);
}
The default error callback (NULL
) does the equivalent of the above.
Each PDF file contains one or more pages. The pdfioFileGetNumPages
function
returns the number of pages in the file while the pdfioFileGetPage
function
gets the specified page in the PDF file:
pdfio_file_t *pdf; // PDF file
size_t i; // Looping var
size_t count; // Number of pages
pdfio_obj_t *page; // Current page
// Iterate the pages in the PDF file
for (i = 0, count = pdfioFileGetNumPages(pdf); i < count; i ++)
{
page = pdfioFileGetPage(pdf, i);
// do something with page
}
Each page is represented by a "page tree" object (what pdfioFileGetPage
returns) that specifies information about the page and one or more "content"
objects that contain the images, fonts, text, and graphics that appear on the
page.
The pdfioFileClose
function closes a PDF file and frees all memory that was
used for it:
pdfioFileClose(pdf);
Writing PDF Files
You create a new PDF file using the pdfioFileCreate
function:
pdfio_rect_t media_box = { 0.0, 0.0, 612.0, 792.0 }; // US Letter
pdfio_rect_t crop_box = { 36.0, 36.0, 576.0, 756.0 }; // 0.5" margins
pdfio_file_t *pdf = pdfioFileCreate("myoutputfile.pdf", "2.0", &media_box, &crop_box, error_cb, error_data);
where the six arguments to the function are the filename ("myoutputfile.pdf"),
PDF version ("2.0"), media box (media_box
), crop box (crop_box
), an optional
error callback function (error_cb
), and an optional pointer value for the
error callback function (error_data
).
Once the file is created, use the pdfioFileCreateObj
, pdfioFileCreatePage
,
and pdfioPageCopy
functions to create objects and pages in the file.
Finally, the pdfioFileClose
function writes the PDF cross-reference and
"trailer" information, closes the file, and frees all memory that was used for
it.
PDF Objects
PDF objects are identified using two numbers - the object number (1 to N) and
the object generation (0 to 65535) that specifies a particular version of an
object. An object's numbers are returned by the pdfioObjGetNumber
and
pdfioObjGetGeneration
functions. You can find a numbered object using the
pdfioFileFindObj
function.
Objects contain values (typically dictionaries) and usually an associated data stream containing images, fonts, ICC profiles, and page content. PDFio provides several accessor functions to get the value(s) associated with an object:
pdfioObjGetArray
returns an object's array value, if anypdfioObjGetDict
returns an object's dictionary value, if anypdfioObjGetLength
returns the length of the data stream, if anypdfioObjGetSubtype
returns the sub-type name of the object, for example "Image" for an image object.pdfioObjGetType
returns the type name of the object, for example "XObject" for an image object.