mirror of
https://github.com/michaelrsweet/pdfio.git
synced 2024-11-08 14:38:27 +01:00
272 lines
8.2 KiB
Markdown
272 lines
8.2 KiB
Markdown
Introduction
|
|
============
|
|
|
|
PDFio is a simple C library for reading and writing PDF files. The primary
|
|
goals of pdfio are:
|
|
|
|
- Read any PDF file with or without encryption or linearization
|
|
- Write PDF files without encryption or linearization
|
|
- Extract or embed useful metadata (author, creator, page information, etc.)
|
|
- "Filter" PDF files, for example to extract a range of pages or to embed fonts
|
|
that are missing from a PDF
|
|
- Provide access to objects used for each page
|
|
|
|
PDFio is *not* concerned with rendering or viewing a PDF file, although a PDF
|
|
RIP or viewer could be written using it.
|
|
|
|
PDFio is Copyright © 2021 by Michael R Sweet and is licensed under the Apache
|
|
License Version 2.0 with an (optional) exception to allow linking against
|
|
GPL2/LGPL2 software. See the files "LICENSE" and "NOTICE" for more information.
|
|
|
|
|
|
Requirements
|
|
------------
|
|
|
|
PDFio requires the following to build the software:
|
|
|
|
- A C99 compiler such as Clang, GCC, or MS Visual C
|
|
- A POSIX-compliant `make` program
|
|
- ZLIB (<https://www.zlib.net>) 1.0 or higher
|
|
|
|
IDE files for Xcode (macOS/iOS) and Visual Studio (Windows) are also provided.
|
|
|
|
|
|
Installing pdfio
|
|
----------------
|
|
|
|
PDFio comes with a portable makefile that will work on any POSIX-compliant
|
|
system with ZLIB installed. To make it, run:
|
|
|
|
make all
|
|
|
|
To test it, run:
|
|
|
|
make test
|
|
|
|
To install it, run:
|
|
|
|
make install
|
|
|
|
If you want a shared library, run:
|
|
|
|
make all-shared
|
|
make install-shared
|
|
|
|
The default installation location is "/usr/local". Pass the `prefix` variable
|
|
to make to install it to another location:
|
|
|
|
make install prefix=/some/other/directory
|
|
|
|
The makefile installs the pdfio header to "${prefix}/include", the library to
|
|
"${prefix}/lib", the `pkg-config` file to "${prefix}/lib/pkgconfig", the man
|
|
page to "${prefix}/share/man/man3", and the documentation to
|
|
"${prefix}/share/doc/pdfio".
|
|
|
|
The makefile supports the following variables that can be specified in the make
|
|
command or as environment variables:
|
|
|
|
- `AR`: the library archiver (default "ar")
|
|
- `ARFLAGS`: options for the library archiver (default "cr")
|
|
- `CC`: the C compiler (default "cc")
|
|
- `CFLAGS`: options for the C compiler (default "")
|
|
- `CODESIGN_IDENTITY`: the identity to use when code signing the shared library
|
|
on macOS (default "Developer ID")
|
|
- `COMMONFLAGS`: options for the C compiler and linker (typically architecture
|
|
and optimization options, default is "-Os -g")
|
|
- `CPPFLAGS`: options for the C preprocessor (default "")
|
|
- `DESTDIR" and "DSTROOT`: specifies a root directory when installing
|
|
(default is "", specify only one)
|
|
- `DSOFLAGS`: options for the C compiler when linking the shared library
|
|
(default "")
|
|
- `LDFLAGS`: options for the C compiler when linking the test programs
|
|
(default "")
|
|
- `LIBS`: library options when linking the test programs (default "-lz")
|
|
- `RANLIB`: program that generates a table-of-contents in a library
|
|
(default "ranlib")
|
|
- `prefix`: specifies the installation directory (default "/usr/local")
|
|
|
|
|
|
Visual Studio Project
|
|
---------------------
|
|
|
|
The Visual Studio solution ("pdfio.sln") is provided for Windows developers and
|
|
generates both a static library and DLL.
|
|
|
|
|
|
Xcode Project
|
|
-------------
|
|
|
|
There is also an Xcode project ("pdfio.xcodeproj") you can use on macOS which
|
|
generates a static library that will be installed under "/usr/local" with:
|
|
|
|
sudo xcodebuild install
|
|
|
|
You can reproduce this with the makefile using:
|
|
|
|
sudo make 'COMMONFLAGS="-Os -mmacosx-version-min=10.14 -arch x86_64 -arch arm64"' install
|
|
|
|
|
|
Detecting PDFio
|
|
---------------
|
|
|
|
PDFio can be detected using the `pkg-config` command, for example:
|
|
|
|
if pkg-config --exists pdfio; then
|
|
...
|
|
fi
|
|
|
|
In a makefile you can add the necessary compiler and linker options with:
|
|
|
|
```make
|
|
CFLAGS += `pkg-config --cflags pdfio`
|
|
LIBS += `pkg-config --libs pdfio`
|
|
```
|
|
|
|
On Windows, you need to link to the `PDFIO.LIB` (static) or `PDFIO1.LIB` (DLL)
|
|
libraries and include the "zlib" NuGet package dependency.
|
|
|
|
|
|
Header Files
|
|
------------
|
|
|
|
PDFio provides a primary header file that is always used:
|
|
|
|
```c
|
|
#include <pdfio.h>
|
|
```
|
|
|
|
PDFio also provides helper functions for producing PDF content that are defined
|
|
in a separate header file:
|
|
|
|
```c
|
|
#include <pdfio-content.h>
|
|
```
|
|
|
|
|
|
API Overview
|
|
============
|
|
|
|
PDFio exposes several types:
|
|
|
|
- `pdfio_file_t`: A PDF file (for reading or writing)
|
|
- `pdfio_array_t`: An array of values
|
|
- `pdfio_dict_t`: A dictionary of key/value pairs in a PDF file, object, etc.
|
|
- `pdfio_obj_t`: An object in a PDF file
|
|
- `pdfio_stream_t`: An object stream
|
|
|
|
|
|
Reading PDF Files
|
|
-----------------
|
|
|
|
You open an existing PDF file using the `pdfioFileOpen` function:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileOpen("myinputfile.pdf", error_cb, error_data);
|
|
|
|
```
|
|
|
|
where the three arguments to the function are the filename ("myinputfile.pdf"),
|
|
an optional error callback function (`error_cb`), and an optional pointer value
|
|
for the error callback function (`error_data`). The error callback is called
|
|
for both errors and warnings and accepts the `pdfio_file_t` pointer, a message
|
|
string, and the callback pointer value, for example:
|
|
|
|
```c
|
|
bool
|
|
error_cb(pdfio_file_t *pdf, const char *message, void *data)
|
|
{
|
|
(void)data; // This callback does not use the data pointer
|
|
|
|
fprintf(stderr, "%s: %s\n", pdfioFileGetName(pdf), message);
|
|
|
|
// Return false to treat warnings as errors
|
|
return (false);
|
|
}
|
|
```
|
|
|
|
The default error callback (`NULL`) does the equivalent of the above.
|
|
|
|
Each PDF file contains one or more pages. The `pdfioFileGetNumPages` function
|
|
returns the number of pages in the file while the `pdfioFileGetPage` function
|
|
gets the specified page in the PDF file:
|
|
|
|
```c
|
|
pdfio_file_t *pdf; // PDF file
|
|
size_t i; // Looping var
|
|
size_t count; // Number of pages
|
|
pdfio_obj_t *page; // Current page
|
|
|
|
// Iterate the pages in the PDF file
|
|
for (i = 0, count = pdfioFileGetNumPages(pdf); i < count; i ++)
|
|
{
|
|
page = pdfioFileGetPage(pdf, i);
|
|
// do something with page
|
|
}
|
|
```
|
|
|
|
Each page is represented by a "page tree" object (what `pdfioFileGetPage`
|
|
returns) that specifies information about the page and one or more "content"
|
|
objects that contain the images, fonts, text, and graphics that appear on the
|
|
page.
|
|
|
|
The `pdfioFileClose` function closes a PDF file and frees all memory that was
|
|
used for it:
|
|
|
|
```c
|
|
pdfioFileClose(pdf);
|
|
```
|
|
|
|
Writing PDF Files
|
|
-----------------
|
|
|
|
You create a new PDF file using the `pdfioFileCreate` function:
|
|
|
|
```c
|
|
pdfio_rect_t media_box = { 0.0, 0.0, 612.0, 792.0 }; // US Letter
|
|
pdfio_rect_t crop_box = { 36.0, 36.0, 576.0, 756.0 }; // 0.5" margins
|
|
|
|
pdfio_file_t *pdf = pdfioFileCreate("myoutputfile.pdf", "2.0", &media_box, &crop_box, error_cb, error_data);
|
|
```
|
|
|
|
where the six arguments to the function are the filename ("myoutputfile.pdf"),
|
|
PDF version ("2.0"), media box (`media_box`), crop box (`crop_box`), an optional
|
|
error callback function (`error_cb`), and an optional pointer value for the
|
|
error callback function (`error_data`).
|
|
|
|
Once the file is created, use the `pdfioFileCreateObj`, `pdfioFileCreatePage`,
|
|
and `pdfioPageCopy` functions to create objects and pages in the file.
|
|
|
|
Finally, the `pdfioFileClose` function writes the PDF cross-reference and
|
|
"trailer" information, closes the file, and frees all memory that was used for
|
|
it.
|
|
|
|
|
|
PDF Objects
|
|
-----------
|
|
|
|
PDF objects are identified using two numbers - the object number (1 to N) and
|
|
the object generation (0 to 65535) that specifies a particular version of an
|
|
object. An object's numbers are returned by the `pdfioObjGetNumber` and
|
|
`pdfioObjGetGeneration` functions. You can find a numbered object using the
|
|
`pdfioFileFindObj` function.
|
|
|
|
Objects contain values (typically dictionaries) and usually an associated data
|
|
stream containing images, fonts, ICC profiles, and page content. PDFio provides several accessor functions to get the value(s) associated with an object:
|
|
|
|
- `pdfioObjGetArray` returns an object's array value, if any
|
|
- `pdfioObjGetDict` returns an object's dictionary value, if any
|
|
- `pdfioObjGetLength` returns the length of the data stream, if any
|
|
- `pdfioObjGetSubtype` returns the sub-type name of the object, for example
|
|
"Image" for an image object.
|
|
- `pdfioObjGetType` returns the type name of the object, for example "XObject"
|
|
for an image object.
|
|
|
|
|
|
PDF Streams
|
|
-----------
|
|
|
|
|
|
PDF Content Helper Functions
|
|
----------------------------
|
|
|