mirror of
https://github.com/michaelrsweet/pdfio.git
synced 2024-12-27 05:48:20 +01:00
2081 lines
66 KiB
Markdown
2081 lines
66 KiB
Markdown
Introduction
|
|
============
|
|
|
|
PDFio is a simple C library for reading and writing PDF files. The primary
|
|
goals of PDFio are:
|
|
|
|
- Read and write any version of PDF file
|
|
- Provide access to pages, objects, and streams within a PDF file
|
|
- Support reading and writing of encrypted PDF files
|
|
- Extract or embed useful metadata (author, creator, page information, etc.)
|
|
- "Filter" PDF files, for example to extract a range of pages or to embed fonts
|
|
that are missing from a PDF
|
|
- Provide access to objects used for each page
|
|
|
|
PDFio is *not* concerned with rendering or viewing a PDF file, although a PDF
|
|
RIP or viewer could be written using it.
|
|
|
|
PDFio is Copyright © 2021-2024 by Michael R Sweet and is licensed under the
|
|
Apache License Version 2.0 with an (optional) exception to allow linking against
|
|
GPL2/LGPL2 software. See the files "LICENSE" and "NOTICE" for more information.
|
|
|
|
|
|
Requirements
|
|
------------
|
|
|
|
PDFio requires the following to build the software:
|
|
|
|
- A C99 compiler such as Clang, GCC, or MS Visual C
|
|
- A POSIX-compliant `make` program
|
|
- A POSIX-compliant `sh` program
|
|
- ZLIB (<https://www.zlib.net>) 1.0 or higher
|
|
|
|
IDE files for Xcode (macOS/iOS) and Visual Studio (Windows) are also provided.
|
|
|
|
|
|
Installing PDFio
|
|
----------------
|
|
|
|
PDFio comes with a configure script that creates a portable makefile that will
|
|
work on any POSIX-compliant system with ZLIB installed. To make it, run:
|
|
|
|
./configure
|
|
make
|
|
|
|
To test it, run:
|
|
|
|
make test
|
|
|
|
To install it, run:
|
|
|
|
sudo make install
|
|
|
|
If you want a shared library, run:
|
|
|
|
./configure --enable-shared
|
|
make
|
|
sudo make install
|
|
|
|
The default installation location is "/usr/local". Pass the `--prefix` option
|
|
to make to install it to another location:
|
|
|
|
./configure --prefix=/some/other/directory
|
|
|
|
Other configure options can be found using the `--help` option:
|
|
|
|
./configure --help
|
|
|
|
|
|
Visual Studio Project
|
|
---------------------
|
|
|
|
The Visual Studio solution ("pdfio.sln") is provided for Windows developers and
|
|
generates both a static library and DLL.
|
|
|
|
|
|
Xcode Project
|
|
-------------
|
|
|
|
There is also an Xcode project ("pdfio.xcodeproj") you can use on macOS which
|
|
generates a static library that will be installed under "/usr/local" with:
|
|
|
|
sudo xcodebuild install
|
|
|
|
|
|
Detecting PDFio
|
|
---------------
|
|
|
|
PDFio can be detected using the `pkg-config` command, for example:
|
|
|
|
if pkg-config --exists pdfio; then
|
|
...
|
|
fi
|
|
|
|
In a makefile you can add the necessary compiler and linker options with:
|
|
|
|
```make
|
|
CFLAGS += `pkg-config --cflags pdfio`
|
|
LIBS += `pkg-config --libs pdfio`
|
|
```
|
|
|
|
On Windows, you need to link to the `PDFIO1.LIB` (DLL) library and include the
|
|
`zlib_native` NuGet package dependency. You can also use the published
|
|
`pdfio_native` NuGet package.
|
|
|
|
|
|
Header Files
|
|
------------
|
|
|
|
PDFio provides a primary header file that is always used:
|
|
|
|
```c
|
|
#include <pdfio.h>
|
|
```
|
|
|
|
PDFio also provides [PDF content helper functions](@) for producing PDF content
|
|
that are defined in a separate header file:
|
|
|
|
```c
|
|
#include <pdfio-content.h>
|
|
```
|
|
|
|
|
|
Understanding PDF Files
|
|
-----------------------
|
|
|
|
A PDF file provides data and commands for displaying pages of graphics and text,
|
|
and is structured in a way that allows it to be displayed in the same way across
|
|
multiple devices and platforms. The following is a PDF which shows "Hello,
|
|
World!" on one page:
|
|
|
|
```
|
|
%PDF-1.0 % Header starts here
|
|
%âãÏÓ
|
|
1 0 obj % Body starts here
|
|
<<
|
|
/Kids [2 0 R]
|
|
/Count 1
|
|
/Type /Pages
|
|
>>
|
|
endobj
|
|
2 0 obj
|
|
<<
|
|
/Rotate 0
|
|
/Parent 1 0 R
|
|
/Resources 3 0 R
|
|
/MediaBox [0 0 612 792]
|
|
/Contents [4 0 R]/Type /Page
|
|
>>
|
|
endobj
|
|
3 0 obj
|
|
<<
|
|
/Font
|
|
<<
|
|
/F0
|
|
<<
|
|
/BaseFont /Times-Italic
|
|
/Subtype /Type1
|
|
/Type /Font
|
|
>>
|
|
>>
|
|
>>
|
|
endobj
|
|
4 0 obj
|
|
<<
|
|
/Length 65
|
|
>>
|
|
stream
|
|
1. 0. 0. 1. 50. 700. cm
|
|
BT
|
|
/F0 36. Tf
|
|
(Hello, World!) Tj
|
|
ET
|
|
endstream
|
|
endobj
|
|
5 0 obj
|
|
<<
|
|
/Pages 1 0 R
|
|
/Type /Catalog
|
|
>>
|
|
endobj
|
|
xref % Cross-reference table starts here
|
|
0 6
|
|
0000000000 65535 f
|
|
0000000015 00000 n
|
|
0000000074 00000 n
|
|
0000000192 00000 n
|
|
0000000291 00000 n
|
|
0000000409 00000 n
|
|
trailer % Trailer starts here
|
|
<<
|
|
/Root 5 0 R
|
|
/Size 6
|
|
>>
|
|
startxref
|
|
459
|
|
%%EOF
|
|
```
|
|
|
|
|
|
### Header
|
|
|
|
The header is the first line of a PDF file that specifies the version of the PDF
|
|
format that has been used, for example `%PDF-1.0`.
|
|
|
|
Since PDF files almost always contain binary data, they can become corrupted if
|
|
line endings are changed. For example, if the file is transferred using FTP in
|
|
text mode or is edited in Notepad on Windows. To allow legacy file transfer
|
|
programs to determine that the file is binary, the PDF standard recommends
|
|
including some bytes with character codes higher than 127 in the header, for
|
|
example:
|
|
|
|
```
|
|
%âãÏÓ
|
|
```
|
|
|
|
The percent sign indicates a comment line while the other few bytes are
|
|
arbitrary character codes in excess of 127. So, the whole header in our example
|
|
is:
|
|
|
|
```
|
|
%PDF-1.0
|
|
%âãÏÓ
|
|
```
|
|
|
|
|
|
### Body
|
|
|
|
The file body consists of a sequence of objects, each preceded by an object
|
|
number, generation number, and the obj keyword on one line, and followed by the
|
|
endobj keyword on another. For example:
|
|
|
|
```
|
|
1 0 obj
|
|
<<
|
|
/Kids [2 0 R]
|
|
/Count 1
|
|
/Type /Pages
|
|
>>
|
|
endobj
|
|
```
|
|
|
|
In this example, the object number is 1 and the generation number is 0, meaning
|
|
it is the first version of the object. The content for object 1 is between the
|
|
initial `1 0 obj` and trailing `endobj` lines. In this case, the content is the
|
|
dictionary `<</Kids [2 0 R] /Count 1 /Type /Pages>>`.
|
|
|
|
|
|
### Cross-Reference Table
|
|
|
|
The cross-reference table lists the byte offset of each object in the file body.
|
|
This allows random access to objects, meaning they don't have to be read in
|
|
order. Objects that are not used are never read, making the process efficient.
|
|
Operations like counting the number of pages in a PDF document are fast, even in
|
|
large files.
|
|
|
|
Each object has an object number and a generation number. Generation numbers
|
|
are used when a cross-reference table entry is reused. For simplicity, we will
|
|
assume generation numbers to be always zero and ignore them. The
|
|
cross-reference table consists of a header line that indicates the number of
|
|
entries, a free entry line for object 0, and a line for each of the objects in
|
|
the file body. For example:
|
|
|
|
```
|
|
0 6 % Six entries in table, starting at 0
|
|
0000000000 65535 f % Free entry for object 0
|
|
0000000015 00000 n % Object 1 is at byte offset 15
|
|
0000000074 00000 n % Object 2 is at byte offset 74
|
|
0000000192 00000 n % etc...
|
|
0000000291 00000 n
|
|
0000000409 00000 n % Object 5 is at byte offset 409
|
|
```
|
|
|
|
|
|
### Trailer
|
|
|
|
The first line of the trailer is just the `trailer` keyword. This is followed
|
|
by the trailer dictionary which contains at least the `/Size` entry specifying
|
|
the number of entries in the cross-reference table and the `/Root` entry which
|
|
references the object for the document catalog which is the root element of the
|
|
graph of objects in the body.
|
|
|
|
There follows a line with just the `startxref` keyword, a line with a single
|
|
number specifying the byte offset of the start of the cross-reference table
|
|
within the file, and then the line `%%EOF` which signals the end of the PDF
|
|
file.
|
|
|
|
```
|
|
trailer % Trailer keyword
|
|
<< % The trailer dictinonary
|
|
/Root 5 0 R
|
|
/Size 6
|
|
>>
|
|
startxref % startxref keyword
|
|
459 % Byte offset of cross-reference table
|
|
%%EOF % End-of-file marker
|
|
```
|
|
|
|
|
|
API Overview
|
|
============
|
|
|
|
PDFio exposes several types:
|
|
|
|
- `pdfio_file_t`: A PDF file (for reading or writing)
|
|
- `pdfio_array_t`: An array of values
|
|
- `pdfio_dict_t`: A dictionary of key/value pairs in a PDF file, object, etc.
|
|
- `pdfio_obj_t`: An object in a PDF file
|
|
- `pdfio_stream_t`: An object stream
|
|
|
|
|
|
|
|
Reading PDF Files
|
|
-----------------
|
|
|
|
You open an existing PDF file using the [`pdfioFileOpen`](@@) function:
|
|
|
|
```c
|
|
pdfio_file_t *pdf =
|
|
pdfioFileOpen("myinputfile.pdf", password_cb, password_data, error_cb,
|
|
error_data);
|
|
|
|
```
|
|
|
|
where the five arguments to the function are the filename ("myinputfile.pdf"),
|
|
an optional password callback function (`password_cb`) and data pointer value
|
|
(`password_data`), and an optional error callback function (`error_cb`) and data
|
|
pointer value (`error_data`). The password callback is called for encrypted PDF
|
|
files that are not using the default password, for example:
|
|
|
|
```c
|
|
const char *
|
|
password_cb(void *data, const char *filename)
|
|
{
|
|
(void)data; // This callback doesn't use the data pointer
|
|
(void)filename; // This callback doesn't use the filename
|
|
|
|
// Return a password string for the file...
|
|
return ("Password42");
|
|
}
|
|
```
|
|
|
|
The error callback is called for both errors and warnings and accepts the
|
|
`pdfio_file_t` pointer, a message string, and the callback pointer value, for
|
|
example:
|
|
|
|
```c
|
|
bool
|
|
error_cb(pdfio_file_t *pdf, const char *message, void *data)
|
|
{
|
|
(void)data; // This callback does not use the data pointer
|
|
|
|
fprintf(stderr, "%s: %s\n", pdfioFileGetName(pdf), message);
|
|
|
|
// Return false to treat warnings as errors
|
|
return (false);
|
|
}
|
|
```
|
|
|
|
The default error callback (`NULL`) does the equivalent of the above.
|
|
|
|
Each PDF file contains one or more pages. The [`pdfioFileGetNumPages`](@@)
|
|
function returns the number of pages in the file while the
|
|
[`pdfioFileGetPage`](@@) function gets the specified page in the PDF file:
|
|
|
|
```c
|
|
pdfio_file_t *pdf; // PDF file
|
|
size_t i; // Looping var
|
|
size_t count; // Number of pages
|
|
pdfio_obj_t *page; // Current page
|
|
|
|
// Iterate the pages in the PDF file
|
|
for (i = 0, count = pdfioFileGetNumPages(pdf); i < count; i ++)
|
|
{
|
|
page = pdfioFileGetPage(pdf, i);
|
|
// do something with page
|
|
}
|
|
```
|
|
|
|
Each page is represented by a "page tree" object (what [`pdfioFileGetPage`](@@)
|
|
returns) that specifies information about the page and one or more "content"
|
|
objects that contain the images, fonts, text, and graphics that appear on the
|
|
page. Use the [`pdfioPageGetNumStreams`](@@) and [`pdfioPageOpenStream`](@@)
|
|
functions to access the content streams for each page, and
|
|
[`pdfioObjGetDict`](@@) to get the associated page object dictionary. For
|
|
example, if you want to display the media and crop boxes for a given page:
|
|
|
|
```c
|
|
pdfio_file_t *pdf; // PDF file
|
|
size_t i; // Looping var
|
|
size_t count; // Number of pages
|
|
pdfio_obj_t *page; // Current page
|
|
pdfio_dict_t *dict; // Current page dictionary
|
|
pdfio_array_t *media_box; // MediaBox array
|
|
double media_values[4]; // MediaBox values
|
|
pdfio_array_t *crop_box; // CropBox array
|
|
double crop_values[4]; // CropBox values
|
|
|
|
// Iterate the pages in the PDF file
|
|
for (i = 0, count = pdfioFileGetNumPages(pdf); i < count; i ++)
|
|
{
|
|
page = pdfioFileGetPage(pdf, i);
|
|
dict = pdfioObjGetDict(page);
|
|
|
|
media_box = pdfioDictGetArray(dict, "MediaBox");
|
|
media_values[0] = pdfioArrayGetNumber(media_box, 0);
|
|
media_values[1] = pdfioArrayGetNumber(media_box, 1);
|
|
media_values[2] = pdfioArrayGetNumber(media_box, 2);
|
|
media_values[3] = pdfioArrayGetNumber(media_box, 3);
|
|
|
|
crop_box = pdfioDictGetArray(dict, "CropBox");
|
|
crop_values[0] = pdfioArrayGetNumber(crop_box, 0);
|
|
crop_values[1] = pdfioArrayGetNumber(crop_box, 1);
|
|
crop_values[2] = pdfioArrayGetNumber(crop_box, 2);
|
|
crop_values[3] = pdfioArrayGetNumber(crop_box, 3);
|
|
|
|
printf("Page %u: MediaBox=[%g %g %g %g], CropBox=[%g %g %g %g]\n",
|
|
(unsigned)(i + 1),
|
|
media_values[0], media_values[1], media_values[2], media_values[3],
|
|
crop_values[0], crop_values[1], crop_values[2], crop_values[3]);
|
|
}
|
|
```
|
|
|
|
Page object dictionaries have several (mostly optional) key/value pairs,
|
|
including:
|
|
|
|
- "Annots": An array of annotation dictionaries for the page; use
|
|
[`pdfioDictGetArray`](@@) to get the array
|
|
- "CropBox": The crop box as an array of four numbers for the left, bottom,
|
|
right, and top coordinates of the target media; use [`pdfioDictGetArray`](@@)
|
|
to get a pointer to the array of numbers
|
|
- "Dur": The number of seconds the page should be displayed; use
|
|
[`pdfioDictGetNumber`](@@) to get the page duration value
|
|
- "Group": The dictionary of transparency group values for the page; use
|
|
[`pdfioDictGetDict`](@@) to get a pointer to the resources dictionary
|
|
- "LastModified": The date and time when this page was last modified; use
|
|
[`pdfioDictGetDate`](@@) to get the Unix `time_t` value
|
|
- "Parent": The parent page tree node object for this page; use
|
|
[`pdfioDictGetObj`](@@) to get a pointer to the object
|
|
- "MediaBox": The media box as an array of four numbers for the left, bottom,
|
|
right, and top coordinates of the target media; use [`pdfioDictGetArray`](@@)
|
|
to get a pointer to the array of numbers
|
|
- "Resources": The dictionary of resources for the page; use
|
|
[`pdfioDictGetDict`](@@) to get a pointer to the resources dictionary
|
|
- "Rotate": A number indicating the number of degrees of counter-clockwise
|
|
rotation to apply to the page when viewing; use [`pdfioDictGetNumber`](@@)
|
|
to get the rotation angle
|
|
- "Thumb": A thumbnail image object for the page; use [`pdfioDictGetObj`](@@)
|
|
to get a pointer to the thumbnail image object
|
|
- "Trans": The page transition dictionary; use [`pdfioDictGetDict`](@@) to get
|
|
a pointer to the dictionary
|
|
|
|
The [`pdfioFileClose`](@@) function closes a PDF file and frees all memory that
|
|
was used for it:
|
|
|
|
```c
|
|
pdfioFileClose(pdf);
|
|
```
|
|
|
|
|
|
Writing PDF Files
|
|
-----------------
|
|
|
|
You create a new PDF file using the [`pdfioFileCreate`](@@) function:
|
|
|
|
```c
|
|
pdfio_rect_t media_box = { 0.0, 0.0, 612.0, 792.0 }; // US Letter
|
|
pdfio_rect_t crop_box = { 36.0, 36.0, 576.0, 756.0 }; // w/0.5" margins
|
|
|
|
pdfio_file_t *pdf = pdfioFileCreate("myoutputfile.pdf", "2.0", &media_box, &crop_box,
|
|
error_cb, error_data);
|
|
```
|
|
|
|
where the six arguments to the function are the filename ("myoutputfile.pdf"),
|
|
PDF version ("2.0"), media box (`media_box`), crop box (`crop_box`), an optional
|
|
error callback function (`error_cb`), and an optional pointer value for the
|
|
error callback function (`error_data`). The units for the media and crop boxes
|
|
are points (1/72nd of an inch).
|
|
|
|
Alternately you can stream a PDF file using the [`pdfioFileCreateOutput`](@@)
|
|
function:
|
|
|
|
```c
|
|
pdfio_rect_t media_box = { 0.0, 0.0, 612.0, 792.0 }; // US Letter
|
|
pdfio_rect_t crop_box = { 36.0, 36.0, 576.0, 756.0 }; // w/0.5" margins
|
|
|
|
pdfio_file_t *pdf = pdfioFileCreateOutput(output_cb, output_ctx, "2.0", &media_box,
|
|
&crop_box, error_cb, error_data);
|
|
```
|
|
|
|
Once the file is created, use the [`pdfioFileCreateObj`](@@),
|
|
[`pdfioFileCreatePage`](@@), and [`pdfioPageCopy`](@@) functions to create
|
|
objects and pages in the file.
|
|
|
|
Finally, the [`pdfioFileClose`](@@) function writes the PDF cross-reference and
|
|
"trailer" information, closes the file, and frees all memory that was used for
|
|
it.
|
|
|
|
|
|
PDF Objects
|
|
-----------
|
|
|
|
PDF objects are identified using two numbers - the object number (1 to N) and
|
|
the object generation (0 to 65535) that specifies a particular version of an
|
|
object. An object's numbers are returned by the [`pdfioObjGetNumber`](@@) and
|
|
[`pdfioObjGetGeneration`](@@) functions. You can find a numbered object using
|
|
the [`pdfioFileFindObj`](@@) function.
|
|
|
|
Objects contain values (typically dictionaries) and usually an associated data
|
|
stream containing images, fonts, ICC profiles, and page content. PDFio provides several accessor functions to get the value(s) associated with an object:
|
|
|
|
- [`pdfioObjGetArray`](@@) returns an object's array value, if any
|
|
- [`pdfioObjGetDict`](@@) returns an object's dictionary value, if any
|
|
- [`pdfioObjGetLength`](@@) returns the length of the data stream, if any
|
|
- [`pdfioObjGetSubtype`](@@) returns the sub-type name of the object, for
|
|
example "Image" for an image object.
|
|
- [`pdfioObjGetType`](@@) returns the type name of the object, for example
|
|
"XObject" for an image object.
|
|
|
|
|
|
PDF Streams
|
|
-----------
|
|
|
|
Some PDF objects have an associated data stream, such as for pages, images, ICC
|
|
color profiles, and fonts. You access the stream for an existing object using
|
|
the [`pdfioObjOpenStream`](@@) function:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileOpen(...);
|
|
pdfio_obj_t *obj = pdfioFileFindObj(pdf, number);
|
|
pdfio_stream_t *st = pdfioObjOpenStream(obj, true);
|
|
```
|
|
|
|
The first argument is the object pointer. The second argument is a boolean
|
|
value that specifies whether you want to decode (typically decompress) the
|
|
stream data or return it as-is.
|
|
|
|
When reading a page stream you'll use the [`pdfioPageOpenStream`](@@) function
|
|
instead:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileOpen(...);
|
|
pdfio_obj_t *obj = pdfioFileGetPage(pdf, number);
|
|
pdfio_stream_t *st = pdfioPageOpenStream(obj, 0, true);
|
|
```
|
|
|
|
Once you have the stream open, you can use one of several functions to read
|
|
from it:
|
|
|
|
- [`pdfioStreamConsume`](@@) reads and discards a number of bytes in the stream
|
|
- [`pdfioStreamGetToken`](@@) reads a PDF token from the stream
|
|
- [`pdfioStreamPeek`](@@) peeks at the next stream data without advancing or
|
|
"consuming" it
|
|
- [`pdfioStreamRead`](@@) reads a buffer of data
|
|
|
|
When you are done reading from the stream, call the [`pdfioStreamClose`](@@)
|
|
function:
|
|
|
|
```c
|
|
pdfioStreamClose(st);
|
|
```
|
|
|
|
To create a stream for a new object, call the [`pdfioObjCreateStream`](@@)
|
|
function:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileCreate(...);
|
|
pdfio_obj_t *obj = pdfioFileCreateObj(pdf, ...);
|
|
pdfio_stream_t *st = pdfioObjCreateStream(obj, PDFIO_FILTER_FLATE);
|
|
```
|
|
|
|
The first argument is the newly created object. The second argument is either
|
|
`PDFIO_FILTER_NONE` to specify that any encoding is done by your program or
|
|
`PDFIO_FILTER_FLATE` to specify that PDFio should Flate compress the stream.
|
|
|
|
To create a page content stream call the [`pdfioFileCreatePage`](@@) function:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileCreate(...);
|
|
pdfio_dict_t *dict = pdfioDictCreate(pdf);
|
|
... set page dictionary keys and values ...
|
|
pdfio_stream_t *st = pdfioFileCreatePage(pdf, dict);
|
|
```
|
|
|
|
Once you have created the stream, use any of the following functions to write
|
|
to the stream:
|
|
|
|
- [`pdfioStreamPrintf`](@@) writes a formatted string to the stream
|
|
- [`pdfioStreamPutChar`](@@) writes a single character to the stream
|
|
- [`pdfioStreamPuts`](@@) writes a C string to the stream
|
|
- [`pdfioStreamWrite`](@@) writes a buffer of data to the stream
|
|
|
|
The [PDF content helper functions](@) provide additional functions for writing
|
|
specific PDF page stream commands.
|
|
|
|
When you are done writing the stream, call [`pdfioStreamClose`](@@) to close
|
|
both the stream and the object.
|
|
|
|
|
|
PDF Content Helper Functions
|
|
----------------------------
|
|
|
|
PDFio includes many helper functions for embedding or writing specific kinds of
|
|
content to a PDF file. These functions can be roughly grouped into five
|
|
categories:
|
|
|
|
- [Color Space Functions](@)
|
|
- [Font Object Functions](@)
|
|
- [Image Object Functions](@)
|
|
- [Page Stream Functions](@)
|
|
- [Page Dictionary Functions](@)
|
|
|
|
|
|
### Color Space Functions
|
|
|
|
PDF color spaces are specified using well-known names like "DeviceCMYK",
|
|
"DeviceGray", and "DeviceRGB" or using arrays that define so-called calibrated
|
|
color spaces. PDFio provides several functions for embedding ICC profiles and
|
|
creating color space arrays:
|
|
|
|
- [`pdfioArrayCreateColorFromICCObj`](@@) creates a color array for an ICC color
|
|
profile object
|
|
- [`pdfioArrayCreateColorFromMatrix`](@@) creates a color array using a CIE XYZ
|
|
color transform matrix, a gamma value, and a CIE XYZ white point
|
|
- [`pdfioArrayCreateColorFromPalette`](@@) creates an indexed color array from
|
|
an array of sRGB values
|
|
- [`pdfioArrayCreateColorFromPrimaries`](@@) creates a color array using CIE XYZ
|
|
primaries and a gamma value
|
|
- [`pdfioArrayCreateColorFromStandard`](@@) creates a color array for a standard
|
|
color space
|
|
|
|
You can embed an ICC color profile using the
|
|
[`pdfioFileCreateICCObjFromFile`](@@) function:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileCreate(...);
|
|
pdfio_obj_t *icc = pdfioFileCreateICCObjFromFile(pdf, "filename.icc");
|
|
```
|
|
|
|
where the first argument is the PDF file and the second argument is the filename
|
|
of the ICC color profile.
|
|
|
|
PDFio also includes predefined constants for creating a few standard color
|
|
spaces:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileCreate(...);
|
|
|
|
// Create an AdobeRGB color array
|
|
pdfio_array_t *adobe_rgb =
|
|
pdfioArrayCreateColorFromStandard(pdf, 3, PDFIO_CS_ADOBE);
|
|
|
|
// Create an Display P3 color array
|
|
pdfio_array_t *display_p3 =
|
|
pdfioArrayCreateColorFromStandard(pdf, 3, PDFIO_CS_P3_D65);
|
|
|
|
// Create an sRGB color array
|
|
pdfio_array_t *srgb =
|
|
pdfioArrayCreateColorFromStandard(pdf, 3, PDFIO_CS_SRGB);
|
|
```
|
|
|
|
|
|
### Font Object Functions
|
|
|
|
PDF supports many kinds of fonts, including PostScript Type1, PDF Type3,
|
|
TrueType/OpenType, and CID. PDFio provides two functions for creating font
|
|
objects. The first is [`pdfioFileCreateFontObjFromBase`](@@) which creates a
|
|
font object for one of the base PDF fonts:
|
|
|
|
- "Courier"
|
|
- "Courier-Bold"
|
|
- "Courier-BoldItalic"
|
|
- "Courier-Italic"
|
|
- "Helvetica"
|
|
- "Helvetica-Bold"
|
|
- "Helvetica-BoldOblique"
|
|
- "Helvetica-Oblique"
|
|
- "Symbol"
|
|
- "Times-Bold"
|
|
- "Times-BoldItalic"
|
|
- "Times-Italic"
|
|
- "Times-Roman"
|
|
- "ZapfDingbats"
|
|
|
|
Except for Symbol and ZapfDingbats (which use a custom 8-bit character set),
|
|
PDFio always uses the Windows CP1252 subset of Unicode for these fonts.
|
|
|
|
The second function is [`pdfioFileCreateFontObjFromFile`](@@) which creates a
|
|
font object from a TrueType/OpenType font file, for example:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileCreate(...);
|
|
pdfio_obj_t *arial =
|
|
pdfioFileCreateFontObjFromFile(pdf, "OpenSans-Regular.ttf", false);
|
|
```
|
|
|
|
will embed an OpenSans Regular TrueType font using the Windows CP1252 subset of
|
|
Unicode. Pass `true` for the third argument to embed it as a Unicode CID font
|
|
instead, for example:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileCreate(...);
|
|
pdfio_obj_t *arial =
|
|
pdfioFileCreateFontObjFromFile(pdf, "NotoSansJP-Regular.otf", true);
|
|
```
|
|
|
|
will embed the NotoSansJP Regular OpenType font with full support for Unicode.
|
|
|
|
> Note: Not all fonts support Unicode, and most do not contain a full
|
|
> complement of Unicode characters. `pdfioFileCreateFontObjFromFile` does not
|
|
> perform any character subsetting, so the entire font file is embedded in the
|
|
> PDF file.
|
|
|
|
|
|
### Image Object Functions
|
|
|
|
PDF supports images with many different color spaces and bit depths with
|
|
optional transparency. PDFio provides two helper functions for creating image
|
|
objects that can be referenced in page streams. The first function is
|
|
[`pdfioFileCreateImageObjFromData`](@@) which creates an image object from data
|
|
in memory, for example:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileCreate(...);
|
|
unsigned char data[1024 * 1024 * 4]; // 1024x1024 RGBA image data
|
|
pdfio_obj_t *img =
|
|
pdfioFileCreateImageObjFromData(pdf, data, /*width*/1024, /*height*/1024,
|
|
/*num_colors*/3, /*color_data*/NULL,
|
|
/*alpha*/true, /*interpolate*/false);
|
|
```
|
|
|
|
will create an object for a 1024x1024 RGBA image in memory, using the default
|
|
color space for 3 colors ("DeviceRGB"). We can use one of the
|
|
[color space functions](@) to use a specific color space for this image, for
|
|
example:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileCreate(...);
|
|
|
|
// Create an AdobeRGB color array
|
|
pdfio_array_t *adobe_rgb =
|
|
pdfioArrayCreateColorFromMatrix(pdf, 3, pdfioAdobeRGBGamma,
|
|
pdfioAdobeRGBMatrix, pdfioAdobeRGBWhitePoint);
|
|
|
|
// Create a 1024x1024 RGBA image using AdobeRGB
|
|
unsigned char data[1024 * 1024 * 4]; // 1024x1024 RGBA image data
|
|
pdfio_obj_t *img =
|
|
pdfioFileCreateImageObjFromData(pdf, data, /*width*/1024, /*height*/1024,
|
|
/*num_colors*/3, /*color_data*/adobe_rgb,
|
|
/*alpha*/true, /*interpolate*/false);
|
|
```
|
|
|
|
The "interpolate" argument specifies whether the colors in the image should be
|
|
smoothed/interpolated when scaling. This is most useful for photographs but
|
|
should be `false` for screenshot and barcode images.
|
|
|
|
If you have a JPEG or PNG file, use the [`pdfioFileCreateImageObjFromFile`](@@)
|
|
function to copy the image into a PDF image object, for example:
|
|
|
|
```c
|
|
pdfio_file_t *pdf = pdfioFileCreate(...);
|
|
pdfio_obj_t *img =
|
|
pdfioFileCreateImageObjFromFile(pdf, "myphoto.jpg", /*interpolate*/true);
|
|
```
|
|
|
|
> Note: Currently `pdfioFileCreateImageObjFromFile` does not support 12 bit JPEG
|
|
> files or PNG files with an alpha channel.
|
|
|
|
|
|
### Page Dictionary Functions
|
|
|
|
PDF pages each have an associated dictionary to specify the images, fonts, and color spaces used by the page. PDFio provides functions to add these resources
|
|
to the dictionary:
|
|
|
|
- [`pdfioPageDictAddColorSpace`](@@) adds a named color space to the page dictionary
|
|
- [`pdfioPageDictAddFont`](@@) adds a named font to the page dictionary
|
|
- [`pdfioPageDictAddImage`](@@) adds a named image to the page dictionary
|
|
|
|
|
|
### Page Stream Functions
|
|
|
|
PDF page streams contain textual commands for drawing on the page. PDFio
|
|
provides many functions for writing these commands with the correct format and
|
|
escaping, as needed:
|
|
|
|
- [`pdfioContentClip`](@@) clips future drawing to the current path
|
|
- [`pdfioContentDrawImage`](@@) draws an image object
|
|
- [`pdfioContentFill`](@@) fills the current path
|
|
- [`pdfioContentFillAndStroke`](@@) fills and strokes the current path
|
|
- [`pdfioContentMatrixConcat`](@@) concatenates a matrix with the current
|
|
transform matrix
|
|
- [`pdfioContentMatrixRotate`](@@) concatenates a rotation matrix with the
|
|
current transform matrix
|
|
- [`pdfioContentMatrixScale`](@@) concatenates a scaling matrix with the
|
|
current transform matrix
|
|
- [`pdfioContentMatrixTranslate`](@@) concatenates a translation matrix with the
|
|
current transform matrix
|
|
- [`pdfioContentPathClose`](@@) closes the current path
|
|
- [`pdfioContentPathCurve`](@@) appends a Bezier curve to the current path
|
|
- [`pdfioContentPathCurve13`](@@) appends a Bezier curve with 2 control points
|
|
to the current path
|
|
- [`pdfioContentPathCurve23`](@@) appends a Bezier curve with 2 control points
|
|
to the current path
|
|
- [`pdfioContentPathLineTo`](@@) appends a line to the current path
|
|
- [`pdfioContentPathMoveTo`](@@) moves the current point in the current path
|
|
- [`pdfioContentPathRect`](@@) appends a rectangle to the current path
|
|
- [`pdfioContentRestore`](@@) restores a previous graphics state
|
|
- [`pdfioContentSave`](@@) saves the current graphics state
|
|
- [`pdfioContentSetDashPattern`](@@) sets the line dash pattern
|
|
- [`pdfioContentSetFillColorDeviceCMYK`](@@) sets the current fill color using a
|
|
device CMYK color
|
|
- [`pdfioContentSetFillColorDeviceGray`](@@) sets the current fill color using a
|
|
device gray color
|
|
- [`pdfioContentSetFillColorDeviceRGB`](@@) sets the current fill color using a
|
|
device RGB color
|
|
- [`pdfioContentSetFillColorGray`](@@) sets the current fill color using a
|
|
calibrated gray color
|
|
- [`pdfioContentSetFillColorRGB`](@@) sets the current fill color using a
|
|
calibrated RGB color
|
|
- [`pdfioContentSetFillColorSpace`](@@) sets the current fill color space
|
|
- [`pdfioContentSetFlatness`](@@) sets the flatness for curves
|
|
- [`pdfioContentSetLineCap`](@@) sets how the ends of lines are stroked
|
|
- [`pdfioContentSetLineJoin`](@@) sets how connections between lines are stroked
|
|
- [`pdfioContentSetLineWidth`](@@) sets the width of stroked lines
|
|
- [`pdfioContentSetMiterLimit`](@@) sets the miter limit for stroked lines
|
|
- [`pdfioContentSetStrokeColorDeviceCMYK`](@@) sets the current stroke color
|
|
using a device CMYK color
|
|
- [`pdfioContentSetStrokeColorDeviceGray`](@@) sets the current stroke color
|
|
using a device gray color
|
|
- [`pdfioContentSetStrokeColorDeviceRGB`](@@) sets the current stroke color
|
|
using a device RGB color
|
|
- [`pdfioContentSetStrokeColorGray`](@@) sets the current stroke color
|
|
using a calibrated gray color
|
|
- [`pdfioContentSetStrokeColorRGB`](@@) sets the current stroke color
|
|
using a calibrated RGB color
|
|
- [`pdfioContentSetStrokeColorSpace`](@@) sets the current stroke color space
|
|
- [`pdfioContentSetTextCharacterSpacing`](@@) sets the spacing between
|
|
characters for text
|
|
- [`pdfioContentSetTextFont`](@@) sets the font and size for text
|
|
- [`pdfioContentSetTextLeading`](@@) sets the line height for text
|
|
- [`pdfioContentSetTextMatrix`](@@) concatenates a matrix with the current text
|
|
matrix
|
|
- [`pdfioContentSetTextRenderingMode`](@@) sets the text rendering mode
|
|
- [`pdfioContentSetTextRise`](@@) adjusts the baseline for text
|
|
- [`pdfioContentSetTextWordSpacing`](@@) sets the spacing between words for text
|
|
- [`pdfioContentSetTextXScaling`](@@) sets the horizontal scaling for text
|
|
- [`pdfioContentStroke`](@@) strokes the current path
|
|
- [`pdfioContentTextBegin`](@@) begins a block of text
|
|
- [`pdfioContentTextEnd`](@@) ends a block of text
|
|
- [`pdfioContentTextMoveLine`](@@) moves to the next line with an offset in a
|
|
text block
|
|
- [`pdfioContentTextMoveTo`](@@) moves within the current line in a text block
|
|
- [`pdfioContentTextNewLine`](@@) moves to the beginning of the next line in a
|
|
text block
|
|
- [`pdfioContentTextNewLineShow`](@@) moves to the beginning of the next line in a
|
|
text block and shows literal text with optional word and character spacing
|
|
- [`pdfioContentTextNewLineShowf`](@@) moves to the beginning of the next line in a
|
|
text block and shows formatted text with optional word and character spacing
|
|
- [`pdfioContentTextShow`](@@) draws a literal string in a text block
|
|
- [`pdfioContentTextShowf`](@@) draws a formatted string in a text block
|
|
- [`pdfioContentTextShowJustified`](@@) draws an array of literal strings with
|
|
offsets between them
|
|
|
|
|
|
Examples
|
|
========
|
|
|
|
|
|
Read PDF Metadata
|
|
-----------------
|
|
|
|
The `pdfioinfo.c` example program opens a PDF file and prints the title, author,
|
|
creation date, and number of pages:
|
|
|
|
```c
|
|
#include <pdfio.h>
|
|
#include <time.h>
|
|
|
|
|
|
int // O - Exit status
|
|
main(int argc, // I - Number of command-line arguments
|
|
char *argv[]) // Command-line arguments
|
|
{
|
|
const char *filename; // PDF filename
|
|
pdfio_file_t *pdf; // PDF file
|
|
time_t creation_date; // Creation date
|
|
struct tm *creation_tm; // Creation date/time information
|
|
char creation_text[256]; // Creation date/time as a string
|
|
|
|
|
|
// Get the filename from the command-line...
|
|
if (argc != 2)
|
|
{
|
|
fputs("Usage: ./pdfioinfo FILENAME.pdf\n", stderr);
|
|
return (1);
|
|
}
|
|
|
|
filename = argv[1];
|
|
|
|
// Open the PDF file with the default callbacks...
|
|
pdf = pdfioFileOpen(filename, /*password_cb*/NULL, /*password_cbdata*/NULL,
|
|
/*error_cb*/NULL, /*error_cbdata*/NULL);
|
|
if (pdf == NULL)
|
|
return (1);
|
|
|
|
// Get the creation date and convert to a string...
|
|
creation_date = pdfioFileGetCreationDate(pdf);
|
|
creation_tm = localtime(&creation_date);
|
|
strftime(creation_text, sizeof(creation_text), "%c", creation_tm);
|
|
|
|
// Print file information to stdout...
|
|
printf("%s:\n", filename);
|
|
printf(" Title: %s\n", pdfioFileGetTitle(pdf));
|
|
printf(" Author: %s\n", pdfioFileGetAuthor(pdf));
|
|
printf(" Created On: %s\n", creation_text);
|
|
printf(" Number Pages: %u\n", (unsigned)pdfioFileGetNumPages(pdf));
|
|
|
|
// Close the PDF file...
|
|
pdfioFileClose(pdf);
|
|
|
|
return (0);
|
|
}
|
|
```
|
|
|
|
|
|
Extract Text from PDF File
|
|
--------------------------
|
|
|
|
The `pdf2text.c` example code extracts non-Unicode text from a PDF file by
|
|
scanning each page for strings and text drawing commands. Since it doesn't
|
|
look at the font encoding or support Unicode text, it is really only useful to
|
|
extract plain ASCII text from a PDF file. And since it writes text in the order
|
|
it appears in the page stream, it may not come out in the same order as appears
|
|
on the page.
|
|
|
|
The [`pdfioStreamGetToken`](@@) function is used to read individual tokens from
|
|
the page streams. Tokens starting with the open parenthesis are text strings,
|
|
while PDF operators are left as-is. We use some simple logic to make sure that
|
|
we include spaces between text strings and add newlines for the text operators
|
|
that start a new line in a text block:
|
|
|
|
```c
|
|
pdfio_stream_t *st; // Page stream
|
|
bool first = true; // First string on line?
|
|
char buffer[1024]; // Token buffer
|
|
|
|
// Read PDF tokens from the page stream...
|
|
while (pdfioStreamGetToken(st, buffer, sizeof(buffer)))
|
|
{
|
|
if (buffer[0] == '(')
|
|
{
|
|
// Text string using an 8-bit encoding
|
|
if (first)
|
|
first = false;
|
|
else if (buffer[1] != ' ')
|
|
putchar(' ');
|
|
|
|
fputs(buffer + 1, stdout);
|
|
}
|
|
else if (!strcmp(buffer, "Td") || !strcmp(buffer, "TD") || !strcmp(buffer, "T*") ||
|
|
!strcmp(buffer, "\'") || !strcmp(buffer, "\""))
|
|
{
|
|
// Text operators that advance to the next line in the block
|
|
putchar('\n');
|
|
first = true;
|
|
}
|
|
}
|
|
|
|
if (!first)
|
|
putchar('\n');
|
|
```
|
|
|
|
|
|
Create a PDF File With Text and an Image
|
|
----------------------------------------
|
|
|
|
The `image2pdf.c` example code creates a PDF file containing a JPEG or PNG
|
|
image file and optional caption on a single page. The `create_pdf_image_file`
|
|
function creates the PDF file, embeds a base font and the named JPEG or PNG
|
|
image file, and then creates a page with the image centered on the page with any
|
|
text centered below:
|
|
|
|
```c
|
|
#include <pdfio.h>
|
|
#include <pdfio-content.h>
|
|
#include <string.h>
|
|
|
|
|
|
bool // O - True on success, false on failure
|
|
create_pdf_image_file(
|
|
const char *pdfname, // I - PDF filename
|
|
const char *imagename, // I - Image filename
|
|
const char *caption) // I - Caption filename
|
|
{
|
|
pdfio_file_t *pdf; // PDF file
|
|
pdfio_obj_t *font; // Caption font
|
|
pdfio_obj_t *image; // Image
|
|
pdfio_dict_t *dict; // Page dictionary
|
|
pdfio_stream_t *page; // Page stream
|
|
double width, height; // Width and height of image
|
|
double swidth, sheight; // Scaled width and height on page
|
|
double tx, ty; // Position on page
|
|
|
|
|
|
// Create the PDF file...
|
|
pdf = pdfioFileCreate(pdfname, /*version*/NULL, /*media_box*/NULL, /*crop_box*/NULL,
|
|
/*error_cb*/NULL, /*error_cbdata*/NULL);
|
|
if (!pdf)
|
|
return (false);
|
|
|
|
// Create a Courier base font for the caption
|
|
font = pdfioFileCreateFontObjFromBase(pdf, "Courier");
|
|
|
|
if (!font)
|
|
{
|
|
pdfioFileClose(pdf);
|
|
return (false);
|
|
}
|
|
|
|
// Create an image object from the JPEG/PNG image file...
|
|
image = pdfioFileCreateImageObjFromFile(pdf, imagename, true);
|
|
|
|
if (!image)
|
|
{
|
|
pdfioFileClose(pdf);
|
|
return (false);
|
|
}
|
|
|
|
// Create a page dictionary with the font and image...
|
|
dict = pdfioDictCreate(pdf);
|
|
pdfioPageDictAddFont(dict, "F1", font);
|
|
pdfioPageDictAddImage(dict, "IM1", image);
|
|
|
|
// Create the page and its content stream...
|
|
page = pdfioFileCreatePage(pdf, dict);
|
|
|
|
// Position and scale the image on the page...
|
|
width = pdfioImageGetWidth(image);
|
|
height = pdfioImageGetHeight(image);
|
|
|
|
// Default media_box is "universal" 595.28x792 points (8.27x11in or 210x279mm).
|
|
// Use margins of 36 points (0.5in or 12.7mm) with another 36 points for the
|
|
// caption underneath...
|
|
swidth = 595.28 - 72.0;
|
|
sheight = swidth * height / width;
|
|
if (sheight > (792.0 - 36.0 - 72.0))
|
|
{
|
|
sheight = 792.0 - 36.0 - 72.0;
|
|
swidth = sheight * width / height;
|
|
}
|
|
|
|
tx = 0.5 * (595.28 - swidth);
|
|
ty = 0.5 * (792 - 36 - sheight);
|
|
|
|
pdfioContentDrawImage(page, "IM1", tx, ty + 36.0, swidth, sheight);
|
|
|
|
// Draw the caption in black...
|
|
pdfioContentSetFillColorDeviceGray(page, 0.0);
|
|
|
|
// Compute the starting point for the text - Courier is monospaced with a
|
|
// nominal width of 0.6 times the text height...
|
|
tx = 0.5 * (595.28 - 18.0 * 0.6 * strlen(caption));
|
|
|
|
// Position and draw the caption underneath...
|
|
pdfioContentTextBegin(page);
|
|
pdfioContentSetTextFont(page, "F1", 18.0);
|
|
pdfioContentTextMoveTo(page, tx, ty);
|
|
pdfioContentTextShow(page, /*unicode*/false, caption);
|
|
pdfioContentTextEnd(page);
|
|
|
|
// Close the page stream and the PDF file...
|
|
pdfioStreamClose(page);
|
|
pdfioFileClose(pdf);
|
|
|
|
return (true);
|
|
}
|
|
```
|
|
|
|
|
|
Generate a Code 128 Barcode
|
|
---------------------------
|
|
|
|
One-dimensional barcodes are often rendered using special fonts that map ASCII
|
|
characters to sequences of bars that can be read. The `examples` directory
|
|
contains such a font (`code128.ttf`) to create "Code 128" barcodes, with an
|
|
accompanying bit of example code in `code128.c`.
|
|
|
|
The first thing you need to do is prepare the barcode string to use with the
|
|
font. Each barcode begins with a start pattern followed by the characters or
|
|
digits you want to encode, a weighted sum digit, and a stop pattern. The
|
|
`make_code128` function creates this string:
|
|
|
|
```c
|
|
static char * // O - Output string
|
|
make_code128(char *dst, // I - Destination buffer
|
|
const char *src, // I - Source string
|
|
size_t dstsize) // I - Size of destination buffer
|
|
{
|
|
char *dstptr, // Pointer into destination buffer
|
|
*dstend; // End of destination buffer
|
|
int sum; // Weighted sum
|
|
static const char *code128_chars = // Code 128 characters
|
|
" !\"#$%&'()*+,-./0123456789:;<=>?"
|
|
"@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_"
|
|
"`abcdefghijklmnopqrstuvwxyz{|}~\303"
|
|
"\304\305\306\307\310\311\312";
|
|
static const char code128_start_code_b = '\314';
|
|
// Start code B
|
|
static const char code128_stop = '\316';
|
|
// Stop pattern
|
|
|
|
|
|
// Start a Code B barcode...
|
|
dstptr = dst;
|
|
dstend = dst + dstsize - 3;
|
|
|
|
*dstptr++ = code128_start_code_b;
|
|
sum = code128_start_code_b - 100;
|
|
|
|
while (*src && dstptr < dstend)
|
|
{
|
|
if (*src >= ' ' && *src < 0x7f)
|
|
{
|
|
sum += (dstptr - dst) * (*src - ' ');
|
|
*dstptr++ = *src;
|
|
}
|
|
|
|
src ++;
|
|
}
|
|
|
|
// Add the weighted sum modulo 103
|
|
*dstptr++ = code128_chars[sum % 103];
|
|
|
|
// Add the stop pattern and return...
|
|
*dstptr++ = code128_stop;
|
|
*dstptr = '\0';
|
|
|
|
return (dst);
|
|
}
|
|
```
|
|
|
|
The `main` function does the rest of the work. The barcode font is imported
|
|
using the [`pdfioFileCreateFontObjFromFile`](@@) function. We pass `false`
|
|
for the "unicode" argument since we just want the (default) ASCII encoding:
|
|
|
|
```c
|
|
barcode_font = pdfioFileCreateFontObjFromFile(pdf, "code128.ttf", /*unicode*/false);
|
|
```
|
|
|
|
Since barcodes usually have the number or text represented by the barcode
|
|
printed underneath it, we also need a regular text font, for which we can choose
|
|
one of the standard 14 PostScript base fonts using the
|
|
[`pdfioFIleCreateFontObjFromBase`](@@) function:
|
|
|
|
```c
|
|
text_font = pdfioFileCreateFontObjFromBase(pdf, "Helvetica");
|
|
```
|
|
|
|
Once we have these fonts we can measure the barcode and regular text labels
|
|
using the [`pdfioContentTextMeasure`](@@) function to determine how large the
|
|
PDF page needs to be to hold the barcode and text:
|
|
|
|
```c
|
|
// Compute sizes of the text...
|
|
const char *barcode = argv[1];
|
|
char barcode_temp[256];
|
|
|
|
if (!(barcode[0] & 0x80))
|
|
barcode = make_code128(barcode_temp, barcode, sizeof(barcode_temp));
|
|
|
|
double barcode_height = 36.0;
|
|
double barcode_width =
|
|
pdfioContentTextMeasure(barcode_font, barcode, barcode_height);
|
|
|
|
const char *text = argv[2];
|
|
double text_height = 0.0;
|
|
double text_width = 0.0;
|
|
|
|
if (text && text_font)
|
|
{
|
|
text_height = 9.0;
|
|
text_width = pdfioContentTextMeasure(text_font, text, text_height);
|
|
}
|
|
|
|
// Compute the size of the PDF page...
|
|
pdfio_rect_t media_box;
|
|
|
|
media_box.x1 = 0.0;
|
|
media_box.y1 = 0.0;
|
|
media_box.x2 = (barcode_width > text_width ? barcode_width : text_width) + 18.0;
|
|
media_box.y2 = barcode_height + text_height + 18.0;
|
|
```
|
|
|
|
Finally, we just need to create a page of the specified size that references the
|
|
two fonts:
|
|
|
|
```c
|
|
// Start a page for the barcode...
|
|
page_dict = pdfioDictCreate(pdf);
|
|
|
|
pdfioDictSetRect(page_dict, "MediaBox", &media_box);
|
|
pdfioDictSetRect(page_dict, "CropBox", &media_box);
|
|
|
|
pdfioPageDictAddFont(page_dict, "B128", barcode_font);
|
|
if (text_font)
|
|
pdfioPageDictAddFont(page_dict, "TEXT", text_font);
|
|
|
|
page_st = pdfioFileCreatePage(pdf, page_dict);
|
|
```
|
|
|
|
With the barcode font called "B128" and the text font called "TEXT", we can
|
|
use them to draw two strings:
|
|
|
|
```c
|
|
// Draw the page...
|
|
pdfioContentSetFillColorGray(page_st, 0.0);
|
|
|
|
pdfioContentSetTextFont(page_st, "B128", barcode_height);
|
|
pdfioContentTextBegin(page_st);
|
|
pdfioContentTextMoveTo(page_st, 0.5 * (media_box.x2 - barcode_width),
|
|
9.0 + text_height);
|
|
pdfioContentTextShow(page_st, /*unicode*/false, barcode);
|
|
pdfioContentTextEnd(page_st);
|
|
|
|
if (text && text_font)
|
|
{
|
|
pdfioContentSetTextFont(page_st, "TEXT", text_height);
|
|
pdfioContentTextBegin(page_st);
|
|
pdfioContentTextMoveTo(page_st, 0.5 * (media_box.x2 - text_width), 9.0);
|
|
pdfioContentTextShow(page_st, /*unicode*/false, text);
|
|
pdfioContentTextEnd(page_st);
|
|
}
|
|
|
|
pdfioStreamClose(page_st);
|
|
```
|
|
|
|
|
|
Convert Markdown to PDF
|
|
-----------------------
|
|
|
|
Markdown is a simple plain text format that supports things like headings,
|
|
links, character styles, tables, and embedded images. The `md2pdf.c` example
|
|
code uses the [mmd](https://www.msweet.org/mmd/) library to convert markdown to
|
|
a PDF file that can be distributed.
|
|
|
|
> Note: The md2pdf example is by far the most complex example code included with
|
|
> PDFio and shows how to layout text, add headers and footers, add links, embed
|
|
> images, format tables, and add an outline (table of contents) for navigation.
|
|
|
|
### Managing Document State
|
|
|
|
The `md2pdf` program needs to maintain three sets of state - one for the
|
|
markdown document which is represented by nodes of type `mmd_t` and the others
|
|
for the PDF document and current PDF page which are contained in the `docdata_t`
|
|
structure:
|
|
|
|
```c
|
|
typedef struct docdata_s // Document formatting data
|
|
{
|
|
// State for the whole document
|
|
pdfio_file_t *pdf; // PDF file
|
|
pdfio_rect_t media_box; // Media (page) box
|
|
pdfio_rect_t crop_box; // Crop box (for margins)
|
|
pdfio_rect_t art_box; // Art box (for markdown content)
|
|
pdfio_obj_t *fonts[DOCFONT_MAX]; // Embedded fonts
|
|
double font_space; // Unit width of a space
|
|
size_t num_images; // Number of embedded images
|
|
docimage_t images[DOCIMAGE_MAX]; // Embedded images
|
|
const char *title; // Document title
|
|
char *heading; // Current document heading
|
|
size_t num_actions; // Number of actions for this document
|
|
docaction_t actions[DOCACTION_MAX]; // Actions for this document
|
|
size_t num_targets; // Number of targets for this document
|
|
doctarget_t targets[DOCTARGET_MAX]; // Targets for this document
|
|
size_t num_toc; // Number of table-of-contents entries
|
|
doctoc_t toc[DOCTOC_MAX]; // Table-of-contents entries
|
|
|
|
// State for the current page
|
|
pdfio_stream_t *st; // Current page stream
|
|
double y; // Current position on page
|
|
docfont_t font; // Current font
|
|
double fsize; // Current font size
|
|
doccolor_t color; // Current color
|
|
pdfio_array_t *annots_array; // Annotations array (for links)
|
|
pdfio_obj_t *annots_obj; // Annotations object (for links)
|
|
size_t num_links; // Number of links for this page
|
|
doclink_t links[DOCLINK_MAX]; // Links for this page
|
|
} docdata_t;
|
|
```
|
|
|
|
|
|
#### Document State
|
|
|
|
The output is fixed to the "universal" media size (the intersection of US Letter
|
|
and ISO A4) with 1/2 inch margins - the `PAGE_` constants can be changed to
|
|
select a different size or margins. The `media_box` member contains the
|
|
"MediaBox" rectangle for the PDF pages, while the `crop_box` and `art_box`
|
|
members contain the "CropBox" and "ArtBox" values, respectively.
|
|
|
|
Four embedded fonts are used:
|
|
|
|
- `DOCFONT_REGULAR`: the default font used for text,
|
|
- `DOCFONT_BOLD`: a boldface font used for heading and strong text,
|
|
- `DOCFONT_ITALIC`: an italic/oblique font used for emphasized text, and
|
|
- `DOCFONT_MONOSPACE`: a fixed-width font used for code.
|
|
|
|
By default the code uses the base PostScript fonts Helvetica, Helvetica-Bold,
|
|
Helvetica-Oblique, and Courier. The `USE_TRUETYPE` define can be used to
|
|
replace these with the Roboto TrueType fonts.
|
|
|
|
Embedded JPEG and PNG images are copied into the PDF document, with the `images`
|
|
array containing the list of the images and their objects.
|
|
|
|
The `title` member contains the document title, while the `heading` member
|
|
contains the current heading text.
|
|
|
|
The `actions` array contains a list of action dictionaries for interior document
|
|
links that need to be resolved, while the `targets` array keeps track of the
|
|
location of the headings in the PDF document.
|
|
|
|
The `toc` array contains a list of headings and is used to construct the PDF
|
|
outlines dictionaries/objects, which provides a table of contents for navigation
|
|
in most PDF readers.
|
|
|
|
|
|
#### Page State
|
|
|
|
The `st` member provides the stream for the current page content. The `color`,
|
|
`font`, `fsize`, and `y` members provide the current graphics state on the page.
|
|
|
|
The `annots_array`, `annots_obj`, `num_links`, and `links` members contain a
|
|
list of hyperlinks on the current page.
|
|
|
|
|
|
### Creating Pages
|
|
|
|
The `new_page` function is used to start a new page. Aside from creating the
|
|
new page object and stream, it adds a standard header and footer to the page.
|
|
It starts by closing the current page if it is open:
|
|
|
|
```c
|
|
// Close the current page...
|
|
if (dd->st)
|
|
{
|
|
pdfioStreamClose(dd->st);
|
|
add_links(dd);
|
|
}
|
|
```
|
|
|
|
The new page needs a dictionary containing any link annotations, the media and
|
|
art boxes, the four fonts, and any images:
|
|
|
|
```c
|
|
// Prep the new page...
|
|
page_dict = pdfioDictCreate(dd->pdf);
|
|
|
|
dd->annots_array = pdfioArrayCreate(dd->pdf);
|
|
dd->annots_obj = pdfioFileCreateArrayObj(dd->pdf, dd->annots_array);
|
|
pdfioDictSetObj(page_dict, "Annots", dd->annots_obj);
|
|
|
|
pdfioDictSetRect(page_dict, "MediaBox", &dd->media_box);
|
|
pdfioDictSetRect(page_dict, "ArtBox", &dd->art_box);
|
|
|
|
for (fontface = DOCFONT_REGULAR; fontface < DOCFONT_MAX; fontface ++)
|
|
pdfioPageDictAddFont(page_dict, docfont_names[fontface], dd->fonts[fontface]);
|
|
|
|
for (i = 0; i < dd->num_images; i ++)
|
|
pdfioPageDictAddImage(page_dict, pdfioStringCreatef(dd->pdf, "I%u", (unsigned)i),
|
|
dd->images[i].obj);
|
|
```
|
|
|
|
Once the page dictionary is initialized, we create a new page and initialize
|
|
the current graphics state:
|
|
|
|
```c
|
|
dd->st = pdfioFileCreatePage(dd->pdf, page_dict);
|
|
dd->color = DOCCOLOR_BLACK;
|
|
dd->font = DOCFONT_MAX;
|
|
dd->fsize = 0.0;
|
|
dd->y = dd->art_box.y2;
|
|
```
|
|
|
|
The header consists of a dark gray separating line and the document title. We
|
|
don't show the header on the first page:
|
|
|
|
```c
|
|
// Add header/footer text
|
|
set_color(dd, DOCCOLOR_GRAY);
|
|
set_font(dd, DOCFONT_REGULAR, SIZE_HEADFOOT);
|
|
|
|
if (pdfioFileGetNumPages(dd->pdf) > 1 && dd->title)
|
|
{
|
|
// Show title in header...
|
|
width = pdfioContentTextMeasure(dd->fonts[DOCFONT_REGULAR], dd->title,
|
|
SIZE_HEADFOOT);
|
|
|
|
pdfioContentTextBegin(dd->st);
|
|
pdfioContentTextMoveTo(dd->st,
|
|
dd->crop_box.x1 + 0.5 * (dd->crop_box.x2 -
|
|
dd->crop_box.x1 - width),
|
|
dd->crop_box.y2 - SIZE_HEADFOOT);
|
|
pdfioContentTextShow(dd->st, UNICODE_VALUE, dd->title);
|
|
pdfioContentTextEnd(dd->st);
|
|
|
|
pdfioContentPathMoveTo(dd->st, dd->crop_box.x1,
|
|
dd->crop_box.y2 - 2 * SIZE_HEADFOOT * LINE_HEIGHT +
|
|
SIZE_HEADFOOT);
|
|
pdfioContentPathLineTo(dd->st, dd->crop_box.x2,
|
|
dd->crop_box.y2 - 2 * SIZE_HEADFOOT * LINE_HEIGHT +
|
|
SIZE_HEADFOOT);
|
|
pdfioContentStroke(dd->st);
|
|
}
|
|
```
|
|
|
|
The footer contains the same dark gray separating line with the current heading
|
|
and page number on opposite sides. The page number is always positioned on the
|
|
outer edge for a two-sided print - right justified on odd numbered pages and
|
|
left justified on even numbered pages:
|
|
|
|
```c
|
|
// Show page number and current heading...
|
|
pdfioContentPathMoveTo(dd->st, dd->crop_box.x1,
|
|
dd->crop_box.y1 + SIZE_HEADFOOT * LINE_HEIGHT);
|
|
pdfioContentPathLineTo(dd->st, dd->crop_box.x2,
|
|
dd->crop_box.y1 + SIZE_HEADFOOT * LINE_HEIGHT);
|
|
pdfioContentStroke(dd->st);
|
|
|
|
pdfioContentTextBegin(dd->st);
|
|
snprintf(temp, sizeof(temp), "%u", (unsigned)pdfioFileGetNumPages(dd->pdf));
|
|
if (pdfioFileGetNumPages(dd->pdf) & 1)
|
|
{
|
|
// Page number on right...
|
|
width = pdfioContentTextMeasure(dd->fonts[DOCFONT_REGULAR], temp, SIZE_HEADFOOT);
|
|
pdfioContentTextMoveTo(dd->st, dd->crop_box.x2 - width, dd->crop_box.y1);
|
|
}
|
|
else
|
|
{
|
|
// Page number on left...
|
|
pdfioContentTextMoveTo(dd->st, dd->crop_box.x1, dd->crop_box.y1);
|
|
}
|
|
|
|
pdfioContentTextShow(dd->st, UNICODE_VALUE, temp);
|
|
pdfioContentTextEnd(dd->st);
|
|
|
|
if (dd->heading)
|
|
{
|
|
pdfioContentTextBegin(dd->st);
|
|
|
|
if (pdfioFileGetNumPages(dd->pdf) & 1)
|
|
{
|
|
// Current heading on left...
|
|
pdfioContentTextMoveTo(dd->st, dd->crop_box.x1, dd->crop_box.y1);
|
|
}
|
|
else
|
|
{
|
|
width = pdfioContentTextMeasure(dd->fonts[DOCFONT_REGULAR], dd->heading,
|
|
SIZE_HEADFOOT);
|
|
pdfioContentTextMoveTo(dd->st, dd->crop_box.x2 - width, dd->crop_box.y1);
|
|
}
|
|
|
|
pdfioContentTextShow(dd->st, UNICODE_VALUE, dd->heading);
|
|
pdfioContentTextEnd(dd->st);
|
|
}
|
|
```
|
|
|
|
|
|
### Formatting the Markdown Document
|
|
|
|
Four functions handle the formatting of the markdown document:
|
|
|
|
- `format_block` formats a single paragraph, heading, or table cell,
|
|
- `format_code`: formats a block of code,
|
|
- `format_doc`: formats the document as a whole, and
|
|
- `format_table`: formats a table.
|
|
|
|
Formatted content is organized into arrays of `linefrag_t` and `tablerow_t`
|
|
structures for a line of content or row of table cells, respectively.
|
|
|
|
|
|
#### High-Level Formatting
|
|
|
|
The `format_doc` function iterates over the block nodes in the markdown
|
|
document. We map a "thematic break" (horizontal rule) to a page break, which
|
|
is implemented by moving the current vertical position to the bottom of the
|
|
page:
|
|
|
|
```c
|
|
case MMD_TYPE_THEMATIC_BREAK :
|
|
// Force a page break
|
|
dd->y = dd->art_box.y1;
|
|
break;
|
|
```
|
|
|
|
A block quote is indented and uses the italic font by default:
|
|
|
|
```c
|
|
case MMD_TYPE_BLOCK_QUOTE :
|
|
format_doc(dd, current, DOCFONT_ITALIC, left + BQ_PADDING, right - BQ_PADDING);
|
|
break;
|
|
```
|
|
|
|
Lists have a leading blank line and are indented:
|
|
|
|
```c
|
|
case MMD_TYPE_ORDERED_LIST :
|
|
case MMD_TYPE_UNORDERED_LIST :
|
|
if (dd->st)
|
|
dd->y -= SIZE_BODY * LINE_HEIGHT;
|
|
|
|
format_doc(dd, current, deffont, left + LIST_PADDING, right);
|
|
break;
|
|
```
|
|
|
|
List items do not have a leading blank line and make use of leader text that is
|
|
shown in front of the list text. The leader text is either the current item
|
|
number or a bullet, which then is directly formatted using the `format_block`
|
|
function:
|
|
|
|
```c
|
|
case MMD_TYPE_LIST_ITEM :
|
|
if (doctype == MMD_TYPE_ORDERED_LIST)
|
|
{
|
|
snprintf(leader, sizeof(leader), "%d. ", i);
|
|
format_block(dd, current, deffont, SIZE_BODY, left, right, leader);
|
|
}
|
|
else
|
|
{
|
|
format_block(dd, current, deffont, SIZE_BODY, left, right, /*leader*/"• ");
|
|
}
|
|
break;
|
|
```
|
|
|
|
Paragraphs have a leading blank line and are likewise directly formatted:
|
|
|
|
```c
|
|
case MMD_TYPE_PARAGRAPH :
|
|
// Add a blank line before the paragraph...
|
|
dd->y -= SIZE_BODY * LINE_HEIGHT;
|
|
|
|
// Format the paragraph...
|
|
format_block(dd, current, deffont, SIZE_BODY, left, right, /*leader*/NULL);
|
|
break;
|
|
```
|
|
|
|
Tables have a leading blank line and are formatted using the `format_table`
|
|
function:
|
|
|
|
```c
|
|
case MMD_TYPE_TABLE :
|
|
// Add a blank line before the paragraph...
|
|
dd->y -= SIZE_BODY * LINE_HEIGHT;
|
|
|
|
// Format the table...
|
|
format_table(dd, current, left, right);
|
|
break;
|
|
```
|
|
|
|
Code blocks have a leading blank line, are indented slightly (to account for the
|
|
padded background), and are formatted using the `format_code` function:
|
|
|
|
```c
|
|
case MMD_TYPE_CODE_BLOCK :
|
|
// Add a blank line before the code block...
|
|
dd->y -= SIZE_BODY * LINE_HEIGHT;
|
|
|
|
// Format the code block...
|
|
format_code(dd, current, left + CODE_PADDING, right - CODE_PADDING);
|
|
break;
|
|
```
|
|
|
|
Headings get some extra processing. First, the current heading is remembered in
|
|
the `docdata_t` structure so it can be used in the page footer:
|
|
|
|
```c
|
|
case MMD_TYPE_HEADING_1 :
|
|
case MMD_TYPE_HEADING_2 :
|
|
case MMD_TYPE_HEADING_3 :
|
|
case MMD_TYPE_HEADING_4 :
|
|
case MMD_TYPE_HEADING_5 :
|
|
case MMD_TYPE_HEADING_6 :
|
|
// Update the current heading
|
|
free(dd->heading);
|
|
dd->heading = mmdCopyAllText(current);
|
|
```
|
|
|
|
Then we add a blank line and format the heading with the boldface font at a
|
|
larger size using the `format_block` function:
|
|
|
|
```c
|
|
// Add a blank line before the heading...
|
|
dd->y -= heading_sizes[curtype - MMD_TYPE_HEADING_1] * LINE_HEIGHT;
|
|
|
|
// Format the heading...
|
|
format_block(dd, current, DOCFONT_BOLD,
|
|
heading_sizes[curtype - MMD_TYPE_HEADING_1], left, right,
|
|
/*leader*/NULL);
|
|
```
|
|
|
|
Once the heading is formatted, we record it in the `toc` array as a PDF outline
|
|
item object/dictionary:
|
|
|
|
```c
|
|
// Add the heading to the table-of-contents...
|
|
if (dd->num_toc < DOCTOC_MAX)
|
|
{
|
|
doctoc_t *t = dd->toc + dd->num_toc;
|
|
// New TOC
|
|
pdfio_array_t *dest; // Destination array
|
|
|
|
t->level = curtype - MMD_TYPE_HEADING_1;
|
|
t->dict = pdfioDictCreate(dd->pdf);
|
|
t->obj = pdfioFileCreateObj(dd->pdf, t->dict);
|
|
dest = pdfioArrayCreate(dd->pdf);
|
|
|
|
pdfioArrayAppendObj(dest,
|
|
pdfioFileGetPage(dd->pdf, pdfioFileGetNumPages(dd->pdf) - 1));
|
|
pdfioArrayAppendName(dest, "XYZ");
|
|
pdfioArrayAppendNumber(dest, PAGE_LEFT);
|
|
pdfioArrayAppendNumber(dest,
|
|
dd->y + heading_sizes[curtype - MMD_TYPE_HEADING_1] * LINE_HEIGHT);
|
|
pdfioArrayAppendNumber(dest, 0.0);
|
|
|
|
pdfioDictSetArray(t->dict, "Dest", dest);
|
|
pdfioDictSetString(t->dict, "Title", pdfioStringCreate(dd->pdf, dd->heading));
|
|
|
|
dd->num_toc ++;
|
|
}
|
|
```
|
|
|
|
Finally, we also save the heading's target name and its location in the
|
|
`targets` array to allow interior links to work:
|
|
|
|
```c
|
|
// Add the heading to the list of link targets...
|
|
if (dd->num_targets < DOCTARGET_MAX)
|
|
{
|
|
doctarget_t *t = dd->targets + dd->num_targets;
|
|
// New target
|
|
|
|
make_target_name(t->name, dd->heading, sizeof(t->name));
|
|
t->page = pdfioFileGetNumPages(dd->pdf) - 1;
|
|
t->y = dd->y + heading_sizes[curtype - MMD_TYPE_HEADING_1] * LINE_HEIGHT;
|
|
|
|
dd->num_targets ++;
|
|
}
|
|
break;
|
|
```
|
|
|
|
|
|
#### Formatting Paragraphs, Headings, List Items, and Table Cells
|
|
|
|
Paragraphs, headings, list items, and table cells all use the same basic
|
|
formatting algorithm. Text, checkboxes, and images are collected until the
|
|
nodes in the current block are used up or the content reaches the right margin.
|
|
|
|
In order to keep adjacent blocks of text together, the formatting algorithm
|
|
makes sure that at least 3 lines of text can fit before the bottom edge of the
|
|
page:
|
|
|
|
```c
|
|
if (mmdGetNextSibling(block))
|
|
need_bottom = 3.0 * SIZE_BODY * LINE_HEIGHT;
|
|
else
|
|
need_bottom = 0.0;
|
|
```
|
|
|
|
Leader text (used for list items) is right justified to the left margin and
|
|
becomes the first fragment on the line when present.
|
|
|
|
```c
|
|
if (leader)
|
|
{
|
|
// Add leader text on first line...
|
|
frags[0].type = MMD_TYPE_NORMAL_TEXT;
|
|
frags[0].width = pdfioContentTextMeasure(dd->fonts[deffont], leader, fsize);
|
|
frags[0].height = fsize;
|
|
frags[0].x = left - frags[0].width;
|
|
frags[0].imagenum = 0;
|
|
frags[0].text = leader;
|
|
frags[0].url = NULL;
|
|
frags[0].ws = false;
|
|
frags[0].font = deffont;
|
|
frags[0].color = DOCCOLOR_BLACK;
|
|
|
|
num_frags = 1;
|
|
lineheight = fsize * LINE_HEIGHT;
|
|
}
|
|
else
|
|
{
|
|
// No leader text...
|
|
num_frags = 0;
|
|
lineheight = 0.0;
|
|
}
|
|
|
|
frag = frags + num_frags;
|
|
```
|
|
|
|
If the current content fragment won't fit, we call `render_line` to draw what we
|
|
have, adjusting the left margin as needed for table cells:
|
|
|
|
```c
|
|
// See if this node will fit on the current line...
|
|
if ((num_frags > 0 && (x + width + wswidth) >= right) || num_frags == LINEFRAG_MAX)
|
|
{
|
|
// No, render this line and start over...
|
|
if (blocktype == MMD_TYPE_TABLE_HEADER_CELL ||
|
|
blocktype == MMD_TYPE_TABLE_BODY_CELL_CENTER)
|
|
margin_left = 0.5 * (right - x);
|
|
else if (blocktype == MMD_TYPE_TABLE_BODY_CELL_RIGHT)
|
|
margin_left = right - x;
|
|
else
|
|
margin_left = 0.0;
|
|
|
|
render_line(dd, margin_left, need_bottom, lineheight, num_frags, frags);
|
|
|
|
num_frags = 0;
|
|
frag = frags;
|
|
x = left;
|
|
lineheight = 0.0;
|
|
need_bottom = 0.0;
|
|
```
|
|
|
|
Block quotes (blocks use a default font of italic) have an orange bar to the
|
|
left of the block:
|
|
|
|
```c
|
|
if (deffont == DOCFONT_ITALIC)
|
|
{
|
|
// Add an orange bar to the left of block quotes...
|
|
set_color(dd, DOCCOLOR_ORANGE);
|
|
pdfioContentSave(dd->st);
|
|
pdfioContentSetLineWidth(dd->st, 3.0);
|
|
pdfioContentPathMoveTo(dd->st, left - 6.0, dd->y - (LINE_HEIGHT - 1.0) * fsize);
|
|
pdfioContentPathLineTo(dd->st, left - 6.0, dd->y + fsize);
|
|
pdfioContentStroke(dd->st);
|
|
pdfioContentRestore(dd->st);
|
|
}
|
|
```
|
|
|
|
Finally, we add the current content fragment to the array:
|
|
|
|
```c
|
|
// Add the current node to the fragment list
|
|
if (num_frags == 0)
|
|
{
|
|
// No leading whitespace at the start of the line
|
|
ws = false;
|
|
wswidth = 0.0;
|
|
}
|
|
|
|
frag->type = type;
|
|
frag->x = x;
|
|
frag->width = width + wswidth;
|
|
frag->height = text ? fsize : height;
|
|
frag->imagenum = imagenum;
|
|
frag->text = text;
|
|
frag->url = url;
|
|
frag->ws = ws;
|
|
frag->font = font;
|
|
frag->color = color;
|
|
|
|
num_frags ++;
|
|
frag ++;
|
|
x += width + wswidth;
|
|
if (height > lineheight)
|
|
lineheight = height;
|
|
```
|
|
|
|
|
|
#### Formatting Code Blocks
|
|
|
|
Code blocks consist of one or more lines of plain monospaced text. We draw a
|
|
light gray background behind each line with a small bit of padding at the top
|
|
and bottom:
|
|
|
|
```c
|
|
// Draw the top padding...
|
|
set_color(dd, DOCCOLOR_LTGRAY);
|
|
pdfioContentPathRect(dd->st, left - CODE_PADDING, dd->y + SIZE_CODEBLOCK,
|
|
right - left + 2.0 * CODE_PADDING, CODE_PADDING);
|
|
pdfioContentFillAndStroke(dd->st, false);
|
|
|
|
// Start a code text block...
|
|
set_font(dd, DOCFONT_MONOSPACE, SIZE_CODEBLOCK);
|
|
pdfioContentTextBegin(dd->st);
|
|
pdfioContentTextMoveTo(dd->st, left, dd->y);
|
|
|
|
for (code = mmdGetFirstChild(block); code; code = mmdGetNextSibling(code))
|
|
{
|
|
set_color(dd, DOCCOLOR_LTGRAY);
|
|
pdfioContentPathRect(dd->st, left - CODE_PADDING,
|
|
dd->y - (LINE_HEIGHT - 1.0) * SIZE_CODEBLOCK,
|
|
right - left + 2.0 * CODE_PADDING, lineheight);
|
|
pdfioContentFillAndStroke(dd->st, false);
|
|
|
|
set_color(dd, DOCCOLOR_RED);
|
|
pdfioContentTextShow(dd->st, UNICODE_VALUE, mmdGetText(code));
|
|
dd->y -= lineheight;
|
|
|
|
if (dd->y < dd->art_box.y1)
|
|
{
|
|
// End the current text block...
|
|
pdfioContentTextEnd(dd->st);
|
|
|
|
// Start a new page...
|
|
new_page(dd);
|
|
set_font(dd, DOCFONT_MONOSPACE, SIZE_CODEBLOCK);
|
|
|
|
dd->y -= lineheight;
|
|
|
|
pdfioContentTextBegin(dd->st);
|
|
pdfioContentTextMoveTo(dd->st, left, dd->y);
|
|
}
|
|
}
|
|
|
|
// End the current text block...
|
|
pdfioContentTextEnd(dd->st);
|
|
dd->y += lineheight;
|
|
|
|
// Draw the bottom padding...
|
|
set_color(dd, DOCCOLOR_LTGRAY);
|
|
pdfioContentPathRect(dd->st, left - CODE_PADDING,
|
|
dd->y - CODE_PADDING - (LINE_HEIGHT - 1.0) * SIZE_CODEBLOCK,
|
|
right - left + 2.0 * CODE_PADDING, CODE_PADDING);
|
|
pdfioContentFillAndStroke(dd->st, false);
|
|
```
|
|
|
|
|
|
#### Formatting Tables
|
|
|
|
Tables are the most difficult to format. We start by scanning the entire table
|
|
and measuring every cell with the `measure_cell` function:
|
|
|
|
```c
|
|
for (num_cols = 0, num_rows = 0, rowptr = rows, current = mmdGetFirstChild(table);
|
|
current && num_rows < TABLEROW_MAX;
|
|
current = next)
|
|
{
|
|
next = mmd_walk_next(table, current);
|
|
type = mmdGetType(current);
|
|
|
|
if (type == MMD_TYPE_TABLE_ROW)
|
|
{
|
|
// Parse row...
|
|
for (col = 0, current = mmdGetFirstChild(current);
|
|
current && num_cols < TABLECOL_MAX;
|
|
current = mmdGetNextSibling(current), col ++)
|
|
{
|
|
rowptr->cells[col] = current;
|
|
|
|
measure_cell(dd, current, cols + col);
|
|
|
|
if (col >= num_cols)
|
|
num_cols = col + 1;
|
|
}
|
|
|
|
rowptr ++;
|
|
num_rows ++;
|
|
}
|
|
}
|
|
```
|
|
|
|
The `measure_cell` function also updates the minimum and maximum width needed
|
|
for each column. To this we add the cell padding to compute the total table
|
|
width:
|
|
|
|
```c
|
|
// Figure out the width of each column...
|
|
for (col = 0, table_width = 0.0; col < num_cols; col ++)
|
|
{
|
|
cols[col].max_width += 2.0 * TABLE_PADDING;
|
|
|
|
table_width += cols[col].max_width;
|
|
cols[col].width = cols[col].max_width;
|
|
}
|
|
```
|
|
|
|
If the calculated width is more than the available width, we need to adjust the
|
|
width of the columns. The algorithm used here breaks the available width into
|
|
N equal-width columns - any columns wider than this will be scaled
|
|
proportionately. This works out as two steps - one to calculate the the base
|
|
width of "narrow" columns and a second to distribute the remaining width amongst
|
|
the wider columns:
|
|
|
|
```c
|
|
format_width = right - left - 2.0 * TABLE_PADDING * num_cols;
|
|
|
|
if (table_width > format_width)
|
|
{
|
|
// Content too wide, try scaling the widths...
|
|
double avg_width, // Average column width
|
|
base_width, // Base width
|
|
remaining_width, // Remaining width
|
|
scale_width; // Width for scaling
|
|
size_t num_remaining_cols = 0; // Number of remaining columns
|
|
|
|
// First mark any columns that are narrower than the average width...
|
|
avg_width = format_width / num_cols;
|
|
|
|
for (col = 0, base_width = 0.0, remaining_width = 0.0; col < num_cols; col ++)
|
|
{
|
|
if (cols[col].width > avg_width)
|
|
{
|
|
remaining_width += cols[col].width;
|
|
num_remaining_cols ++;
|
|
}
|
|
else
|
|
{
|
|
base_width += cols[col].width;
|
|
}
|
|
}
|
|
|
|
// Then proportionately distribute the remaining width to the other columns...
|
|
format_width -= base_width;
|
|
|
|
for (col = 0, table_width = 0.0; col < num_cols; col ++)
|
|
{
|
|
if (cols[col].width > avg_width)
|
|
cols[col].width = cols[col].width * format_width / remaining_width;
|
|
|
|
table_width += cols[col].width;
|
|
}
|
|
}
|
|
```
|
|
|
|
Now that we have the widths of the columns, we can calculate the left and right
|
|
margins of each column for formatting the cell text:
|
|
|
|
```c
|
|
// Calculate the margins of each column in preparation for formatting
|
|
for (col = 0, x = left + TABLE_PADDING; col < num_cols; col ++)
|
|
{
|
|
cols[col].left = x;
|
|
cols[col].right = x + cols[col].width;
|
|
|
|
x += cols[col].width + 2.0 * TABLE_PADDING;
|
|
}
|
|
```
|
|
|
|
Then we re-measure the cells using the final column widths to determine the
|
|
height of each cell and row:
|
|
|
|
```c
|
|
// Calculate the height of each row and cell in preparation for formatting
|
|
for (row = 0, rowptr = rows; row < num_rows; row ++, rowptr ++)
|
|
{
|
|
for (col = 0; col < num_cols; col ++)
|
|
{
|
|
height = measure_cell(dd, rowptr->cells[col], cols + col) + 2.0 * TABLE_PADDING;
|
|
if (height > rowptr->height)
|
|
rowptr->height = height;
|
|
}
|
|
}
|
|
```
|
|
|
|
Finally, we render each row in the table:
|
|
|
|
```c
|
|
// Render each table row...
|
|
for (row = 0, rowptr = rows; row < num_rows; row ++, rowptr ++)
|
|
render_row(dd, num_cols, cols, rowptr);
|
|
```
|
|
|
|
|
|
### Rendering the Markdown Document
|
|
|
|
The formatted content in arrays of `linefrag_t` and `tablerow_t` structures
|
|
are passed to the `render_line` and `render_row` functions respectively to
|
|
produce content in the PDF document.
|
|
|
|
|
|
#### Rendering a Line in a Paragraph, Heading, or Table Cell
|
|
|
|
The `render_line` function adds content from the `linefrag_t` array to a PDF
|
|
page. It starts by determining whether a new page is needed:
|
|
|
|
```c
|
|
if (!dd->st)
|
|
{
|
|
new_page(dd);
|
|
margin_top = 0.0;
|
|
}
|
|
|
|
dd->y -= margin_top + lineheight;
|
|
if ((dd->y - need_bottom) < dd->art_box.y1)
|
|
{
|
|
new_page(dd);
|
|
|
|
dd->y -= lineheight;
|
|
}
|
|
```
|
|
|
|
We then loops through the fragments for the current line, drawing checkboxes,
|
|
images, and text as needed. When a hyperlink is present, we add the link to the
|
|
`links` array in the `docdata_t` structure, mapping "@" and "@@" to an internal
|
|
link corresponding to the linked text:
|
|
|
|
```c
|
|
if (frag->url && dd->num_links < DOCLINK_MAX)
|
|
{
|
|
doclink_t *l = dd->links + dd->num_links;
|
|
// Pointer to this link record
|
|
|
|
if (!strcmp(frag->url, "@"))
|
|
{
|
|
// Use mapped text as link target...
|
|
char targetlink[129]; // Targeted link
|
|
|
|
targetlink[0] = '#';
|
|
make_target_name(targetlink + 1, frag->text, sizeof(targetlink) - 1);
|
|
|
|
l->url = pdfioStringCreate(dd->pdf, targetlink);
|
|
}
|
|
else if (!strcmp(frag->url, "@@"))
|
|
{
|
|
// Use literal text as anchor...
|
|
l->url = pdfioStringCreatef(dd->pdf, "#%s", frag->text);
|
|
}
|
|
else
|
|
{
|
|
// Use URL as-is...
|
|
l->url = frag->url;
|
|
}
|
|
|
|
l->box.x1 = frag->x;
|
|
l->box.y1 = dd->y;
|
|
l->box.x2 = frag->x + frag->width;
|
|
l->box.y2 = dd->y + frag->height;
|
|
|
|
dd->num_links ++;
|
|
}
|
|
```
|
|
|
|
These are later written as annotations in the `add_links` function.
|
|
|
|
|
|
#### Rendering a Table Row
|
|
|
|
The `render_row` function takes a row of cells and the corresponding column
|
|
definitions. It starts by drawing the border boxes around body cells:
|
|
|
|
```c
|
|
if (mmdGetType(row->cells[0]) == MMD_TYPE_TABLE_HEADER_CELL)
|
|
{
|
|
// Header row, no border...
|
|
deffont = DOCFONT_BOLD;
|
|
}
|
|
else
|
|
{
|
|
// Regular body row, add borders...
|
|
deffont = DOCFONT_REGULAR;
|
|
|
|
set_color(dd, DOCCOLOR_GRAY);
|
|
pdfioContentPathRect(dd->st, cols[0].left - TABLE_PADDING, dd->y - row->height,
|
|
cols[num_cols - 1].right - cols[0].left +
|
|
2.0 * TABLE_PADDING, row->height);
|
|
for (col = 1; col < num_cols; col ++)
|
|
{
|
|
pdfioContentPathMoveTo(dd->st, cols[col].left - TABLE_PADDING, dd->y);
|
|
pdfioContentPathLineTo(dd->st, cols[col].left - TABLE_PADDING, dd->y - row->height);
|
|
}
|
|
pdfioContentStroke(dd->st);
|
|
}
|
|
```
|
|
|
|
Then it formats each cell using the `format_block` function described
|
|
previously. The page `y` value is reset before formatting each cell:
|
|
|
|
```c
|
|
row_y = dd->y;
|
|
|
|
for (col = 0; col < num_cols; col ++)
|
|
{
|
|
dd->y = row_y;
|
|
|
|
format_block(dd, row->cells[col], deffont, SIZE_TABLE, cols[col].left,
|
|
cols[col].right, /*leader*/NULL);
|
|
}
|
|
|
|
dd->y = row_y - row->height;
|
|
```
|