PDFio is Copyright \[co] 2021\-2024 by Michael R Sweet and is licensed under the Apache License Version 2.0 with an (optional) exception to allow linking against GPL2/LGPL2 software. See the files "LICENSE" and "NOTICE" for more information.
PDFio comes with a configure script that creates a portable makefile that will work on any POSIX\-compliant system with ZLIB installed. To make it, run:
There is also an Xcode project ("pdfio.xcodeproj") you can use on macOS which generates a static library that will be installed under "/usr/local" with:
.nf
sudo xcodebuild install
.fi
.SSDetectingPDFio
.PP
PDFio can be detected using the pkg\-config command, for example:
On Windows, you need to link to the PDFIO1.LIB (DLL) library and include the zlib_native NuGet package dependency. You can also use the published pdfio_native NuGet package.
A PDF file provides data and commands for displaying pages of graphics and text, and is structured in a way that allows it to be displayed in the same way across multiple devices and platforms. The following is a PDF which shows "Hello, World!" on one page:
The header is the first line of a PDF file that specifies the version of the PDF format that has been used, for example %PDF\-1.0\.
.PP
Since PDF files almost always contain binary data, they can become corrupted if line endings are changed. For example, if the file is transferred using FTP in text mode or is edited in Notepad on Windows. To allow legacy file transfer programs to determine that the file is binary, the PDF standard recommends including some bytes with character codes higher than 127 in the header, for example:
.nf
%âãÏÓ
.fi
.PP
The percent sign indicates a comment line while the other few bytes are arbitrary character codes in excess of 127. So, the whole header in our example is:
.nf
%PDF\-1.0
%âãÏÓ
.fi
.PP
Body
.PP
The file body consists of a sequence of objects, each preceded by an object number, generation number, and the obj keyword on one line, and followed by the endobj keyword on another. For example:
In this example, the object number is 1 and the generation number is 0, meaning it is the first version of the object. The content for object 1 is between the initial 1 0 obj and trailing endobj lines. In this case, the content is the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>>\.
.PP
Cross\-Reference Table
.PP
The cross\-reference table lists the byte offset of each object in the file body. This allows random access to objects, meaning they don't have to be read in order. Objects that are not used are never read, making the process efficient. Operations like counting the number of pages in a PDF document are fast, even in large files.
.PP
Each object has an object number and a generation number. Generation numbers are used when a cross\-reference table entry is reused. For simplicity, we will assume generation numbers to be always zero and ignore them. The cross\-reference table consists of a header line that indicates the number of entries, a free entry line for object 0, and a line for each of the objects in the file body. For example:
.nf
0 6 % Six entries in table, starting at 0
0000000000 65535 f % Free entry for object 0
0000000015 00000 n % Object 1 is at byte offset 15
0000000074 00000 n % Object 2 is at byte offset 74
0000000192 00000 n % etc...
0000000291 00000 n
0000000409 00000 n % Object 5 is at byte offset 409
.fi
.PP
Trailer
.PP
The first line of the trailer is just the trailer keyword. This is followed by the trailer dictionary which contains at least the /Size entry specifying the number of entries in the cross\-reference table and the /Root entry which references the object for the document catalog which is the root element of the graph of objects in the body.
.PP
There follows a line with just the startxref keyword, a line with a single number specifying the byte offset of the start of the cross\-reference table within the file, and then the line %%EOF which signals the end of the PDF file.
where the five arguments to the function are the filename ("myinputfile.pdf"), an optional password callback function (password_cb) and data pointer value (password_data), and an optional error callback function (error_cb) and data pointer value (error_data). The password callback is called for encrypted PDF files that are not using the default password, for example:
.nf
const char *
password_cb(void *data, const char *filename)
{
(void)data; // This callback doesn't use the data pointer
(void)filename; // This callback doesn't use the filename
// Return a password string for the file...
return ("Password42");
}
.fi
.PP
The error callback is called for both errors and warnings and accepts the pdfio_file_t pointer, a message string, and the callback pointer value, for example:
The default error callback (NULL) does the equivalent of the above.
.PP
Each PDF file contains one or more pages. The pdfioFileGetNumPages function returns the number of pages in the file while the pdfioFileGetPage function gets the specified page in the PDF file:
Each page is represented by a "page tree" object (what pdfioFileGetPage returns) that specifies information about the page and one or more "content" objects that contain the images, fonts, text, and graphics that appear on the page. Use the pdfioPageGetNumStreams and pdfioPageOpenStream functions to access the content streams for each page, and pdfioObjGetDict to get the associated page object dictionary. For example, if you want to display the media and crop boxes for a given page:
.nf
pdfio_file_t *pdf; // PDF file
size_t i; // Looping var
size_t count; // Number of pages
pdfio_obj_t *page; // Current page
pdfio_dict_t *dict; // Current page dictionary
pdfio_array_t *media_box; // MediaBox array
double media_values[4]; // MediaBox values
pdfio_array_t *crop_box; // CropBox array
double crop_values[4]; // CropBox values
// Iterate the pages in the PDF file
for (i = 0, count = pdfioFileGetNumPages(pdf); i < count; i ++)
Page object dictionaries have several (mostly optional) key/value pairs, including:
.IP\(bu5
.PP
"Annots": An array of annotation dictionaries for the page; use pdfioDictGetArray to get the array
.IP\(bu5
.PP
"CropBox": The crop box as an array of four numbers for the left, bottom, right, and top coordinates of the target media; use pdfioDictGetArray to get a pointer to the array of numbers
.IP\(bu5
.PP
"Dur": The number of seconds the page should be displayed; use pdfioDictGetNumber to get the page duration value
.IP\(bu5
.PP
"Group": The dictionary of transparency group values for the page; use pdfioDictGetDict to get a pointer to the resources dictionary
.IP\(bu5
.PP
"LastModified": The date and time when this page was last modified; use pdfioDictGetDate to get the Unix time_t value
.IP\(bu5
.PP
"Parent": The parent page tree node object for this page; use pdfioDictGetObj to get a pointer to the object
.IP\(bu5
.PP
"MediaBox": The media box as an array of four numbers for the left, bottom, right, and top coordinates of the target media; use pdfioDictGetArray to get a pointer to the array of numbers
.IP\(bu5
.PP
"Resources": The dictionary of resources for the page; use pdfioDictGetDict to get a pointer to the resources dictionary
.IP\(bu5
.PP
"Rotate": A number indicating the number of degrees of counter\-clockwise rotation to apply to the page when viewing; use pdfioDictGetNumber to get the rotation angle
.IP\(bu5
.PP
"Thumb": A thumbnail image object for the page; use pdfioDictGetObj to get a pointer to the thumbnail image object
.IP\(bu5
.PP
"Trans": The page transition dictionary; use pdfioDictGetDict to get a pointer to the dictionary
where the six arguments to the function are the filename ("myoutputfile.pdf"), PDF version ("2.0"), media box (media_box), crop box (crop_box), an optional error callback function (error_cb), and an optional pointer value for the error callback function (error_data). The units for the media and crop boxes are points (1/72nd of an inch).
Once the file is created, use the pdfioFileCreateObj, pdfioFileCreatePage, and pdfioPageCopy functions to create objects and pages in the file.
.PP
Finally, the pdfioFileClose function writes the PDF cross\-reference and "trailer" information, closes the file, and frees all memory that was used for it.
PDF objects are identified using two numbers \- the object number (1 to N) and the object generation (0 to 65535) that specifies a particular version of an object. An object's numbers are returned by the pdfioObjGetNumber and pdfioObjGetGeneration functions. You can find a numbered object using the pdfioFileFindObj function.
.PP
Objects contain values (typically dictionaries) and usually an associated data stream containing images, fonts, ICC profiles, and page content. PDFio provides several accessor functions to get the value(s) associated with an object:
.IP\(bu5
.PP
pdfioObjGetArray returns an object's array value, if any
.IP\(bu5
.PP
pdfioObjGetDict returns an object's dictionary value, if any
.IP\(bu5
.PP
pdfioObjGetLength returns the length of the data stream, if any
.IP\(bu5
.PP
pdfioObjGetSubtype returns the sub\-type name of the object, for example "Image" for an image object.
.IP\(bu5
.PP
pdfioObjGetType returns the type name of the object, for example "XObject" for an image object.
Some PDF objects have an associated data stream, such as for pages, images, ICC color profiles, and fonts. You access the stream for an existing object using the pdfioObjOpenStream function:
The first argument is the object pointer. The second argument is a boolean value that specifies whether you want to decode (typically decompress) the stream data or return it as\-is.
The first argument is the newly created object. The second argument is either PDFIO_FILTER_NONE to specify that any encoding is done by your program or PDFIO_FILTER_FLATE to specify that PDFio should Flate compress the stream.
PDFio includes many helper functions for embedding or writing specific kinds of content to a PDF file. These functions can be roughly grouped into five categories:
PDF color spaces are specified using well\-known names like "DeviceCMYK", "DeviceGray", and "DeviceRGB" or using arrays that define so\-called calibrated color spaces. PDFio provides several functions for embedding ICC profiles and creating color space arrays:
.IP\(bu5
.PP
pdfioArrayCreateColorFromICCObj creates a color array for an ICC color profile object
.IP\(bu5
.PP
pdfioArrayCreateColorFromMatrix creates a color array using a CIE XYZ color transform matrix, a gamma value, and a CIE XYZ white point
.IP\(bu5
.PP
pdfioArrayCreateColorFromPalette creates an indexed color array from an array of sRGB values
.IP\(bu5
.PP
pdfioArrayCreateColorFromPrimaries creates a color array using CIE XYZ primaries and a gamma value
PDF supports many kinds of fonts, including PostScript Type1, PDF Type3, TrueType/OpenType, and CID. PDFio provides two functions for creating font objects. The first is pdfioFileCreateFontObjFromBase which creates a font object for one of the base PDF fonts:
will embed an OpenSans Regular TrueType font using the Windows CP1252 subset of Unicode. Pass true for the third argument to embed it as a Unicode CID font instead, for example:
Note: Not all fonts support Unicode, and most do not contain a full complement of Unicode characters. pdfioFileCreateFontObjFromFile does not perform any character subsetting, so the entire font file is embedded in the PDF file.
PDF supports images with many different color spaces and bit depths with optional transparency. PDFio provides two helper functions for creating image objects that can be referenced in page streams. The first function is pdfioFileCreateImageObjFromData which creates an image object from data in memory, for example:
will create an object for a 1024x1024 RGBA image in memory, using the default color space for 3 colors ("DeviceRGB"). We can use one of the color space functions to use a specific color space for this image, for example:
The "interpolate" argument specifies whether the colors in the image should be smoothed/interpolated when scaling. This is most useful for photographs but should be false for screenshot and barcode images.
.PP
If you have a JPEG or PNG file, use the pdfioFileCreateImageObjFromFile function to copy the image into a PDF image object, for example:
PDF pages each have an associated dictionary to specify the images, fonts, and color spaces used by the page. PDFio provides functions to add these resources to the dictionary:
.IP\(bu5
.PP
pdfioPageDictAddColorSpace adds a named color space to the page dictionary
.IP\(bu5
.PP
pdfioPageDictAddFont adds a named font to the page dictionary
.IP\(bu5
.PP
pdfioPageDictAddImage adds a named image to the page dictionary
PDF page streams contain textual commands for drawing on the page. PDFio provides many functions for writing these commands with the correct format and escaping, as needed:
.IP\(bu5
.PP
pdfioContentClip clips future drawing to the current path
.IP\(bu5
.PP
pdfioContentDrawImage draws an image object
.IP\(bu5
.PP
pdfioContentFill fills the current path
.IP\(bu5
.PP
pdfioContentFillAndStroke fills and strokes the current path
.IP\(bu5
.PP
pdfioContentMatrixConcat concatenates a matrix with the current transform matrix
.IP\(bu5
.PP
pdfioContentMatrixRotate concatenates a rotation matrix with the current transform matrix
.IP\(bu5
.PP
pdfioContentMatrixScale concatenates a scaling matrix with the current transform matrix
.IP\(bu5
.PP
pdfioContentMatrixTranslate concatenates a translation matrix with the current transform matrix
.IP\(bu5
.PP
pdfioContentPathClose closes the current path
.IP\(bu5
.PP
pdfioContentPathCurve appends a Bezier curve to the current path
.IP\(bu5
.PP
pdfioContentPathCurve13 appends a Bezier curve with 2 control points to the current path
.IP\(bu5
.PP
pdfioContentPathCurve23 appends a Bezier curve with 2 control points to the current path
.IP\(bu5
.PP
pdfioContentPathLineTo appends a line to the current path
.IP\(bu5
.PP
pdfioContentPathMoveTo moves the current point in the current path
.IP\(bu5
.PP
pdfioContentPathRect appends a rectangle to the current path
.IP\(bu5
.PP
pdfioContentRestore restores a previous graphics state
.IP\(bu5
.PP
pdfioContentSave saves the current graphics state
.IP\(bu5
.PP
pdfioContentSetDashPattern sets the line dash pattern
.IP\(bu5
.PP
pdfioContentSetFillColorDeviceCMYK sets the current fill color using a device CMYK color
.IP\(bu5
.PP
pdfioContentSetFillColorDeviceGray sets the current fill color using a device gray color
.IP\(bu5
.PP
pdfioContentSetFillColorDeviceRGB sets the current fill color using a device RGB color
.IP\(bu5
.PP
pdfioContentSetFillColorGray sets the current fill color using a calibrated gray color
.IP\(bu5
.PP
pdfioContentSetFillColorRGB sets the current fill color using a calibrated RGB color
.IP\(bu5
.PP
pdfioContentSetFillColorSpace sets the current fill color space
.IP\(bu5
.PP
pdfioContentSetFlatness sets the flatness for curves
.IP\(bu5
.PP
pdfioContentSetLineCap sets how the ends of lines are stroked
.IP\(bu5
.PP
pdfioContentSetLineJoin sets how connections between lines are stroked
.IP\(bu5
.PP
pdfioContentSetLineWidth sets the width of stroked lines
.IP\(bu5
.PP
pdfioContentSetMiterLimit sets the miter limit for stroked lines
.IP\(bu5
.PP
pdfioContentSetStrokeColorDeviceCMYK sets the current stroke color using a device CMYK color
.IP\(bu5
.PP
pdfioContentSetStrokeColorDeviceGray sets the current stroke color using a device gray color
.IP\(bu5
.PP
pdfioContentSetStrokeColorDeviceRGB sets the current stroke color using a device RGB color
.IP\(bu5
.PP
pdfioContentSetStrokeColorGray sets the current stroke color using a calibrated gray color
.IP\(bu5
.PP
pdfioContentSetStrokeColorRGB sets the current stroke color using a calibrated RGB color
.IP\(bu5
.PP
pdfioContentSetStrokeColorSpace sets the current stroke color space
.IP\(bu5
.PP
pdfioContentSetTextCharacterSpacing sets the spacing between characters for text
.IP\(bu5
.PP
pdfioContentSetTextFont sets the font and size for text
.IP\(bu5
.PP
pdfioContentSetTextLeading sets the line height for text
.IP\(bu5
.PP
pdfioContentSetTextMatrix concatenates a matrix with the current text matrix
.IP\(bu5
.PP
pdfioContentSetTextRenderingMode sets the text rendering mode
.IP\(bu5
.PP
pdfioContentSetTextRise adjusts the baseline for text
.IP\(bu5
.PP
pdfioContentSetTextWordSpacing sets the spacing between words for text
.IP\(bu5
.PP
pdfioContentSetTextXScaling sets the horizontal scaling for text
.IP\(bu5
.PP
pdfioContentStroke strokes the current path
.IP\(bu5
.PP
pdfioContentTextBegin begins a block of text
.IP\(bu5
.PP
pdfioContentTextEnd ends a block of text
.IP\(bu5
.PP
pdfioContentTextMoveLine moves to the next line with an offset in a text block
.IP\(bu5
.PP
pdfioContentTextMoveTo moves within the current line in a text block
The pdf2text.c example code extracts non\-Unicode text from a PDF file by scanning each page for strings and text drawing commands. Since it doesn't look at the font encoding or support Unicode text, it is really only useful to extract plain ASCII text from a PDF file. And since it writes text in the order it appears in the page stream, it may not come out in the same order as appears on the page.
.PP
The pdfioStreamGetToken function is used to read individual tokens from the page streams. Tokens starting with the open parenthesis are text strings, while PDF operators are left as\-is. We use some simple logic to make sure that we include spaces between text strings and add newlines for the text operators that start a new line in a text block:
.nf
pdfio_stream_t *st; // Page stream
bool first = true; // First string on line?
char buffer[1024]; // Token buffer
// Read PDF tokens from the page stream...
while (pdfioStreamGetToken(st, buffer, sizeof(buffer)))
The image2pdf.c example code creates a PDF file containing a JPEG or PNG image file and optional caption on a single page. The create_pdf_image_file function creates the PDF file, embeds a base font and the named JPEG or PNG image file, and then creates a page with the image centered on the page with any text centered below:
One\-dimensional barcodes are often rendered using special fonts that map ASCII characters to sequences of bars that can be read. The examples directory contains such a font (code128.ttf) to create "Code 128" barcodes, with an accompanying bit of example code in code128.c\.
.PP
The first thing you need to do is prepare the barcode string to use with the font. Each barcode begins with a start pattern followed by the characters or digits you want to encode, a weighted sum digit, and a stop pattern. The make_code128 function creates this string:
.nf
static char * // O \- Output string
make_code128(char *dst, // I \- Destination buffer
const char *src, // I \- Source string
size_t dstsize) // I \- Size of destination buffer
The main function does the rest of the work. The barcode font is imported using the pdfioFileCreateFontObjFromFile function. We pass false for the "unicode" argument since we just want the (default) ASCII encoding:
Since barcodes usually have the number or text represented by the barcode printed underneath it, we also need a regular text font, for which we can choose one of the standard 14 PostScript base fonts using the pdfioFIleCreateFontObjFromBase function:
Once we have these fonts we can measure the barcode and regular text labels using the pdfioContentTextMeasure function to determine how large the PDF page needs to be to hold the barcode and text:
Markdown is a simple plain text format that supports things like headings, links, character styles, tables, and embedded images. The md2pdf.c example code uses the mmd library to convert markdown to a PDF file that can be distributed.
Note: The md2pdf example is by far the most complex example code included with PDFio and shows how to layout text, add headers and footers, add links, embed images, format tables, and add an outline (table of contents) for navigation.
The md2pdf program needs to maintain three sets of state \- one for the markdown document which is represented by nodes of type mmd_t and the others for the PDF document and current PDF page which are contained in the docdata_t structure:
.nf
typedef struct docdata_s // Document formatting data
{
// State for the whole document
pdfio_file_t *pdf; // PDF file
pdfio_rect_t media_box; // Media (page) box
pdfio_rect_t crop_box; // Crop box (for margins)
pdfio_rect_t art_box; // Art box (for markdown content)
size_t num_links; // Number of links for this page
doclink_t links[DOCLINK_MAX]; // Links for this page
} docdata_t;
.fi
.PP
Document State
.PP
The output is fixed to the "universal" media size (the intersection of US Letter and ISO A4) with 1/2 inch margins \- the PAGE_ constants can be changed to select a different size or margins. The media_box member contains the "MediaBox" rectangle for the PDF pages, while the crop_box and art_box members contain the "CropBox" and "ArtBox" values, respectively.
.PP
Four embedded fonts are used:
.IP\(bu5
.PP
DOCFONT_REGULAR: the default font used for text,
.IP\(bu5
.PP
DOCFONT_BOLD: a boldface font used for heading and strong text,
.IP\(bu5
.PP
DOCFONT_ITALIC: an italic/oblique font used for emphasized text, and
.IP\(bu5
.PP
DOCFONT_MONOSPACE: a fixed\-width font used for code.
.PP
By default the code uses the base PostScript fonts Helvetica, Helvetica\-Bold, Helvetica\-Oblique, and Courier. The USE_TRUETYPE define can be used to replace these with the Roboto TrueType fonts.
.PP
Embedded JPEG and PNG images are copied into the PDF document, with the images array containing the list of the images and their objects.
.PP
The title member contains the document title, while the heading member contains the current heading text.
.PP
The actions array contains a list of action dictionaries for interior document links that need to be resolved, while the targets array keeps track of the location of the headings in the PDF document.
.PP
The toc array contains a list of headings and is used to construct the PDF outlines dictionaries/objects, which provides a table of contents for navigation in most PDF readers.
.PP
Page State
.PP
The st member provides the stream for the current page content. The color, font, fsize, and y members provide the current graphics state on the page.
.PP
The annots_array, annots_obj, num_links, and links members contain a list of hyperlinks on the current page.
.PP
Creating Pages
.PP
The new_page function is used to start a new page. Aside from creating the new page object and stream, it adds a standard header and footer to the page. It starts by closing the current page if it is open:
.nf
// Close the current page...
if (dd\->st)
{
pdfioStreamClose(dd\->st);
add_links(dd);
}
.fi
.PP
The new page needs a dictionary containing any link annotations, the media and art boxes, the four fonts, and any images:
The footer contains the same dark gray separating line with the current heading and page number on opposite sides. The page number is always positioned on the outer edge for a two\-sided print \- right justified on odd numbered pages and left justified on even numbered pages:
Four functions handle the formatting of the markdown document:
.IP\(bu5
.PP
format_block formats a single paragraph, heading, or table cell,
.IP\(bu5
.PP
format_code: formats a block of code,
.IP\(bu5
.PP
format_doc: formats the document as a whole, and
.IP\(bu5
.PP
format_table: formats a table.
.PP
Formatted content is organized into arrays of linefrag_t and tablerow_t structures for a line of content or row of table cells, respectively.
.PP
High\-Level Formatting
.PP
The format_doc function iterates over the block nodes in the markdown document. We map a "thematic break" (horizontal rule) to a page break, which is implemented by moving the current vertical position to the bottom of the page:
.nf
case MMD_TYPE_THEMATIC_BREAK :
// Force a page break
dd\->y = dd\->art_box.y1;
break;
.fi
.PP
A block quote is indented and uses the italic font by default:
.nf
case MMD_TYPE_BLOCK_QUOTE :
format_doc(dd, current, DOCFONT_ITALIC, left + BQ_PADDING, right \- BQ_PADDING);
break;
.fi
.PP
Lists have a leading blank line and are indented:
.nf
case MMD_TYPE_ORDERED_LIST :
case MMD_TYPE_UNORDERED_LIST :
if (dd\->st)
dd\->y \-= SIZE_BODY * LINE_HEIGHT;
format_doc(dd, current, deffont, left + LIST_PADDING, right);
break;
.fi
.PP
List items do not have a leading blank line and make use of leader text that is shown in front of the list text. The leader text is either the current item number or a bullet, which then is directly formatted using the format_block function:
Formatting Paragraphs, Headings, List Items, and Table Cells
.PP
Paragraphs, headings, list items, and table cells all use the same basic formatting algorithm. Text, checkboxes, and images are collected until the nodes in the current block are used up or the content reaches the right margin.
.PP
In order to keep adjacent blocks of text together, the formatting algorithm makes sure that at least 3 lines of text can fit before the bottom edge of the page:
.nf
if (mmdGetNextSibling(block))
need_bottom = 3.0 * SIZE_BODY * LINE_HEIGHT;
else
need_bottom = 0.0;
.fi
.PP
Leader text (used for list items) is right justified to the left margin and becomes the first fragment on the line when present.
pdfioContentPathLineTo(dd\->st, left \- 6.0, dd\->y + fsize);
pdfioContentStroke(dd\->st);
pdfioContentRestore(dd\->st);
}
.fi
.PP
Finally, we add the current content fragment to the array:
.nf
// Add the current node to the fragment list
if (num_frags == 0)
{
// No leading whitespace at the start of the line
ws = false;
wswidth = 0.0;
}
frag\->type = type;
frag\->x = x;
frag\->width = width + wswidth;
frag\->height = text ? fsize : height;
frag\->imagenum = imagenum;
frag\->text = text;
frag\->url = url;
frag\->ws = ws;
frag\->font = font;
frag\->color = color;
num_frags ++;
frag ++;
x += width + wswidth;
if (height > lineheight)
lineheight = height;
.fi
.PP
Formatting Code Blocks
.PP
Code blocks consist of one or more lines of plain monospaced text. We draw a light gray background behind each line with a small bit of padding at the top and bottom:
.nf
// Draw the top padding...
set_color(dd, DOCCOLOR_LTGRAY);
pdfioContentPathRect(dd\->st, left \- CODE_PADDING, dd\->y + SIZE_CODEBLOCK,
right \- left + 2.0 * CODE_PADDING, CODE_PADDING);
pdfioContentFillAndStroke(dd\->st, false);
// Start a code text block...
set_font(dd, DOCFONT_MONOSPACE, SIZE_CODEBLOCK);
pdfioContentTextBegin(dd\->st);
pdfioContentTextMoveTo(dd\->st, left, dd\->y);
for (code = mmdGetFirstChild(block); code; code = mmdGetNextSibling(code))
{
set_color(dd, DOCCOLOR_LTGRAY);
pdfioContentPathRect(dd\->st, left \- CODE_PADDING,
right \- left + 2.0 * CODE_PADDING, CODE_PADDING);
pdfioContentFillAndStroke(dd\->st, false);
.fi
.PP
Formatting Tables
.PP
Tables are the most difficult to format. We start by scanning the entire table and measuring every cell with the measure_cell function:
.nf
for (num_cols = 0, num_rows = 0, rowptr = rows, current = mmdGetFirstChild(table);
current && num_rows < TABLEROW_MAX;
current = next)
{
next = mmd_walk_next(table, current);
type = mmdGetType(current);
if (type == MMD_TYPE_TABLE_ROW)
{
// Parse row...
for (col = 0, current = mmdGetFirstChild(current);
current && num_cols < TABLECOL_MAX;
current = mmdGetNextSibling(current), col ++)
{
rowptr\->cells[col] = current;
measure_cell(dd, current, cols + col);
if (col >= num_cols)
num_cols = col + 1;
}
rowptr ++;
num_rows ++;
}
}
.fi
.PP
The measure_cell function also updates the minimum and maximum width needed for each column. To this we add the cell padding to compute the total table width:
.nf
// Figure out the width of each column...
for (col = 0, table_width = 0.0; col < num_cols; col ++)
{
cols[col].max_width += 2.0 * TABLE_PADDING;
table_width += cols[col].max_width;
cols[col].width = cols[col].max_width;
}
.fi
.PP
If the calculated width is more than the available width, we need to adjust the width of the columns. The algorithm used here breaks the available width into N equal\-width columns \- any columns wider than this will be scaled proportionately. This works out as two steps \- one to calculate the the base width of "narrow" columns and a second to distribute the remaining width amongst the wider columns:
.nf
format_width = right \- left \- 2.0 * TABLE_PADDING * num_cols;
if (table_width > format_width)
{
// Content too wide, try scaling the widths...
double avg_width, // Average column width
base_width, // Base width
remaining_width, // Remaining width
scale_width; // Width for scaling
size_t num_remaining_cols = 0; // Number of remaining columns
// First mark any columns that are narrower than the average width...
avg_width = format_width / num_cols;
for (col = 0, base_width = 0.0, remaining_width = 0.0; col < num_cols; col ++)
{
if (cols[col].width > avg_width)
{
remaining_width += cols[col].width;
num_remaining_cols ++;
}
else
{
base_width += cols[col].width;
}
}
// Then proportionately distribute the remaining width to the other columns...
format_width \-= base_width;
for (col = 0, table_width = 0.0; col < num_cols; col ++)
The formatted content in arrays of linefrag_t and tablerow_t structures are passed to the render_line and render_row functions respectively to produce content in the PDF document.
.PP
Rendering a Line in a Paragraph, Heading, or Table Cell
.PP
The render_line function adds content from the linefrag_t array to a PDF page. It starts by determining whether a new page is needed:
.nf
if (!dd\->st)
{
new_page(dd);
margin_top = 0.0;
}
dd\->y \-= margin_top + lineheight;
if ((dd\->y \- need_bottom) < dd\->art_box.y1)
{
new_page(dd);
dd\->y \-= lineheight;
}
.fi
.PP
We then loops through the fragments for the current line, drawing checkboxes, images, and text as needed. When a hyperlink is present, we add the link to the links array in the docdata_t structure, mapping "@" and "@@" to an internal link corresponding to the linked text: