dany/pdfio

mirror of https://github.com/michaelrsweet/pdfio.git synced 2024-11-08 14:38:27 +01:00

addition of lines requeested

2024-10-15 09:38:01 +05:30

34 KiB

Raw Blame History

Introduction

PDFio is a simple C library for reading and writing PDF files. The primary goals of pdfio are:

Read and write any version of PDF file
Provide access to pages, objects, and streams within a PDF file
Support reading and writing of encrypted PDF files
Extract or embed useful metadata (author, creator, page information, etc.)
"Filter" PDF files, for example to extract a range of pages or to embed fonts that are missing from a PDF
Provide access to objects used for each page

PDFio is not concerned with rendering or viewing a PDF file, although a PDF RIP or viewer could be written using it.

PDFio is Copyright © 2021-2024 by Michael R Sweet and is licensed under the Apache License Version 2.0 with an (optional) exception to allow linking against GPL2/LGPL2 software. See the files "LICENSE" and "NOTICE" for more information.

Requirements

PDFio requires the following to build the software:

A C99 compiler such as Clang, GCC, or MS Visual C
A POSIX-compliant make program
A POSIX-compliant sh program
ZLIB (https://www.zlib.net) 1.0 or higher

IDE files for Xcode (macOS/iOS) and Visual Studio (Windows) are also provided.

Installing pdfio

PDFio comes with a configure script that creates a portable makefile that will work on any POSIX-compliant system with ZLIB installed. To make it, run:

./configure
make

To test it, run:

make test

To install it, run:

sudo make install

If you want a shared library, run:

./configure --enable-shared
make
sudo make install

The default installation location is "/usr/local". Pass the --prefix option to make to install it to another location:

./configure --prefix=/some/other/directory

Other configure options can be found using the --help option:

./configure --help

Visual Studio Project

The Visual Studio solution ("pdfio.sln") is provided for Windows developers and generates both a static library and DLL.

Xcode Project

There is also an Xcode project ("pdfio.xcodeproj") you can use on macOS which generates a static library that will be installed under "/usr/local" with:

sudo xcodebuild install

Detecting PDFio

PDFio can be detected using the pkg-config command, for example:

if pkg-config --exists pdfio; then
    ...
fi

In a makefile you can add the necessary compiler and linker options with:

CFLAGS  +=      `pkg-config --cflags pdfio`
LIBS    +=      `pkg-config --libs pdfio`

On Windows, you need to link to the PDFIO1.LIB (DLL) library and include the zlib_native NuGet package dependency. You can also use the published pdfio_native NuGet package.

Header Files

PDFio provides a primary header file that is always used:

#include <pdfio.h>

PDFio also provides PDF content helper functions for producing PDF content that are defined in a separate header file:

#include <pdfio-content.h>

API Overview

PDFio exposes several types:

pdfio_file_t: A PDF file (for reading or writing)
pdfio_array_t: An array of values
pdfio_dict_t: A dictionary of key/value pairs in a PDF file, object, etc.
pdfio_obj_t: An object in a PDF file
pdfio_stream_t: An object stream

Understanding PDF Files

A PDF file provides data and commands for displaying pages of graphics and text, and is structured in a way that allows it to be displayed in the same way across multiple devices and platforms.
The following is a PDF which shows "Hello, World!" on one page:

%PDF-1.0                              %Header starts here
%âãÏÓ
1 0 obj                               %Body starts here
<<
/Kids [2 0 R]
/Count 1
/Type /Pages
>>
endobj
2 0 obj
<<
/Rotate 0
/Parent 1 0 R
/Resources 3 0 R
/MediaBox [0 0 612 792]
/Contents [4 0 R]/Type /Page
>>
endobj
3 0 obj
<<
/Font
<<
/F0
<<
/BaseFont /Times-Italic
/Subtype /Type1
/Type /Font
>>
>>
>>
endobj
4 0 obj
<<
/Length 65
>>
stream
1. 0. 0. 1. 50. 700. cm
BT
  /F0 36. Tf
  (Hello, World!) Tj
ET
endstream
endobj
5 0 obj
<<
/Pages 1 0 R
/Type /Catalog
>>
endobj
xref                               %Cross-reference table starts here
0 6
0000000000 65535 f
0000000015 00000 n
0000000074 00000 n
0000000192 00000 n
0000000291 00000 n
0000000409 00000 n
trailer                            %Trailer starts here
<<
/Root 5 0 R
/Size 6
>>
startxref
459
%%EOF

Header

This is the first line of a PDF File. This specifies the version of PDF Format used.
For Example: '%PDF-1.0'

Since PDF files almost always contain binary data, they can become corrupted if line endings are changed (for example, if the file is transferred over FTP in text mode). To allow legacy file transfer programs to determine that the file is binary, it is usual to include some bytes withcharacter codes higher than 127 in the header.

For example: %âãÏÓ
The percent sign indicates another header line, the other few bytes are arbitrary character codes in excess of 127. So, the whole header in our example is:

%PDF-1.0  
%âãÏÓ

Body

The file body consists of a sequence of objects, each preceded by an object number, generation number, and the obj keyword on one line, and followed by the endobj keyword on another. For Example:

1 0 obj
<<
/Kids [2 0 R]
/Count 1
/Type /Pages
>>
endobj

Here, the object number is 1, and the generation number is 0 (it almost always is). The content for object 1 is in between the two lines 1 0 obj and endobj.
In this case, it’s the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>>

Cross-Reference Table

The cross-reference table lists the byte offset of each object in the file body. This allows random access to objects, meaning they don't have to be read in order.
Objects that are not used are never read, making the process efficient. Operations like counting the number of pages in a PDF document are fast, even in large files. Each object has an object number and a generation number.

Generation numbers are used when a cross-reference table entry is reused.
For simplicity, we will assume generation numbers to be always zero and ignore them.
The cross-reference table consists of:
Header line that indicates the number of entries.
Special entry (the first entry).
One line for each of the object in the file body.

0 6                  %Six entries in table, starting at 0
0000000000 65535 f   %Special entry
0000000015 00000 n   %Object 1 is at byte offset 15
0000000074 00000 n   %Object 2 is at byte offset 74
0000000192 00000 n   %etc...
0000000291 00000 n  
0000000409 00000 n   %Object 5 is at byte offset 409

Trailer

The first line of the trailer is just the trailer keyword. This is followed by the trailer dictionary, which contains at least the /Size entry (Number of entries in the cross-reference table) and the /Root entry (Object number of the document catalog, which is the root element of the graph of objects in the body).
There follows a line with just the startxref keyword, a line with a single number (the byte offset of the start of the cross-reference table within the file), and then the line %%EOF, which signals the end of the PDF file.

trailer          %Trailer keyword
<<               %The trailer dictinonary
/Root 5 0 R
/Size 6
>>
startxref        %startxref keyword
459              %Byte offset of cross-reference table
%%EOF            %End-of-file marker

How a PDF File is Read

To read a PDF file, converting it from a flat series of bytes into a graph of objects in memory, the following steps might typically occur:

Read the PDF header from the beginning of the file, checking that this is, indeed, a PDF document and retrieving its version number.
The end-of-file marker is now found, by searching backward from the end of the file. The trailer dictionary can now be read, and the byte offset of the start of the cross-reference table retrieved.
The cross-reference table can now be read. We now know where each object in the file is.
At this stage, all the objects can be read and parsed, or we can leave this process until each object is actually needed, reading it on demand.
We can now use the data, extracting the pages, parsing graphical content, extracting metadata, and so on.
This is not an exhaustive description, since there are many possible complications (encryption, linearization, objects, and cross reference streams).

How a PDF File is Written

Writing a PDF document to a series of bytes in a file is much simpler than reading it—we don’t need to support all of the PDF format, just the subset we intend to use. Writing a PDF file is very fast, since it amounts to little more than flattening the object graph to a series of bytes.

Output the header.
Remove any objects which are not referenced by any other object in the PDF. This avoids writing objects which are no longer needed.
Renumber the objects so they run from 1 to n where n is the number of objects in the file.
Output the objects one by one, starting with object number one, recording the byte offset of each for the cross-reference table.
Write the cross-reference table.
Write the trailer, trailer dictionary, and end-of-file marker.

Reading PDF Files

You open an existing PDF file using the pdfioFileOpen function:

pdfio_file_t *pdf = pdfioFileOpen("myinputfile.pdf", password_cb, password_data,
                                  error_cb, error_data);

where the five arguments to the function are the filename ("myinputfile.pdf"), an optional password callback function (password_cb) and data pointer value (password_data), and an optional error callback function (error_cb) and data pointer value (error_data). The password callback is called for encrypted PDF files that are not using the default password, for example:

const char *
password_cb(void *data, const char *filename)
{
  (void)data;     // This callback doesn't use the data pointer
  (void)filename; // This callback doesn't use the filename

  // Return a password string for the file...
  return ("Password42");
}

The error callback is called for both errors and warnings and accepts the pdfio_file_t pointer, a message string, and the callback pointer value, for example:

bool
error_cb(pdfio_file_t *pdf, const char *message, void *data)
{
  (void)data; // This callback does not use the data pointer

  fprintf(stderr, "%s: %s\n", pdfioFileGetName(pdf), message);

  // Return false to treat warnings as errors
  return (false);
}

The default error callback (NULL) does the equivalent of the above.

Each PDF file contains one or more pages. The pdfioFileGetNumPages function returns the number of pages in the file while the pdfioFileGetPage function gets the specified page in the PDF file:

pdfio_file_t *pdf;   // PDF file
size_t       i;      // Looping var
size_t       count;  // Number of pages
pdfio_obj_t  *page;  // Current page

// Iterate the pages in the PDF file
for (i = 0, count = pdfioFileGetNumPages(pdf); i < count; i ++)
{
  page = pdfioFileGetPage(pdf, i);
  // do something with page
}

Each page is represented by a "page tree" object (what pdfioFileGetPage returns) that specifies information about the page and one or more "content" objects that contain the images, fonts, text, and graphics that appear on the page. Use the pdfioPageGetNumStreams and pdfioPageOpenStream functions to access the content streams for each page, and pdfioObjGetDict to get the associated page object dictionary. For example, if you want to display the media and crop boxes for a given page:

pdfio_file_t  *pdf;             // PDF file
size_t        i;                // Looping var
size_t        count;            // Number of pages
pdfio_obj_t   *page;            // Current page
pdfio_dict_t  *dict;            // Current page dictionary
pdfio_array_t *media_box;       // MediaBox array
double        media_values[4];  // MediaBox values
pdfio_array_t *crop_box;        // CropBox array
double        crop_values[4];   // CropBox values

// Iterate the pages in the PDF file
for (i = 0, count = pdfioFileGetNumPages(pdf); i < count; i ++)
{
  page = pdfioFileGetPage(pdf, i);
  dict = pdfioObjGetDict(page);

  media_box       = pdfioDictGetArray(dict, "MediaBox");
  media_values[0] = pdfioArrayGetNumber(media_box, 0);
  media_values[1] = pdfioArrayGetNumber(media_box, 1);
  media_values[2] = pdfioArrayGetNumber(media_box, 2);
  media_values[3] = pdfioArrayGetNumber(media_box, 3);

  crop_box       = pdfioDictGetArray(dict, "CropBox");
  crop_values[0] = pdfioArrayGetNumber(crop_box, 0);
  crop_values[1] = pdfioArrayGetNumber(crop_box, 1);
  crop_values[2] = pdfioArrayGetNumber(crop_box, 2);
  crop_values[3] = pdfioArrayGetNumber(crop_box, 3);

  printf("Page %u: MediaBox=[%g %g %g %g], CropBox=[%g %g %g %g]\n",
         (unsigned)(i + 1),
         media_values[0], media_values[1], media_values[2], media_values[3],
         crop_values[0], crop_values[1], crop_values[2], crop_values[3]);
}

Page object dictionaries have several (mostly optional) key/value pairs, including:

"Annots": An array of annotation dictionaries for the page; use pdfioDictGetArray to get the array
"CropBox": The crop box as an array of four numbers for the left, bottom, right, and top coordinates of the target media; use pdfioDictGetArray to get a pointer to the array of numbers
"Dur": The number of seconds the page should be displayed; use pdfioDictGetNumber to get the page duration value
"Group": The dictionary of transparency group values for the page; use pdfioDictGetDict to get a pointer to the resources dictionary
"LastModified": The date and time when this page was last modified; use pdfioDictGetDate to get the Unix time_t value
"Parent": The parent page tree node object for this page; use pdfioDictGetObj to get a pointer to the object
"MediaBox": The media box as an array of four numbers for the left, bottom, right, and top coordinates of the target media; use pdfioDictGetArray to get a pointer to the array of numbers
"Resources": The dictionary of resources for the page; use pdfioDictGetDict to get a pointer to the resources dictionary
"Rotate": A number indicating the number of degrees of counter-clockwise rotation to apply to the page when viewing; use pdfioDictGetNumber to get the rotation angle
"Thumb": A thumbnail image object for the page; use pdfioDictGetObj to get a pointer to the thumbnail image object
"Trans": The page transition dictionary; use pdfioDictGetDict to get a pointer to the dictionary

The pdfioFileClose function closes a PDF file and frees all memory that was used for it:

pdfioFileClose(pdf);

Writing PDF Files

You create a new PDF file using the pdfioFileCreate function:

pdfio_rect_t media_box = { 0.0, 0.0, 612.0, 792.0 };  // US Letter
pdfio_rect_t crop_box = { 36.0, 36.0, 576.0, 756.0 }; // w/0.5" margins

pdfio_file_t *pdf = pdfioFileCreate("myoutputfile.pdf", "2.0", &media_box, &crop_box, error_cb, error_data);

where the six arguments to the function are the filename ("myoutputfile.pdf"), PDF version ("2.0"), media box (media_box), crop box (crop_box), an optional error callback function (error_cb), and an optional pointer value for the error callback function (error_data). The units for the media and crop boxes are points (1/72nd of an inch).

Alternately you can stream a PDF file using the pdfioFileCreateOutput function:

pdfio_rect_t media_box = { 0.0, 0.0, 612.0, 792.0 };  // US Letter
pdfio_rect_t crop_box = { 36.0, 36.0, 576.0, 756.0 }; // w/0.5" margins

pdfio_file_t *pdf = pdfioFileCreateOutput(output_cb, output_ctx, "2.0", &media_box, &crop_box, error_cb, error_data);

Once the file is created, use the pdfioFileCreateObj, pdfioFileCreatePage, and pdfioPageCopy functions to create objects and pages in the file.

Finally, the pdfioFileClose function writes the PDF cross-reference and "trailer" information, closes the file, and frees all memory that was used for it.

PDF Objects

PDF objects are identified using two numbers - the object number (1 to N) and the object generation (0 to 65535) that specifies a particular version of an object. An object's numbers are returned by the pdfioObjGetNumber and pdfioObjGetGeneration functions. You can find a numbered object using the pdfioFileFindObj function.

Objects contain values (typically dictionaries) and usually an associated data stream containing images, fonts, ICC profiles, and page content. PDFio provides several accessor functions to get the value(s) associated with an object:

pdfioObjGetArray returns an object's array value, if any
pdfioObjGetDict returns an object's dictionary value, if any
pdfioObjGetLength returns the length of the data stream, if any
pdfioObjGetSubtype returns the sub-type name of the object, for example "Image" for an image object.
pdfioObjGetType returns the type name of the object, for example "XObject" for an image object.

PDF Streams

Some PDF objects have an associated data stream, such as for pages, images, ICC color profiles, and fonts. You access the stream for an existing object using the pdfioObjOpenStream function:

pdfio_file_t *pdf = pdfioFileOpen(...);
pdfio_obj_t *obj = pdfioFileFindObj(pdf, number);
pdfio_stream_t *st = pdfioObjOpenStream(obj, true);

The first argument is the object pointer. The second argument is a boolean value that specifies whether you want to decode (typically decompress) the stream data or return it as-is.

When reading a page stream you'll use the pdfioPageOpenStream function instead:

pdfio_file_t *pdf = pdfioFileOpen(...);
pdfio_obj_t *obj = pdfioFileGetPage(pdf, number);
pdfio_stream_t *st = pdfioPageOpenStream(obj, 0, true);

Once you have the stream open, you can use one of several functions to read from it:

pdfioStreamConsume reads and discards a number of bytes in the stream
pdfioStreamGetToken reads a PDF token from the stream
pdfioStreamPeek peeks at the next stream data without advancing or "consuming" it
pdfioStreamRead reads a buffer of data

When you are done reading from the stream, call the pdfioStreamClose function:

pdfioStreamClose(st);

To create a stream for a new object, call the pdfioObjCreateStream function:

pdfio_file_t *pdf = pdfioFileCreate(...);
pdfio_obj_t *obj = pdfioFileCreateObj(pdf, ...);
pdfio_stream_t *st = pdfioObjCreateStream(obj, PDFIO_FILTER_FLATE);

The first argument is the newly created object. The second argument is either PDFIO_FILTER_NONE to specify that any encoding is done by your program or PDFIO_FILTER_FLATE to specify that PDFio should Flate compress the stream.

To create a page content stream call the pdfioFileCreatePage function:

pdfio_file_t *pdf = pdfioFileCreate(...);
pdfio_dict_t *dict = pdfioDictCreate(pdf);
... set page dictionary keys and values ...
pdfio_stream_t *st = pdfioFileCreatePage(pdf, dict);

Once you have created the stream, use any of the following functions to write to the stream:

pdfioStreamPrintf writes a formatted string to the stream
pdfioStreamPutChar writes a single character to the stream
pdfioStreamPuts writes a C string to the stream
pdfioStreamWrite writes a buffer of data to the stream

The PDF content helper functions provide additional functions for writing specific PDF page stream commands.

When you are done writing the stream, call pdfioStreamClose to close both the stream and the object.

PDF Content Helper Functions

PDFio includes many helper functions for embedding or writing specific kinds of content to a PDF file. These functions can be roughly grouped into five categories:

Color Space Functions

PDF color spaces are specified using well-known names like "DeviceCMYK", "DeviceGray", and "DeviceRGB" or using arrays that define so-called calibrated color spaces. PDFio provides several functions for embedding ICC profiles and creating color space arrays:

pdfioArrayCreateColorFromICCObj creates a color array for an ICC color profile object
pdfioArrayCreateColorFromMatrix creates a color array using a CIE XYZ color transform matrix, a gamma value, and a CIE XYZ white point
pdfioArrayCreateColorFromPalette creates an indexed color array from an array of sRGB values
pdfioArrayCreateColorFromPrimaries creates a color array using CIE XYZ primaries and a gamma value
pdfioArrayCreateColorFromStandard creates a color array for a standard color space

You can embed an ICC color profile using the pdfioFileCreateICCObjFromFile function:

pdfio_file_t *pdf = pdfioFileCreate(...);
pdfio_obj_t *icc = pdfioFileCreateICCObjFromFile(pdf, "filename.icc");

where the first argument is the PDF file and the second argument is the filename of the ICC color profile.

PDFio also includes predefined constants for creating a few standard color spaces:

pdfio_file_t *pdf = pdfioFileCreate(...);

// Create an AdobeRGB color array
pdfio_array_t *adobe_rgb = pdfioArrayCreateColorFromStandard(pdf, 3, PDFIO_CS_ADOBE);

// Create an Display P3 color array
pdfio_array_t *display_p3 = pdfioArrayCreateColorFromStandard(pdf, 3, PDFIO_CS_P3_D65);

// Create an sRGB color array
pdfio_array_t *srgb = pdfioArrayCreateColorFromStandard(pdf, 3, PDFIO_CS_SRGB);

Font Object Functions

PDF supports many kinds of fonts, including PostScript Type1, PDF Type3, TrueType/OpenType, and CID. PDFio provides two functions for creating font objects. The first is pdfioFileCreateFontObjFromBase which creates a font object for one of the base PDF fonts:

"Courier"
"Courier-Bold"
"Courier-BoldItalic"
"Courier-Italic"
"Helvetica"
"Helvetica-Bold"
"Helvetica-BoldOblique"
"Helvetica-Oblique"
"Symbol"
"Times-Bold"
"Times-BoldItalic"
"Times-Italic"
"Times-Roman"
"ZapfDingbats"

PDFio always uses the Windows CP1252 subset of Unicode for these fonts.

The second function is pdfioFileCreateFontObjFromFile which creates a font object from a TrueType/OpenType font file, for example:

pdfio_file_t *pdf = pdfioFileCreate(...);
pdfio_obj_t *arial = pdfioFileCreateFontObjFromFile(pdf, "OpenSans-Regular.ttf", false);

will embed an OpenSans Regular TrueType font using the Windows CP1252 subset of Unicode. Pass true for the third argument to embed it as a Unicode CID font instead, for example:

pdfio_file_t *pdf = pdfioFileCreate(...);
pdfio_obj_t *arial = pdfioFileCreateFontObjFromFile(pdf, "NotoSansJP-Regular.otf", true);

will embed the NotoSansJP Regular OpenType font with full support for Unicode.

Note: Not all fonts support Unicode.

Image Object Functions

PDF supports images with many different color spaces and bit depths with optional transparency. PDFio provides two helper functions for creating image objects that can be referenced in page streams. The first function is pdfioFileCreateImageObjFromData which creates an image object from data in memory, for example:

pdfio_file_t *pdf = pdfioFileCreate(...);
unsigned char data[1024 * 1024 * 4]; // 1024x1024 RGBA image data
pdfio_obj_t *img = pdfioFileCreateImageObjFromData(pdf, data, /*width*/1024, /*height*/1024, /*num_colors*/3, /*color_data*/NULL, /*alpha*/true, /*interpolate*/false);

will create an object for a 1024x1024 RGBA image in memory, using the default color space for 3 colors ("DeviceRGB"). We can use one of the color space functions to use a specific color space for this image, for example:

pdfio_file_t *pdf = pdfioFileCreate(...);

// Create an AdobeRGB color array
pdfio_array_t *adobe_rgb = pdfioArrayCreateColorFromMatrix(pdf, 3, pdfioAdobeRGBGamma, pdfioAdobeRGBMatrix, pdfioAdobeRGBWhitePoint);

// Create a 1024x1024 RGBA image using AdobeRGB
unsigned char data[1024 * 1024 * 4]; // 1024x1024 RGBA image data
pdfio_obj_t *img = pdfioFileCreateImageObjFromData(pdf, data, /*width*/1024, /*height*/1024, /*num_colors*/3, /*color_data*/adobe_rgb, /*alpha*/true, /*interpolate*/false);

The "interpolate" argument specifies whether the colors in the image should be smoothed/interpolated when scaling. This is most useful for photographs but should be false for screenshot and barcode images.

If you have a JPEG or PNG file, use the pdfioFileCreateImageObjFromFile function to copy the image into a PDF image object, for example:

pdfio_file_t *pdf = pdfioFileCreate(...);
pdfio_obj_t *img = pdfioFileCreateImageObjFromFile(pdf, "myphoto.jpg", /*interpolate*/true);

Page Dictionary Functions

PDF pages each have an associated dictionary to specify the images, fonts, and color spaces used by the page. PDFio provides functions to add these resources to the dictionary:

pdfioPageDictAddColorSpace adds a named color space to the page dictionary
pdfioPageDictAddFont adds a named font to the page dictionary
pdfioPageDictAddImage adds a named image to the page dictionary

Page Stream Functions

PDF page streams contain textual commands for drawing on the page. PDFio provides many functions for writing these commands with the correct format and escaping, as needed:

pdfioContentClip clips future drawing to the current path
pdfioContentDrawImage draws an image object
pdfioContentFill fills the current path
pdfioContentFillAndStroke fills and strokes the current path
pdfioContentMatrixConcat concatenates a matrix with the current transform matrix
pdfioContentMatrixRotate concatenates a rotation matrix with the current transform matrix
pdfioContentMatrixScale concatenates a scaling matrix with the current transform matrix
pdfioContentMatrixTranslate concatenates a translation matrix with the current transform matrix
pdfioContentPathClose closes the current path
pdfioContentPathCurve appends a Bezier curve to the current path
pdfioContentPathCurve13 appends a Bezier curve with 2 control points to the current path
pdfioContentPathCurve23 appends a Bezier curve with 2 control points to the current path
pdfioContentPathLineTo appends a line to the current path
pdfioContentPathMoveTo moves the current point in the current path
pdfioContentPathRect appends a rectangle to the current path
pdfioContentRestore restores a previous graphics state
pdfioContentSave saves the current graphics state
pdfioContentSetDashPattern sets the line dash pattern
pdfioContentSetFillColorDeviceCMYK sets the current fill color using a device CMYK color
pdfioContentSetFillColorDeviceGray sets the current fill color using a device gray color
pdfioContentSetFillColorDeviceRGB sets the current fill color using a device RGB color
pdfioContentSetFillColorGray sets the current fill color using a calibrated gray color
pdfioContentSetFillColorRGB sets the current fill color using a calibrated RGB color
pdfioContentSetFillColorSpace sets the current fill color space
pdfioContentSetFlatness sets the flatness for curves
pdfioContentSetLineCap sets how the ends of lines are stroked
pdfioContentSetLineJoin sets how connections between lines are stroked
pdfioContentSetLineWidth sets the width of stroked lines
pdfioContentSetMiterLimit sets the miter limit for stroked lines
pdfioContentSetStrokeColorDeviceCMYK sets the current stroke color using a device CMYK color
pdfioContentSetStrokeColorDeviceGray sets the current stroke color using a device gray color
pdfioContentSetStrokeColorDeviceRGB sets the current stroke color using a device RGB color
pdfioContentSetStrokeColorGray sets the current stroke color using a calibrated gray color
pdfioContentSetStrokeColorRGB sets the current stroke color using a calibrated RGB color
pdfioContentSetStrokeColorSpace sets the current stroke color space
pdfioContentSetTextCharacterSpacing sets the spacing between characters for text
pdfioContentSetTextFont sets the font and size for text
pdfioContentSetTextLeading sets the line height for text
pdfioContentSetTextMatrix concatenates a matrix with the current text matrix
pdfioContentSetTextRenderingMode sets the text rendering mode
pdfioContentSetTextRise adjusts the baseline for text
pdfioContentSetTextWordSpacing sets the spacing between words for text
pdfioContentSetTextXScaling sets the horizontal scaling for text
pdfioContentStroke strokes the current path
pdfioContentTextBegin begins a block of text
pdfioContentTextEnd ends a block of text
pdfioContentTextMoveLine moves to the next line with an offset in a text block
pdfioContentTextMoveTo moves within the current line in a text block
pdfioContentTextNewLine moves to the beginning of the next line in a text block
pdfioContentTextNewLineShow moves to the beginning of the next line in a text block and shows literal text with optional word and character spacing
pdfioContentTextNewLineShowf moves to the beginning of the next line in a text block and shows formatted text with optional word and character spacing
pdfioContentTextShow draws a literal string in a text block
pdfioContentTextShowf draws a formatted string in a text block
pdfioContentTextShowJustified draws an array of literal strings with offsets between them

Examples

Read PDF Metadata

The following example function will open a PDF file and print the title, author, creation date, and number of pages:

#include <pdfio.h>
#include <time.h>


void
show_pdf_info(const char *filename)
{
  pdfio_file_t *pdf;
  time_t       creation_date;
  struct tm    *creation_tm;
  char         creation_text[256];


  // Open the PDF file with the default callbacks...
  pdf = pdfioFileOpen(filename, /*password_cb*/NULL, /*password_cbdata*/NULL, /*error_cb*/NULL, /*error_cbdata*/NULL);
  if (pdf == NULL)
    return;

  // Get the creation date and convert to a string...
  creation_date = pdfioFileGetCreationDate(pdf);
  creation_tm   = localtime(&creation_date);
  strftime(creation_text, sizeof(creation_text), "%c", &creation_tm);

  // Print file information to stdout...
  printf("%s:\n", filename);
  printf("         Title: %s\n", pdfioFileGetTitle(pdf));
  printf("        Author: %s\n", pdfioFileGetAuthor(pdf));
  printf("    Created On: %s\n", creation_text);
  printf("  Number Pages: %u\n", (unsigned)pdfioFileGetNumPages(pdf));

  // Close the PDF file...
  pdfioFileClose(pdf);
}

Create PDF File With Text and Image

The following example function will create a PDF file, embed a base font and the named JPEG or PNG image file, and then creates a page with the image centered on the page with the text centered below:

#include <pdfio.h>
#include <pdfio-content.h>
#include <string.h>


void
create_pdf_image_file(const char *pdfname, const char *imagename, const char *caption)
{
  pdfio_file_t   *pdf;
  pdfio_obj_t    *font;
  pdfio_obj_t    *image;
  pdfio_dict_t   *dict;
  pdfio_stream_t *page;
  double         width, height;
  double         swidth, sheight;
  double         tx, ty;


  // Create the PDF file...
  pdf = pdfioFileCreate(pdfname, /*version*/NULL, /*media_box*/NULL, /*crop_box*/NULL, /*error_cb*/NULL, /*error_cbdata*/NULL);

  // Create a Courier base font for the caption
  font = pdfioFileCreateFontObjFromBase(pdf, "Courier");

  // Create an image object from the JPEG/PNG image file...
  image = pdfioFileCreateImageObjFromFile(pdf, imagename, true);

  // Create a page dictionary with the font and image...
  dict = pdfioDictCreate(pdf);
  pdfioPageDictAddFont(dict, "F1", font);
  pdfioPageDictAddImage(dict, "IM1", image);

  // Create the page and its content stream...
  page = pdfioFileCreatePage(pdf, dict);

  // Position and scale the image on the page...
  width  = pdfioImageGetWidth(image);
  height = pdfioImageGetHeight(image);

  // Default media_box is "universal" 595.28x792 points (8.27x11in or 210x279mm)
  // Use margins of 36 points (0.5in or 12.7mm) with another 36 points for the
  // caption underneath...
  swidth  = 595.28 - 72.0;
  sheight = swidth * height / width;
  if (sheight > (792.0 - 36.0 - 72.0))
  {
    sheight = 792.0 - 36.0 - 72.0;
    swidth  = sheight * width / height;
  }

  tx = 0.5 * (595.28 - swidth);
  ty = 0.5 * (792 - 36 - sheight);

  pdfioContentDrawImage(page, "IM1", tx, ty + 36.0, swidth, sheight);

  // Draw the caption in black...
  pdfioContentSetFillColorDeviceGray(page, 0.0);

  // Compute the starting point for the text - Courier is monospaced with a
  // nominal width of 0.6 times the text height...
  tx = 0.5 * (595.28 - 18.0 * 0.6 * strlen(caption));

  // Position and draw the caption underneath...
  pdfioContentTextBegin(page);
  pdfioContentSetTextFont(page, "F1", 18.0);
  pdfioContentTextMoveTo(page, tx, ty);
  pdfioContentTextShow(page, /*unicode*/false, caption);
  pdfioContentTextEnd(page);

  // Close the page stream and the PDF file...
  pdfioStreamClose(page);
  pdfioFileClose(pdf);
}

34 KiB Raw Blame History Unescape Escape