47 Commits

Author SHA1 Message Date
458f366d78 Fix some Unicode font embedding issues:
- Reworked Widths array compression for CID fonts to require at least 4 repeated
  widths.
- Fixed the embedded CMap for Unicode fonts.
2025-03-06 17:09:27 -05:00
4165cd23ba Fix some issues discovered by some PDF checking tools:
- Extremely small floating point numbers would be written with exponential
  notation my the pdfioContent functions.  They are now written with up to 6
  decimal places of precision with excess trailing 0's removed.
- 8-bit (simple) TrueType fonts were embedded without a Widths array, which
  made Acrobat Reader sad but nobody else...
- Switched to using the WinANSI base encoding, which is CP1252.
2025-03-06 16:04:00 -05:00
7e56d26ff8 Prep for release. 2025-03-06 14:41:34 -05:00
712b213ec6 Enable libpng tests in testpdfio, too. 2025-03-06 14:41:38 -05:00
b7b6655db0 Update dependencies on Windows to include libpng. 2025-03-06 14:37:44 -05:00
e9debcd169 Add some more range checking to the cmap code. 2025-03-06 14:16:38 -05:00
2f925ccd3c Update documentation and pdf2text example (Issue #95) 2025-03-06 12:40:19 -05:00
89c2a75376 Fix a potential heap overflow in the TrueType cmap code. 2025-02-24 10:55:28 -05:00
1237599dea Clean up some compiler warnings. 2025-02-22 19:48:09 -05:00
6e2e4bbcc6 Remove unnnecessary length remaining check. 2025-02-22 11:04:31 -05:00
d535067c91 Fix pkg-config dependencies. 2025-02-22 08:30:38 -05:00
e996898b57 Back out object stream changes, as they would require much more significant
reworking of the "write value" private API that I don't want to do right now.
2025-02-21 16:57:01 -05:00
aa6a20c042 Lay the groundwork for object streams. 2025-02-21 15:33:27 -05:00
f09105dd3f Add support for writing the PCLm subset of PDF (Issue #99) 2025-02-20 18:18:53 -05:00
5be5552b2b Turn write_obj_header into private API. 2025-02-20 17:37:31 -05:00
492a4f51b2 Allocate stream compression buffer. 2025-02-16 13:20:51 -05:00
44827bac1a Cleanup. 2025-02-16 12:40:39 -05:00
3fad0d6f15 Support xref streams with encrypted output. 2025-02-16 12:35:45 -05:00
aeee24b856 Add xref stream support (Issue #10) 2025-02-15 21:54:16 -05:00
8d72f22efe Add support for 'repairing' damaged PDF files (Issue #45) 2025-02-15 17:26:23 -05:00
77117ac789 Update MD5 code with proper coding style/documentation for this project. 2025-02-15 13:35:54 -05:00
fceb5a807d Update AES code with proper coding style/documentation for this project. 2025-02-15 12:56:27 -05:00
4f123c2a01 Update makesrcdist script to report all issues before exiting and fix major/minor version checks. 2025-02-15 12:30:19 -05:00
c4c8fa6036 Make sure we have all the version numbers in pdfio.h. 2025-02-15 12:25:09 -05:00
9a5c5ec65d Add support for the sRGB chunk in PNG files in addition to the cHRM and gAMA
chunks.
2025-02-14 14:51:06 -05:00
3f4308b68d Add ICC support to PNG files. 2025-02-14 14:37:08 -05:00
9e930a7c5d Add new pdfioFileCreateICCObjFromData API to DLL exports. 2025-02-14 13:23:01 -05:00
afa010cea2 Add ICC color profile support for JPEG files (Issue #7) 2025-02-14 13:22:30 -05:00
c26b200a83 Add missing symbol to DLL. 2025-02-13 19:27:04 -05:00
eff02198ab Clean up pdfioinfo example changes. 2025-02-13 19:25:44 -05:00
5f98c7838c Rename pdfioFileGetModDate to pdfioFileGetModificationDate.
Add pdfioFileSetModificationDate API.

Update DLL exports file.

Update docos and changelog.
2025-02-13 18:56:43 -05:00
4f880bc0c1 Merge pull request #88 from tlaronde/info
Extend by adding pdfioGetModDate and extend the pdfioinfo example
2025-02-13 18:47:28 -05:00
d032483ed4 Merge branch 'michaelrsweet:master' into info 2025-02-12 15:54:47 +01:00
b2fc82f3a8 Update CI dependencies.
Add libpng_native to VC++ projects.
2025-02-12 09:25:57 -05:00
b81d01f319 Fix builds without libpng. 2025-02-11 22:59:23 -05:00
1b35321615 Add PngSuite to testpdfio (Issue #90) 2025-02-11 22:54:59 -05:00
990342f2a5 Add masking, color space, and variable bit depth support (Issue #90) 2025-02-11 22:07:02 -05:00
7f5fc456bc Fix image dictionary for new libpng-based PNG image support (Issue #90) 2025-02-11 20:23:59 -05:00
7c527cc908 Fix pdfio-512.png file. 2025-02-11 20:23:28 -05:00
41d17fc4e3 Update version number in NuGet files. 2025-02-11 20:23:17 -05:00
4e89137689 Use pkg-config for compiler options.
Fix some issues with the image2pdf example code.
2025-02-11 20:22:36 -05:00
e686669b9d Save work on libpng PNG loader (Issue #90) 2025-02-10 21:25:59 -05:00
1e5cc6ffd5 Do cleanup of PNG loading code, in preparation of adding full support (Issue #90) 2025-02-10 15:54:29 -05:00
4f1b373232 Add PngSuite from http://www.schaik.com/pngsuite/ for testing PNG image
support (Issue #90)
2025-02-10 11:04:39 -05:00
6f4bfe107f Refactor pdfioFileCreateImageObjFromData to do the image writing in a separate
function (Issue #90)
2025-02-10 10:28:28 -05:00
5b5de3aff6 Update pdf2txt example to support font encodings. 2025-01-28 14:26:33 -05:00
8b2b013b36 Extend by adding pdfioGetModDate and extend the pdfioinfo example
When exploring a PDF, it may be convenient to have the typical
informations delivered by some "Document Properties"---and some more
about the MediaBox(es).

So just add the function to get the ModDate and extend the
pdfioinfo example as an example of what the library do have
and pdfioinfo as a debugging tool also.

Signed-off-by: Thierry LARONDE <tlaronde@kergis.com>
2025-01-18 11:25:36 +01:00
118 changed files with 4319 additions and 1175 deletions

View File

@ -17,7 +17,7 @@ jobs:
- name: Update Build Environment
run: sudo apt-get update --fix-missing -y
- name: Install Prerequisites
run: sudo apt-get install -y cppcheck zlib1g-dev
run: sudo apt-get install -y cppcheck zlib1g-dev libpng-dev
- name: Configure PDFio
run: ./configure --enable-debug --enable-sanitizer --enable-maintainer
- name: Build PDFio

View File

@ -32,7 +32,7 @@ jobs:
run: sudo apt-get update --fix-missing -y
- name: Install Prerequisites
run: sudo apt-get install -y zlib1g-dev
run: sudo apt-get install -y zlib1g-dev libpng-dev
- name: Initialize CodeQL
uses: github/codeql-action/init@v2

View File

@ -12,7 +12,7 @@ jobs:
- name: Update Build Environment
run: sudo apt-get update --fix-missing -y
- name: Install Prerequisites
run: sudo apt-get install -y zlib1g-dev
run: sudo apt-get install -y zlib1g-dev libpng-dev
- name: Download Coverity Build Tool
run: |
wget -q https://scan.coverity.com/download/linux64 --post-data token="$TOKEN&project=$GITHUB_REPOSITORY" -O cov-analysis-linux64.tar.gz

View File

@ -1,6 +1,26 @@
Changes in PDFio
================
v1.5.0 - 2025-03-06
-------------------
- Added support for embedded color profiles in JPEG images (Issue #7)
- Added `pdfioFileCreateICCObjFromData` API.
- Added support for writing cross-reference streams for PDF 1.5 and newer files
(Issue #10)
- Added `pdfioFileGetModDate()` API (Issue #88)
- Added support for using libpng to embed PNG images in PDF output (Issue #90)
- Added support for writing the PCLm subset of PDF (Issue #99)
- Now support opening damaged PDF files (Issue #45)
- Updated documentation (Issue #95)
- Updated the pdf2txt example to support font encodings.
- Fixed potential heap/integer overflow issues in the TrueType cmap code.
- Fixed an output issue for extremely small `double` values with the
`pdfioContent` APIs.
- Fixed a missing Widths array issue for embedded TrueType fonts.
- Fixed some Unicode font embedding issues.
v1.4.1 - 2025-01-24
-------------------

View File

@ -15,7 +15,7 @@
.SILENT:
# Version number...
# Version numbers...
PDFIO_VERSION = @PDFIO_VERSION@
PDFIO_VERSION_MAJOR = @PDFIO_VERSION_MAJOR@
PDFIO_VERSION_MINOR = @PDFIO_VERSION_MINOR@

77
configure vendored
View File

@ -1,6 +1,6 @@
#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.71 for pdfio 1.4.1.
# Generated by GNU Autoconf 2.71 for pdfio 1.5.0.
#
# Report bugs to <https://github.com/michaelrsweet/pdfio/issues>.
#
@ -610,8 +610,8 @@ MAKEFLAGS=
# Identity of this package.
PACKAGE_NAME='pdfio'
PACKAGE_TARNAME='pdfio'
PACKAGE_VERSION='1.4.1'
PACKAGE_STRING='pdfio 1.4.1'
PACKAGE_VERSION='1.5.0'
PACKAGE_STRING='pdfio 1.5.0'
PACKAGE_BUGREPORT='https://github.com/michaelrsweet/pdfio/issues'
PACKAGE_URL='https://www.msweet.org/pdfio'
@ -653,6 +653,7 @@ WARNINGS
CSFLAGS
LIBPDFIO_STATIC
LIBPDFIO
PKGCONFIG_LIBPNG
PKGCONFIG_REQUIRES
PKGCONFIG_LIBS_PRIVATE
PKGCONFIG_LIBS
@ -729,6 +730,7 @@ SHELL'
ac_subst_files=''
ac_user_opts='
enable_option_checking
enable_libpng
enable_static
enable_shared
enable_debug
@ -1293,7 +1295,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF
\`configure' configures pdfio 1.4.1 to adapt to many kinds of systems.
\`configure' configures pdfio 1.5.0 to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]...
@ -1359,7 +1361,7 @@ fi
if test -n "$ac_init_help"; then
case $ac_init_help in
short | recursive ) echo "Configuration of pdfio 1.4.1:";;
short | recursive ) echo "Configuration of pdfio 1.5.0:";;
esac
cat <<\_ACEOF
@ -1367,6 +1369,8 @@ Optional Features:
--disable-option-checking ignore unrecognized --enable/--with options
--disable-FEATURE do not include FEATURE (same as --enable-FEATURE=no)
--enable-FEATURE[=ARG] include FEATURE [ARG=yes]
--enable-libpng use libpng for pdfioFileCreateImageObjFromFile,
default=auto
--disable-static do not install static library
--enable-shared install shared library
--enable-debug turn on debugging, default=no
@ -1456,7 +1460,7 @@ fi
test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then
cat <<\_ACEOF
pdfio configure 1.4.1
pdfio configure 1.5.0
generated by GNU Autoconf 2.71
Copyright (C) 2021 Free Software Foundation, Inc.
@ -1612,7 +1616,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by pdfio $as_me 1.4.1, which was
It was created by pdfio $as_me 1.5.0, which was
generated by GNU Autoconf 2.71. Invocation command line was
$ $0$ac_configure_args_raw
@ -2368,9 +2372,9 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
PDFIO_VERSION="1.4.1"
PDFIO_VERSION_MAJOR="`echo 1.4.1 | awk -F. '{print $1}'`"
PDFIO_VERSION_MINOR="`echo 1.4.1 | awk -F. '{printf("%d\n",$2);}'`"
PDFIO_VERSION="1.5.0"
PDFIO_VERSION_MAJOR="`echo 1.5.0 | awk -F. '{print $1}'`"
PDFIO_VERSION_MINOR="`echo 1.5.0 | awk -F. '{printf("%d\n",$2);}'`"
@ -4099,6 +4103,55 @@ fi
fi
# Check whether --enable-libpng was given.
if test ${enable_libpng+y}
then :
enableval=$enable_libpng;
fi
PKGCONFIG_LIBPNG=""
if test "x$PKGCONFIG" != x -a x$enable_libpng != xno
then :
{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for libpng-1.6.x" >&5
printf %s "checking for libpng-1.6.x... " >&6; }
if $PKGCONFIG --exists libpng16
then :
{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
printf "%s\n" "yes" >&6; };
printf "%s\n" "#define HAVE_LIBPNG 1" >>confdefs.h
CPPFLAGS="$($PKGCONFIG --cflags libpng16) -DHAVE_LIBPNG=1 $CPPFLAGS"
LIBS="$($PKGCONFIG --libs libpng16) -lz $LIBS"
PKGCONFIG_LIBS_PRIVATE="$($PKGCONFIG --libs libpng16) $PKGCONFIG_LIBS_PRIVATE"
PKGCONFIG_REQUIRES="libpng >= 1.6,$PKGCONFIG_REQUIRES"
else $as_nop
{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
printf "%s\n" "no" >&6; };
if test x$enable_libpng = xyes
then :
as_fn_error $? "libpng-dev 1.6 or later required for --enable-libpng." "$LINENO" 5
fi
fi
elif test x$enable_libpng = xyes
then :
as_fn_error $? "libpng-dev 1.6 or later required for --enable-libpng." "$LINENO" 5
fi
# Check whether --enable-static was given.
if test ${enable_static+y}
then :
@ -4935,7 +4988,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their
# values after options handling.
ac_log="
This file was extended by pdfio $as_me 1.4.1, which was
This file was extended by pdfio $as_me 1.5.0, which was
generated by GNU Autoconf 2.71. Invocation command line was
CONFIG_FILES = $CONFIG_FILES
@ -4991,7 +5044,7 @@ ac_cs_config_escaped=`printf "%s\n" "$ac_cs_config" | sed "s/^ //; s/'/'\\\\\\\\
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config='$ac_cs_config_escaped'
ac_cs_version="\\
pdfio config.status 1.4.1
pdfio config.status 1.5.0
configured by $0, generated by GNU Autoconf 2.71,
with options \\"\$ac_cs_config\\"

View File

@ -1,7 +1,7 @@
dnl
dnl Configuration script for PDFio
dnl
dnl Copyright © 2023-2024 by Michael R Sweet
dnl Copyright © 2023-2025 by Michael R Sweet
dnl
dnl Licensed under Apache License v2.0. See the file "LICENSE" for more
dnl information.
@ -21,7 +21,7 @@ AC_PREREQ([2.70])
dnl Package name and version...
AC_INIT([pdfio], [1.4.1], [https://github.com/michaelrsweet/pdfio/issues], [pdfio], [https://www.msweet.org/pdfio])
AC_INIT([pdfio], [1.5.0], [https://github.com/michaelrsweet/pdfio/issues], [pdfio], [https://www.msweet.org/pdfio])
PDFIO_VERSION="AC_PACKAGE_VERSION"
PDFIO_VERSION_MAJOR="`echo AC_PACKAGE_VERSION | awk -F. '{print $1}'`"
@ -121,6 +121,32 @@ AS_IF([$PKGCONFIG --exists zlib], [
])
dnl libpng...
AC_ARG_ENABLE([libpng], AS_HELP_STRING([--enable-libpng], [use libpng for pdfioFileCreateImageObjFromFile, default=auto]))
PKGCONFIG_LIBPNG=""
AC_SUBST([PKGCONFIG_LIBPNG])
AS_IF([test "x$PKGCONFIG" != x -a x$enable_libpng != xno], [
AC_MSG_CHECKING([for libpng-1.6.x])
AS_IF([$PKGCONFIG --exists libpng16], [
AC_MSG_RESULT([yes]);
AC_DEFINE([HAVE_LIBPNG], 1, [Have PNG library?])
CPPFLAGS="$($PKGCONFIG --cflags libpng16) -DHAVE_LIBPNG=1 $CPPFLAGS"
LIBS="$($PKGCONFIG --libs libpng16) -lz $LIBS"
PKGCONFIG_LIBS_PRIVATE="$($PKGCONFIG --libs libpng16) $PKGCONFIG_LIBS_PRIVATE"
PKGCONFIG_REQUIRES="libpng >= 1.6,$PKGCONFIG_REQUIRES"
], [
AC_MSG_RESULT([no]);
AS_IF([test x$enable_libpng = xyes], [
AC_MSG_ERROR([libpng-dev 1.6 or later required for --enable-libpng.])
])
])
], [test x$enable_libpng = xyes], [
AC_MSG_ERROR([libpng-dev 1.6 or later required for --enable-libpng.])
])
dnl Library target...
AC_ARG_ENABLE([static], AS_HELP_STRING([--disable-static], [do not install static library]))
AC_ARG_ENABLE([shared], AS_HELP_STRING([--enable-shared], [install shared library]))

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

After

Width:  |  Height:  |  Size: 20 KiB

View File

@ -1,4 +1,4 @@
.TH pdfio 3 "pdf read/write library" "2025-01-24" "pdf read/write library"
.TH pdfio 3 "pdf read/write library" "2025-03-06" "pdf read/write library"
.SH NAME
pdfio \- pdf read/write library
.SH Introduction
@ -34,7 +34,7 @@ PDFio is
.I not
concerned with rendering or viewing a PDF file, although a PDF RIP or viewer could be written using it.
.PP
PDFio is Copyright \[co] 2021\-2024 by Michael R Sweet and is licensed under the Apache License Version 2.0 with an (optional) exception to allow linking against GPL2/LGPL2 software. See the files "LICENSE" and "NOTICE" for more information.
PDFio is Copyright \[co] 2021\-2025 by Michael R Sweet and is licensed under the Apache License Version 2.0 with an (optional) exception to allow linking against GPL2/LGPL2 software. See the files "LICENSE" and "NOTICE" for more information.
.SS Requirements
.PP
PDFio requires the following to build the software:
@ -52,9 +52,11 @@ A POSIX\-compliant sh program
.IP \(bu 5
.PP
ZLIB (https://www.zlib.net) 1.0 or higher
ZLIB (https://www.zlib.net/) 1.0 or higher
.PP
PDFio will also use libpng 1.6 or higher (https://www.libpng.org/) to provide enhanced PNG image support.
.PP
IDE files for Xcode (macOS/iOS) and Visual Studio (Windows) are also provided.
.SS Installing PDFio
@ -1097,28 +1099,83 @@ The pdfioinfo.c example program opens a PDF file and prints the title, author, c
.fi
.SS Extract Text from PDF File
.PP
The pdf2text.c example code extracts non\-Unicode text from a PDF file by scanning each page for strings and text drawing commands. Since it doesn't look at the font encoding or support Unicode text, it is really only useful to extract plain ASCII text from a PDF file. And since it writes text in the order it appears in the page stream, it may not come out in the same order as appears on the page.
The pdf2text.c example code extracts text from a PDF file and writes it to the standard output. Unlike some other PDF tools, it outputs the text in the order it is seen in each page stream so the output might appear "jumbled" if the PDF producer doesn't output text in reading order. The code is able to handle different font encodings and produces UTF\-8 output.
.PP
The pdfioStreamGetToken function is used to read individual tokens from the page streams. Tokens starting with the open parenthesis are text strings, while PDF operators are left as\-is. We use some simple logic to make sure that we include spaces between text strings and add newlines for the text operators that start a new line in a text block:
The pdfioStreamGetToken function is used to read individual tokens from the page streams:
.nf
pdfio_stream_t *st; // Page stream
char buffer[1024], // Token buffer
*bufptr, // Pointer into buffer
name[256]; // Current (font) name
bool first = true; // First string on line?
char buffer[1024]; // Token buffer
int encoding[256]; // Font encoding to Unicode
bool in_array = false; // Are we in an array?
// Read PDF tokens from the page stream...
while (pdfioStreamGetToken(st, buffer, sizeof(buffer)))
{
if (buffer[0] == '(')
.fi
.PP
Justified text can be found inside arrays ("[ ... ]"), so we look for the array delimiter tokens and any (spacing) numbers inside an array. Experimentation has shown that numbers greater than 100 can be treated as whitespace:
.nf
if (!strcmp(buffer, "["))
{
// Start of an array for justified text...
in_array = true;
}
else if (!strcmp(buffer, "]"))
{
// End of an array for justified text...
in_array = false;
}
else if (!first && in_array && (isdigit(buffer[0]) || buffer[0] == '\-') && fabs(atof(buffer)) > 100)
{
// Whitespace in a justified text block...
putchar(' ');
}
.fi
.PP
Tokens starting with \'(' or \'<' are text fragments. 8\-bit text starting with \'(' needs to be mapped to Unicode using the current font encoding while hex strings starting with \'<' are UTF\-16 (Unicode) that need to be converted to UTF\-8:
.nf
else if (buffer[0] == '(')
{
// Text string using an 8\-bit encoding
if (first)
first = false;
else if (buffer[1] != ' ')
putchar(' ');
first = false;
fputs(buffer + 1, stdout);
for (bufptr = buffer + 1; *bufptr; bufptr ++)
put_utf8(encoding[*bufptr & 255]);
}
else if (buffer[0] == '<')
{
// Unicode text string
first = false;
puts_utf16(buffer + 1);
}
.fi
.PP
Simple (8\-bit) fonts include an encoding table that maps the 8\-bit characters to one of 1051 Unicode glyph names. Since each font can use a different encoding, we look for font names starting with \'/' and the "Tf" (set text font) operator token and load that font's encoding using the load_encoding function:
.nf
else if (buffer[0] == '/')
{
// Save name...
strncpy(name, buffer + 1, sizeof(name) \- 1);
name[sizeof(name) \- 1] = '\\0';
}
else if (!strcmp(buffer, "Tf") && name[0])
{
// Set font...
load_encoding(obj, name, encoding);
}
.fi
.PP
Finally, some text operators start a new line in a text block, so when we see their tokens we output a newline:
.nf
else if (!strcmp(buffer, "Td") || !strcmp(buffer, "TD") || !strcmp(buffer, "T*") ||
!strcmp(buffer, "\\'") || !strcmp(buffer, "\\""))
{
@ -1127,9 +1184,150 @@ The pdfioStreamGetToken function is used to read individual tokens from the page
first = true;
}
}
.fi
.PP
The load_encoding Function
.PP
The load_encoding function looks up the named font in the page's "Resources" dictionary. Every PDF simple font contains an "Encoding" dictionary with a base encoding ("WinANSI", "MacRoman", or "MacExpert") and a differences array that lists character indexes and glyph names for an 8\-bit font.
.PP
We start by initializing the encoding array to the default WinANSI encoding and looking up the font object for the named font:
.nf
static void
load_encoding(
pdfio_obj_t *page_obj, // I \- Page object
const char *name, // I \- Font name
int encoding[256]) // O \- Encoding table
{
size_t i, j; // Looping vars
pdfio_dict_t *page_dict, // Page dictionary
*resources_dict, // Resources dictionary
*font_dict; // Font dictionary
pdfio_obj_t *font_obj, // Font object
*encoding_obj; // Encoding object
static int win_ansi[32] = // WinANSI characters from 128 to 159
{
...
};
static int mac_roman[128] = // MacRoman characters from 128 to 255
{
...
};
if (!first)
putchar('\\n');
// Initialize the encoding to be the "standard" WinAnsi...
for (i = 0; i < 128; i ++)
encoding[i] = i;
for (i = 160; i < 256; i ++)
encoding[i] = i;
memcpy(encoding + 128, win_ansi, sizeof(win_ansi));
// Find the named font...
if ((page_dict = pdfioObjGetDict(page_obj)) == NULL)
return;
if ((resources_dict = pdfioDictGetDict(page_dict, "Resources")) == NULL)
return;
if ((font_dict = pdfioDictGetDict(resources_dict, "Font")) == NULL)
{
// Font resources not a dictionary, see if it is an object...
if ((font_obj = pdfioDictGetObj(resources_dict, "Font")) != NULL)
font_dict = pdfioObjGetDict(font_obj);
if (!font_dict)
return;
}
if ((font_obj = pdfioDictGetObj(font_dict, name)) == NULL)
return;
.fi
.PP
Once we have found the font we see if it has an "Encoding" dictionary:
.nf
pdfio_dict_t *encoding_dict; // Encoding dictionary
if ((encoding_obj = pdfioDictGetObj(pdfioObjGetDict(font_obj), "Encoding")) == NULL)
return;
if ((encoding_dict = pdfioObjGetDict(encoding_obj)) == NULL)
return;
.fi
.PP
Once we have the encoding dictionary we can get the "BaseEncoding" and "Differences" values:
.nf
const char *base_encoding; // BaseEncoding name
pdfio_array_t *differences; // Differences array
// OK, have the encoding object, build the encoding using it...
base_encoding = pdfioDictGetName(encoding_dict, "BaseEncoding");
differences = pdfioDictGetArray(encoding_dict, "Differences");
.fi
.PP
If the base encoding is "MacRomainEncoding", we need to reset the upper 128 characters in the encoding array match it:
.nf
if (base_encoding && !strcmp(base_encoding, "MacRomanEncoding"))
{
// Map upper 128
memcpy(encoding + 128, mac_roman, sizeof(mac_roman));
}
.fi
.PP
Then we loop through the differences array, keeping track of the current index within the encoding array. A number indicates a new index while a name is the Unicode glyph for the current index:
.nf
typedef struct name_map_s
{
const char *name; // Character name
int unicode; // Unicode value
} name_map_t;
static name_map_t unicode_map[1051]; // List of glyph names
if (differences)
{
// Apply differences
size_t count = pdfioArrayGetSize(differences);
// Number of differences
const char *name; // Character name
size_t idx = 0; // Index in encoding array
for (i = 0; i < count; i ++)
{
switch (pdfioArrayGetType(differences, i))
{
case PDFIO_VALTYPE_NUMBER :
// Get the index of the next character...
idx = (size_t)pdfioArrayGetNumber(differences, i);
break;
case PDFIO_VALTYPE_NAME :
// Lookup name and apply to encoding...
if (idx < 0 || idx > 255)
break;
name = pdfioArrayGetName(differences, i);
for (j = 0; j < (sizeof(unicode_map) / sizeof(unicode_map[0])); j ++)
{
if (!strcmp(name, unicode_map[j].name))
{
encoding[idx] = unicode_map[j].unicode;
break;
}
}
idx ++;
break;
default :
// Do nothing for other values
break;
}
}
}
}
.fi
.SS Create a PDF File With Text and an Image
.PP
@ -3561,7 +3759,8 @@ This function creates a new PDF file. The "filename" argument specifies the
name of the PDF file to create.
.PP
The "version" argument specifies the PDF version number for the file or
\fBNULL\fR for the default ("2.0").
\fBNULL\fR for the default ("2.0"). The value "PCLm-1.0" can be specified to
produce the PCLm subset of PDF.
.PP
The "media_box" and "crop_box" arguments specify the default MediaBox and
CropBox for pages in the PDF file - if \fBNULL\fR then a default "Universal" size
@ -3643,8 +3842,19 @@ This function embeds a TrueType/OpenType font into a PDF file. The
characters (potentially full Unicode, but more typically a subset)
or to only support the Windows CP1252 (ISO-8859-1 with additional
characters such as the Euro symbol) subset of Unicode.
.SS pdfioFileCreateICCObjFromData
Add ICC profile data to a PDF file.
.PP
.nf
pdfio_obj_t * pdfioFileCreateICCObjFromData (
pdfio_file_t *pdf,
const unsigned char *data,
size_t datalen,
size_t num_colors
);
.fi
.SS pdfioFileCreateICCObjFromFile
Add an ICC profile object to a PDF file.
Add an ICC profile file to a PDF file.
.PP
.nf
pdfio_obj_t * pdfioFileCreateICCObjFromFile (
@ -3767,7 +3977,9 @@ written:
.fi
The "version" argument specifies the PDF version number for the file or
\fBNULL\fR for the default ("2.0").
\fBNULL\fR for the default ("2.0"). Unlike \fIpdfioFileCreate\fR and
\fIpdfioFileCreateTemporary\fR, it is generally not safe to pass the
"PCLm-1.0" version string.
.PP
The "media_box" and "crop_box" arguments specify the default MediaBox and
CropBox for pages in the PDF file - if \fBNULL\fR then a default "Universal" size
@ -3880,6 +4092,14 @@ const char * pdfioFileGetKeywords (
pdfio_file_t *pdf
);
.fi
.SS pdfioFileGetModificationDate
Get the most recent modification date for a PDF file.
.PP
.nf
time_t pdfioFileGetModificationDate (
pdfio_file_t *pdf
);
.fi
.SS pdfioFileGetName
Get a PDF's filename.
.PP
@ -4027,6 +4247,15 @@ void pdfioFileSetKeywords (
const char *value
);
.fi
.SS pdfioFileSetModificationDate
Set the modification date for a PDF file.
.PP
.nf
void pdfioFileSetModificationDate (
pdfio_file_t *pdf,
time_t value
);
.fi
.SS pdfioFileSetPermissions
Set the PDF permissions, encryption mode, and passwords.
.PP
@ -4334,12 +4563,13 @@ bool pdfioStreamGetToken (
);
.fi
.PP
This function reads a single PDF token from a stream. Operator tokens,
boolean values, and numbers are returned as-is in the provided string buffer.
String values start with the opening parenthesis ('(') but have all escaping
resolved and the terminating parenthesis removed. Hexadecimal string values
start with the opening angle bracket ('<') and have all whitespace and the
terminating angle bracket removed.
This function reads a single PDF token from a stream, skipping all whitespace
and comments. Operator tokens, boolean values, and numbers are returned
as-is in the provided string buffer. String values start with the opening
parenthesis ('(') but have all escaping resolved and the terminating
parenthesis removed. Hexadecimal string values start with the opening angle
bracket ('<') and have all whitespace and the terminating angle bracket
removed.
.SS pdfioStreamPeek
Peek at data in a stream.
.PP

View File

@ -1,13 +1,13 @@
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>PDFio Programming Manual v1.4.1</title>
<title>PDFio Programming Manual v1.5.0</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<meta name="generator" content="codedoc v3.8">
<meta name="author" content="Michael R Sweet">
<meta name="language" content="en-US">
<meta name="copyright" content="Copyright © 2021-2025 by Michael R Sweet">
<meta name="version" content="1.4.1">
<meta name="version" content="1.5.0">
<style type="text/css"><!--
body {
background: white;
@ -251,7 +251,7 @@ span.string {
<body>
<div class="header">
<p><img class="title" src="pdfio-512.png"></p>
<h1 class="title">PDFio Programming Manual v1.4.1</h1>
<h1 class="title">PDFio Programming Manual v1.5.0</h1>
<p>Michael R Sweet</p>
<p>Copyright © 2021-2025 by Michael R Sweet</p>
</div>
@ -400,6 +400,7 @@ span.string {
<li><a href="#pdfioFileCreateArrayObj">pdfioFileCreateArrayObj</a></li>
<li><a href="#pdfioFileCreateFontObjFromBase">pdfioFileCreateFontObjFromBase</a></li>
<li><a href="#pdfioFileCreateFontObjFromFile">pdfioFileCreateFontObjFromFile</a></li>
<li><a href="#pdfioFileCreateICCObjFromData">pdfioFileCreateICCObjFromData</a></li>
<li><a href="#pdfioFileCreateICCObjFromFile">pdfioFileCreateICCObjFromFile</a></li>
<li><a href="#pdfioFileCreateImageObjFromData">pdfioFileCreateImageObjFromData</a></li>
<li><a href="#pdfioFileCreateImageObjFromFile">pdfioFileCreateImageObjFromFile</a></li>
@ -417,6 +418,7 @@ span.string {
<li><a href="#pdfioFileGetCreator">pdfioFileGetCreator</a></li>
<li><a href="#pdfioFileGetID">pdfioFileGetID</a></li>
<li><a href="#pdfioFileGetKeywords">pdfioFileGetKeywords</a></li>
<li><a href="#pdfioFileGetModificationDate">pdfioFileGetModificationDate</a></li>
<li><a href="#pdfioFileGetName">pdfioFileGetName</a></li>
<li><a href="#pdfioFileGetNumObjs">pdfioFileGetNumObjs</a></li>
<li><a href="#pdfioFileGetNumPages">pdfioFileGetNumPages</a></li>
@ -432,6 +434,7 @@ span.string {
<li><a href="#pdfioFileSetCreationDate">pdfioFileSetCreationDate</a></li>
<li><a href="#pdfioFileSetCreator">pdfioFileSetCreator</a></li>
<li><a href="#pdfioFileSetKeywords">pdfioFileSetKeywords</a></li>
<li><a href="#pdfioFileSetModificationDate">pdfioFileSetModificationDate</a></li>
<li><a href="#pdfioFileSetPermissions">pdfioFileSetPermissions</a></li>
<li><a href="#pdfioFileSetSubject">pdfioFileSetSubject</a></li>
<li><a href="#pdfioFileSetTitle">pdfioFileSetTitle</a></li>
@ -522,7 +525,7 @@ span.string {
</li>
</ul>
<p>PDFio is <em>not</em> concerned with rendering or viewing a PDF file, although a PDF RIP or viewer could be written using it.</p>
<p>PDFio is Copyright © 2021-2024 by Michael R Sweet and is licensed under the Apache License Version 2.0 with an (optional) exception to allow linking against GPL2/LGPL2 software. See the files &quot;LICENSE&quot; and &quot;NOTICE&quot; for more information.</p>
<p>PDFio is Copyright © 2021-2025 by Michael R Sweet and is licensed under the Apache License Version 2.0 with an (optional) exception to allow linking against GPL2/LGPL2 software. See the files &quot;LICENSE&quot; and &quot;NOTICE&quot; for more information.</p>
<h3 class="title" id="requirements">Requirements</h3>
<p>PDFio requires the following to build the software:</p>
<ul>
@ -532,9 +535,10 @@ span.string {
</li>
<li><p>A POSIX-compliant <code>sh</code> program</p>
</li>
<li><p>ZLIB (<a href="https://www.zlib.net">https://www.zlib.net</a>) 1.0 or higher</p>
<li><p>ZLIB (<a href="https://www.zlib.net/">https://www.zlib.net/</a>) 1.0 or higher</p>
</li>
</ul>
<p>PDFio will also use libpng 1.6 or higher (<a href="https://www.libpng.org/">https://www.libpng.org/</a>) to provide enhanced PNG image support.</p>
<p>IDE files for Xcode (macOS/iOS) and Visual Studio (Windows) are also provided.</p>
<h3 class="title" id="installing-pdfio">Installing PDFio</h3>
<p>PDFio comes with a configure script that creates a portable makefile that will work on any POSIX-compliant system with ZLIB installed. To make it, run:</p>
@ -1212,26 +1216,69 @@ main(<span class="reserved">int</span> argc, <span clas
}
</code></pre>
<h3 class="title" id="extract-text-from-pdf-file">Extract Text from PDF File</h3>
<p>The <code>pdf2text.c</code> example code extracts non-Unicode text from a PDF file by scanning each page for strings and text drawing commands. Since it doesn't look at the font encoding or support Unicode text, it is really only useful to extract plain ASCII text from a PDF file. And since it writes text in the order it appears in the page stream, it may not come out in the same order as appears on the page.</p>
<p>The <a href="#pdfioStreamGetToken"><code>pdfioStreamGetToken</code></a> function is used to read individual tokens from the page streams. Tokens starting with the open parenthesis are text strings, while PDF operators are left as-is. We use some simple logic to make sure that we include spaces between text strings and add newlines for the text operators that start a new line in a text block:</p>
<p>The <code>pdf2text.c</code> example code extracts text from a PDF file and writes it to the standard output. Unlike some other PDF tools, it outputs the text in the order it is seen in each page stream so the output might appear &quot;jumbled&quot; if the PDF producer doesn't output text in reading order. The code is able to handle different font encodings and produces UTF-8 output.</p>
<p>The <a href="#pdfioStreamGetToken"><code>pdfioStreamGetToken</code></a> function is used to read individual tokens from the page streams:</p>
<pre><code class="language-c">pdfio_stream_t *st; <span class="comment">// Page stream</span>
<span class="reserved">char</span> buffer[<span class="number">1024</span>], <span class="comment">// Token buffer</span>
*bufptr, <span class="comment">// Pointer into buffer</span>
name[<span class="number">256</span>]; <span class="comment">// Current (font) name</span>
<span class="reserved">bool</span> first = <span class="reserved">true</span>; <span class="comment">// First string on line?</span>
<span class="reserved">char</span> buffer[<span class="number">1024</span>]; <span class="comment">// Token buffer</span>
<span class="reserved">int</span> encoding[<span class="number">256</span>]; <span class="comment">// Font encoding to Unicode</span>
<span class="reserved">bool</span> in_array = <span class="reserved">false</span>; <span class="comment">// Are we in an array?</span>
<span class="comment">// Read PDF tokens from the page stream...</span>
<span class="reserved">while</span> (pdfioStreamGetToken(st, buffer, <span class="reserved">sizeof</span>(buffer)))
{
<span class="reserved">if</span> (buffer[<span class="number">0</span>] == <span class="string">'('</span>)
</code></pre>
<p>Justified text can be found inside arrays (&quot;[ ... ]&quot;), so we look for the array delimiter tokens and any (spacing) numbers inside an array. Experimentation has shown that numbers greater than 100 can be treated as whitespace:</p>
<pre><code class="language-c"> <span class="reserved">if</span> (!strcmp(buffer, <span class="string">&quot;[&quot;</span>))
{
<span class="comment">// Start of an array for justified text...</span>
in_array = <span class="reserved">true</span>;
}
<span class="reserved">else</span> <span class="reserved">if</span> (!strcmp(buffer, <span class="string">&quot;]&quot;</span>))
{
<span class="comment">// End of an array for justified text...</span>
in_array = <span class="reserved">false</span>;
}
<span class="reserved">else</span> <span class="reserved">if</span> (!first &amp;&amp; in_array &amp;&amp; (isdigit(buffer[<span class="number">0</span>]) || buffer[<span class="number">0</span>] == <span class="string">'-'</span>) &amp;&amp; fabs(atof(buffer)) &gt; <span class="number">100</span>)
{
<span class="comment">// Whitespace in a justified text block...</span>
putchar(<span class="string">' '</span>);
}
</code></pre>
<p>Tokens starting with '(' or '&lt;' are text fragments. 8-bit text starting with '(' needs to be mapped to Unicode using the current font encoding while hex strings starting with '&lt;' are UTF-16 (Unicode) that need to be converted to UTF-8:</p>
<pre><code class="language-c"> <span class="reserved">else</span> <span class="reserved">if</span> (buffer[<span class="number">0</span>] == <span class="string">'('</span>)
{
<span class="comment">// Text string using an 8-bit encoding</span>
<span class="reserved">if</span> (first)
first = <span class="reserved">false</span>;
<span class="reserved">else</span> <span class="reserved">if</span> (buffer[<span class="number">1</span>] != <span class="string">' '</span>)
putchar(<span class="string">' '</span>);
first = <span class="reserved">false</span>;
fputs(buffer + <span class="number">1</span>, stdout);
<span class="reserved">for</span> (bufptr = buffer + <span class="number">1</span>; *bufptr; bufptr ++)
put_utf8(encoding[*bufptr &amp; <span class="number">255</span>]);
}
<span class="reserved">else</span> <span class="reserved">if</span> (!strcmp(buffer, <span class="string">&quot;Td&quot;</span>) || !strcmp(buffer, <span class="string">&quot;TD&quot;</span>) || !strcmp(buffer, <span class="string">&quot;T*&quot;</span>) ||
<span class="reserved">else</span> <span class="reserved">if</span> (buffer[<span class="number">0</span>] == <span class="string">'&lt;'</span>)
{
<span class="comment">// Unicode text string</span>
first = <span class="reserved">false</span>;
puts_utf16(buffer + <span class="number">1</span>);
}
</code></pre>
<p>Simple (8-bit) fonts include an encoding table that maps the 8-bit characters to one of 1051 Unicode glyph names. Since each font can use a different encoding, we look for font names starting with '/' and the &quot;Tf&quot; (set text font) operator token and load that font's encoding using the <a href="#the-loadencoding-function">load_encoding</a> function:</p>
<pre><code class="language-c"> <span class="reserved">else</span> <span class="reserved">if</span> (buffer[<span class="number">0</span>] == <span class="string">'/'</span>)
{
<span class="comment">// Save name...</span>
strncpy(name, buffer + <span class="number">1</span>, <span class="reserved">sizeof</span>(name) - <span class="number">1</span>);
name[<span class="reserved">sizeof</span>(name) - <span class="number">1</span>] = <span class="string">'\0'</span>;
}
<span class="reserved">else</span> <span class="reserved">if</span> (!strcmp(buffer, <span class="string">&quot;Tf&quot;</span>) &amp;&amp; name[<span class="number">0</span>])
{
<span class="comment">// Set font...</span>
load_encoding(obj, name, encoding);
}
</code></pre>
<p>Finally, some text operators start a new line in a text block, so when we see their tokens we output a newline:</p>
<pre><code class="language-c"> <span class="reserved">else</span> <span class="reserved">if</span> (!strcmp(buffer, <span class="string">&quot;Td&quot;</span>) || !strcmp(buffer, <span class="string">&quot;TD&quot;</span>) || !strcmp(buffer, <span class="string">&quot;T*&quot;</span>) ||
!strcmp(buffer, <span class="string">&quot;\'&quot;</span>) || !strcmp(buffer, <span class="string">&quot;\&quot;&quot;</span>))
{
<span class="comment">// Text operators that advance to the next line in the block</span>
@ -1239,9 +1286,133 @@ main(<span class="reserved">int</span> argc, <span clas
first = <span class="reserved">true</span>;
}
}
</code></pre>
<h4 id="the-loadencoding-function">The <code>load_encoding</code> Function</h4>
<p>The <code>load_encoding</code> function looks up the named font in the page's &quot;Resources&quot; dictionary. Every PDF simple font contains an &quot;Encoding&quot; dictionary with a base encoding (&quot;WinANSI&quot;, &quot;MacRoman&quot;, or &quot;MacExpert&quot;) and a differences array that lists character indexes and glyph names for an 8-bit font.</p>
<p>We start by initializing the encoding array to the default WinANSI encoding and looking up the font object for the named font:</p>
<pre><code class="language-c"><span class="reserved">static</span> <span class="reserved">void</span>
load_encoding(
pdfio_obj_t *page_obj, <span class="comment">// I - Page object</span>
<span class="reserved">const</span> <span class="reserved">char</span> *name, <span class="comment">// I - Font name</span>
<span class="reserved">int</span> encoding[<span class="number">256</span>]) <span class="comment">// O - Encoding table</span>
{
size_t i, j; <span class="comment">// Looping vars</span>
pdfio_dict_t *page_dict, <span class="comment">// Page dictionary</span>
*resources_dict, <span class="comment">// Resources dictionary</span>
*font_dict; <span class="comment">// Font dictionary</span>
pdfio_obj_t *font_obj, <span class="comment">// Font object</span>
*encoding_obj; <span class="comment">// Encoding object</span>
<span class="reserved">static</span> <span class="reserved">int</span> win_ansi[<span class="number">32</span>] = <span class="comment">// WinANSI characters from 128 to 159</span>
{
...
};
<span class="reserved">static</span> <span class="reserved">int</span> mac_roman[<span class="number">128</span>] = <span class="comment">// MacRoman characters from 128 to 255</span>
{
...
};
<span class="reserved">if</span> (!first)
putchar(<span class="string">'\n'</span>);
<span class="comment">// Initialize the encoding to be the &quot;standard&quot; WinAnsi...</span>
<span class="reserved">for</span> (i = <span class="number">0</span>; i &lt; <span class="number">128</span>; i ++)
encoding[i] = i;
<span class="reserved">for</span> (i = <span class="number">160</span>; i &lt; <span class="number">256</span>; i ++)
encoding[i] = i;
memcpy(encoding + <span class="number">128</span>, win_ansi, <span class="reserved">sizeof</span>(win_ansi));
<span class="comment">// Find the named font...</span>
<span class="reserved">if</span> ((page_dict = pdfioObjGetDict(page_obj)) == NULL)
<span class="reserved">return</span>;
<span class="reserved">if</span> ((resources_dict = pdfioDictGetDict(page_dict, <span class="string">&quot;Resources&quot;</span>)) == NULL)
<span class="reserved">return</span>;
<span class="reserved">if</span> ((font_dict = pdfioDictGetDict(resources_dict, <span class="string">&quot;Font&quot;</span>)) == NULL)
{
<span class="comment">// Font resources not a dictionary, see if it is an object...</span>
<span class="reserved">if</span> ((font_obj = pdfioDictGetObj(resources_dict, <span class="string">&quot;Font&quot;</span>)) != NULL)
font_dict = pdfioObjGetDict(font_obj);
<span class="reserved">if</span> (!font_dict)
<span class="reserved">return</span>;
}
<span class="reserved">if</span> ((font_obj = pdfioDictGetObj(font_dict, name)) == NULL)
<span class="reserved">return</span>;
</code></pre>
<p>Once we have found the font we see if it has an &quot;Encoding&quot; dictionary:</p>
<pre><code class="language-c"> pdfio_dict_t *encoding_dict; <span class="comment">// Encoding dictionary</span>
<span class="reserved">if</span> ((encoding_obj = pdfioDictGetObj(pdfioObjGetDict(font_obj), <span class="string">&quot;Encoding&quot;</span>)) == NULL)
<span class="reserved">return</span>;
<span class="reserved">if</span> ((encoding_dict = pdfioObjGetDict(encoding_obj)) == NULL)
<span class="reserved">return</span>;
</code></pre>
<p>Once we have the encoding dictionary we can get the &quot;BaseEncoding&quot; and &quot;Differences&quot; values:</p>
<pre><code class="language-c"> <span class="reserved">const</span> <span class="reserved">char</span> *base_encoding; <span class="comment">// BaseEncoding name</span>
pdfio_array_t *differences; <span class="comment">// Differences array</span>
<span class="comment">// OK, have the encoding object, build the encoding using it...</span>
base_encoding = pdfioDictGetName(encoding_dict, <span class="string">&quot;BaseEncoding&quot;</span>);
differences = pdfioDictGetArray(encoding_dict, <span class="string">&quot;Differences&quot;</span>);
</code></pre>
<p>If the base encoding is &quot;MacRomainEncoding&quot;, we need to reset the upper 128 characters in the encoding array match it:</p>
<pre><code class="language-c"> <span class="reserved">if</span> (base_encoding &amp;&amp; !strcmp(base_encoding, <span class="string">&quot;MacRomanEncoding&quot;</span>))
{
<span class="comment">// Map upper 128</span>
memcpy(encoding + <span class="number">128</span>, mac_roman, <span class="reserved">sizeof</span>(mac_roman));
}
</code></pre>
<p>Then we loop through the differences array, keeping track of the current index within the encoding array. A number indicates a new index while a name is the Unicode glyph for the current index:</p>
<pre><code class="language-c"> <span class="reserved">typedef</span> <span class="reserved">struct</span> name_map_s
{
<span class="reserved">const</span> <span class="reserved">char</span> *name; <span class="comment">// Character name</span>
<span class="reserved">int</span> unicode; <span class="comment">// Unicode value</span>
} name_map_t;
<span class="reserved">static</span> name_map_t unicode_map[<span class="number">1051</span>]; <span class="comment">// List of glyph names</span>
<span class="reserved">if</span> (differences)
{
<span class="comment">// Apply differences</span>
size_t count = pdfioArrayGetSize(differences);
<span class="comment">// Number of differences</span>
<span class="reserved">const</span> <span class="reserved">char</span> *name; <span class="comment">// Character name</span>
size_t idx = <span class="number">0</span>; <span class="comment">// Index in encoding array</span>
<span class="reserved">for</span> (i = <span class="number">0</span>; i &lt; count; i ++)
{
<span class="reserved">switch</span> (pdfioArrayGetType(differences, i))
{
<span class="reserved">case</span> PDFIO_VALTYPE_NUMBER :
<span class="comment">// Get the index of the next character...</span>
idx = (size_t)pdfioArrayGetNumber(differences, i);
<span class="reserved">break</span>;
<span class="reserved">case</span> PDFIO_VALTYPE_NAME :
<span class="comment">// Lookup name and apply to encoding...</span>
<span class="reserved">if</span> (idx &lt; <span class="number">0</span> || idx &gt; <span class="number">255</span>)
<span class="reserved">break</span>;
name = pdfioArrayGetName(differences, i);
<span class="reserved">for</span> (j = <span class="number">0</span>; j &lt; (<span class="reserved">sizeof</span>(unicode_map) / <span class="reserved">sizeof</span>(unicode_map[<span class="number">0</span>])); j ++)
{
<span class="reserved">if</span> (!strcmp(name, unicode_map[j].name))
{
encoding[idx] = unicode_map[j].unicode;
<span class="reserved">break</span>;
}
}
idx ++;
<span class="reserved">break</span>;
<span class="reserved">default</span> :
<span class="comment">// Do nothing for other values</span>
<span class="reserved">break</span>;
}
}
}
}
</code></pre>
<h3 class="title" id="create-a-pdf-file-with-text-and-an-image">Create a PDF File With Text and an Image</h3>
<p>The <code>image2pdf.c</code> example code creates a PDF file containing a JPEG or PNG image file and optional caption on a single page. The <code>create_pdf_image_file</code> function creates the PDF file, embeds a base font and the named JPEG or PNG image file, and then creates a page with the image centered on the page with any text centered below:</p>
@ -3812,7 +3983,8 @@ have been iterated.</p>
name of the PDF file to create.<br>
<br>
The &quot;version&quot; argument specifies the PDF version number for the file or
<code>NULL</code> for the default (&quot;2.0&quot;).<br>
<code>NULL</code> for the default (&quot;2.0&quot;). The value &quot;PCLm-1.0&quot; can be specified to
produce the PCLm subset of PDF.<br>
<br>
The &quot;media_box&quot; and &quot;crop_box&quot; arguments specify the default MediaBox and
CropBox for pages in the PDF file - if <code>NULL</code> then a default &quot;Universal&quot; size
@ -3907,8 +4079,25 @@ Unicode.</p>
characters (potentially full Unicode, but more typically a subset)
or to only support the Windows CP1252 (ISO-8859-1 with additional
characters such as the Euro symbol) subset of Unicode.</p>
<h3 class="function"><a id="pdfioFileCreateICCObjFromData">pdfioFileCreateICCObjFromData</a></h3>
<p class="description">Add ICC profile data to a PDF file.</p>
<p class="code">
<a href="#pdfio_obj_t">pdfio_obj_t</a> *pdfioFileCreateICCObjFromData(<a href="#pdfio_file_t">pdfio_file_t</a> *pdf, <span class="reserved">const</span> <span class="reserved">unsigned</span> <span class="reserved">char</span> *data, size_t datalen, size_t num_colors);</p>
<h4 class="parameters">Parameters</h4>
<table class="list"><tbody>
<tr><th>pdf</th>
<td class="description">PDF file</td></tr>
<tr><th>data</th>
<td class="description">ICC profile buffer</td></tr>
<tr><th>datalen</th>
<td class="description">Length of ICC profile</td></tr>
<tr><th>num_colors</th>
<td class="description">Number of color components (1, 3, or 4)</td></tr>
</tbody></table>
<h4 class="returnvalue">Return Value</h4>
<p class="description">Object</p>
<h3 class="function"><a id="pdfioFileCreateICCObjFromFile">pdfioFileCreateICCObjFromFile</a></h3>
<p class="description">Add an ICC profile object to a PDF file.</p>
<p class="description">Add an ICC profile file to a PDF file.</p>
<p class="code">
<a href="#pdfio_obj_t">pdfio_obj_t</a> *pdfioFileCreateICCObjFromFile(<a href="#pdfio_file_t">pdfio_file_t</a> *pdf, <span class="reserved">const</span> <span class="reserved">char</span> *filename, size_t num_colors);</p>
<h4 class="parameters">Parameters</h4>
@ -4069,7 +4258,9 @@ output_cb(void *output_cbdata, const void *buffer, size_t bytes)
</pre>
The &quot;version&quot; argument specifies the PDF version number for the file or
<code>NULL</code> for the default (&quot;2.0&quot;).<br>
<code>NULL</code> for the default (&quot;2.0&quot;). Unlike <a href="#pdfioFileCreate"><code>pdfioFileCreate</code></a> and
<a href="#pdfioFileCreateTemporary"><code>pdfioFileCreateTemporary</code></a>, it is generally not safe to pass the
&quot;PCLm-1.0&quot; version string.<br>
<br>
The &quot;media_box&quot; and &quot;crop_box&quot; arguments specify the default MediaBox and
CropBox for pages in the PDF file - if <code>NULL</code> then a default &quot;Universal&quot; size
@ -4137,8 +4328,19 @@ You must call <a href="#pdfioObjClose"><code>pdfioObjClose</code></a> to write t
<p class="description">Create a temporary PDF file.</p>
<p class="discussion">This function creates a PDF file with a unique filename in the current
temporary directory. The temporary file is stored in the string &quot;buffer&quot; an
will have a &quot;.pdf&quot; extension. Otherwise, this function works the same as
the <a href="#pdfioFileCreate"><code>pdfioFileCreate</code></a> function.
will have a &quot;.pdf&quot; extension.<br>
<br>
The &quot;version&quot; argument specifies the PDF version number for the file or
<code>NULL</code> for the default (&quot;2.0&quot;). The value &quot;PCLm-1.0&quot; can be specified to
produce the PCLm subset of PDF.<br>
<br>
The &quot;media_box&quot; and &quot;crop_box&quot; arguments specify the default MediaBox and
CropBox for pages in the PDF file - if <code>NULL</code> then a default &quot;Universal&quot; size
of 8.27x11in (the intersection of US Letter and ISO A4) is used.<br>
<br>
The &quot;error_cb&quot; and &quot;error_cbdata&quot; arguments specify an error handler callback
and its data pointer - if <code>NULL</code> the default error handler is used that
writes error messages to <code>stderr</code>.
</p>
<h3 class="function"><a id="pdfioFileFindObj">pdfioFileFindObj</a></h3>
@ -4223,6 +4425,17 @@ time_t pdfioFileGetCreationDate(<a href="#pdfio_file_t">pdfio_file_t</a> *pdf);<
</tbody></table>
<h4 class="returnvalue">Return Value</h4>
<p class="description">Keywords string or <code>NULL</code> for none</p>
<h3 class="function"><a id="pdfioFileGetModificationDate">pdfioFileGetModificationDate</a></h3>
<p class="description">Get the most recent modification date for a PDF file.</p>
<p class="code">
time_t pdfioFileGetModificationDate(<a href="#pdfio_file_t">pdfio_file_t</a> *pdf);</p>
<h4 class="parameters">Parameters</h4>
<table class="list"><tbody>
<tr><th>pdf</th>
<td class="description">PDF file</td></tr>
</tbody></table>
<h4 class="returnvalue">Return Value</h4>
<p class="description">Modification date or <code>0</code> for none</p>
<h3 class="function"><a id="pdfioFileGetName">pdfioFileGetName</a></h3>
<p class="description">Get a PDF's filename.</p>
<p class="code">
@ -4418,6 +4631,17 @@ writes error messages to <code>stderr</code>.</p>
<tr><th>value</th>
<td class="description">Value</td></tr>
</tbody></table>
<h3 class="function"><a id="pdfioFileSetModificationDate">pdfioFileSetModificationDate</a></h3>
<p class="description">Set the modification date for a PDF file.</p>
<p class="code">
<span class="reserved">void</span> pdfioFileSetModificationDate(<a href="#pdfio_file_t">pdfio_file_t</a> *pdf, time_t value);</p>
<h4 class="parameters">Parameters</h4>
<table class="list"><tbody>
<tr><th>pdf</th>
<td class="description">PDF file</td></tr>
<tr><th>value</th>
<td class="description">Value</td></tr>
</tbody></table>
<h3 class="function"><a id="pdfioFileSetPermissions">pdfioFileSetPermissions</a></h3>
<p class="description">Set the PDF permissions, encryption mode, and passwords.</p>
<p class="code">
@ -4818,14 +5042,15 @@ size_t pdfioPageGetNumStreams(<a href="#pdfio_obj_t">pdfio_obj_t</a> *page);</p>
<td class="description">Size of string buffer</td></tr>
</tbody></table>
<h4 class="returnvalue">Return Value</h4>
<p class="description"><code>true</code> on success, <code>false</code> on EOF</p>
<p class="description"><code>true</code> on success, <code>false</code> on end-of-stream or error</p>
<h4 class="discussion">Discussion</h4>
<p class="discussion">This function reads a single PDF token from a stream. Operator tokens,
boolean values, and numbers are returned as-is in the provided string buffer.
String values start with the opening parenthesis ('(') but have all escaping
resolved and the terminating parenthesis removed. Hexadecimal string values
start with the opening angle bracket ('&lt;') and have all whitespace and the
terminating angle bracket removed.</p>
<p class="discussion">This function reads a single PDF token from a stream, skipping all whitespace
and comments. Operator tokens, boolean values, and numbers are returned
as-is in the provided string buffer. String values start with the opening
parenthesis ('(') but have all escaping resolved and the terminating
parenthesis removed. Hexadecimal string values start with the opening angle
bracket ('&lt;') and have all whitespace and the terminating angle bracket
removed.</p>
<h3 class="function"><a id="pdfioStreamPeek">pdfioStreamPeek</a></h3>
<p class="description">Peek at data in a stream.</p>
<p class="code">

View File

@ -15,7 +15,7 @@ goals of PDFio are:
PDFio is *not* concerned with rendering or viewing a PDF file, although a PDF
RIP or viewer could be written using it.
PDFio is Copyright © 2021-2024 by Michael R Sweet and is licensed under the
PDFio is Copyright © 2021-2025 by Michael R Sweet and is licensed under the
Apache License Version 2.0 with an (optional) exception to allow linking against
GPL2/LGPL2 software. See the files "LICENSE" and "NOTICE" for more information.
@ -28,7 +28,10 @@ PDFio requires the following to build the software:
- A C99 compiler such as Clang, GCC, or MS Visual C
- A POSIX-compliant `make` program
- A POSIX-compliant `sh` program
- ZLIB (<https://www.zlib.net>) 1.0 or higher
- ZLIB (<https://www.zlib.net/>) 1.0 or higher
PDFio will also use libpng 1.6 or higher (<https://www.libpng.org/>) to provide
enhanced PNG image support.
IDE files for Xcode (macOS/iOS) and Visual Studio (Windows) are also provided.
@ -941,37 +944,98 @@ main(int argc, // I - Number of command-line arguments
Extract Text from PDF File
--------------------------
The `pdf2text.c` example code extracts non-Unicode text from a PDF file by
scanning each page for strings and text drawing commands. Since it doesn't
look at the font encoding or support Unicode text, it is really only useful to
extract plain ASCII text from a PDF file. And since it writes text in the order
it appears in the page stream, it may not come out in the same order as appears
on the page.
The `pdf2text.c` example code extracts text from a PDF file and writes it to the
standard output. Unlike some other PDF tools, it outputs the text in the order
it is seen in each page stream so the output might appear "jumbled" if the PDF
producer doesn't output text in reading order. The code is able to handle
different font encodings and produces UTF-8 output.
The [`pdfioStreamGetToken`](@@) function is used to read individual tokens from
the page streams. Tokens starting with the open parenthesis are text strings,
while PDF operators are left as-is. We use some simple logic to make sure that
we include spaces between text strings and add newlines for the text operators
that start a new line in a text block:
the page streams:
```c
pdfio_stream_t *st; // Page stream
char buffer[1024], // Token buffer
*bufptr, // Pointer into buffer
name[256]; // Current (font) name
bool first = true; // First string on line?
char buffer[1024]; // Token buffer
int encoding[256]; // Font encoding to Unicode
bool in_array = false; // Are we in an array?
// Read PDF tokens from the page stream...
while (pdfioStreamGetToken(st, buffer, sizeof(buffer)))
{
if (buffer[0] == '(')
```
Justified text can be found inside arrays ("[ ... ]"), so we look for the array
delimiter tokens and any (spacing) numbers inside an array. Experimentation has
shown that numbers greater than 100 can be treated as whitespace:
```c
if (!strcmp(buffer, "["))
{
// Start of an array for justified text...
in_array = true;
}
else if (!strcmp(buffer, "]"))
{
// End of an array for justified text...
in_array = false;
}
else if (!first && in_array && (isdigit(buffer[0]) || buffer[0] == '-') && fabs(atof(buffer)) > 100)
{
// Whitespace in a justified text block...
putchar(' ');
}
```
Tokens starting with '(' or '<' are text fragments. 8-bit text starting with
'(' needs to be mapped to Unicode using the current font encoding while hex
strings starting with '<' are UTF-16 (Unicode) that need to be converted to
UTF-8:
```c
else if (buffer[0] == '(')
{
// Text string using an 8-bit encoding
if (first)
first = false;
else if (buffer[1] != ' ')
putchar(' ');
first = false;
fputs(buffer + 1, stdout);
for (bufptr = buffer + 1; *bufptr; bufptr ++)
put_utf8(encoding[*bufptr & 255]);
}
else if (buffer[0] == '<')
{
// Unicode text string
first = false;
puts_utf16(buffer + 1);
}
```
Simple (8-bit) fonts include an encoding table that maps the 8-bit characters to
one of 1051 Unicode glyph names. Since each font can use a different encoding,
we look for font names starting with '/' and the "Tf" (set text font) operator
token and load that font's encoding using the
[load_encoding](#the-loadencoding-function) function:
```c
else if (buffer[0] == '/')
{
// Save name...
strncpy(name, buffer + 1, sizeof(name) - 1);
name[sizeof(name) - 1] = '\0';
}
else if (!strcmp(buffer, "Tf") && name[0])
{
// Set font...
load_encoding(obj, name, encoding);
}
```
Finally, some text operators start a new line in a text block, so when we see
their tokens we output a newline:
```c
else if (!strcmp(buffer, "Td") || !strcmp(buffer, "TD") || !strcmp(buffer, "T*") ||
!strcmp(buffer, "\'") || !strcmp(buffer, "\""))
{
@ -980,9 +1044,160 @@ while (pdfioStreamGetToken(st, buffer, sizeof(buffer)))
first = true;
}
}
```
if (!first)
putchar('\n');
### The `load_encoding` Function
The `load_encoding` function looks up the named font in the page's "Resources"
dictionary. Every PDF simple font contains an "Encoding" dictionary with a base
encoding ("WinANSI", "MacRoman", or "MacExpert") and a differences array that
lists character indexes and glyph names for an 8-bit font.
We start by initializing the encoding array to the default WinANSI encoding and
looking up the font object for the named font:
```c
static void
load_encoding(
pdfio_obj_t *page_obj, // I - Page object
const char *name, // I - Font name
int encoding[256]) // O - Encoding table
{
size_t i, j; // Looping vars
pdfio_dict_t *page_dict, // Page dictionary
*resources_dict, // Resources dictionary
*font_dict; // Font dictionary
pdfio_obj_t *font_obj, // Font object
*encoding_obj; // Encoding object
static int win_ansi[32] = // WinANSI characters from 128 to 159
{
...
};
static int mac_roman[128] = // MacRoman characters from 128 to 255
{
...
};
// Initialize the encoding to be the "standard" WinAnsi...
for (i = 0; i < 128; i ++)
encoding[i] = i;
for (i = 160; i < 256; i ++)
encoding[i] = i;
memcpy(encoding + 128, win_ansi, sizeof(win_ansi));
// Find the named font...
if ((page_dict = pdfioObjGetDict(page_obj)) == NULL)
return;
if ((resources_dict = pdfioDictGetDict(page_dict, "Resources")) == NULL)
return;
if ((font_dict = pdfioDictGetDict(resources_dict, "Font")) == NULL)
{
// Font resources not a dictionary, see if it is an object...
if ((font_obj = pdfioDictGetObj(resources_dict, "Font")) != NULL)
font_dict = pdfioObjGetDict(font_obj);
if (!font_dict)
return;
}
if ((font_obj = pdfioDictGetObj(font_dict, name)) == NULL)
return;
```
Once we have found the font we see if it has an "Encoding" dictionary:
```c
pdfio_dict_t *encoding_dict; // Encoding dictionary
if ((encoding_obj = pdfioDictGetObj(pdfioObjGetDict(font_obj), "Encoding")) == NULL)
return;
if ((encoding_dict = pdfioObjGetDict(encoding_obj)) == NULL)
return;
```
Once we have the encoding dictionary we can get the "BaseEncoding" and
"Differences" values:
```c
const char *base_encoding; // BaseEncoding name
pdfio_array_t *differences; // Differences array
// OK, have the encoding object, build the encoding using it...
base_encoding = pdfioDictGetName(encoding_dict, "BaseEncoding");
differences = pdfioDictGetArray(encoding_dict, "Differences");
```
If the base encoding is "MacRomainEncoding", we need to reset the upper 128
characters in the encoding array match it:
```c
if (base_encoding && !strcmp(base_encoding, "MacRomanEncoding"))
{
// Map upper 128
memcpy(encoding + 128, mac_roman, sizeof(mac_roman));
}
```
Then we loop through the differences array, keeping track of the current index
within the encoding array. A number indicates a new index while a name is the
Unicode glyph for the current index:
```c
typedef struct name_map_s
{
const char *name; // Character name
int unicode; // Unicode value
} name_map_t;
static name_map_t unicode_map[1051]; // List of glyph names
if (differences)
{
// Apply differences
size_t count = pdfioArrayGetSize(differences);
// Number of differences
const char *name; // Character name
size_t idx = 0; // Index in encoding array
for (i = 0; i < count; i ++)
{
switch (pdfioArrayGetType(differences, i))
{
case PDFIO_VALTYPE_NUMBER :
// Get the index of the next character...
idx = (size_t)pdfioArrayGetNumber(differences, i);
break;
case PDFIO_VALTYPE_NAME :
// Lookup name and apply to encoding...
if (idx < 0 || idx > 255)
break;
name = pdfioArrayGetName(differences, i);
for (j = 0; j < (sizeof(unicode_map) / sizeof(unicode_map[0])); j ++)
{
if (!strcmp(name, unicode_map[j].name))
{
encoding[idx] = unicode_map[j].unicode;
break;
}
}
idx ++;
break;
default :
// Do nothing for other values
break;
}
}
}
}
```

View File

@ -14,8 +14,8 @@
# Common options
CFLAGS = -g $(CPPFLAGS)
#CFLAGS = -g -fsanitize=address $(CPPFLAGS)
CPPFLAGS = -I.. -I/usr/local/include
LIBS = -L.. -L/usr/local/lib -lpdfio -lz -lm
CPPFLAGS = -I.. $(shell PKG_CONFIG_PATH="..:$(PKG_CONFIG_PATH)" pkg-config pdfio --cflags)
LIBS = -L.. $(shell PKG_CONFIG_PATH="..:$(PKG_CONFIG_PATH)" pkg-config pdfio --libs)
# Targets

View File

@ -1,7 +1,7 @@
//
// Image example for PDFio.
//
// Copyright © 2023-2024 by Michael R Sweet.
// Copyright © 2023-2025 by Michael R Sweet.
//
// Licensed under Apache License v2.0. See the file "LICENSE" for more
// information.
@ -22,8 +22,8 @@
bool // O - True on success, false on failure
create_pdf_image_file(
const char *pdfname, // I - PDF filename
const char *imagename, // I - Image filename
const char *pdfname, // I - PDF filename
const char *caption) // I - Caption filename
{
pdfio_file_t *pdf; // PDF file
@ -36,6 +36,15 @@ create_pdf_image_file(
double tx, ty; // Position on page
// Default the caption...
if (!caption)
{
if ((caption = strrchr(imagename, '/')) != NULL)
caption ++;
else
caption = imagename;
}
// Create the PDF file...
pdf = pdfioFileCreate(pdfname, /*version*/NULL, /*media_box*/NULL,
/*crop_box*/NULL, /*error_cb*/NULL,

File diff suppressed because it is too large Load Diff

View File

@ -13,6 +13,7 @@
#include <pdfio.h>
#include <time.h>
#include <math.h>
//
@ -25,11 +26,26 @@ main(int argc, // I - Number of command-line arguments
{
const char *filename; // PDF filename
pdfio_file_t *pdf; // PDF file
const char *author; // Author name
time_t creation_date; // Creation date
struct tm *creation_tm; // Creation date/time information
char creation_text[256]; // Creation date/time as a string
const char *title; // Title
pdfio_dict_t *catalog; // Catalog dictionary
const char *author, // Author name
*creator, // Creator name
*producer, // Producer name
*title; // Title
time_t creation_date, // Creation date
modification_date; // Modification date
struct tm *creation_tm, // Creation date/time information
*modification_tm; // Modification date/time information
char creation_text[256], // Creation date/time as a string
modification_text[256], // Modification date/time human fmt string
range_text[255]; // Page range text
size_t num_pages; // PDF number of pages
bool has_acroform; // Does the file have an AcroForm?
pdfio_obj_t *page; // Object
pdfio_dict_t *page_dict; // Object dictionary
size_t cur, // Current page index
prev; // Previous page index
pdfio_rect_t cur_box, // Current MediaBox
prev_box; // Previous MediaBox
// Get the filename from the command-line...
@ -48,9 +64,14 @@ main(int argc, // I - Number of command-line arguments
if (pdf == NULL)
return (1);
// Get the title and author...
author = pdfioFileGetAuthor(pdf);
title = pdfioFileGetTitle(pdf);
// Get the title, author, etc...
catalog = pdfioFileGetCatalog(pdf);
author = pdfioFileGetAuthor(pdf);
creator = pdfioFileGetCreator(pdf);
has_acroform = pdfioDictGetObj(catalog, "AcroForm") != NULL ? true : false;
num_pages = pdfioFileGetNumPages(pdf);
producer = pdfioFileGetProducer(pdf);
title = pdfioFileGetTitle(pdf);
// Get the creation date and convert to a string...
if ((creation_date = pdfioFileGetCreationDate(pdf)) > 0)
@ -63,12 +84,76 @@ main(int argc, // I - Number of command-line arguments
snprintf(creation_text, sizeof(creation_text), "-- not set --");
}
// Get the modification date and convert to a string...
if ((modification_date = pdfioFileGetModificationDate(pdf)) > 0)
{
modification_tm = localtime(&modification_date);
strftime(modification_text, sizeof(modification_text), "%c", modification_tm);
}
else
{
snprintf(modification_text, sizeof(modification_text), "-- not set --");
}
// Print file information to stdout...
printf("%s:\n", filename);
printf(" Title: %s\n", title ? title : "-- not set --");
printf(" Author: %s\n", author ? author : "-- not set --");
printf(" Created On: %s\n", creation_text);
printf(" Number Pages: %u\n", (unsigned)pdfioFileGetNumPages(pdf));
printf(" Title: %s\n", title ? title : "-- not set --");
printf(" Author: %s\n", author ? author : "-- not set --");
printf(" Creator: %s\n", creator ? creator : "-- not set --");
printf(" Producer: %s\n", producer ? producer : "-- not set --");
printf(" Created On: %s\n", creation_text);
printf(" Modified On: %s\n", modification_text);
printf(" Version: %s\n", pdfioFileGetVersion(pdf));
printf(" AcroForm: %s\n", has_acroform ? "Yes" : "No");
printf(" Number of Pages: %u\n", (unsigned)num_pages);
// Report the MediaBox for all of the pages
prev_box.x1 = prev_box.x2 = prev_box.y1 = prev_box.y2 = 0.0;
for (cur = 0, prev = 0; cur < num_pages; cur ++)
{
// Find the MediaBox for this page in the page tree...
for (page = pdfioFileGetPage(pdf, cur);
page != NULL;
page = pdfioDictGetObj(page_dict, "Parent"))
{
cur_box.x1 = cur_box.x2 = cur_box.y1 = cur_box.y2 = 0.0;
page_dict = pdfioObjGetDict(page);
if (pdfioDictGetRect(page_dict, "MediaBox", &cur_box))
break;
}
// If this MediaBox is different from the previous one, show the range of
// pages that have that size...
if (cur == 0 ||
fabs(cur_box.x1 - prev_box.x1) > 0.01 ||
fabs(cur_box.y1 - prev_box.y1) > 0.01 ||
fabs(cur_box.x2 - prev_box.x2) > 0.01 ||
fabs(cur_box.y2 - prev_box.y2) > 0.01)
{
if (cur > prev)
{
snprintf(range_text, sizeof(range_text), "Pages %u-%u",
(unsigned)(prev + 1), (unsigned)cur);
printf("%16s: [%g %g %g %g]\n", range_text,
prev_box.x1, prev_box.y1, prev_box.x2, prev_box.y2);
}
// Start a new series of pages with the new size...
prev = cur;
prev_box = cur_box;
}
}
// Show the last range as needed...
if (cur > prev)
{
snprintf(range_text, sizeof(range_text), "Pages %u-%u",
(unsigned)(prev + 1), (unsigned)cur);
printf("%16s: [%g %g %g %g]\n", range_text,
prev_box.x1, prev_box.y1, prev_box.x2, prev_box.y2);
}
// Close the PDF file...
pdfioFileClose(pdf);

View File

@ -21,40 +21,60 @@ if test $# != 1; then
exit 1
fi
status=0
version=$1
version_major=$(echo $1 | awk -F. '{print $1}')
version_minor=$(echo $1 | awk -F. '{print $2}')
# Check that version number has been updated everywhere...
if test $(grep AC_INIT configure.ac | awk '{print $2}') != "[$version],"; then
echo "Still need to update AC_INIT version in 'configure.ac'."
exit 1
status=1
fi
if test $(head -4 CHANGES.md | tail -1 | awk '{print $1}') != "v$version"; then
echo "Still need to update CHANGES.md version number."
exit 1
status=1
fi
if test $(head -4 CHANGES.md | tail -1 | awk '{print $3}') = "YYYY-MM-DD"; then
echo "Still need to update CHANGES.md release date."
exit 1
status=1
fi
if test $(grep PDFIO_VERSION= configure | awk -F \" '{print $2}') != "$version"; then
echo "Still need to run 'autoconf -f'."
exit 1
status=1
fi
if test $(grep '<version>' pdfio_native.nuspec | sed -E -e '1,$s/^.*<version>([0-9.]+).*$/\1/') != "$version"; then
echo "Still need to update version in 'pdfio_native.nuspec'."
exit 1
status=1
fi
if test $(grep '<version>' pdfio_native.redist.nuspec | sed -E -e '1,$s/^.*<version>([0-9.]+).*$/\1/') != "$version"; then
echo "Still need to update version in 'pdfio_native.redist.nuspec'."
exit 1
status=1
fi
if test $(grep PDFIO_VERSION pdfio.h | awk -F \" '{print $2}') != "$version"; then
echo "Still need to update PDFIO_VERSION in 'pdfio.h'."
status=1
fi
if test $(grep PDFIO_VERSION_MAJOR pdfio.h | awk '{print $4}') != "$version_major"; then
echo "Still need to update PDFIO_VERSION_MAJOR in 'pdfio.h'."
status=1
fi
if test $(grep PDFIO_VERSION_MINOR pdfio.h | awk '{print $4}') != "$version_minor"; then
echo "Still need to update PDFIO_VERSION_MINOR in 'pdfio.h'."
status=1
fi
if test $(grep VERSION pdfio1.def | awk '{print $2}') != "$version_major.$version_minor"; then
echo "Still need to update VERSION in 'pdfio1.def'."
status=1
fi
if test $status = 1; then
exit 1
fi

View File

@ -1,5 +1,7 @@
<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="libpng_native" version="1.6.30" targetFramework="native" />
<package id="libpng_native.redist" version="1.6.30" targetFramework="native" />
<package id="zlib_native" version="1.2.11" targetFramework="native" />
<package id="zlib_native.redist" version="1.2.11" targetFramework="native" />
</packages>

View File

@ -1,7 +1,7 @@
//
// AES functions for PDFio.
//
// Copyright © 2021 by Michael R Sweet.
// Copyright © 2021-2025 by Michael R Sweet.
//
// Licensed under Apache License v2.0. See the file "LICENSE" for more
// information.
@ -76,18 +76,18 @@ static const uint8_t Rcon[11] = // Round constants
// Local functions...
//
static void AddRoundKey(size_t round, state_t *state, const uint8_t *RoundKey);
static void SubBytes(state_t *state);
static void ShiftRows(state_t *state);
static void add_round_key(size_t round, state_t *state, const uint8_t *round_key);
static void sub_bytes(state_t *state);
static void shift_rows(state_t *state);
static uint8_t xtime(uint8_t x);
static void MixColumns(state_t *state);
static uint8_t Multiply(uint8_t x, uint8_t y);
static void InvMixColumns(state_t *state);
static void InvSubBytes(state_t *state);
static void InvShiftRows(state_t *state);
static void Cipher(state_t *state, const _pdfio_aes_t *ctx);
static void InvCipher(state_t *state, const _pdfio_aes_t *ctx);
static void XorWithIv(uint8_t *buf, const uint8_t *Iv);
static void mix_columns(state_t *state);
static uint8_t multiply(uint8_t x, uint8_t y);
static void inv_mix_columns(state_t *state);
static void inv_sub_bytes(state_t *state);
static void inv_shift_rows(state_t *state);
static void cipher(state_t *state, const _pdfio_aes_t *ctx);
static void inv_cipher(state_t *state, const _pdfio_aes_t *ctx);
static void xor_with_iv(uint8_t *buf, const uint8_t *Iv);
//
@ -106,7 +106,6 @@ _pdfioCryptoAESInit(
*rkptr, // Current round_key values
*rkend, // End of round_key values
tempa[4]; // Used for the column/row operations
// size_t roundlen = keylen + 24; // Length of round_key
size_t nwords = keylen / 4; // Number of 32-bit words in key
@ -188,8 +187,8 @@ _pdfioCryptoAESDecrypt(
while (len > 15)
{
memcpy(next_iv, outbuffer, 16);
InvCipher((state_t *)outbuffer, ctx);
XorWithIv(outbuffer, ctx->iv);
inv_cipher((state_t *)outbuffer, ctx);
xor_with_iv(outbuffer, ctx->iv);
memcpy(ctx->iv, next_iv, 16);
outbuffer += 16;
len -= 16;
@ -231,8 +230,8 @@ _pdfioCryptoAESEncrypt(
while (len > 15)
{
XorWithIv(outbuffer, iv);
Cipher((state_t*)outbuffer, ctx);
xor_with_iv(outbuffer, iv);
cipher((state_t*)outbuffer, ctx);
iv = outbuffer;
outbuffer += 16;
len -= 16;
@ -242,10 +241,10 @@ _pdfioCryptoAESEncrypt(
if (len > 0)
{
// Pad the final buffer with (16 - len)...
memset(outbuffer + len, 16 - len, 16 - len);
memset(outbuffer + len, (int)(16 - len), 16 - len);
XorWithIv(outbuffer, iv);
Cipher((state_t*)outbuffer, ctx);
xor_with_iv(outbuffer, iv);
cipher((state_t*)outbuffer, ctx);
iv = outbuffer;
outbytes += 16;
}
@ -257,24 +256,32 @@ _pdfioCryptoAESEncrypt(
}
// This function adds the round key to state.
//
// 'add_round_key()' - Add the round key to state.
//
// The round key is added to the state by an XOR function.
//
static void
AddRoundKey(size_t round, state_t *state, const uint8_t *RoundKey)
add_round_key(size_t round, // I - Which round
state_t *state, // I - Current state
const uint8_t *round_key) // I - Key
{
unsigned i; // Looping var
uint8_t *sptr = (*state)[0]; // Pointer into state
for (RoundKey += round * 16, i = 16; i > 0; i --, sptr ++, RoundKey ++)
*sptr ^= *RoundKey;
for (round_key += round * 16, i = 16; i > 0; i --, sptr ++, round_key ++)
*sptr ^= *round_key;
}
// The SubBytes Function Substitutes the values in the
// state matrix with values in an S-box.
//
// 'sub_bytes()' - Substitute the values in the state matrix with values in an S-box.
//
static void
SubBytes(state_t *state)
sub_bytes(state_t *state) // I - Current state
{
unsigned i; // Looping var
uint8_t *sptr = (*state)[0]; // Pointer into state
@ -284,11 +291,16 @@ SubBytes(state_t *state)
*sptr = sbox[*sptr];
}
// The ShiftRows() function shifts the rows in the state to the left.
//
// 'shift_rows()' - Shift the rows in the state to the left.
//
// Each row is shifted with different offset.
// Offset = Row number. So the first row is not shifted.
//
static void
ShiftRows(state_t *state)
shift_rows(state_t *state) // I - Current state
{
uint8_t *sptr = (*state)[0]; // Pointer into state
uint8_t temp; // Temporary value
@ -319,21 +331,29 @@ ShiftRows(state_t *state)
}
static uint8_t
xtime(uint8_t x)
//
// 'xtime()' - Compute the AES xtime function.
//
static uint8_t // O - xtime(x)
xtime(uint8_t x) // I - Column value
{
return ((uint8_t)((x << 1) ^ ((x >> 7) * 0x1b)));
}
// MixColumns function mixes the columns of the state matrix
//
// 'mix_columns()' - Mix the columns of the state matrix.
//
static void
MixColumns(state_t *state)
mix_columns(state_t *state) // I - Current state
{
unsigned i; // Looping var
uint8_t *sptr = (*state)[0]; // Pointer into state
uint8_t Tmp, Tm, t; // Temporary values
for (i = 4; i > 0; i --, sptr += 4)
{
t = sptr[0];
@ -357,11 +377,15 @@ MixColumns(state_t *state)
}
// Multiply is used to multiply numbers in the field GF(2^8)
//
// 'multiply()' - Multiply numbers in the field GF(2^8)
//
// Note: The last call to xtime() is unneeded, but often ends up generating a smaller binary
// The compiler seems to be able to vectorize the operation better this way.
// See https://github.com/kokke/tiny-AES-c/pull/34
static uint8_t Multiply(uint8_t x, uint8_t y)
//
static uint8_t multiply(uint8_t x, uint8_t y)
{
return (((y & 1) * x) ^
((y>>1 & 1) * xtime(x)) ^
@ -371,11 +395,15 @@ static uint8_t Multiply(uint8_t x, uint8_t y)
}
// MixColumns function mixes the columns of the state matrix.
//
// 'mix_columns()' - Mix the columns of the state matrix.
//
// The method used to multiply may be difficult to understand for the inexperienced.
// Please use the references to gain more information.
//
static void
InvMixColumns(state_t *state)
inv_mix_columns(state_t *state) // I - Current state
{
unsigned i; // Looping var
uint8_t *sptr = (*state)[0]; // Pointer into state
@ -389,18 +417,20 @@ InvMixColumns(state_t *state)
c = sptr[2];
d = sptr[3];
*sptr++ = Multiply(a, 0x0e) ^ Multiply(b, 0x0b) ^ Multiply(c, 0x0d) ^ Multiply(d, 0x09);
*sptr++ = Multiply(a, 0x09) ^ Multiply(b, 0x0e) ^ Multiply(c, 0x0b) ^ Multiply(d, 0x0d);
*sptr++ = Multiply(a, 0x0d) ^ Multiply(b, 0x09) ^ Multiply(c, 0x0e) ^ Multiply(d, 0x0b);
*sptr++ = Multiply(a, 0x0b) ^ Multiply(b, 0x0d) ^ Multiply(c, 0x09) ^ Multiply(d, 0x0e);
*sptr++ = multiply(a, 0x0e) ^ multiply(b, 0x0b) ^ multiply(c, 0x0d) ^ multiply(d, 0x09);
*sptr++ = multiply(a, 0x09) ^ multiply(b, 0x0e) ^ multiply(c, 0x0b) ^ multiply(d, 0x0d);
*sptr++ = multiply(a, 0x0d) ^ multiply(b, 0x09) ^ multiply(c, 0x0e) ^ multiply(d, 0x0b);
*sptr++ = multiply(a, 0x0b) ^ multiply(b, 0x0d) ^ multiply(c, 0x09) ^ multiply(d, 0x0e);
}
}
// The SubBytes Function Substitutes the values in the
// state matrix with values in an S-box.
//
// 'sub_bytes()' - Substitute the values in the state matrix with values in an S-box.
//
static void
InvSubBytes(state_t *state)
inv_sub_bytes(state_t *state) // I - Current state
{
unsigned i; // Looping var
uint8_t *sptr = (*state)[0]; // Pointer into state
@ -411,8 +441,12 @@ InvSubBytes(state_t *state)
}
//
// 'inv_shift_rows()' - Shift the rows in the state to the right.
//
static void
InvShiftRows(state_t *state)
inv_shift_rows(state_t *state) // I - Current state
{
uint8_t *sptr = (*state)[0]; // Pointer into state
uint8_t temp; // Temporary value
@ -443,40 +477,52 @@ InvShiftRows(state_t *state)
}
// Cipher is the main function that encrypts the PlainText.
//
// 'cipher()' - Encrypt the PlainText.
//
static void
Cipher(state_t *state, const _pdfio_aes_t *ctx)
cipher(state_t *state, // I - Current state
const _pdfio_aes_t *ctx) // I - AES context
{
size_t round = 0;
size_t round = 0; // Current round
// Add the First round key to the state before starting the rounds.
AddRoundKey(0, state, ctx->round_key);
add_round_key(0, state, ctx->round_key);
// There will be Nr rounds.
// The first Nr-1 rounds are identical.
// These Nr rounds are executed in the loop below.
// Last one without MixColumns()
// Last one without mix_columns()
for (round = 1; round < ctx->round_size; round ++)
{
SubBytes(state);
ShiftRows(state);
MixColumns(state);
AddRoundKey(round, state, ctx->round_key);
sub_bytes(state);
shift_rows(state);
mix_columns(state);
add_round_key(round, state, ctx->round_key);
}
// Add round key to last round
SubBytes(state);
ShiftRows(state);
AddRoundKey(ctx->round_size, state, ctx->round_key);
sub_bytes(state);
shift_rows(state);
add_round_key(ctx->round_size, state, ctx->round_key);
}
//
// 'inv_cipher()' - Decrypt the CipherText.
//
static void
InvCipher(state_t *state, const _pdfio_aes_t *ctx)
inv_cipher(state_t *state, // I - Current state
const _pdfio_aes_t *ctx) // I - AES context
{
size_t round;
size_t round; // Current round
// Add the First round key to the state before starting the rounds.
AddRoundKey(ctx->round_size, state, ctx->round_key);
add_round_key(ctx->round_size, state, ctx->round_key);
// There will be Nr rounds.
// The first Nr-1 rounds are identical.
@ -484,20 +530,25 @@ InvCipher(state_t *state, const _pdfio_aes_t *ctx)
// Last one without InvMixColumn()
for (round = ctx->round_size - 1; ; round --)
{
InvShiftRows(state);
InvSubBytes(state);
AddRoundKey(round, state, ctx->round_key);
inv_shift_rows(state);
inv_sub_bytes(state);
add_round_key(round, state, ctx->round_key);
if (round == 0)
break;
InvMixColumns(state);
inv_mix_columns(state);
}
}
//
// 'xor_with_iv()' - XOR a block with the initialization vector.
//
static void
XorWithIv(uint8_t *buf, const uint8_t *Iv)
xor_with_iv(uint8_t *buf, // I - Block
const uint8_t *Iv) // I - Initialization vector
{
// 16-byte block...
*buf++ ^= *Iv++;

View File

@ -98,7 +98,7 @@ _pdfioFileFlush(pdfio_file_t *pdf) // I - PDF file
if (!write_buffer(pdf, pdf->buffer, (size_t)(pdf->bufptr - pdf->buffer)))
return (false);
pdf->bufpos += pdf->bufptr - pdf->buffer;
pdf->bufpos += (off_t)(pdf->bufptr - pdf->buffer);
}
pdf->bufptr = pdf->buffer;
@ -216,7 +216,7 @@ _pdfioFilePeek(pdfio_file_t *pdf, // I - PDF file
PDFIO_DEBUG("_pdfioFilePeek: Sliding buffer, total=%ld\n", (long)total);
memmove(pdf->buffer, pdf->bufptr, total);
pdf->bufpos += pdf->bufptr - pdf->buffer;
pdf->bufpos += (off_t)(pdf->bufptr - pdf->buffer);
pdf->bufptr = pdf->buffer;
pdf->bufend = pdf->buffer + total;
@ -317,14 +317,14 @@ _pdfioFileRead(pdfio_file_t *pdf, // I - PDF file
// Advance current position in file as needed...
if (pdf->bufend)
{
pdf->bufpos += pdf->bufend - pdf->buffer;
pdf->bufpos += (off_t)(pdf->bufend - pdf->buffer);
pdf->bufptr = pdf->bufend = NULL;
}
// Read directly from the file...
if ((rbytes = read_buffer(pdf, bufptr, bytes)) > 0)
{
pdf->bufpos += rbytes;
pdf->bufpos += (off_t)rbytes;
continue;
}
else if (rbytes < 0 && (errno == EINTR || errno == EAGAIN))
@ -361,14 +361,14 @@ _pdfioFileSeek(pdfio_file_t *pdf, // I - PDF file
// Adjust offset for relative seeks...
if (whence == SEEK_CUR)
{
offset += pdf->bufpos + (pdf->bufptr - pdf->buffer);
offset += pdf->bufpos + (off_t)(pdf->bufptr - pdf->buffer);
whence = SEEK_SET;
}
if (pdf->mode == _PDFIO_MODE_READ)
{
// Reading, see if we already have the data we need...
if (whence != SEEK_END && offset >= pdf->bufpos && pdf->bufend && offset < (pdf->bufpos + pdf->bufend - pdf->buffer))
if (whence != SEEK_END && offset >= pdf->bufpos && pdf->bufend && offset < (off_t)(pdf->bufpos + pdf->bufend - pdf->buffer))
{
// Yes, seek within existing buffer...
pdf->bufptr = pdf->buffer + (offset - pdf->bufpos);

File diff suppressed because it is too large Load Diff

View File

@ -129,6 +129,7 @@ extern bool pdfioContentTextShowJustified(pdfio_stream_t *st, bool unicode, siz
// Resource helpers...
extern pdfio_obj_t *pdfioFileCreateFontObjFromBase(pdfio_file_t *pdf, const char *name) _PDFIO_PUBLIC;
extern pdfio_obj_t *pdfioFileCreateFontObjFromFile(pdfio_file_t *pdf, const char *filename, bool unicode) _PDFIO_PUBLIC;
extern pdfio_obj_t *pdfioFileCreateICCObjFromData(pdfio_file_t *pdf, const unsigned char *data, size_t datalen, size_t num_colors) _PDFIO_PUBLIC;
extern pdfio_obj_t *pdfioFileCreateICCObjFromFile(pdfio_file_t *pdf, const char *filename, size_t num_colors) _PDFIO_PUBLIC;
extern pdfio_obj_t *pdfioFileCreateImageObjFromData(pdfio_file_t *pdf, const unsigned char *data, size_t width, size_t height, size_t num_colors, pdfio_array_t *color_data, bool alpha, bool interpolate) _PDFIO_PUBLIC;
extern pdfio_obj_t *pdfioFileCreateImageObjFromFile(pdfio_file_t *pdf, const char *filename, bool interpolate) _PDFIO_PUBLIC;

View File

@ -1,7 +1,7 @@
//
// Cryptographic support functions for PDFio.
//
// Copyright © 2021-2023 by Michael R Sweet.
// Copyright © 2021-2025 by Michael R Sweet.
//
// Licensed under Apache License v2.0. See the file "LICENSE" for more
// information.
@ -466,6 +466,7 @@ _pdfioCryptoMakeReader(
if (memcmp(pdf->password, pdf_user_key, 32) && memcmp(own_user_key, pdf_user_key, 16))
{
_pdfioFileError(pdf, "Unable to unlock file.");
*ivlen = 0;
return (NULL);
}
@ -483,6 +484,7 @@ _pdfioCryptoMakeReader(
switch (pdf->encryption)
{
default :
_pdfioFileError(pdf, "Unsupported encryption algorithm.");
*ivlen = 0;
return (NULL);

View File

@ -25,6 +25,7 @@ static struct lconv *get_lconv(void);
static bool load_obj_stream(pdfio_obj_t *obj);
static bool load_pages(pdfio_file_t *pdf, pdfio_obj_t *obj, size_t depth);
static bool load_xref(pdfio_file_t *pdf, off_t xref_offset, pdfio_password_cb_t password_cb, void *password_data);
static bool repair_xref(pdfio_file_t *pdf, pdfio_password_cb_t password_cb, void *password_data);
static bool write_pages(pdfio_file_t *pdf);
static bool write_trailer(pdfio_file_t *pdf);
@ -164,7 +165,8 @@ pdfioFileClose(pdfio_file_t *pdf) // I - PDF file
// name of the PDF file to create.
//
// The "version" argument specifies the PDF version number for the file or
// `NULL` for the default ("2.0").
// `NULL` for the default ("2.0"). The value "PCLm-1.0" can be specified to
// produce the PCLm subset of PDF.
//
// The "media_box" and "crop_box" arguments specify the default MediaBox and
// CropBox for pages in the PDF file - if `NULL` then a default "Universal" size
@ -396,7 +398,9 @@ _pdfioFileCreateObj(
// ```
//
// The "version" argument specifies the PDF version number for the file or
// `NULL` for the default ("2.0").
// `NULL` for the default ("2.0"). Unlike @link pdfioFileCreate@ and
// @link pdfioFileCreateTemporary@, it is generally not safe to pass the
// "PCLm-1.0" version string.
//
// The "media_box" and "crop_box" arguments specify the default MediaBox and
// CropBox for pages in the PDF file - if `NULL` then a default "Universal" size
@ -531,8 +535,19 @@ pdfioFileCreateStringObj(
//
// This function creates a PDF file with a unique filename in the current
// temporary directory. The temporary file is stored in the string "buffer" an
// will have a ".pdf" extension. Otherwise, this function works the same as
// the @link pdfioFileCreate@ function.
// will have a ".pdf" extension.
//
// The "version" argument specifies the PDF version number for the file or
// `NULL` for the default ("2.0"). The value "PCLm-1.0" can be specified to
// produce the PCLm subset of PDF.
//
// The "media_box" and "crop_box" arguments specify the default MediaBox and
// CropBox for pages in the PDF file - if `NULL` then a default "Universal" size
// of 8.27x11in (the intersection of US Letter and ISO A4) is used.
//
// The "error_cb" and "error_cbdata" arguments specify an error handler callback
// and its data pointer - if `NULL` the default error handler is used that
// writes error messages to `stderr`.
//
// @since PDFio v1.1@
//
@ -801,6 +816,18 @@ pdfioFileGetKeywords(pdfio_file_t *pdf) // I - PDF file
}
//
// 'pdfioFileGetModificationDate()' - Get the most recent modification date for a PDF file.
//
time_t // O - Modification date or `0` for none
pdfioFileGetModificationDate(
pdfio_file_t *pdf) // I - PDF file
{
return (pdf && pdf->info_obj ? pdfioDictGetDate(pdfioObjGetDict(pdf->info_obj), "ModDate") : 0);
}
//
// 'pdfioFileGetName()' - Get a PDF's filename.
//
@ -1058,7 +1085,10 @@ pdfioFileOpen(
xref_offset = (off_t)strtol(ptr + 9, NULL, 10);
if (!load_xref(pdf, xref_offset, password_cb, password_cbdata))
goto error;
{
if (!repair_xref(pdf, password_cb, password_cbdata))
goto error;
}
return (pdf);
@ -1126,6 +1156,20 @@ pdfioFileSetKeywords(
}
//
// 'pdfioFileSetModificationDate()' - Set the modification date for a PDF file.
//
void
pdfioFileSetModificationDate(
pdfio_file_t *pdf, // I - PDF file
time_t value) // I - Value
{
if (pdf && pdf->info_obj)
pdfioDictSetDate(pdf->info_obj->value.value.dict, "ModDate", value);
}
//
// 'pdfioFileSetPermissions()' - Set the PDF permissions, encryption mode, and passwords.
//
@ -1366,7 +1410,7 @@ create_common(
pdf->output_cb = output_cb;
pdf->output_ctx = output_cbdata;
pdf->filename = strdup(filename);
pdf->version = strdup(version);
pdf->version = strdup(!strncmp(version, "PCLm-", 5) ? "1.4" : version);
pdf->mode = _PDFIO_MODE_WRITE;
pdf->error_cb = error_cb;
pdf->error_data = error_cbdata;
@ -1397,8 +1441,15 @@ create_common(
}
// Write a standard PDF header...
if (!_pdfioFilePrintf(pdf, "%%PDF-%s\n%%\342\343\317\323\n", version))
if (!strncmp(version, "PCLm-", 5))
{
if (!_pdfioFilePrintf(pdf, "%%PDF-1.4\n%%%s\n", version))
goto error;
}
else if (!_pdfioFilePrintf(pdf, "%%PDF-%s\n%%\342\343\317\323\n", version))
{
goto error;
}
// Create the pages object...
if ((dict = pdfioDictCreate(pdf)) == NULL)
@ -1748,7 +1799,7 @@ load_xref(
return (false);
}
if (_pdfioFileSeek(pdf, line_offset + ptr + 3 - line, SEEK_SET) < 0)
if (_pdfioFileSeek(pdf, line_offset + (off_t)(ptr + 3 - line), SEEK_SET) < 0)
{
_pdfioFileError(pdf, "Unable to seek to xref object %lu %u.", (unsigned long)number, (unsigned)generation);
return (false);
@ -1890,7 +1941,7 @@ load_xref(
if (w[0] == 0 || buffer[0] == 1)
{
// Location of object...
current->offset = offset;
current->offset = (off_t)offset;
}
else if (number != offset)
{
@ -1927,7 +1978,7 @@ load_xref(
else if (!current)
{
// Add this object...
if (!add_obj(pdf, (size_t)number, (unsigned short)generation, offset))
if (!add_obj(pdf, (size_t)number, (unsigned short)generation, (off_t)offset))
return (false);
}
@ -2054,7 +2105,7 @@ load_xref(
if (pdfioFileFindObj(pdf, (size_t)number))
continue; // Don't replace newer object...
if (!add_obj(pdf, (size_t)number, (unsigned short)generation, offset))
if (!add_obj(pdf, (size_t)number, (unsigned short)generation, (off_t)offset))
return (false);
}
@ -2139,6 +2190,159 @@ load_xref(
}
//
// 'repair_xref()' - Try to "repair" a PDF file and its cross-references...
//
static bool // O - `true` on success, `false` on failure
repair_xref(
pdfio_file_t *pdf, // I - PDF file
pdfio_password_cb_t password_cb, // I - Password callback or `NULL` for none
void *password_data) // I - Password callback data, if any
{
char line[16384], // Line from file
*ptr; // Pointer into line
off_t line_offset; // Offset in file
intmax_t number; // Object number
int generation; // Generation number
size_t i; // Looping var
size_t num_sobjs = 0; // Number of object streams
pdfio_obj_t *sobjs[16384]; // Object streams to load
// Read from the beginning of the file, looking for
if ((line_offset = _pdfioFileSeek(pdf, 0, SEEK_SET)) < 0)
return (false);
while (_pdfioFileGets(pdf, line, sizeof(line)))
{
// See if this is the start of an object...
if (line[0] >= '1' && line[0] <= '9')
{
// Maybe, look some more...
if ((number = strtoimax(line, &ptr, 10)) >= 1 && (generation = (int)strtol(ptr, &ptr, 10)) >= 0 && generation < 65536)
{
while (isspace(*ptr & 255))
ptr ++;
if (!strncmp(ptr, "obj", 3))
{
// Yes, start of an object...
pdfio_obj_t *obj; // Object
_pdfio_token_t tb; // Token buffer/stack
PDFIO_DEBUG("OBJECT %ld %d at offset %ld\n", (long)number, generation, (long)line_offset);
if ((obj = add_obj(pdf, (size_t)number, (unsigned short)generation, line_offset)) == NULL)
{
_pdfioFileError(pdf, "Unable to allocate memory for object.");
return (false);
}
_pdfioTokenInit(&tb, pdf, (_pdfio_tconsume_cb_t)_pdfioFileConsume, (_pdfio_tpeek_cb_t)_pdfioFilePeek, pdf);
if (!_pdfioValueRead(pdf, obj, &tb, &obj->value, 0))
{
_pdfioFileError(pdf, "Unable to read cross-reference stream dictionary.");
return (false);
}
if (_pdfioTokenGet(&tb, line, sizeof(line)) && strcmp(line, "stream"))
{
const char *type = pdfioObjGetType(obj);
// Object type
_pdfioTokenFlush(&tb);
obj->stream_offset = _pdfioFileTell(pdf);
if (type && !strcmp(type, "ObjStm") && num_sobjs < (sizeof(sobjs) / sizeof(sobjs[0])))
{
sobjs[num_sobjs] = obj;
num_sobjs ++;
}
if (type && !strcmp(type, "XRef") && !pdf->trailer_dict)
{
// Save the trailer dictionary...
pdf->trailer_dict = pdfioObjGetDict(obj);
pdf->encrypt_obj = pdfioDictGetObj(pdf->trailer_dict, "Encrypt");
pdf->id_array = pdfioDictGetArray(pdf->trailer_dict, "ID");
}
}
}
}
}
else if (!strncmp(line, "trailer", 7) && (!line[7] || isspace(line[7] & 255) || line[7] == '<'))
{
// Trailer dictionary
_pdfio_token_t tb; // Token buffer/stack
_pdfio_value_t trailer; // Trailer
if (line[7])
{
// Probably the start of the trailer dictionary, rewind the file so
// we can read it...
_pdfioFileSeek(pdf, line_offset + 7, SEEK_SET);
}
PDFIO_DEBUG("TRAILER at offset %ld\n", (long)line_offset);
_pdfioTokenInit(&tb, pdf, (_pdfio_tconsume_cb_t)_pdfioFileConsume, (_pdfio_tpeek_cb_t)_pdfioFilePeek, pdf);
if (!_pdfioValueRead(pdf, NULL, &tb, &trailer, 0))
{
_pdfioFileError(pdf, "Unable to read cross-reference stream dictionary.");
return (false);
}
else if (trailer.type != PDFIO_VALTYPE_DICT)
{
_pdfioFileError(pdf, "Trailer is not a dictionary.");
return (false);
}
_pdfioTokenFlush(&tb);
if (!pdf->trailer_dict)
{
// Save the trailer dictionary and grab the root (catalog) and info
// objects...
pdf->trailer_dict = trailer.value.dict;
pdf->encrypt_obj = pdfioDictGetObj(pdf->trailer_dict, "Encrypt");
pdf->id_array = pdfioDictGetArray(pdf->trailer_dict, "ID");
}
}
// Get the offset for the next line...
line_offset = _pdfioFileTell(pdf);
}
// If the trailer contains an Encrypt key, try unlocking the file...
if (pdf->encrypt_obj && !_pdfioCryptoUnlock(pdf, password_cb, password_data))
return (false);
// Load any stream objects...
for (i = 0; i < num_sobjs; i ++)
{
if (!load_obj_stream(sobjs[i]))
return (false);
}
// Once we have all of the xref tables loaded, get the important objects and
// build the pages array...
pdf->info_obj = pdfioDictGetObj(pdf->trailer_dict, "Info");
if ((pdf->root_obj = pdfioDictGetObj(pdf->trailer_dict, "Root")) == NULL)
{
_pdfioFileError(pdf, "Missing Root object.");
return (false);
}
PDFIO_DEBUG("repair_xref: Root=%p(%lu)\n", pdf->root_obj, (unsigned long)pdf->root_obj->number);
// Load pages...
return (load_pages(pdf, pdfioDictGetObj(pdfioObjGetDict(pdf->root_obj), "Pages"), 0));
}
//
// 'write_pages()' - Write the PDF pages objects.
//
@ -2157,7 +2361,7 @@ write_pages(pdfio_file_t *pdf) // I - PDF file
for (i = 0; i < pdf->num_pages; i ++)
pdfioArrayAppendObj(kids, pdf->pages[i]);
pdfioDictSetNumber(pdf->pages_obj->value.value.dict, "Count", pdf->num_pages);
pdfioDictSetNumber(pdf->pages_obj->value.value.dict, "Count", (double)pdf->num_pages);
pdfioDictSetArray(pdf->pages_obj->value.value.dict, "Kids", kids);
// Write the Pages object...
@ -2175,59 +2379,221 @@ write_trailer(pdfio_file_t *pdf) // I - PDF file
bool ret = true; // Return value
off_t xref_offset; // Offset to xref table
size_t i; // Looping var
pdfio_obj_t *obj; // Current object
// Write the xref table...
// TODO: Look at adding support for xref streams...
xref_offset = _pdfioFileTell(pdf);
if (!_pdfioFilePrintf(pdf, "xref\n0 %lu \n0000000000 65535 f \n", (unsigned long)pdf->num_objs + 1))
if (strcmp(pdf->version, "1.5") >= 0 && !pdf->output_cb)
{
_pdfioFileError(pdf, "Unable to write cross-reference table.");
ret = false;
goto done;
}
// Write a cross-reference stream...
pdfio_dict_t *xref_dict; // Object dictionary
pdfio_array_t *w_array; // W array
pdfio_obj_t *xref_obj; // Object
pdfio_stream_t *xref_st; // Stream
int offsize; // Size of object offsets
unsigned char buffer[10]; // Buffer entry
pdfio_encryption_t encryption; // PDF encryption mode
for (i = 0; i < pdf->num_objs; i ++)
{
pdfio_obj_t *obj = pdf->objs[i]; // Current object
// Disable encryption while we write the xref stream...
encryption = pdf->encryption;
pdf->encryption = PDFIO_ENCRYPTION_NONE;
if (!_pdfioFilePrintf(pdf, "%010lu %05u n \n", (unsigned long)obj->offset, obj->generation))
// Figure out how many bytes are needed for the object numbers
if (xref_offset < 0xff)
offsize = 1;
else if (xref_offset < 0xffff)
offsize = 2;
else if (xref_offset < 0xffffff)
offsize = 3;
else if (xref_offset < 0xffffffff)
offsize = 4;
else if (xref_offset < 0xffffffffff)
offsize = 5;
else if (xref_offset < 0xffffffffffff)
offsize = 6;
else if (xref_offset < 0xffffffffffffff)
offsize = 7;
else
offsize = 8;
// Create the object...
if ((w_array = pdfioArrayCreate(pdf)) == NULL)
{
_pdfioFileError(pdf, "Unable to write cross-reference table.");
ret = false;
goto done;
}
pdfioArrayAppendNumber(w_array, 1);
pdfioArrayAppendNumber(w_array, offsize);
pdfioArrayAppendNumber(w_array, 1);
if ((xref_dict = pdfioDictCreate(pdf)) == NULL)
{
_pdfioFileError(pdf, "Unable to write cross-reference table.");
ret = false;
goto done;
}
pdfioDictSetName(xref_dict, "Type", "XRef");
pdfioDictSetNumber(xref_dict, "Size", (double)(pdf->num_objs + 2));
pdfioDictSetArray(xref_dict, "W", w_array);
pdfioDictSetName(xref_dict, "Filter", "FlateDecode");
pdfioDictSetObj(xref_dict, "Info", pdf->info_obj);
pdfioDictSetObj(xref_dict, "Root", pdf->root_obj);
if (pdf->encrypt_obj)
pdfioDictSetObj(xref_dict, "Encrypt", pdf->encrypt_obj);
if (pdf->id_array)
pdfioDictSetArray(xref_dict, "ID", pdf->id_array);
if ((xref_obj = pdfioFileCreateObj(pdf, xref_dict)) == NULL)
{
_pdfioFileError(pdf, "Unable to write cross-reference table.");
ret = false;
goto done;
}
if ((xref_st = pdfioObjCreateStream(xref_obj, PDFIO_FILTER_FLATE)) == NULL)
{
_pdfioFileError(pdf, "Unable to write cross-reference table.");
ret = false;
goto done;
}
// Write the "free" 0 object...
memset(buffer, 0, sizeof(buffer));
pdfioStreamWrite(xref_st, buffer, offsize + 2);
// Then write the "allocated" objects...
buffer[0] = 1;
for (i = 0; i < pdf->num_objs; i ++)
{
obj = pdf->objs[i]; // Current object
switch (offsize)
{
case 1 :
buffer[1] = obj->offset & 255;
break;
case 2 :
buffer[1] = (obj->offset >> 8) & 255;
buffer[2] = obj->offset & 255;
break;
case 3 :
buffer[1] = (obj->offset >> 16) & 255;
buffer[2] = (obj->offset >> 8) & 255;
buffer[3] = obj->offset & 255;
break;
case 4 :
buffer[1] = (obj->offset >> 24) & 255;
buffer[2] = (obj->offset >> 16) & 255;
buffer[3] = (obj->offset >> 8) & 255;
buffer[4] = obj->offset & 255;
break;
case 5 :
buffer[1] = (obj->offset >> 32) & 255;
buffer[2] = (obj->offset >> 24) & 255;
buffer[3] = (obj->offset >> 16) & 255;
buffer[4] = (obj->offset >> 8) & 255;
buffer[5] = obj->offset & 255;
break;
case 6 :
buffer[1] = (obj->offset >> 40) & 255;
buffer[2] = (obj->offset >> 32) & 255;
buffer[3] = (obj->offset >> 24) & 255;
buffer[4] = (obj->offset >> 16) & 255;
buffer[5] = (obj->offset >> 8) & 255;
buffer[6] = obj->offset & 255;
break;
case 7 :
buffer[1] = (obj->offset >> 48) & 255;
buffer[2] = (obj->offset >> 40) & 255;
buffer[3] = (obj->offset >> 32) & 255;
buffer[4] = (obj->offset >> 24) & 255;
buffer[5] = (obj->offset >> 16) & 255;
buffer[6] = (obj->offset >> 8) & 255;
buffer[7] = obj->offset & 255;
break;
default :
buffer[1] = (obj->offset >> 56) & 255;
buffer[2] = (obj->offset >> 48) & 255;
buffer[3] = (obj->offset >> 40) & 255;
buffer[4] = (obj->offset >> 32) & 255;
buffer[5] = (obj->offset >> 24) & 255;
buffer[6] = (obj->offset >> 16) & 255;
buffer[7] = (obj->offset >> 8) & 255;
buffer[8] = obj->offset & 255;
break;
}
if (!pdfioStreamWrite(xref_st, buffer, offsize + 2))
{
_pdfioFileError(pdf, "Unable to write cross-reference table.");
ret = false;
goto done;
}
}
pdfioStreamClose(xref_st);
pdf->encryption = encryption;
}
// Write the trailer...
if (!_pdfioFilePuts(pdf, "trailer\n"))
else
{
_pdfioFileError(pdf, "Unable to write trailer.");
ret = false;
goto done;
}
// Write a cross-reference table...
if (!_pdfioFilePrintf(pdf, "xref\n0 %lu \n0000000000 65535 f \n", (unsigned long)pdf->num_objs + 1))
{
_pdfioFileError(pdf, "Unable to write cross-reference table.");
ret = false;
goto done;
}
if ((pdf->trailer_dict = pdfioDictCreate(pdf)) == NULL)
{
_pdfioFileError(pdf, "Unable to create trailer.");
ret = false;
goto done;
}
for (i = 0; i < pdf->num_objs; i ++)
{
obj = pdf->objs[i]; // Current object
if (pdf->encrypt_obj)
pdfioDictSetObj(pdf->trailer_dict, "Encrypt", pdf->encrypt_obj);
if (pdf->id_array)
pdfioDictSetArray(pdf->trailer_dict, "ID", pdf->id_array);
pdfioDictSetObj(pdf->trailer_dict, "Info", pdf->info_obj);
pdfioDictSetObj(pdf->trailer_dict, "Root", pdf->root_obj);
pdfioDictSetNumber(pdf->trailer_dict, "Size", pdf->num_objs + 1);
if (!_pdfioFilePrintf(pdf, "%010lu %05u n \n", (unsigned long)obj->offset, obj->generation))
{
_pdfioFileError(pdf, "Unable to write cross-reference table.");
ret = false;
goto done;
}
}
if (!_pdfioDictWrite(pdf->trailer_dict, NULL, NULL))
{
_pdfioFileError(pdf, "Unable to write trailer.");
ret = false;
goto done;
// Write the trailer...
if (!_pdfioFilePuts(pdf, "trailer\n"))
{
_pdfioFileError(pdf, "Unable to write trailer.");
ret = false;
goto done;
}
if ((pdf->trailer_dict = pdfioDictCreate(pdf)) == NULL)
{
_pdfioFileError(pdf, "Unable to create trailer.");
ret = false;
goto done;
}
if (pdf->encrypt_obj)
pdfioDictSetObj(pdf->trailer_dict, "Encrypt", pdf->encrypt_obj);
if (pdf->id_array)
pdfioDictSetArray(pdf->trailer_dict, "ID", pdf->id_array);
pdfioDictSetObj(pdf->trailer_dict, "Info", pdf->info_obj);
pdfioDictSetObj(pdf->trailer_dict, "Root", pdf->root_obj);
pdfioDictSetNumber(pdf->trailer_dict, "Size", (double)(pdf->num_objs + 1));
if (!_pdfioDictWrite(pdf->trailer_dict, NULL, NULL))
{
_pdfioFileError(pdf, "Unable to write trailer.");
ret = false;
goto done;
}
}
if (!_pdfioFilePrintf(pdf, "\nstartxref\n%lu\n%%EOF\n", (unsigned long)xref_offset))

View File

@ -1,7 +1,7 @@
//
// MD5 functions for PDFio.
//
// Copyright © 2021 by Michael R Sweet.
// Copyright © 2021-2025 by Michael R Sweet.
// Copyright © 1999 Aladdin Enterprises. All rights reserved.
//
// This software is provided 'as-is', without any express or implied
@ -108,231 +108,285 @@
#define T63 0x2ad7d2bb
#define T64 0xeb86d391
//
// Use the unoptimized (big-endian) implementation if we don't know the
// endian-ness of the platform.
//
#ifdef __BYTE_ORDER__
# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
# define ARCH_IS_BIG_ENDIAN 0 // Use little endian optimized version
# else
# define ARCH_IS_BIG_ENDIAN 1 // Use generic version
# endif // __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#elif !defined(ARCH_IS_BIG_ENDIAN)
# define ARCH_IS_BIG_ENDIAN 1 // Use generic version
#endif // !ARCH_IS_BIG_ENDIAN
//
// 'md5_process()' - Hash a block of data.
//
static void
md5_process(_pdfio_md5_t *pms, const uint8_t *data /*[64]*/)
md5_process(_pdfio_md5_t *pms, // I - MD5 state
const uint8_t *data/*[64]*/)// I - Data
{
uint32_t
a = pms->abcd[0], b = pms->abcd[1],
c = pms->abcd[2], d = pms->abcd[3];
uint32_t t;
uint32_t a = pms->abcd[0], // First word of state
b = pms->abcd[1], // Second word of state
c = pms->abcd[2], // Third word of state
d = pms->abcd[3]; // Fourth word of state
uint32_t t; // Temporary state
#ifndef ARCH_IS_BIG_ENDIAN
# define ARCH_IS_BIG_ENDIAN 1 /* slower, default implementation */
#endif
#if ARCH_IS_BIG_ENDIAN
// On big-endian machines, we must arrange the bytes in the right
// order. (This also works on machines of unknown byte order.)
uint32_t X[16]; // Little-endian representation
const uint8_t *xp; // Pointer into data
int i; // Looping var
/*
* On big-endian machines, we must arrange the bytes in the right
* order. (This also works on machines of unknown byte order.)
*/
uint32_t X[16];
const uint8_t *xp = data;
int i;
for (i = 0; i < 16; ++i, xp += 4)
X[i] = xp[0] + (unsigned)(xp[1] << 8) + (unsigned)(xp[2] << 16) + (unsigned)(xp[3] << 24);
for (i = 0, xp = data; i < 16; i ++, xp += 4)
X[i] = xp[0] + (unsigned)(xp[1] << 8) + (unsigned)(xp[2] << 16) + (unsigned)(xp[3] << 24);
#else /* !ARCH_IS_BIG_ENDIAN */
// On little-endian machines, we can process properly aligned data without copying it.
uint32_t xbuf[16]; // Aligned buffer
const uint32_t *X; // Pointer to little-endian representation
/*
* On little-endian machines, we can process properly aligned data
* without copying it.
*/
uint32_t xbuf[16];
const uint32_t *X;
if (!((data - (const uint8_t *)0) & 3)) {
/* data are properly aligned */
X = (const uint32_t *)data;
} else {
/* not aligned */
memcpy(xbuf, data, 64);
X = xbuf;
}
#endif
if (!((data - (const uint8_t *)0) & 3))
{
// data is properly aligned, use it directly...
X = (const uint32_t *)data;
}
else
{
// data is not aligned, copy to the aligned buffer...
memcpy(xbuf, data, 64);
X = xbuf;
}
#endif // ARCH_IS_BIG_ENDIAN
#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32 - (n))))
/* Round 1. */
/* Let [abcd k s i] denote the operation
a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */
#define F(x, y, z) (((x) & (y)) | (~(x) & (z)))
#define SET(a, b, c, d, k, s, Ti)\
t = a + F(b,c,d) + X[k] + Ti;\
a = ROTATE_LEFT(t, s) + b
/* Do the following 16 operations. */
SET(a, b, c, d, 0, 7, T1);
SET(d, a, b, c, 1, 12, T2);
SET(c, d, a, b, 2, 17, T3);
SET(b, c, d, a, 3, 22, T4);
SET(a, b, c, d, 4, 7, T5);
SET(d, a, b, c, 5, 12, T6);
SET(c, d, a, b, 6, 17, T7);
SET(b, c, d, a, 7, 22, T8);
SET(a, b, c, d, 8, 7, T9);
SET(d, a, b, c, 9, 12, T10);
SET(c, d, a, b, 10, 17, T11);
SET(b, c, d, a, 11, 22, T12);
SET(a, b, c, d, 12, 7, T13);
SET(d, a, b, c, 13, 12, T14);
SET(c, d, a, b, 14, 17, T15);
SET(b, c, d, a, 15, 22, T16);
// Round 1.
// Let [abcd k s i] denote the operation
// a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s).
#define F(x, y, z) (((x) & (y)) | (~(x) & (z)))
#define SET(a, b, c, d, k, s, Ti) t = a + F(b,c,d) + X[k] + Ti; a = ROTATE_LEFT(t, s) + b
// Do the following 16 operations.
SET(a, b, c, d, 0, 7, T1);
SET(d, a, b, c, 1, 12, T2);
SET(c, d, a, b, 2, 17, T3);
SET(b, c, d, a, 3, 22, T4);
SET(a, b, c, d, 4, 7, T5);
SET(d, a, b, c, 5, 12, T6);
SET(c, d, a, b, 6, 17, T7);
SET(b, c, d, a, 7, 22, T8);
SET(a, b, c, d, 8, 7, T9);
SET(d, a, b, c, 9, 12, T10);
SET(c, d, a, b, 10, 17, T11);
SET(b, c, d, a, 11, 22, T12);
SET(a, b, c, d, 12, 7, T13);
SET(d, a, b, c, 13, 12, T14);
SET(c, d, a, b, 14, 17, T15);
SET(b, c, d, a, 15, 22, T16);
#undef SET
/* Round 2. */
/* Let [abcd k s i] denote the operation
a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */
#define G(x, y, z) (((x) & (z)) | ((y) & ~(z)))
#define SET(a, b, c, d, k, s, Ti)\
t = a + G(b,c,d) + X[k] + Ti;\
a = ROTATE_LEFT(t, s) + b
/* Do the following 16 operations. */
SET(a, b, c, d, 1, 5, T17);
SET(d, a, b, c, 6, 9, T18);
SET(c, d, a, b, 11, 14, T19);
SET(b, c, d, a, 0, 20, T20);
SET(a, b, c, d, 5, 5, T21);
SET(d, a, b, c, 10, 9, T22);
SET(c, d, a, b, 15, 14, T23);
SET(b, c, d, a, 4, 20, T24);
SET(a, b, c, d, 9, 5, T25);
SET(d, a, b, c, 14, 9, T26);
SET(c, d, a, b, 3, 14, T27);
SET(b, c, d, a, 8, 20, T28);
SET(a, b, c, d, 13, 5, T29);
SET(d, a, b, c, 2, 9, T30);
SET(c, d, a, b, 7, 14, T31);
SET(b, c, d, a, 12, 20, T32);
// Round 2.
// Let [abcd k s i] denote the operation
// a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s).
#define G(x, y, z) (((x) & (z)) | ((y) & ~(z)))
#define SET(a, b, c, d, k, s, Ti) t = a + G(b,c,d) + X[k] + Ti; a = ROTATE_LEFT(t, s) + b
// Do the following 16 operations.
SET(a, b, c, d, 1, 5, T17);
SET(d, a, b, c, 6, 9, T18);
SET(c, d, a, b, 11, 14, T19);
SET(b, c, d, a, 0, 20, T20);
SET(a, b, c, d, 5, 5, T21);
SET(d, a, b, c, 10, 9, T22);
SET(c, d, a, b, 15, 14, T23);
SET(b, c, d, a, 4, 20, T24);
SET(a, b, c, d, 9, 5, T25);
SET(d, a, b, c, 14, 9, T26);
SET(c, d, a, b, 3, 14, T27);
SET(b, c, d, a, 8, 20, T28);
SET(a, b, c, d, 13, 5, T29);
SET(d, a, b, c, 2, 9, T30);
SET(c, d, a, b, 7, 14, T31);
SET(b, c, d, a, 12, 20, T32);
#undef SET
/* Round 3. */
/* Let [abcd k s t] denote the operation
a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */
#define H(x, y, z) ((x) ^ (y) ^ (z))
#define SET(a, b, c, d, k, s, Ti)\
t = a + H(b,c,d) + X[k] + Ti;\
a = ROTATE_LEFT(t, s) + b
/* Do the following 16 operations. */
SET(a, b, c, d, 5, 4, T33);
SET(d, a, b, c, 8, 11, T34);
SET(c, d, a, b, 11, 16, T35);
SET(b, c, d, a, 14, 23, T36);
SET(a, b, c, d, 1, 4, T37);
SET(d, a, b, c, 4, 11, T38);
SET(c, d, a, b, 7, 16, T39);
SET(b, c, d, a, 10, 23, T40);
SET(a, b, c, d, 13, 4, T41);
SET(d, a, b, c, 0, 11, T42);
SET(c, d, a, b, 3, 16, T43);
SET(b, c, d, a, 6, 23, T44);
SET(a, b, c, d, 9, 4, T45);
SET(d, a, b, c, 12, 11, T46);
SET(c, d, a, b, 15, 16, T47);
SET(b, c, d, a, 2, 23, T48);
// Round 3.
// Let [abcd k s t] denote the operation
// a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s).
#define H(x, y, z) ((x) ^ (y) ^ (z))
#define SET(a, b, c, d, k, s, Ti) t = a + H(b,c,d) + X[k] + Ti; a = ROTATE_LEFT(t, s) + b
// Do the following 16 operations.
SET(a, b, c, d, 5, 4, T33);
SET(d, a, b, c, 8, 11, T34);
SET(c, d, a, b, 11, 16, T35);
SET(b, c, d, a, 14, 23, T36);
SET(a, b, c, d, 1, 4, T37);
SET(d, a, b, c, 4, 11, T38);
SET(c, d, a, b, 7, 16, T39);
SET(b, c, d, a, 10, 23, T40);
SET(a, b, c, d, 13, 4, T41);
SET(d, a, b, c, 0, 11, T42);
SET(c, d, a, b, 3, 16, T43);
SET(b, c, d, a, 6, 23, T44);
SET(a, b, c, d, 9, 4, T45);
SET(d, a, b, c, 12, 11, T46);
SET(c, d, a, b, 15, 16, T47);
SET(b, c, d, a, 2, 23, T48);
#undef SET
/* Round 4. */
/* Let [abcd k s t] denote the operation
a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */
#define I(x, y, z) ((y) ^ ((x) | ~(z)))
#define SET(a, b, c, d, k, s, Ti)\
t = a + I(b,c,d) + X[k] + Ti;\
a = ROTATE_LEFT(t, s) + b
/* Do the following 16 operations. */
SET(a, b, c, d, 0, 6, T49);
SET(d, a, b, c, 7, 10, T50);
SET(c, d, a, b, 14, 15, T51);
SET(b, c, d, a, 5, 21, T52);
SET(a, b, c, d, 12, 6, T53);
SET(d, a, b, c, 3, 10, T54);
SET(c, d, a, b, 10, 15, T55);
SET(b, c, d, a, 1, 21, T56);
SET(a, b, c, d, 8, 6, T57);
SET(d, a, b, c, 15, 10, T58);
SET(c, d, a, b, 6, 15, T59);
SET(b, c, d, a, 13, 21, T60);
SET(a, b, c, d, 4, 6, T61);
SET(d, a, b, c, 11, 10, T62);
SET(c, d, a, b, 2, 15, T63);
SET(b, c, d, a, 9, 21, T64);
// Round 4.
// Let [abcd k s t] denote the operation
// a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s).
#define I(x, y, z) ((y) ^ ((x) | ~(z)))
#define SET(a, b, c, d, k, s, Ti) t = a + I(b,c,d) + X[k] + Ti; a = ROTATE_LEFT(t, s) + b
// Do the following 16 operations.
SET(a, b, c, d, 0, 6, T49);
SET(d, a, b, c, 7, 10, T50);
SET(c, d, a, b, 14, 15, T51);
SET(b, c, d, a, 5, 21, T52);
SET(a, b, c, d, 12, 6, T53);
SET(d, a, b, c, 3, 10, T54);
SET(c, d, a, b, 10, 15, T55);
SET(b, c, d, a, 1, 21, T56);
SET(a, b, c, d, 8, 6, T57);
SET(d, a, b, c, 15, 10, T58);
SET(c, d, a, b, 6, 15, T59);
SET(b, c, d, a, 13, 21, T60);
SET(a, b, c, d, 4, 6, T61);
SET(d, a, b, c, 11, 10, T62);
SET(c, d, a, b, 2, 15, T63);
SET(b, c, d, a, 9, 21, T64);
#undef SET
/* Then perform the following additions. (That is increment each
of the four registers by the value it had before this block
was started.) */
pms->abcd[0] += a;
pms->abcd[1] += b;
pms->abcd[2] += c;
pms->abcd[3] += d;
// Then perform the following additions. (That is increment each of the four
// registers by the value it had before this block was started.)
pms->abcd[0] += a;
pms->abcd[1] += b;
pms->abcd[2] += c;
pms->abcd[3] += d;
}
//
// '_pdfioCryptoMD5Init()' - Initialize an MD5 hash.
//
void
_pdfioCryptoMD5Init(_pdfio_md5_t *pms)
_pdfioCryptoMD5Init(_pdfio_md5_t *pms) // I - MD5 state
{
pms->count[0] = pms->count[1] = 0;
pms->abcd[0] = 0x67452301;
pms->abcd[1] = 0xefcdab89;
pms->abcd[2] = 0x98badcfe;
pms->abcd[3] = 0x10325476;
pms->abcd[0] = 0x67452301;
pms->abcd[1] = 0xefcdab89;
pms->abcd[2] = 0x98badcfe;
pms->abcd[3] = 0x10325476;
}
//
// '_pdfioCryptoMD5Append()' - Append bytes to the MD5 hash.
//
void
_pdfioCryptoMD5Append(_pdfio_md5_t *pms, const uint8_t *data, size_t nbytes)
_pdfioCryptoMD5Append(
_pdfio_md5_t *pms, // I - MD5 state
const uint8_t *data, // I - Data to add
size_t nbytes) // I - Number of bytes
{
const uint8_t *p = data;
size_t left = nbytes;
size_t offset = (pms->count[0] >> 3) & 63;
uint32_t nbits = (uint32_t)(nbytes << 3);
const uint8_t *p = data; // Pointer into data
size_t left = nbytes; // Remaining bytes
size_t offset = (pms->count[0] >> 3) & 63;
// Offset into state
uint32_t nbits = (uint32_t)(nbytes << 3);
// Number of bits to add
if (nbytes == 0)
return;
/* Update the message length. */
pms->count[1] += (unsigned)(nbytes >> 29);
pms->count[0] += nbits;
if (pms->count[0] < nbits)
pms->count[1]++;
if (nbytes == 0)
return;
/* Process an initial partial block. */
if (offset) {
size_t copy = (offset + nbytes > 64 ? 64 - offset : nbytes);
// Update the message length.
pms->count[1] += (unsigned)(nbytes >> 29);
pms->count[0] += nbits;
if (pms->count[0] < nbits)
pms->count[1] ++;
memcpy(pms->buf + offset, p, copy);
if (offset + copy < 64)
return;
p += copy;
left -= copy;
md5_process(pms, pms->buf);
}
// Process an initial partial block.
if (offset)
{
size_t copy = ((offset + nbytes) > 64 ? 64 - offset : nbytes);
// Number of bytes to copy
/* Process full blocks. */
for (; left >= 64; p += 64, left -= 64)
md5_process(pms, p);
memcpy(pms->buf + offset, p, copy);
/* Process a final partial block. */
if (left)
memcpy(pms->buf, p, left);
if ((offset + copy) < 64)
return;
p += copy;
left -= copy;
md5_process(pms, pms->buf);
}
// Process full blocks.
for (; left >= 64; p += 64, left -= 64)
md5_process(pms, p);
// Copy a final partial block.
if (left)
memcpy(pms->buf, p, left);
}
//
// '_pdfioCryptoMD5Finish()' - Finalize the MD5 hash.
//
void
_pdfioCryptoMD5Finish(_pdfio_md5_t *pms, uint8_t digest[16])
_pdfioCryptoMD5Finish(
_pdfio_md5_t *pms, // I - MD5 state
uint8_t digest[16]) // O - Digest value
{
static const uint8_t pad[64] = {
0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
uint8_t data[8];
int i;
int i; // Looping var
uint8_t data[8]; // Digest length data
static const uint8_t pad[64] = // Padding bytes
{
0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
};
/* Save the length before padding. */
for (i = 0; i < 8; ++i)
data[i] = (uint8_t)(pms->count[i >> 2] >> ((i & 3) << 3));
/* Pad to 56 bytes mod 64. */
_pdfioCryptoMD5Append(pms, pad, ((55 - (pms->count[0] >> 3)) & 63) + 1);
/* Append the length. */
_pdfioCryptoMD5Append(pms, data, 8);
for (i = 0; i < 16; ++i)
digest[i] = (uint8_t)(pms->abcd[i >> 2] >> ((i & 3) << 3));
// Save the length before padding.
for (i = 0; i < 8; ++i)
data[i] = (uint8_t)(pms->count[i >> 2] >> ((i & 3) << 3));
// Pad to 56 bytes mod 64.
_pdfioCryptoMD5Append(pms, pad, ((55 - (pms->count[0] >> 3)) & 63) + 1);
// Append the length.
_pdfioCryptoMD5Append(pms, data, 8);
// Copy the digest from the state...
for (i = 0; i < 16; ++i)
digest[i] = (uint8_t)(pms->abcd[i >> 2] >> ((i & 3) << 3));
}

View File

@ -1,7 +1,7 @@
//
// PDF object functions for PDFio.
//
// Copyright © 2021-2024 by Michael R Sweet.
// Copyright © 2021-2025 by Michael R Sweet.
//
// Licensed under Apache License v2.0. See the file "LICENSE" for more
// information.
@ -10,13 +10,6 @@
#include "pdfio-private.h"
//
// Local functions...
//
static bool write_obj_header(pdfio_obj_t *obj);
//
// 'pdfioObjClose()' - Close an object, writing any data as needed to the PDF
// file.
@ -42,7 +35,7 @@ pdfioObjClose(pdfio_obj_t *obj) // I - Object
if (!obj->offset)
{
// Write the object value
if (!write_obj_header(obj))
if (!_pdfioObjWriteHeader(obj))
return (false);
// Write the "endobj" line...
@ -195,7 +188,7 @@ pdfioObjCreateStream(
}
}
if (!write_obj_header(obj))
if (!_pdfioObjWriteHeader(obj))
return (NULL);
if (!_pdfioFilePuts(obj->pdf, "stream\n"))
@ -205,7 +198,7 @@ pdfioObjCreateStream(
obj->pdf->current_obj = obj;
// Return the new stream...
return (_pdfioStreamCreate(obj, length_obj, filter));
return (_pdfioStreamCreate(obj, length_obj, 0, filter));
}
@ -582,11 +575,11 @@ _pdfioObjSetExtension(
//
// 'write_obj_header()' - Write the object header...
// '_pdfioObjWriteHeader()' - Write the object header...
//
static bool // O - `true` on success, `false` on failure
write_obj_header(pdfio_obj_t *obj) // I - Object
bool // O - `true` on success, `false` on failure
_pdfioObjWriteHeader(pdfio_obj_t *obj) // I - Object
{
obj->offset = _pdfioFileTell(obj->pdf);

View File

@ -1,7 +1,7 @@
//
// Private header file for PDFio.
//
// Copyright © 2021-2024 by Michael R Sweet.
// Copyright © 2021-2025 by Michael R Sweet.
//
// Licensed under Apache License v2.0. See the file "LICENSE" for more
// information.
@ -107,7 +107,7 @@ typedef enum _pdfio_mode_e // Read/write mode
typedef enum _pdfio_predictor_e // PNG predictor constants
{
_PDFIO_PREDICTOR_NONE = 1, // No predictor (default)
_PDFIO_PREDICTOR_TIFF2 = 2, // TIFF2 predictor (???)
_PDFIO_PREDICTOR_TIFF2 = 2, // TIFF predictor 2 (difference from left neighbor)
_PDFIO_PREDICTOR_PNG_NONE = 10, // PNG None predictor (same as `_PDFIO_PREDICTOR_NONE`)
_PDFIO_PREDICTOR_PNG_SUB = 11, // PNG Sub predictor
_PDFIO_PREDICTOR_PNG_UP = 12, // PNG Up predictor
@ -313,8 +313,9 @@ struct _pdfio_stream_s // Stream
z_stream flate; // Flate filter state
_pdfio_predictor_t predictor; // Predictor function, if any
size_t pbpixel, // Size of a pixel in bytes
pbsize; // Predictor buffer size, if any
unsigned char cbuffer[4096], // Compressed data buffer
pbsize, // Predictor buffer size, if any
cbsize; // Compressed data buffer size
unsigned char *cbuffer, // Compressed data buffer
*prbuffer, // Raw buffer (previous line), as needed
*psbuffer; // PNG filter buffer, as needed
_pdfio_crypto_cb_t crypto_cb; // Encryption/descryption callback, if any
@ -383,8 +384,9 @@ extern void _pdfioObjDelete(pdfio_obj_t *obj) _PDFIO_INTERNAL;
extern void *_pdfioObjGetExtension(pdfio_obj_t *obj) _PDFIO_INTERNAL;
extern bool _pdfioObjLoad(pdfio_obj_t *obj) _PDFIO_INTERNAL;
extern void _pdfioObjSetExtension(pdfio_obj_t *obj, void *data, _pdfio_extfree_t datafree) _PDFIO_INTERNAL;
extern bool _pdfioObjWriteHeader(pdfio_obj_t *obj) _PDFIO_INTERNAL;
extern pdfio_stream_t *_pdfioStreamCreate(pdfio_obj_t *obj, pdfio_obj_t *length_obj, pdfio_filter_t compression) _PDFIO_INTERNAL;
extern pdfio_stream_t *_pdfioStreamCreate(pdfio_obj_t *obj, pdfio_obj_t *length_obj, size_t cbsize, pdfio_filter_t compression) _PDFIO_INTERNAL;
extern pdfio_stream_t *_pdfioStreamOpen(pdfio_obj_t *obj, bool decode) _PDFIO_INTERNAL;
extern bool _pdfioStringIsAllocated(pdfio_file_t *pdf, const char *s) _PDFIO_INTERNAL;

View File

@ -1,7 +1,7 @@
//
// PDF stream functions for PDFio.
//
// Copyright © 2021-2024 by Michael R Sweet.
// Copyright © 2021-2025 by Michael R Sweet.
//
// Licensed under Apache License v2.0. See the file "LICENSE" for more
// information.
@ -50,7 +50,7 @@ pdfioStreamClose(pdfio_stream_t *st) // I - Stream
while ((status = deflate(&st->flate, Z_FINISH)) != Z_STREAM_END)
{
size_t bytes = sizeof(st->cbuffer) - st->flate.avail_out,
size_t bytes = st->cbsize - st->flate.avail_out,
// Bytes to write
outbytes; // Actual bytes written
@ -89,13 +89,13 @@ pdfioStreamClose(pdfio_stream_t *st) // I - Stream
}
st->flate.next_out = (Bytef *)st->cbuffer + bytes;
st->flate.avail_out = (uInt)(sizeof(st->cbuffer) - bytes);
st->flate.avail_out = (uInt)(st->cbsize - bytes);
}
if (st->flate.avail_out < (uInt)sizeof(st->cbuffer))
if (st->flate.avail_out < (uInt)st->cbsize)
{
// Write any residuals...
size_t bytes = sizeof(st->cbuffer) - st->flate.avail_out;
size_t bytes = st->cbsize - st->flate.avail_out;
// Bytes to write
if (st->crypto_cb)
@ -140,7 +140,7 @@ pdfioStreamClose(pdfio_stream_t *st) // I - Stream
// Update the length as needed...
if (st->length_obj)
{
st->length_obj->value.value.number = st->obj->stream_length;
st->length_obj->value.value.number = (double)st->obj->stream_length;
pdfioObjClose(st->length_obj);
}
else if (st->obj->length_offset)
@ -172,6 +172,7 @@ pdfioStreamClose(pdfio_stream_t *st) // I - Stream
st->pdf->current_obj = NULL;
free(st->cbuffer);
free(st->prbuffer);
free(st->psbuffer);
free(st);
@ -190,6 +191,7 @@ pdfio_stream_t * // O - Stream or `NULL` on error
_pdfioStreamCreate(
pdfio_obj_t *obj, // I - Object
pdfio_obj_t *length_obj, // I - Length object, if any
size_t cbsize, // I - Size of compression buffer
pdfio_filter_t compression) // I - Compression to apply
{
pdfio_stream_t *st; // Stream
@ -302,8 +304,21 @@ _pdfioStreamCreate(
else
st->predictor = _PDFIO_PREDICTOR_NONE;
if (cbsize == 0)
cbsize = 4096;
st->cbsize = cbsize;
if ((st->cbuffer = malloc(cbsize)) == NULL)
{
_pdfioFileError(st->pdf, "Unable to allocate %lu bytes for Flate output buffer: %s", (unsigned long)cbsize, strerror(errno));
free(st->prbuffer);
free(st->psbuffer);
free(st);
return (NULL);
}
st->flate.next_out = (Bytef *)st->cbuffer;
st->flate.avail_out = (uInt)sizeof(st->cbuffer);
st->flate.avail_out = (uInt)cbsize;
if ((status = deflateInit(&(st->flate), 9)) != Z_OK)
{
@ -362,15 +377,16 @@ pdfioStreamConsume(pdfio_stream_t *st, // I - Stream
//
// 'pdfioStreamGetToken()' - Read a single PDF token from a stream.
//
// This function reads a single PDF token from a stream. Operator tokens,
// boolean values, and numbers are returned as-is in the provided string buffer.
// String values start with the opening parenthesis ('(') but have all escaping
// resolved and the terminating parenthesis removed. Hexadecimal string values
// start with the opening angle bracket ('<') and have all whitespace and the
// terminating angle bracket removed.
// This function reads a single PDF token from a stream, skipping all whitespace
// and comments. Operator tokens, boolean values, and numbers are returned
// as-is in the provided string buffer. String values start with the opening
// parenthesis ('(') but have all escaping resolved and the terminating
// parenthesis removed. Hexadecimal string values start with the opening angle
// bracket ('<') and have all whitespace and the terminating angle bracket
// removed.
//
bool // O - `true` on success, `false` on EOF
bool // O - `true` on success, `false` on end-of-stream or error
pdfioStreamGetToken(
pdfio_stream_t *st, // I - Stream
char *buffer, // I - String buffer
@ -425,14 +441,14 @@ _pdfioStreamOpen(pdfio_obj_t *obj, // I - Object
if ((st->remaining = pdfioObjGetLength(obj)) == 0)
{
free(st);
return (NULL);
_pdfioFileError(obj->pdf, "No stream data.");
goto error;
}
if (_pdfioFileSeek(st->pdf, obj->stream_offset, SEEK_SET) != obj->stream_offset)
{
free(st);
return (NULL);
_pdfioFileError(obj->pdf, "Unable to seek to stream data.");
goto error;
}
type = pdfioObjGetType(obj);
@ -445,11 +461,7 @@ _pdfioStreamOpen(pdfio_obj_t *obj, // I - Object
ivlen = (size_t)_pdfioFilePeek(st->pdf, iv, sizeof(iv));
if ((st->crypto_cb = _pdfioCryptoMakeReader(st->pdf, obj, &st->crypto_ctx, iv, &ivlen)) == NULL)
{
// TODO: Add error message?
free(st);
return (NULL);
}
goto error;
PDFIO_DEBUG("_pdfioStreamOpen: ivlen=%d\n", (int)ivlen);
if (ivlen > 0)
@ -480,8 +492,7 @@ _pdfioStreamOpen(pdfio_obj_t *obj, // I - Object
{
// TODO: Implement compound filters...
_pdfioFileError(st->pdf, "Unsupported compound stream filter.");
free(st);
return (NULL);
goto error;
}
// No filter, read as-is...
@ -514,8 +525,7 @@ _pdfioStreamOpen(pdfio_obj_t *obj, // I - Object
else if (bpc < 1 || bpc == 3 || (bpc > 4 && bpc < 8) || (bpc > 8 && bpc < 16) || bpc > 16)
{
_pdfioFileError(st->pdf, "Unsupported BitsPerColor value %d.", bpc);
free(st);
return (NULL);
goto error;
}
if (colors == 0)
@ -525,8 +535,7 @@ _pdfioStreamOpen(pdfio_obj_t *obj, // I - Object
else if (colors < 0 || colors > 4)
{
_pdfioFileError(st->pdf, "Unsupported Colors value %d.", colors);
free(st);
return (NULL);
goto error;
}
if (columns == 0)
@ -536,15 +545,13 @@ _pdfioStreamOpen(pdfio_obj_t *obj, // I - Object
else if (columns < 0)
{
_pdfioFileError(st->pdf, "Unsupported Columns value %d.", columns);
free(st);
return (NULL);
goto error;
}
if ((predictor > 2 && predictor < 10) || predictor > 15)
{
_pdfioFileError(st->pdf, "Unsupported Predictor function %d.", predictor);
free(st);
return (NULL);
goto error;
}
else if (predictor > 1)
{
@ -558,28 +565,31 @@ _pdfioStreamOpen(pdfio_obj_t *obj, // I - Object
if ((st->prbuffer = calloc(1, st->pbsize - 1)) == NULL || (st->psbuffer = calloc(1, st->pbsize)) == NULL)
{
_pdfioFileError(st->pdf, "Unable to allocate %lu bytes for Predictor buffers.", (unsigned long)st->pbsize);
free(st->prbuffer);
free(st->psbuffer);
free(st);
return (NULL);
goto error;
}
}
else
{
st->predictor = _PDFIO_PREDICTOR_NONE;
}
st->cbsize = 4096;
if ((st->cbuffer = malloc(st->cbsize)) == NULL)
{
_pdfioFileError(st->pdf, "Unable to allocate %lu bytes for Flate compression buffer.", (unsigned long)st->cbsize);
goto error;
}
PDFIO_DEBUG("_pdfioStreamOpen: pos=%ld\n", (long)_pdfioFileTell(st->pdf));
if (sizeof(st->cbuffer) > st->remaining)
if (st->cbsize > st->remaining)
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, st->remaining);
else
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, sizeof(st->cbuffer));
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, st->cbsize);
if (rbytes <= 0)
{
_pdfioFileError(st->pdf, "Unable to read bytes for stream.");
free(st->prbuffer);
free(st->psbuffer);
free(st);
return (NULL);
goto error;
}
if (st->crypto_cb)
@ -593,10 +603,7 @@ _pdfioStreamOpen(pdfio_obj_t *obj, // I - Object
if ((status = inflateInit(&(st->flate))) != Z_OK)
{
_pdfioFileError(st->pdf, "Unable to start Flate filter: %s", zstrerror(status));
free(st->prbuffer);
free(st->psbuffer);
free(st);
return (NULL);
goto error;
}
st->remaining -= st->flate.avail_in;
@ -610,8 +617,7 @@ _pdfioStreamOpen(pdfio_obj_t *obj, // I - Object
{
// Something else we don't support
_pdfioFileError(st->pdf, "Unsupported stream filter '/%s'.", filter);
free(st);
return (NULL);
goto error;
}
}
else
@ -621,6 +627,16 @@ _pdfioStreamOpen(pdfio_obj_t *obj, // I - Object
}
return (st);
// If we get here something went wrong...
error:
free(st->cbuffer);
free(st->prbuffer);
free(st->psbuffer);
free(st);
return (NULL);
}
@ -1045,10 +1061,10 @@ stream_read(pdfio_stream_t *st, // I - Stream
if (st->flate.avail_in == 0)
{
// Read more from the file...
if (sizeof(st->cbuffer) > st->remaining)
if (st->cbsize > st->remaining)
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, st->remaining);
else
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, sizeof(st->cbuffer));
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, st->cbsize);
if (rbytes <= 0)
return (-1); // End of file...
@ -1101,10 +1117,10 @@ stream_read(pdfio_stream_t *st, // I - Stream
if (st->flate.avail_in == 0)
{
// Read more from the file...
if (sizeof(st->cbuffer) > st->remaining)
if (st->cbsize > st->remaining)
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, st->remaining);
else
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, sizeof(st->cbuffer));
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, st->cbsize);
if (rbytes <= 0)
return (-1); // End of file...
@ -1171,10 +1187,10 @@ stream_read(pdfio_stream_t *st, // I - Stream
if (st->flate.avail_in == 0)
{
// Read more from the file...
if (sizeof(st->cbuffer) > st->remaining)
if (st->cbsize > st->remaining)
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, st->remaining);
else
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, sizeof(st->cbuffer));
rbytes = _pdfioFileRead(st->pdf, st->cbuffer, st->cbsize);
if (rbytes <= 0)
return (-1); // End of file...
@ -1278,10 +1294,10 @@ stream_write(pdfio_stream_t *st, // I - Stream
while (st->flate.avail_in > 0)
{
if (st->flate.avail_out < (sizeof(st->cbuffer) / 8))
if (st->flate.avail_out < (st->cbsize / 8))
{
// Flush the compression buffer...
size_t cbytes = sizeof(st->cbuffer) - st->flate.avail_out,
size_t cbytes = st->cbsize - st->flate.avail_out,
outbytes;
if (st->crypto_cb)
@ -1310,7 +1326,7 @@ stream_write(pdfio_stream_t *st, // I - Stream
}
st->flate.next_out = (Bytef *)st->cbuffer + cbytes;
st->flate.avail_out = (uInt)(sizeof(st->cbuffer) - cbytes);
st->flate.avail_out = (uInt)(st->cbsize - cbytes);
}
// Deflate what we can this time...

View File

@ -20,10 +20,12 @@ extern "C" {
//
// Version number...
// Version numbers...
//
# define PDFIO_VERSION "1.4.1"
# define PDFIO_VERSION "1.5.0"
# define PDFIO_VERSION_MAJOR 1
# define PDFIO_VERSION_MINOR 5
//
@ -201,6 +203,7 @@ extern time_t pdfioFileGetCreationDate(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern const char *pdfioFileGetCreator(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern pdfio_array_t *pdfioFileGetID(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern const char *pdfioFileGetKeywords(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern time_t pdfioFileGetModificationDate(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern const char *pdfioFileGetName(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern size_t pdfioFileGetNumObjs(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern size_t pdfioFileGetNumPages(pdfio_file_t *pdf) _PDFIO_PUBLIC;
@ -216,6 +219,7 @@ extern void pdfioFileSetAuthor(pdfio_file_t *pdf, const char *value) _PDFIO_PUB
extern void pdfioFileSetCreationDate(pdfio_file_t *pdf, time_t value) _PDFIO_PUBLIC;
extern void pdfioFileSetCreator(pdfio_file_t *pdf, const char *value) _PDFIO_PUBLIC;
extern void pdfioFileSetKeywords(pdfio_file_t *pdf, const char *value) _PDFIO_PUBLIC;
extern void pdfioFileSetModificationDate(pdfio_file_t *pdf, time_t value) _PDFIO_PUBLIC;
extern bool pdfioFileSetPermissions(pdfio_file_t *pdf, pdfio_permission_t permissions, pdfio_encryption_t encryption, const char *owner_password, const char *user_password) _PDFIO_PUBLIC;
extern void pdfioFileSetSubject(pdfio_file_t *pdf, const char *value) _PDFIO_PUBLIC;
extern void pdfioFileSetTitle(pdfio_file_t *pdf, const char *value) _PDFIO_PUBLIC;

View File

@ -7,7 +7,7 @@ Name: pdfio
Description: PDF read/write library
Version: @PDFIO_VERSION@
URL: https://www.msweet.org/pdfio
Requires: @PKGCONFIG_REQUIRES@
Cflags: @PKGCONFIG_CFLAGS@
Libs: @PKGCONFIG_LIBS@
Libs.private: @PKGCONFIG_LIBS_PRIVATE@
Cflags: @PKGCONFIG_CFLAGS@
Requires: @PKGCONFIG_REQUIRES@

View File

@ -115,7 +115,7 @@
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<SDLCheck>true</SDLCheck>
<PreprocessorDefinitions>_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<PreprocessorDefinitions>HAVE_LIBPNG;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
</ClCompile>
<Link>
@ -130,7 +130,7 @@
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<SDLCheck>true</SDLCheck>
<PreprocessorDefinitions>NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<PreprocessorDefinitions>HAVE_LIBPNG;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
</ClCompile>
<Link>
@ -172,6 +172,8 @@
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
<Import Project="packages\libpng_native.redist.1.6.30\build\native\libpng_native.redist.targets" Condition="Exists('packages\libpng_native.redist.1.6.30\build\native\libpng_native.redist.targets')" />
<Import Project="packages\libpng_native.1.6.30\build\native\libpng_native.targets" Condition="Exists('packages\libpng_native.1.6.30\build\native\libpng_native.targets')" />
<Import Project="packages\zlib_native.redist.1.2.11\build\native\zlib_native.redist.targets" Condition="Exists('packages\zlib_native.redist.1.2.11\build\native\zlib_native.redist.targets')" />
<Import Project="packages\zlib_native.1.2.11\build\native\zlib_native.targets" Condition="Exists('packages\zlib_native.1.2.11\build\native\zlib_native.targets')" />
</ImportGroup>

View File

@ -1,73 +1,20 @@
LIBRARY pdfio1
VERSION 1.4
VERSION 1.5
EXPORTS
_pdfioArrayDebug
_pdfioArrayDecrypt
_pdfioArrayDelete
_pdfioArrayGetValue
_pdfioArrayRead
_pdfioArrayWrite
_pdfioCryptoAESDecrypt
_pdfioCryptoAESEncrypt
_pdfioCryptoAESInit
_pdfioCryptoLock
_pdfioCryptoMD5Append
_pdfioCryptoMD5Finish
_pdfioCryptoMD5Init
_pdfioCryptoMakeRandom
_pdfioCryptoMakeReader
_pdfioCryptoMakeWriter
_pdfioCryptoRC4Crypt
_pdfioCryptoRC4Init
_pdfioCryptoSHA256Append
_pdfioCryptoSHA256Finish
_pdfioCryptoSHA256Init
_pdfioCryptoUnlock
_pdfioDictDebug
_pdfioDictDecrypt
_pdfioDictDelete
_pdfioDictGetValue
_pdfioDictRead
_pdfioDictSetValue
_pdfioDictWrite
_pdfioFileAddMappedObj
_pdfioFileAddPage
_pdfioFileConsume
_pdfioFileCreateObj
_pdfioFileDefaultError
_pdfioFileError
_pdfioFileFindMappedObj
_pdfioFileFlush
_pdfioFileGetChar
_pdfioFileGets
_pdfioFilePeek
_pdfioFilePrintf
_pdfioFilePuts
_pdfioFileRead
_pdfioFileSeek
_pdfioFileTell
_pdfioFileWrite
_pdfioObjDelete
_pdfioObjGetExtension
_pdfioObjLoad
_pdfioObjSetExtension
_pdfioStreamCreate
_pdfioStreamOpen
_pdfioStringIsAllocated
_pdfioTokenClear
_pdfioTokenFlush
_pdfioTokenGet
_pdfioTokenInit
_pdfioTokenPush
_pdfioTokenRead
_pdfioValueCopy
_pdfioValueDebug
_pdfioValueDecrypt
_pdfioValueDelete
_pdfioValueRead
_pdfioValueWrite
_pdfio_strtod
_pdfio_vsnprintf
pdfioArrayAppendArray
pdfioArrayAppendBinary
pdfioArrayAppendBoolean
@ -187,6 +134,7 @@ pdfioFileCreate
pdfioFileCreateArrayObj
pdfioFileCreateFontObjFromBase
pdfioFileCreateFontObjFromFile
pdfioFileCreateICCObjFromData
pdfioFileCreateICCObjFromFile
pdfioFileCreateImageObjFromData
pdfioFileCreateImageObjFromFile
@ -204,6 +152,7 @@ pdfioFileGetCreationDate
pdfioFileGetCreator
pdfioFileGetID
pdfioFileGetKeywords
pdfioFileGetModificationDate
pdfioFileGetName
pdfioFileGetNumObjs
pdfioFileGetNumPages
@ -219,6 +168,7 @@ pdfioFileSetAuthor
pdfioFileSetCreationDate
pdfioFileSetCreator
pdfioFileSetKeywords
pdfioFileSetModificationDate
pdfioFileSetPermissions
pdfioFileSetSubject
pdfioFileSetTitle

View File

@ -3,7 +3,7 @@
<metadata>
<id>pdfio_native</id>
<title>PDFio Library for VS2019+</title>
<version>1.4.1</version>
<version>1.5.0</version>
<authors>Michael R Sweet</authors>
<owners>michaelrsweet</owners>
<projectUrl>https://github.com/michaelrsweet/pappl</projectUrl>
@ -16,7 +16,8 @@
<copyright>Copyright © 2019-2025 by Michael R Sweet</copyright>
<tags>pdf file native</tags>
<dependencies>
<dependency id="pdfio_native.redist" version="1.4.1" />
<dependency id="pdfio_native.redist" version="1.5.0" />
<dependency id="libpng_native.redist" version="1.6.30" />
<dependency id="zlib_native.redist" version="1.2.11" />
</dependencies>
</metadata>

View File

@ -3,7 +3,7 @@
<metadata>
<id>pdfio_native.redist</id>
<title>PDFio Library for VS2019+</title>
<version>1.4.1</version>
<version>1.5.0</version>
<authors>Michael R Sweet</authors>
<owners>michaelrsweet</owners>
<projectUrl>https://github.com/michaelrsweet/pappl</projectUrl>
@ -16,6 +16,7 @@
<copyright>Copyright © 2019-2025 by Michael R Sweet</copyright>
<tags>pdf file native</tags>
<dependencies>
<dependency id="libpng_native.redist" version="1.6.30" />
<dependency id="zlib_native.redist" version="1.2.11" />
</dependencies>
</metadata>

View File

@ -7,6 +7,8 @@
:: Copy dependent DLLs to the named build directory
echo Copying DLLs
copy packages\libpng_native.redist.1.6.30\build\native\bin\x64\Debug\*.dll %1
copy packages\libpng_native.redist.1.6.30\build\native\bin\x64\Release\*.dll %1
copy packages\zlib_native.redist.1.2.11\build\native\bin\x64\Debug\*.dll %1
copy packages\zlib_native.redist.1.2.11\build\native\bin\x64\Release\*.dll %1

View File

@ -0,0 +1,9 @@
PngSuite
--------
Permission to use, copy, modify and distribute these images for any
purpose and without fee is hereby granted.
(c) Willem van Schaik, 1996, 2011

Binary file not shown.

After

Width:  |  Height:  |  Size: 217 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 154 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 247 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 254 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 315 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 132 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 193 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 327 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 214 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 361 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 164 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 145 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 145 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 146 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 216 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 184 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 370 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 214 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 377 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 219 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 350 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 206 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 340 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 207 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 285 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 214 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 405 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 215 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 115 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 120 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 129 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 143 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 131 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 149 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 149 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 139 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 147 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 143 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 355 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 263 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 385 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 329 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 349 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 248 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 399 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 338 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 356 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 258 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 393 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 336 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 357 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 245 B

Some files were not shown because too many files have changed in this diff Show More