Compare commits

...

10 Commits

Author SHA1 Message Date
Thierry LARONDE
269160d745
Merge 8b2b013b36ff05b3f98afed416fe3e87023b121a into 48fe8d1bc9d56189001600065cad84cc05851886 2025-01-25 12:41:16 +04:00
Michael R Sweet
48fe8d1bc9
Bump version. 2025-01-24 15:31:31 -05:00
Michael R Sweet
a4026bfe00
Prep for release. 2025-01-24 15:30:59 -05:00
Michael R Sweet
1e945cb750
Add LICENSE files to example install list. 2025-01-24 14:44:44 -05:00
Michael R Sweet
4cb4ceaadd
Update docos with fixed codedoc. 2025-01-24 14:42:41 -05:00
Michael R Sweet
cca7383c73
Fix support for UTF-16 string values in dictionaries (Issue #92)
Specifically to support Unicode Title and Author values.
2025-01-24 10:43:41 -05:00
Michael R Sweet
6c68b9fa5a
Add URLs and copyrights for Code 128 font and ProPhoto ICC profile (Issue #91) 2025-01-24 09:56:51 -05:00
Michael R Sweet
dd7ed67ec1
Update makesrcdist to validate CHANGES.md. 2025-01-23 15:34:43 -05:00
Michael R Sweet
9e2f3aba10
Fix reading of compressed object streams (Issue #92) 2025-01-23 15:27:22 -05:00
Thierry LARONDE
8b2b013b36 Extend by adding pdfioGetModDate and extend the pdfioinfo example
When exploring a PDF, it may be convenient to have the typical
informations delivered by some "Document Properties"---and some more
about the MediaBox(es).

So just add the function to get the ModDate and extend the
pdfioinfo example as an example of what the library do have
and pdfioinfo as a debugging tool also.

Signed-off-by: Thierry LARONDE <tlaronde@kergis.com>
2025-01-18 11:25:36 +01:00
13 changed files with 274 additions and 32 deletions

View File

@ -1,8 +1,7 @@
Changes in PDFio
================
v1.4.1 - YYYY-MM-DD
v1.4.1 - 2025-01-24
-------------------
- Added license files for the example fonts now bundled with PDFio (Issue #91)
@ -10,6 +9,8 @@ v1.4.1 - YYYY-MM-DD
- Fixed handling of the Info object (Issue #87)
- Fixed opening of PDF files less than 1024 bytes in length (Issue #87)
- Fixed potential `NULL` dereference when reading (Issue #89)
- Fixed reading of compressed object streams (Issue #92)
- Fixed reading of UTF-16 string values (Issue #92)
v1.4.0 - 2024-12-26

View File

@ -117,12 +117,14 @@ DOCFILES = \
NOTICE
EXAMPLES = \
examples/Makefile \
examples/Roboto-LICENSE.txt \
examples/Roboto-Bold.ttf \
examples/Roboto-Italic.ttf \
examples/Roboto-Regular.ttf \
examples/RobotoMono-Regular.ttf \
examples/code128.c \
examples/code128.ttf \
examples/code128-LICENSE.txt \
examples/image2pdf.c \
examples/md2pdf.c \
examples/md2pdf.md \

View File

@ -1,4 +1,4 @@
.TH pdfio 3 "pdf read/write library" "2025-01-17" "pdf read/write library"
.TH pdfio 3 "pdf read/write library" "2025-01-24" "pdf read/write library"
.SH NAME
pdfio \- pdf read/write library
.SH Introduction
@ -2074,7 +2074,7 @@ The render_line function adds content from the linefrag_t array to a PDF page. I
}
.fi
.PP
We then loops through the fragments for the current line, drawing checkboxes, images, and text as needed. Whan a hyperlink is present, we add the link to the links array in the docdata_t structure, mapping "@" and "@@" to an internal link corresponding to the linked text:
We then loop through the fragments for the current line, drawing checkboxes, images, and text as needed. When a hyperlink is present, we add the link to the links array in the docdata_t structure, mapping "@" and "@@" to an internal link corresponding to the linked text:
.nf
if (frag\->url && dd\->num_links < DOCLINK_MAX)
@ -4567,4 +4567,4 @@ typedef enum pdfio_valtype_e pdfio_valtype_t;
Michael R Sweet
.SH COPYRIGHT
.PP
Copyright (c) 2021-2024 by Michael R Sweet
Copyright (c) 2021-2025 by Michael R Sweet

View File

@ -1,13 +1,13 @@
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>PDFio Programming Manual v1.4.0</title>
<title>PDFio Programming Manual v1.4.1</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<meta name="generator" content="codedoc v3.8">
<meta name="author" content="Michael R Sweet">
<meta name="language" content="en-US">
<meta name="copyright" content="Copyright © 2021-2024 by Michael R Sweet">
<meta name="version" content="1.4.0">
<meta name="copyright" content="Copyright © 2021-2025 by Michael R Sweet">
<meta name="version" content="1.4.1">
<style type="text/css"><!--
body {
background: white;
@ -251,9 +251,9 @@ span.string {
<body>
<div class="header">
<p><img class="title" src="pdfio-512.png"></p>
<h1 class="title">PDFio Programming Manual v1.4.0</h1>
<h1 class="title">PDFio Programming Manual v1.4.1</h1>
<p>Michael R Sweet</p>
<p>Copyright © 2021-2024 by Michael R Sweet</p>
<p>Copyright © 2021-2025 by Michael R Sweet</p>
</div>
<div class="contents">
<h2 class="title">Contents</h2>
@ -2034,7 +2034,7 @@ dd-&gt;y -= margin_top + lineheight;
dd-&gt;y -= lineheight;
}
</code></pre>
<p>We then loops through the fragments for the current line, drawing checkboxes, images, and text as needed. WhÑn a hyperlink is present, we add the link to the <code>links</code> array in the <code>docdata_t</code> structure, mapping &quot;@&quot; and &quot;@@&quot; to an internal link corresponding to the linked text:</p>
<p>We then loop through the fragments for the current line, drawing checkboxes, images, and text as needed. When a hyperlink is present, we add the link to the <code>links</code> array in the <code>docdata_t</code> structure, mapping &quot;@&quot; and &quot;@@&quot; to an internal link corresponding to the linked text:</p>
<pre><code class="language-c"><span class="reserved">if</span> (frag-&gt;url &amp;&amp; dd-&gt;num_links &lt; DOCLINK_MAX)
{
doclink_t *l = dd-&gt;links + dd-&gt;num_links;

View File

@ -2005,7 +2005,7 @@ if ((dd->y - need_bottom) < dd->art_box.y1)
}
```
We then loops through the fragments for the current line, drawing checkboxes,
We then loop through the fragments for the current line, drawing checkboxes,
images, and text as needed. When a hyperlink is present, we add the link to the
`links` array in the `docdata_t` structure, mapping "@" and "@@" to an internal
link corresponding to the linked text:

View File

@ -1,3 +1,7 @@
Copyright 2003 Grandzebu, All Rights Reserved
http://grandzebu.net/informatique/codbar-en/code128.htm
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991

View File

@ -25,11 +25,18 @@ main(int argc, // I - Number of command-line arguments
{
const char *filename; // PDF filename
pdfio_file_t *pdf; // PDF file
const char *author; // Author name
time_t creation_date; // Creation date
struct tm *creation_tm; // Creation date/time information
char creation_text[256]; // Creation date/time as a string
const char *author, // Author name
*creator, // Creator name
*producer; // Producer name
time_t creation_date, // Creation date
mod_date; // Modification date
struct tm *creation_tm, // Creation date/time information
*mod_tm; // Mod. date/time information
char creation_text[256], // Creation date/time as a string
mod_text[256]; // Mod. date/time human fmt string
const char *title; // Title
size_t num_pages; // PDF number of pages
bool has_acroform; // AcroForm or not
// Get the filename from the command-line...
@ -48,9 +55,12 @@ main(int argc, // I - Number of command-line arguments
if (pdf == NULL)
return (1);
// Get the title and author...
// Get the title, author...
author = pdfioFileGetAuthor(pdf);
title = pdfioFileGetTitle(pdf);
creator = pdfioFileGetCreator(pdf);
producer = pdfioFileGetProducer(pdf);
num_pages = pdfioFileGetNumPages(pdf);
// Get the creation date and convert to a string...
if ((creation_date = pdfioFileGetCreationDate(pdf)) > 0)
@ -63,12 +73,82 @@ main(int argc, // I - Number of command-line arguments
snprintf(creation_text, sizeof(creation_text), "-- not set --");
}
// Get the modification date and convert to a string...
if ((mod_date = pdfioFileGetModDate(pdf)) > 0)
{
mod_tm = localtime(&mod_date);
strftime(mod_text, sizeof(mod_text), "%c", mod_tm);
}
else
{
snprintf(mod_text, sizeof(mod_text), "-- not set --");
}
// Detect simply if AcroFrom is a dict in catalog
{
pdfio_dict_t *dict; // some Object dictionary
dict = pdfioFileGetCatalog(pdf);
has_acroform = (dict != NULL && pdfioDictGetObj(dict, "AcroForm") != NULL)?
true : false;
}
// Print file information to stdout...
printf("%s:\n", filename);
printf(" Title: %s\n", title ? title : "-- not set --");
printf(" Author: %s\n", author ? author : "-- not set --");
printf(" Creator: %s\n", creator ? creator : "-- not set --");
printf(" Producer: %s\n", producer ? producer : "-- not set --");
printf(" Created On: %s\n", creation_text);
printf(" Number Pages: %u\n", (unsigned)pdfioFileGetNumPages(pdf));
printf(" Modified On: %s\n", mod_text);
printf(" Version: %s\n", pdfioFileGetVersion(pdf));
printf(" AcroForm: %s\n", has_acroform ? "Yes" : "No");
printf(" Number Pages: %u\n", (unsigned)num_pages);
printf(" MediaBoxes:");
// There can be a different MediaBox per page
// Loop and report MediaBox and number of consecutive pages of this size
{
pdfio_obj_t *obj; // Object
pdfio_dict_t *dict; // Object dictionary
pdfio_rect_t prev, // MediaBox previous
now; // MediaBox now
size_t n, // Page index
nprev; // Number previous prev size
// MediaBox should be set at least on the root
for (n = nprev = 0; n < num_pages; n++)
{
obj = pdfioFileGetPage(pdf, n);
while (obj != NULL)
{
dict = pdfioObjGetDict(obj);
if (pdfioDictGetRect(dict, "MediaBox", &now))
{
if (
nprev == 0
|| (
now.x1 != prev.x1 || now.y1 != prev.y1
|| now.x2 != prev.x2 || now.y2 != prev.y2
)
)
{
if (nprev) printf("(%zd) ", nprev);
prev = now;
printf("[%.7g %.7g %.7g %.7g]", now.x1, now.y1, now.x2, now.y2);
nprev = 1;
}
else
++nprev;
obj = NULL;
}
else
obj = pdfioDictGetObj(dict, "Parent");
}
}
printf("(%zd)", nprev);
}
printf("\n");
// Close the PDF file...
pdfioFileClose(pdf);

View File

@ -29,6 +29,15 @@ if test $(grep AC_INIT configure.ac | awk '{print $2}') != "[$version],"; then
exit 1
fi
if test $(head -4 CHANGES.md | tail -1 | awk '{print $1}') != "v$version"; then
echo "Still need to update CHANGES.md version number."
exit 1
fi
if test $(head -4 CHANGES.md | tail -1 | awk '{print $3}') = "YYYY-MM-DD"; then
echo "Still need to update CHANGES.md release date."
exit 1
fi
if test $(grep PDFIO_VERSION= configure | awk -F \" '{print $2}') != "$version"; then
echo "Still need to run 'autoconf -f'."
exit 1

View File

@ -465,10 +465,134 @@ pdfioDictGetString(pdfio_dict_t *dict, // I - Dictionary
else if (value && value->type == PDFIO_VALTYPE_BINARY && value->value.binary.datalen < 4096)
{
// Convert binary string to regular string...
char temp[4096]; // Temporary string
char temp[4096], // Temporary string
*tempptr; // Pointer into temporary string
unsigned char *dataptr; // Pointer into the data string
memcpy(temp, value->value.binary.data, value->value.binary.datalen);
temp[value->value.binary.datalen] = '\0';
if (!(value->value.binary.datalen & 1) && !memcmp(value->value.binary.data, "\377\376", 2))
{
// Copy UTF-16 BE
int ch; // Unicode character
size_t remaining; // Remaining bytes
for (dataptr = value->value.binary.data + 2, remaining = value->value.binary.datalen - 2, tempptr = temp; remaining > 1 && tempptr < (temp + sizeof(temp) - 5); dataptr += 2, remaining -= 2)
{
ch = (dataptr[0] << 8) | dataptr[1];
if (ch >= 0xd800 && ch <= 0xdbff && remaining > 3)
{
// Multi-word UTF-16 char...
int lch; // Lower bits
lch = (dataptr[2] << 8) | dataptr[3];
if (lch < 0xdc00 || lch >= 0xdfff)
break;
ch = (((ch & 0x3ff) << 10) | (lch & 0x3ff)) + 0x10000;
dataptr += 2;
remaining -= 2;
}
else if (ch >= 0xfffe)
{
continue;
}
if (ch < 128)
{
// ASCII
*tempptr++ = (char)ch;
}
else if (ch < 4096)
{
// 2-byte UTF-8
*tempptr++ = (char)(0xc0 | (ch >> 6));
*tempptr++ = (char)(0x80 | (ch & 0x3f));
}
else if (ch < 65536)
{
// 3-byte UTF-8
*tempptr++ = (char)(0xe0 | (ch >> 12));
*tempptr++ = (char)(0x80 | ((ch >> 6) & 0x3f));
*tempptr++ = (char)(0x80 | (ch & 0x3f));
}
else
{
// 4-byte UTF-8
*tempptr++ = (char)(0xe0 | (ch >> 18));
*tempptr++ = (char)(0x80 | ((ch >> 12) & 0x3f));
*tempptr++ = (char)(0x80 | ((ch >> 6) & 0x3f));
*tempptr++ = (char)(0x80 | (ch & 0x3f));
}
}
*tempptr = '\0';
}
else if (!(value->value.binary.datalen & 1) && !memcmp(value->value.binary.data, "\376\377", 2))
{
// Copy UTF-16 LE
int ch; // Unicode character
size_t remaining; // Remaining bytes
for (dataptr = value->value.binary.data + 2, remaining = value->value.binary.datalen - 2, tempptr = temp; remaining > 1 && tempptr < (temp + sizeof(temp) - 5); dataptr += 2, remaining -= 2)
{
ch = (dataptr[1] << 8) | dataptr[0];
if (ch >= 0xd800 && ch <= 0xdbff && remaining > 3)
{
// Multi-word UTF-16 char...
int lch; // Lower bits
lch = (dataptr[3] << 8) | dataptr[2];
if (lch < 0xdc00 || lch >= 0xdfff)
break;
ch = (((ch & 0x3ff) << 10) | (lch & 0x3ff)) + 0x10000;
dataptr += 2;
remaining -= 2;
}
else if (ch >= 0xfffe)
{
continue;
}
if (ch < 128)
{
// ASCII
*tempptr++ = (char)ch;
}
else if (ch < 4096)
{
// 2-byte UTF-8
*tempptr++ = (char)(0xc0 | (ch >> 6));
*tempptr++ = (char)(0x80 | (ch & 0x3f));
}
else if (ch < 65536)
{
// 3-byte UTF-8
*tempptr++ = (char)(0xe0 | (ch >> 12));
*tempptr++ = (char)(0x80 | ((ch >> 6) & 0x3f));
*tempptr++ = (char)(0x80 | (ch & 0x3f));
}
else
{
// 4-byte UTF-8
*tempptr++ = (char)(0xe0 | (ch >> 18));
*tempptr++ = (char)(0x80 | ((ch >> 12) & 0x3f));
*tempptr++ = (char)(0x80 | ((ch >> 6) & 0x3f));
*tempptr++ = (char)(0x80 | (ch & 0x3f));
}
}
*tempptr = '\0';
}
else
{
// Copy as-is...
memcpy(temp, value->value.binary.data, value->value.binary.datalen);
temp[value->value.binary.datalen] = '\0';
}
free(value->value.binary.data);
value->type = PDFIO_VALTYPE_STRING;

View File

@ -801,6 +801,18 @@ pdfioFileGetKeywords(pdfio_file_t *pdf) // I - PDF file
}
//
// 'pdfioFileGetModDate()' - Get the most recent modification date for a PDF file.
//
time_t // O - Modification date or `0` for none
pdfioFileGetModDate(
pdfio_file_t *pdf) // I - PDF file
{
return (pdf && pdf->info_obj ? pdfioDictGetDate(pdfioObjGetDict(pdf->info_obj), "ModDate") : 0);
}
//
// 'pdfioFileGetName()' - Get a PDF's filename.
//
@ -1517,6 +1529,7 @@ load_obj_stream(pdfio_obj_t *obj) // I - Object to load
cur_obj, // Current object
num_objs = 0; // Number of objects
pdfio_obj_t *objs[16384]; // Objects
int count; // Count of objects
PDFIO_DEBUG("load_obj_stream(obj=%p(%d))\n", obj, (int)obj->number);
@ -1528,12 +1541,17 @@ load_obj_stream(pdfio_obj_t *obj) // I - Object to load
return (false);
}
count = (int)pdfioDictGetNumber(pdfioObjGetDict(obj), "N");
PDFIO_DEBUG("load_obj_stream: N=%d\n", count);
_pdfioTokenInit(&tb, obj->pdf, (_pdfio_tconsume_cb_t)pdfioStreamConsume, (_pdfio_tpeek_cb_t)pdfioStreamPeek, st);
// Read the object numbers from the beginning of the stream...
while (_pdfioTokenGet(&tb, buffer, sizeof(buffer)))
while (count > 0 && _pdfioTokenGet(&tb, buffer, sizeof(buffer)))
{
// Stop if this isn't an object number...
PDFIO_DEBUG("load_obj_stream: %s\n", buffer);
if (!isdigit(buffer[0] & 255))
break;
@ -1556,21 +1574,19 @@ load_obj_stream(pdfio_obj_t *obj) // I - Object to load
// Skip offset
_pdfioTokenGet(&tb, buffer, sizeof(buffer));
PDFIO_DEBUG("load_obj_stream: %ld at offset %s\n", (long)number, buffer);
// One less compressed object...
count --;
}
if (!buffer[0])
{
pdfioStreamClose(st);
return (false);
}
_pdfioTokenPush(&tb, buffer);
PDFIO_DEBUG("load_obj_stream: num_objs=%lu\n", (unsigned long)num_objs);
// Read the objects themselves...
for (cur_obj = 0; cur_obj < num_objs; cur_obj ++)
{
if (!_pdfioValueRead(obj->pdf, obj, &tb, &(objs[cur_obj]->value), 0))
{
_pdfioFileError(obj->pdf, "Unable to read compressed object.");
pdfioStreamClose(st);
return (false);
}
@ -1720,7 +1736,7 @@ load_xref(
pdfio_stream_t *st; // Stream
unsigned char buffer[32]; // Read buffer
size_t num_sobjs = 0, // Number of object streams
sobjs[8192]; // Object streams to load
sobjs[16384]; // Object streams to load
pdfio_obj_t *current; // Current object
if ((number = strtoimax(line, &ptr, 10)) < 1)

View File

@ -1,7 +1,7 @@
//
// Public header file for PDFio.
//
// Copyright © 2021-2024 by Michael R Sweet.
// Copyright © 2021-2025 by Michael R Sweet.
//
// Licensed under Apache License v2.0. See the file "LICENSE" for more
// information.
@ -23,7 +23,7 @@ extern "C" {
// Version number...
//
# define PDFIO_VERSION "1.4.0"
# define PDFIO_VERSION "1.4.1"
//
@ -201,6 +201,7 @@ extern time_t pdfioFileGetCreationDate(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern const char *pdfioFileGetCreator(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern pdfio_array_t *pdfioFileGetID(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern const char *pdfioFileGetKeywords(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern time_t pdfioFileGetModDate(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern const char *pdfioFileGetName(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern size_t pdfioFileGetNumObjs(pdfio_file_t *pdf) _PDFIO_PUBLIC;
extern size_t pdfioFileGetNumPages(pdfio_file_t *pdf) _PDFIO_PUBLIC;

View File

@ -204,6 +204,7 @@ pdfioFileGetCreationDate
pdfioFileGetCreator
pdfioFileGetID
pdfioFileGetKeywords
pdfioFileGetModDate
pdfioFileGetName
pdfioFileGetNumObjs
pdfioFileGetNumPages

View File

@ -1,3 +1,7 @@
https://www.color.org/chardata/rgb/rommrgb.xalter
Copyright © 2006 Hewlett-Packard
Terms of use
This profile is made available by ICC, and may be copied, distributed, embedded, made, used, and sold without restriction. Altered versions of this profile shall have the original identification and copyright information removed and shall not be misrepresented as the original profile.