Update documentation (Issue #77)

- Explain pdfioObjGetSubtype and pdfioObjGetType values - Provide example code and documentation for accessing common page object values
2025-07-13 06:24:25 +02:00 · 2024-10-09 15:07:57 -04:00
parent 206f75403a
commit 74dfefdcc1
5 changed files with 336 additions and 12 deletions
--- a/doc/pdfio.html
+++ b/doc/pdfio.html
@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <html lang="en-US">
 <head>
-<title>PDFio Programming Manual v1.3.0</title>
+<title>PDFio Programming Manual v1.3.2</title>
 <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
 <meta name="generator" content="codedoc v3.7">
 <meta name="author" content="Michael R Sweet">
 <meta name="language" content="en-US">
 <meta name="copyright" content="Copyright © 2021-2024 by Michael R Sweet">
-<meta name="version" content="1.3.0">
+<meta name="version" content="1.3.2">
 <style type="text/css"><!--
 body {
  background: white;
@ -251,7 +251,7 @@ span.string {
 <body>
 <div class="header">
 <p><img class="title" src="pdfio-512.png"></p>
-<h1 class="title">PDFio Programming Manual v1.3.0</h1>
+<h1 class="title">PDFio Programming Manual v1.3.2</h1>
 <p>Michael R Sweet</p>
 <p>Copyright © 2021-2024 by Michael R Sweet</p>
 </div>
@ -628,7 +628,66 @@ pdfio_obj_t  *page;  <span class="comment">// Current page</span>
  <span class="comment">// do something with page</span>
 }
 </code></pre>
-<p>Each page is represented by a &quot;page tree&quot; object (what <a href="#pdfioFileGetPage"><code>pdfioFileGetPage</code></a> returns) that specifies information about the page and one or more &quot;content&quot; objects that contain the images, fonts, text, and graphics that appear on the page. Use the <a href="#pdfioPageGetNumStreams"><code>pdfioPageGetNumStreams</code></a> and <a href="#pdfioPageOpenStream"><code>pdfioPageOpenStream</code></a> functions to access the content streams for each page.</p>
+<p>Each page is represented by a &quot;page tree&quot; object (what <a href="#pdfioFileGetPage"><code>pdfioFileGetPage</code></a> returns) that specifies information about the page and one or more &quot;content&quot; objects that contain the images, fonts, text, and graphics that appear on the page. Use the <a href="#pdfioPageGetNumStreams"><code>pdfioPageGetNumStreams</code></a> and <a href="#pdfioPageOpenStream"><code>pdfioPageOpenStream</code></a> functions to access the content streams for each page, and <a href="#pdfioObjGetDict"><code>pdfioObjGetDict</code></a> to get the associated page object dictionary. For example, if you want to display the media and crop boxes for a given page:</p>
+<pre><code class="language-c">pdfio_file_t  *pdf;             <span class="comment">// PDF file</span>
+size_t        i;                <span class="comment">// Looping var</span>
+size_t        count;            <span class="comment">// Number of pages</span>
+pdfio_obj_t   *page;            <span class="comment">// Current page</span>
+pdfio_dict_t  *dict;            <span class="comment">// Current page dictionary</span>
+pdfio_array_t *media_box;       <span class="comment">// MediaBox array</span>
+<span class="reserved">double</span>        media_values[<span class="number">4</span>];  <span class="comment">// MediaBox values</span>
+pdfio_array_t *crop_box;        <span class="comment">// CropBox array</span>
+<span class="reserved">double</span>        crop_values[<span class="number">4</span>];   <span class="comment">// CropBox values</span>
+
+<span class="comment">// Iterate the pages in the PDF file</span>
+<span class="reserved">for</span> (i = <span class="number">0</span>, count = pdfioFileGetNumPages(pdf); i &lt; count; i ++)
+{
+  page = pdfioFileGetPage(pdf, i);
+  dict = pdfioObjGetDict(page);
+
+  media_box       = pdfioDictGetArray(dict, <span class="string">&quot;MediaBox&quot;</span>);
+  media_values[<span class="number">0</span>] = pdfioArrayGetNumber(media_box, <span class="number">0</span>);
+  media_values[<span class="number">1</span>] = pdfioArrayGetNumber(media_box, <span class="number">1</span>);
+  media_values[<span class="number">2</span>] = pdfioArrayGetNumber(media_box, <span class="number">2</span>);
+  media_values[<span class="number">3</span>] = pdfioArrayGetNumber(media_box, <span class="number">3</span>);
+
+  crop_box       = pdfioDictGetArray(dict, <span class="string">&quot;CropBox&quot;</span>);
+  crop_values[<span class="number">0</span>] = pdfioArrayGetNumber(crop_box, <span class="number">0</span>);
+  crop_values[<span class="number">1</span>] = pdfioArrayGetNumber(crop_box, <span class="number">1</span>);
+  crop_values[<span class="number">2</span>] = pdfioArrayGetNumber(crop_box, <span class="number">2</span>);
+  crop_values[<span class="number">3</span>] = pdfioArrayGetNumber(crop_box, <span class="number">3</span>);
+
+  printf(<span class="string">&quot;Page %u: MediaBox=[%g %g %g %g], CropBox=[%g %g %g %g]\n&quot;</span>,
+         (<span class="reserved">unsigned</span>)(i + <span class="number">1</span>),
+         media_values[<span class="number">0</span>], media_values[<span class="number">1</span>], media_values[<span class="number">2</span>], media_values[<span class="number">3</span>],
+         crop_values[<span class="number">0</span>], crop_values[<span class="number">1</span>], crop_values[<span class="number">2</span>], crop_values[<span class="number">3</span>]);
+}
+</code></pre>
+<p>Page object dictionaries have several (mostly optional) key/value pairs, including:</p>
+<ul>
+<li><p>&quot;Annots&quot;: An array of annotation dictionaries for the page; use <a href="#pdfioDictGetArray"><code>pdfioDictGetArray</code></a> to get the array</p>
+</li>
+<li><p>&quot;CropBox&quot;: The crop box as an array of four numbers for the left, bottom, right, and top coordinates of the target media; use <a href="#pdfioDictGetArray"><code>pdfioDictGetArray</code></a> to get a pointer to the array of numbers</p>
+</li>
+<li><p>&quot;Dur&quot;: The number of seconds the page should be displayed; use <a href="#pdfioDictGetNumber"><code>pdfioDictGetNumber</code></a> to get the page duration value</p>
+</li>
+<li><p>&quot;Group&quot;: The dictionary of transparency group values for the page; use <a href="#pdfioDictGetDict"><code>pdfioDictGetDict</code></a> to get a pointer to the resources dictionary</p>
+</li>
+<li><p>&quot;LastModified&quot;: The date and time when this page was last modified; use <a href="#pdfioDictGetDate"><code>pdfioDictGetDate</code></a> to get the Unix <code>time_t</code> value</p>
+</li>
+<li><p>&quot;Parent&quot;: The parent page tree node object for this page; use <a href="#pdfioDictGetObj"><code>pdfioDictGetObj</code></a> to get a pointer to the object</p>
+</li>
+<li><p>&quot;MediaBox&quot;: The media box as an array of four numbers for the left, bottom, right, and top coordinates of the target media; use <a href="#pdfioDictGetArray"><code>pdfioDictGetArray</code></a> to get a pointer to the array of numbers</p>
+</li>
+<li><p>&quot;Resources&quot;: The dictionary of resources for the page; use <a href="#pdfioDictGetDict"><code>pdfioDictGetDict</code></a> to get a pointer to the resources dictionary</p>
+</li>
+<li><p>&quot;Rotate&quot;: A number indicating the number of degrees of counter-clockwise rotation to apply to the page when viewing; use <a href="#pdfioDictGetNumber"><code>pdfioDictGetNumber</code></a> to get the rotation angle</p>
+</li>
+<li><p>&quot;Thumb&quot;: A thumbnail image object for the page; use <a href="#pdfioDictGetObj"><code>pdfioDictGetObj</code></a> to get a pointer to the thumbnail image object</p>
+</li>
+<li><p>&quot;Trans&quot;: The page transition dictionary; use <a href="#pdfioDictGetDict"><code>pdfioDictGetDict</code></a> to get a pointer to the dictionary</p>
+</li>
+</ul>
 <p>The <a href="#pdfioFileClose"><code>pdfioFileClose</code></a> function closes a PDF file and frees all memory that was used for it:</p>
 <pre><code class="language-c">pdfioFileClose(pdf);
 </code></pre>
@ -3490,7 +3549,30 @@ size_t pdfioObjGetNumber(<a href="#pdfio_obj_t">pdfio_obj_t</a> *obj);</p>
 <td class="description">Object</td></tr>
 </tbody></table>
 <h4 class="returnvalue">Return Value</h4>
-<p class="description">Object subtype</p>
+<p class="description">Object subtype name or <code>NULL</code> for none</p>
+<h4 class="discussion">Discussion</h4>
+<p class="discussion">This function returns an object's PDF subtype name, if any.  Common subtype
+names include:
+
+</p><ul>
+<li>&quot;CIDFontType0&quot;: A CID Type0 font
+</li>
+<li>&quot;CIDFontType2&quot;: A CID TrueType font
+</li>
+<li>&quot;Image&quot;: An image or image mask
+</li>
+<li>&quot;Form&quot;: A fillable form
+</li>
+<li>&quot;OpenType&quot;: An OpenType font
+</li>
+<li>&quot;Type0&quot;: A composite font
+</li>
+<li>&quot;Type1&quot;: A PostScript Type1 font
+</li>
+<li>&quot;Type3&quot;: A PDF Type3 font
+</li>
+<li>&quot;TrueType&quot;: A TrueType font</li>
+</ul>
 <h3 class="function"><a id="pdfioObjGetType">pdfioObjGetType</a></h3>
 <p class="description">Get an object's type.</p>
 <p class="code">
@ -3501,7 +3583,28 @@ size_t pdfioObjGetNumber(<a href="#pdfio_obj_t">pdfio_obj_t</a> *obj);</p>
 <td class="description">Object</td></tr>
 </tbody></table>
 <h4 class="returnvalue">Return Value</h4>
-<p class="description">Object type</p>
+<p class="description">Object type name or <code>NULL</code> for none</p>
+<h4 class="discussion">Discussion</h4>
+<p class="discussion">This function returns an object's PDF type name, if any. Common type names
+include:
+
+</p><ul>
+<li>&quot;CMap&quot;: A character map for composite fonts
+</li>
+<li>&quot;Font&quot;: An embedded font (<a href="#pdfioObjGetSubtype"><code>pdfioObjGetSubtype</code></a> will tell you the
+  font format)
+</li>
+<li>&quot;FontDescriptor&quot;: A font descriptor
+</li>
+<li>&quot;Page&quot;: A (visible) page
+</li>
+<li>&quot;Pages&quot;: A page tree node
+</li>
+<li>&quot;Template&quot;: An invisible template page
+</li>
+<li>&quot;XObject&quot;: An image, image mask, or form (<a href="#pdfioObjGetSubtype"><code>pdfioObjGetSubtype</code></a> will
+  tell you which)</li>
+</ul>
 <h3 class="function"><a id="pdfioObjOpenStream">pdfioObjOpenStream</a></h3>
 <p class="description">Open an object's (data) stream for reading.</p>
 <p class="code">