Update documentation (Issue #77)

- Explain pdfioObjGetSubtype and pdfioObjGetType values
- Provide example code and documentation for accessing common page object values
This commit is contained in:
Michael R Sweet
2024-10-09 15:07:57 -04:00
parent 206f75403a
commit 74dfefdcc1
5 changed files with 336 additions and 12 deletions

View File

@ -1,13 +1,13 @@
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>PDFio Programming Manual v1.3.0</title>
<title>PDFio Programming Manual v1.3.2</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<meta name="generator" content="codedoc v3.7">
<meta name="author" content="Michael R Sweet">
<meta name="language" content="en-US">
<meta name="copyright" content="Copyright © 2021-2024 by Michael R Sweet">
<meta name="version" content="1.3.0">
<meta name="version" content="1.3.2">
<style type="text/css"><!--
body {
background: white;
@ -251,7 +251,7 @@ span.string {
<body>
<div class="header">
<p><img class="title" src="pdfio-512.png"></p>
<h1 class="title">PDFio Programming Manual v1.3.0</h1>
<h1 class="title">PDFio Programming Manual v1.3.2</h1>
<p>Michael R Sweet</p>
<p>Copyright © 2021-2024 by Michael R Sweet</p>
</div>
@ -628,7 +628,66 @@ pdfio_obj_t *page; <span class="comment">// Current page</span>
<span class="comment">// do something with page</span>
}
</code></pre>
<p>Each page is represented by a &quot;page tree&quot; object (what <a href="#pdfioFileGetPage"><code>pdfioFileGetPage</code></a> returns) that specifies information about the page and one or more &quot;content&quot; objects that contain the images, fonts, text, and graphics that appear on the page. Use the <a href="#pdfioPageGetNumStreams"><code>pdfioPageGetNumStreams</code></a> and <a href="#pdfioPageOpenStream"><code>pdfioPageOpenStream</code></a> functions to access the content streams for each page.</p>
<p>Each page is represented by a &quot;page tree&quot; object (what <a href="#pdfioFileGetPage"><code>pdfioFileGetPage</code></a> returns) that specifies information about the page and one or more &quot;content&quot; objects that contain the images, fonts, text, and graphics that appear on the page. Use the <a href="#pdfioPageGetNumStreams"><code>pdfioPageGetNumStreams</code></a> and <a href="#pdfioPageOpenStream"><code>pdfioPageOpenStream</code></a> functions to access the content streams for each page, and <a href="#pdfioObjGetDict"><code>pdfioObjGetDict</code></a> to get the associated page object dictionary. For example, if you want to display the media and crop boxes for a given page:</p>
<pre><code class="language-c">pdfio_file_t *pdf; <span class="comment">// PDF file</span>
size_t i; <span class="comment">// Looping var</span>
size_t count; <span class="comment">// Number of pages</span>
pdfio_obj_t *page; <span class="comment">// Current page</span>
pdfio_dict_t *dict; <span class="comment">// Current page dictionary</span>
pdfio_array_t *media_box; <span class="comment">// MediaBox array</span>
<span class="reserved">double</span> media_values[<span class="number">4</span>]; <span class="comment">// MediaBox values</span>
pdfio_array_t *crop_box; <span class="comment">// CropBox array</span>
<span class="reserved">double</span> crop_values[<span class="number">4</span>]; <span class="comment">// CropBox values</span>
<span class="comment">// Iterate the pages in the PDF file</span>
<span class="reserved">for</span> (i = <span class="number">0</span>, count = pdfioFileGetNumPages(pdf); i &lt; count; i ++)
{
page = pdfioFileGetPage(pdf, i);
dict = pdfioObjGetDict(page);
media_box = pdfioDictGetArray(dict, <span class="string">&quot;MediaBox&quot;</span>);
media_values[<span class="number">0</span>] = pdfioArrayGetNumber(media_box, <span class="number">0</span>);
media_values[<span class="number">1</span>] = pdfioArrayGetNumber(media_box, <span class="number">1</span>);
media_values[<span class="number">2</span>] = pdfioArrayGetNumber(media_box, <span class="number">2</span>);
media_values[<span class="number">3</span>] = pdfioArrayGetNumber(media_box, <span class="number">3</span>);
crop_box = pdfioDictGetArray(dict, <span class="string">&quot;CropBox&quot;</span>);
crop_values[<span class="number">0</span>] = pdfioArrayGetNumber(crop_box, <span class="number">0</span>);
crop_values[<span class="number">1</span>] = pdfioArrayGetNumber(crop_box, <span class="number">1</span>);
crop_values[<span class="number">2</span>] = pdfioArrayGetNumber(crop_box, <span class="number">2</span>);
crop_values[<span class="number">3</span>] = pdfioArrayGetNumber(crop_box, <span class="number">3</span>);
printf(<span class="string">&quot;Page %u: MediaBox=[%g %g %g %g], CropBox=[%g %g %g %g]\n&quot;</span>,
(<span class="reserved">unsigned</span>)(i + <span class="number">1</span>),
media_values[<span class="number">0</span>], media_values[<span class="number">1</span>], media_values[<span class="number">2</span>], media_values[<span class="number">3</span>],
crop_values[<span class="number">0</span>], crop_values[<span class="number">1</span>], crop_values[<span class="number">2</span>], crop_values[<span class="number">3</span>]);
}
</code></pre>
<p>Page object dictionaries have several (mostly optional) key/value pairs, including:</p>
<ul>
<li><p>&quot;Annots&quot;: An array of annotation dictionaries for the page; use <a href="#pdfioDictGetArray"><code>pdfioDictGetArray</code></a> to get the array</p>
</li>
<li><p>&quot;CropBox&quot;: The crop box as an array of four numbers for the left, bottom, right, and top coordinates of the target media; use <a href="#pdfioDictGetArray"><code>pdfioDictGetArray</code></a> to get a pointer to the array of numbers</p>
</li>
<li><p>&quot;Dur&quot;: The number of seconds the page should be displayed; use <a href="#pdfioDictGetNumber"><code>pdfioDictGetNumber</code></a> to get the page duration value</p>
</li>
<li><p>&quot;Group&quot;: The dictionary of transparency group values for the page; use <a href="#pdfioDictGetDict"><code>pdfioDictGetDict</code></a> to get a pointer to the resources dictionary</p>
</li>
<li><p>&quot;LastModified&quot;: The date and time when this page was last modified; use <a href="#pdfioDictGetDate"><code>pdfioDictGetDate</code></a> to get the Unix <code>time_t</code> value</p>
</li>
<li><p>&quot;Parent&quot;: The parent page tree node object for this page; use <a href="#pdfioDictGetObj"><code>pdfioDictGetObj</code></a> to get a pointer to the object</p>
</li>
<li><p>&quot;MediaBox&quot;: The media box as an array of four numbers for the left, bottom, right, and top coordinates of the target media; use <a href="#pdfioDictGetArray"><code>pdfioDictGetArray</code></a> to get a pointer to the array of numbers</p>
</li>
<li><p>&quot;Resources&quot;: The dictionary of resources for the page; use <a href="#pdfioDictGetDict"><code>pdfioDictGetDict</code></a> to get a pointer to the resources dictionary</p>
</li>
<li><p>&quot;Rotate&quot;: A number indicating the number of degrees of counter-clockwise rotation to apply to the page when viewing; use <a href="#pdfioDictGetNumber"><code>pdfioDictGetNumber</code></a> to get the rotation angle</p>
</li>
<li><p>&quot;Thumb&quot;: A thumbnail image object for the page; use <a href="#pdfioDictGetObj"><code>pdfioDictGetObj</code></a> to get a pointer to the thumbnail image object</p>
</li>
<li><p>&quot;Trans&quot;: The page transition dictionary; use <a href="#pdfioDictGetDict"><code>pdfioDictGetDict</code></a> to get a pointer to the dictionary</p>
</li>
</ul>
<p>The <a href="#pdfioFileClose"><code>pdfioFileClose</code></a> function closes a PDF file and frees all memory that was used for it:</p>
<pre><code class="language-c">pdfioFileClose(pdf);
</code></pre>
@ -3490,7 +3549,30 @@ size_t pdfioObjGetNumber(<a href="#pdfio_obj_t">pdfio_obj_t</a> *obj);</p>
<td class="description">Object</td></tr>
</tbody></table>
<h4 class="returnvalue">Return Value</h4>
<p class="description">Object subtype</p>
<p class="description">Object subtype name or <code>NULL</code> for none</p>
<h4 class="discussion">Discussion</h4>
<p class="discussion">This function returns an object's PDF subtype name, if any. Common subtype
names include:
</p><ul>
<li>&quot;CIDFontType0&quot;: A CID Type0 font
</li>
<li>&quot;CIDFontType2&quot;: A CID TrueType font
</li>
<li>&quot;Image&quot;: An image or image mask
</li>
<li>&quot;Form&quot;: A fillable form
</li>
<li>&quot;OpenType&quot;: An OpenType font
</li>
<li>&quot;Type0&quot;: A composite font
</li>
<li>&quot;Type1&quot;: A PostScript Type1 font
</li>
<li>&quot;Type3&quot;: A PDF Type3 font
</li>
<li>&quot;TrueType&quot;: A TrueType font</li>
</ul>
<h3 class="function"><a id="pdfioObjGetType">pdfioObjGetType</a></h3>
<p class="description">Get an object's type.</p>
<p class="code">
@ -3501,7 +3583,28 @@ size_t pdfioObjGetNumber(<a href="#pdfio_obj_t">pdfio_obj_t</a> *obj);</p>
<td class="description">Object</td></tr>
</tbody></table>
<h4 class="returnvalue">Return Value</h4>
<p class="description">Object type</p>
<p class="description">Object type name or <code>NULL</code> for none</p>
<h4 class="discussion">Discussion</h4>
<p class="discussion">This function returns an object's PDF type name, if any. Common type names
include:
</p><ul>
<li>&quot;CMap&quot;: A character map for composite fonts
</li>
<li>&quot;Font&quot;: An embedded font (<a href="#pdfioObjGetSubtype"><code>pdfioObjGetSubtype</code></a> will tell you the
font format)
</li>
<li>&quot;FontDescriptor&quot;: A font descriptor
</li>
<li>&quot;Page&quot;: A (visible) page
</li>
<li>&quot;Pages&quot;: A page tree node
</li>
<li>&quot;Template&quot;: An invisible template page
</li>
<li>&quot;XObject&quot;: An image, image mask, or form (<a href="#pdfioObjGetSubtype"><code>pdfioObjGetSubtype</code></a> will
tell you which)</li>
</ul>
<h3 class="function"><a id="pdfioObjOpenStream">pdfioObjOpenStream</a></h3>
<p class="description">Open an object's (data) stream for reading.</p>
<p class="code">