Merge 0ab291a78bd977570ce1f9fbd78439f3f1b38c20 into 9c04d1dc209cb081799f4c787ebe712b2d93d460

Update pdfio.md
2025-06-15 17:54:21 +02:00 · 2024-10-21 15:07:28 +00:00 · 2024-10-21 20:37:25 +05:30 · 2024-10-21 19:52:34 +05:30 · 2024-10-21 17:09:38 +05:30
1 changed files with 10 additions and 45 deletions
--- a/doc/pdfio.md
+++ b/doc/pdfio.md
@ -118,20 +118,6 @@ that are defined in a separate header file:
 ```c
 #include <pdfio-content.h>
 ```
 API Overview
 ============
 PDFio exposes several types:
 - `pdfio_file_t`: A PDF file (for reading or writing)
 - `pdfio_array_t`: An array of values
 - `pdfio_dict_t`: A dictionary of key/value pairs in a PDF file, object, etc.
 - `pdfio_obj_t`: An object in a PDF file
 - `pdfio_stream_t`: An object stream
 Understanding PDF Files
 -----------------------
@ -286,40 +272,19 @@ startxref        %startxref keyword
 %%EOF            %End-of-file marker
 ```
 How a PDF File is Read
 ----------------------
-To read a PDF file, converting it from a flat series of bytes into a graph of objects in memory, 
+API Overview
-the following steps might typically occur:
+============
-1. Read the PDF header from the beginning of the file, checking that this is, indeed, a PDF
+
-document and retrieving its version number.
+PDFio exposes several types:
-3. The end-of-file marker is now found, by searching backward from the end of the file.
+
-The trailer dictionary can now be read, and the byte offset of the start of the cross-reference
+- `pdfio_file_t`: A PDF file (for reading or writing)
-table retrieved.
+- `pdfio_array_t`: An array of values
-5. The cross-reference table can now be read. We now know where each object in the file is.
+- `pdfio_dict_t`: A dictionary of key/value pairs in a PDF file, object, etc.
-6. At this stage, all the objects can be read and parsed, or we can leave this process until each
+- `pdfio_obj_t`: An object in a PDF file
-object is actually needed, reading it on demand.
+- `pdfio_stream_t`: An object stream
 8. We can now use the data, extracting the pages, parsing graphical content, extracting metadata,
 and so on.  
 This is not an exhaustive description, since there are many possible complications
 (encryption, linearization, objects, and cross reference streams).
 How a PDF File is Written
 -------------------------
 Writing a PDF document to a series of bytes in a file is much simpler than
 reading it—we don’t need to support all of the PDF format, just the subset
 we intend to use. Writing a PDF file is very fast, since it amounts to little
 more than flattening the object graph to a series of bytes.
 1. Output the header.
 2. Remove any objects which are not referenced by any other object in the
 PDF. This avoids writing objects which are no longer needed.
 3. Renumber the objects so they run from 1 to n where n is the number of
 objects in the file.
 4. Output the objects one by one, starting with object number one,
 recording the byte offset of each for the cross-reference table.
 5. Write the cross-reference table.
 6. Write the trailer, trailer dictionary, and end-of-file marker.
 Reading PDF Files
 -----------------
Author	SHA1	Message	Date
ThePhatak	6f4c36d804	Merge 0ab291a78bd977570ce1f9fbd78439f3f1b38c20 into 9c04d1dc209cb081799f4c787ebe712b2d93d460	2024-10-21 15:07:28 +00:00
ThePhatak	0ab291a78b	Update pdfio.md	2024-10-21 20:37:25 +05:30
ThePhatak	cac6d4891c	Update pdfio.md	2024-10-21 19:52:34 +05:30
ThePhatak	4f29ad89da	Merge branch 'michaelrsweet:master' into master	2024-10-21 17:09:38 +05:30