Merge 0ab291a78bd977570ce1f9fbd78439f3f1b38c20 into 9c04d1dc209cb081799f4c787ebe712b2d93d460

Update pdfio.md
2025-07-07 03:24:32 +02:00 · 2024-10-21 15:07:28 +00:00 · 2024-10-21 20:37:25 +05:30 · 2024-10-21 19:52:34 +05:30 · 2024-10-21 17:09:38 +05:30
1 changed files with 10 additions and 45 deletions
--- a/doc/pdfio.md
+++ b/doc/pdfio.md
@ -118,20 +118,6 @@ that are defined in a separate header file:
 ```c
 #include <pdfio-content.h>
 ```
-
-
-API Overview
-============
-
-PDFio exposes several types:
-
- `pdfio_file_t`: A PDF file (for reading or writing)
- `pdfio_array_t`: An array of values
- `pdfio_dict_t`: A dictionary of key/value pairs in a PDF file, object, etc.
- `pdfio_obj_t`: An object in a PDF file
- `pdfio_stream_t`: An object stream
-
-
 Understanding PDF Files
 -----------------------

@ -286,40 +272,19 @@ startxref        %startxref keyword
 %%EOF            %End-of-file marker
 ```

-How a PDF File is Read
----------------------

-To read a PDF file, converting it from a flat series of bytes into a graph of objects in memory, 
-the following steps might typically occur:
-1. Read the PDF header from the beginning of the file, checking that this is, indeed, a PDF
-document and retrieving its version number.
-3. The end-of-file marker is now found, by searching backward from the end of the file.
-The trailer dictionary can now be read, and the byte offset of the start of the cross-reference
-table retrieved.
-5. The cross-reference table can now be read. We now know where each object in the file is.
-6. At this stage, all the objects can be read and parsed, or we can leave this process until each
-object is actually needed, reading it on demand.
-8. We can now use the data, extracting the pages, parsing graphical content, extracting metadata,
-and so on.  
-This is not an exhaustive description, since there are many possible complications
-(encryption, linearization, objects, and cross reference streams).
+API Overview
+============
+
+PDFio exposes several types:
+
+- `pdfio_file_t`: A PDF file (for reading or writing)
+- `pdfio_array_t`: An array of values
+- `pdfio_dict_t`: A dictionary of key/value pairs in a PDF file, object, etc.
+- `pdfio_obj_t`: An object in a PDF file
+- `pdfio_stream_t`: An object stream

-How a PDF File is Written
-------------------------

-Writing a PDF document to a series of bytes in a file is much simpler than
-reading it—we don’t need to support all of the PDF format, just the subset
-we intend to use. Writing a PDF file is very fast, since it amounts to little
-more than flattening the object graph to a series of bytes.
-1. Output the header.
-2. Remove any objects which are not referenced by any other object in the
-PDF. This avoids writing objects which are no longer needed.
-3. Renumber the objects so they run from 1 to n where n is the number of
-objects in the file.
-4. Output the objects one by one, starting with object number one,
-recording the byte offset of each for the cross-reference table.
-5. Write the cross-reference table.
-6. Write the trailer, trailer dictionary, and end-of-file marker.

 Reading PDF Files
 -----------------
Author	SHA1	Message	Date
ThePhatak	6f4c36d804	Merge 0ab291a78bd977570ce1f9fbd78439f3f1b38c20 into 9c04d1dc209cb081799f4c787ebe712b2d93d460	2024-10-21 15:07:28 +00:00
ThePhatak	0ab291a78b	Update pdfio.md	2024-10-21 20:37:25 +05:30
ThePhatak	cac6d4891c	Update pdfio.md	2024-10-21 19:52:34 +05:30
ThePhatak	4f29ad89da	Merge branch 'michaelrsweet:master' into master	2024-10-21 17:09:38 +05:30