mirror of
https://github.com/michaelrsweet/pdfio.git
synced 2024-12-27 05:48:20 +01:00
Compare commits
4 Commits
99da340c86
...
6f4c36d804
Author | SHA1 | Date | |
---|---|---|---|
|
6f4c36d804 | ||
|
0ab291a78b | ||
|
cac6d4891c | ||
|
4f29ad89da |
55
doc/pdfio.md
55
doc/pdfio.md
@ -118,20 +118,6 @@ that are defined in a separate header file:
|
|||||||
```c
|
```c
|
||||||
#include <pdfio-content.h>
|
#include <pdfio-content.h>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
API Overview
|
|
||||||
============
|
|
||||||
|
|
||||||
PDFio exposes several types:
|
|
||||||
|
|
||||||
- `pdfio_file_t`: A PDF file (for reading or writing)
|
|
||||||
- `pdfio_array_t`: An array of values
|
|
||||||
- `pdfio_dict_t`: A dictionary of key/value pairs in a PDF file, object, etc.
|
|
||||||
- `pdfio_obj_t`: An object in a PDF file
|
|
||||||
- `pdfio_stream_t`: An object stream
|
|
||||||
|
|
||||||
|
|
||||||
Understanding PDF Files
|
Understanding PDF Files
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
@ -286,40 +272,19 @@ startxref %startxref keyword
|
|||||||
%%EOF %End-of-file marker
|
%%EOF %End-of-file marker
|
||||||
```
|
```
|
||||||
|
|
||||||
How a PDF File is Read
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
To read a PDF file, converting it from a flat series of bytes into a graph of objects in memory,
|
API Overview
|
||||||
the following steps might typically occur:
|
============
|
||||||
1. Read the PDF header from the beginning of the file, checking that this is, indeed, a PDF
|
|
||||||
document and retrieving its version number.
|
PDFio exposes several types:
|
||||||
3. The end-of-file marker is now found, by searching backward from the end of the file.
|
|
||||||
The trailer dictionary can now be read, and the byte offset of the start of the cross-reference
|
- `pdfio_file_t`: A PDF file (for reading or writing)
|
||||||
table retrieved.
|
- `pdfio_array_t`: An array of values
|
||||||
5. The cross-reference table can now be read. We now know where each object in the file is.
|
- `pdfio_dict_t`: A dictionary of key/value pairs in a PDF file, object, etc.
|
||||||
6. At this stage, all the objects can be read and parsed, or we can leave this process until each
|
- `pdfio_obj_t`: An object in a PDF file
|
||||||
object is actually needed, reading it on demand.
|
- `pdfio_stream_t`: An object stream
|
||||||
8. We can now use the data, extracting the pages, parsing graphical content, extracting metadata,
|
|
||||||
and so on.
|
|
||||||
This is not an exhaustive description, since there are many possible complications
|
|
||||||
(encryption, linearization, objects, and cross reference streams).
|
|
||||||
|
|
||||||
How a PDF File is Written
|
|
||||||
-------------------------
|
|
||||||
|
|
||||||
Writing a PDF document to a series of bytes in a file is much simpler than
|
|
||||||
reading it—we don’t need to support all of the PDF format, just the subset
|
|
||||||
we intend to use. Writing a PDF file is very fast, since it amounts to little
|
|
||||||
more than flattening the object graph to a series of bytes.
|
|
||||||
1. Output the header.
|
|
||||||
2. Remove any objects which are not referenced by any other object in the
|
|
||||||
PDF. This avoids writing objects which are no longer needed.
|
|
||||||
3. Renumber the objects so they run from 1 to n where n is the number of
|
|
||||||
objects in the file.
|
|
||||||
4. Output the objects one by one, starting with object number one,
|
|
||||||
recording the byte offset of each for the cross-reference table.
|
|
||||||
5. Write the cross-reference table.
|
|
||||||
6. Write the trailer, trailer dictionary, and end-of-file marker.
|
|
||||||
|
|
||||||
Reading PDF Files
|
Reading PDF Files
|
||||||
-----------------
|
-----------------
|
||||||
|
Loading…
Reference in New Issue
Block a user