mirror of
https://github.com/michaelrsweet/pdfio.git
synced 2024-12-26 13:28:22 +01:00
Clean up updated docos (Issue #78)
This commit is contained in:
parent
21b8e3b06f
commit
21ac2b52d1
123
doc/pdfio.md
123
doc/pdfio.md
@ -118,17 +118,20 @@ that are defined in a separate header file:
|
|||||||
```c
|
```c
|
||||||
#include <pdfio-content.h>
|
#include <pdfio-content.h>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
Understanding PDF Files
|
Understanding PDF Files
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
A PDF file provides data and commands for displaying pages of graphics and text,
|
A PDF file provides data and commands for displaying pages of graphics and text,
|
||||||
and is structured in a way that allows it to be displayed in the same way across
|
and is structured in a way that allows it to be displayed in the same way across
|
||||||
multiple devices and platforms.
|
multiple devices and platforms. The following is a PDF which shows "Hello,
|
||||||
The following is a PDF which shows "Hello, World!" on one page:
|
World!" on one page:
|
||||||
|
|
||||||
```
|
```
|
||||||
%PDF-1.0 %Header starts here
|
%PDF-1.0 % Header starts here
|
||||||
%âãÏÓ
|
%âãÏÓ
|
||||||
1 0 obj %Body starts here
|
1 0 obj % Body starts here
|
||||||
<<
|
<<
|
||||||
/Kids [2 0 R]
|
/Kids [2 0 R]
|
||||||
/Count 1
|
/Count 1
|
||||||
@ -175,7 +178,7 @@ endobj
|
|||||||
/Type /Catalog
|
/Type /Catalog
|
||||||
>>
|
>>
|
||||||
endobj
|
endobj
|
||||||
xref %Cross-reference table starts here
|
xref % Cross-reference table starts here
|
||||||
0 6
|
0 6
|
||||||
0000000000 65535 f
|
0000000000 65535 f
|
||||||
0000000015 00000 n
|
0000000015 00000 n
|
||||||
@ -183,7 +186,7 @@ xref %Cross-reference table starts here
|
|||||||
0000000192 00000 n
|
0000000192 00000 n
|
||||||
0000000291 00000 n
|
0000000291 00000 n
|
||||||
0000000409 00000 n
|
0000000409 00000 n
|
||||||
trailer %Trailer starts here
|
trailer % Trailer starts here
|
||||||
<<
|
<<
|
||||||
/Root 5 0 R
|
/Root 5 0 R
|
||||||
/Size 6
|
/Size 6
|
||||||
@ -193,27 +196,38 @@ startxref
|
|||||||
%%EOF
|
%%EOF
|
||||||
```
|
```
|
||||||
|
|
||||||
### Header
|
|
||||||
This is the first line of a PDF File. This specifies the version of PDF Format used.
|
|
||||||
For Example: '%PDF-1.0'
|
|
||||||
|
|
||||||
Since PDF files almost always contain binary data, they can become corrupted if line
|
### Header
|
||||||
endings are changed (for example, if the file is transferred over FTP in text mode).
|
|
||||||
To allow legacy file transfer programs to determine that the file is binary, it is
|
The header is the first line of a PDF file that specifies the version of the PDF
|
||||||
usual to include some bytes withcharacter codes higher than 127 in the header.
|
format that has been used, for example `%PDF-1.0`.
|
||||||
- For example: %âãÏÓ
|
|
||||||
- The percent sign indicates another header line, the other few bytes are arbitrary
|
Since PDF files almost always contain binary data, they can become corrupted if
|
||||||
character codes in excess of 127. So, the whole header in our example is:
|
line endings are changed. For example, if the file is transferred using FTP in
|
||||||
|
text mode or is edited in Notepad on Windows. To allow legacy file transfer
|
||||||
|
programs to determine that the file is binary, the PDF standard recommends
|
||||||
|
including some bytes with character codes higher than 127 in the header, for
|
||||||
|
example:
|
||||||
|
|
||||||
|
```
|
||||||
|
%âãÏÓ
|
||||||
|
```
|
||||||
|
|
||||||
|
The percent sign indicates a comment line while the other few bytes are
|
||||||
|
arbitrary character codes in excess of 127. So, the whole header in our example
|
||||||
|
is:
|
||||||
|
|
||||||
```
|
```
|
||||||
%PDF-1.0
|
%PDF-1.0
|
||||||
%âãÏÓ
|
%âãÏÓ
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
### Body
|
### Body
|
||||||
The file body consists of a sequence of objects, each preceded by an object number,
|
|
||||||
generation number, and the obj keyword on one line, and followed by the endobj keyword
|
The file body consists of a sequence of objects, each preceded by an object
|
||||||
on another. For Example:
|
number, generation number, and the obj keyword on one line, and followed by the
|
||||||
|
endobj keyword on another. For example:
|
||||||
|
|
||||||
```
|
```
|
||||||
1 0 obj
|
1 0 obj
|
||||||
@ -225,51 +239,60 @@ on another. For Example:
|
|||||||
endobj
|
endobj
|
||||||
```
|
```
|
||||||
|
|
||||||
Here, the object number is 1, and the generation number is 0 (it almost always is).
|
In this example, the object number is 1 and the generation number is 0, meaning
|
||||||
The content for object 1 is in between the two lines 1 0 obj and endobj.
|
it is the first version of the object. The content for object 1 is between the
|
||||||
In this case, it’s the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>>
|
initial `1 0 obj` and trailing `endobj` lines. In this case, the content is the
|
||||||
|
dictionary `<</Kids [2 0 R] /Count 1 /Type /Pages>>`.
|
||||||
|
|
||||||
|
|
||||||
### Cross-Reference Table
|
### Cross-Reference Table
|
||||||
|
|
||||||
The cross-reference table lists the byte offset of each object in the file body.
|
The cross-reference table lists the byte offset of each object in the file body.
|
||||||
This allows random access to objects, meaning they don't have to be read in order.
|
This allows random access to objects, meaning they don't have to be read in
|
||||||
Objects that are not used are never read, making the process efficient.
|
order. Objects that are not used are never read, making the process efficient.
|
||||||
Operations like counting the number of pages in a PDF document are fast, even in large files.
|
Operations like counting the number of pages in a PDF document are fast, even in
|
||||||
Each object has an object number and a generation number.
|
large files.
|
||||||
- Generation numbers are used when a cross-reference table entry is reused.
|
|
||||||
- For simplicity, we will assume generation numbers to be always zero and ignore them.
|
Each object has an object number and a generation number. Generation numbers
|
||||||
The cross-reference table consists of:
|
are used when a cross-reference table entry is reused. For simplicity, we will
|
||||||
- Header line that indicates the number of entries.
|
assume generation numbers to be always zero and ignore them. The
|
||||||
- Special entry (the first entry).
|
cross-reference table consists of a header line that indicates the number of
|
||||||
- One line for each of the object in the file body.
|
entries, a free entry line for object 0, and a line for each of the objects in
|
||||||
|
the file body. For example:
|
||||||
|
|
||||||
```
|
```
|
||||||
0 6 %Six entries in table, starting at 0
|
0 6 % Six entries in table, starting at 0
|
||||||
0000000000 65535 f %Special entry
|
0000000000 65535 f % Free entry for object 0
|
||||||
0000000015 00000 n %Object 1 is at byte offset 15
|
0000000015 00000 n % Object 1 is at byte offset 15
|
||||||
0000000074 00000 n %Object 2 is at byte offset 74
|
0000000074 00000 n % Object 2 is at byte offset 74
|
||||||
0000000192 00000 n %etc...
|
0000000192 00000 n % etc...
|
||||||
0000000291 00000 n
|
0000000291 00000 n
|
||||||
0000000409 00000 n %Object 5 is at byte offset 409
|
0000000409 00000 n % Object 5 is at byte offset 409
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
### Trailer
|
### Trailer
|
||||||
The first line of the trailer is just the trailer keyword. This is followed by the trailer dictionary,
|
|
||||||
which contains at least the /Size entry (Number of entries in the cross-reference table)
|
The first line of the trailer is just the `trailer` keyword. This is followed
|
||||||
and the /Root entry (Object number of the document catalog, which is the root element
|
by the trailer dictionary which contains at least the `/Size` entry specifying
|
||||||
of the graph of objects in the body).
|
the number of entries in the cross-reference table and the `/Root` entry which
|
||||||
There follows a line with just the startxref keyword, a line with a single number (the byte offset of
|
references the object for the document catalog which is the root element of the
|
||||||
the start of the cross-reference table within the file), and then the line %%EOF, which signals the
|
graph of objects in the body.
|
||||||
end of the PDF file.
|
|
||||||
|
There follows a line with just the `startxref` keyword, a line with a single
|
||||||
|
number specifying the byte offset of the start of the cross-reference table
|
||||||
|
within the file, and then the line `%%EOF` which signals the end of the PDF
|
||||||
|
file.
|
||||||
|
|
||||||
```
|
```
|
||||||
trailer %Trailer keyword
|
trailer % Trailer keyword
|
||||||
<< %The trailer dictinonary
|
<< % The trailer dictinonary
|
||||||
/Root 5 0 R
|
/Root 5 0 R
|
||||||
/Size 6
|
/Size 6
|
||||||
>>
|
>>
|
||||||
startxref %startxref keyword
|
startxref % startxref keyword
|
||||||
459 %Byte offset of cross-reference table
|
459 % Byte offset of cross-reference table
|
||||||
%%EOF %End-of-file marker
|
%%EOF % End-of-file marker
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user