mirror of
https://github.com/michaelrsweet/pdfio.git
synced 2024-12-26 05:18:21 +01:00
Update pdfio.md
This commit is contained in:
parent
2cadfd8a1e
commit
853fa4fe8f
48
doc/pdfio.md
48
doc/pdfio.md
@ -146,13 +146,16 @@ Since PDF files almost always contain binary data, they can become corrupted if
|
||||
- For example: %âãÏÓ
|
||||
- The percent sign indicates another header line, the other few bytes are arbitrary character codes in excess of 127. So, the whole header in our example is:
|
||||
|
||||
```
|
||||
%PDF-1.0
|
||||
%âãÏÓ
|
||||
```
|
||||
|
||||
### Body
|
||||
The file body consists of a sequence of objects, each preceded by an object number, generation number, and the obj keyword on one line, and followed by the endobj keyword on another.
|
||||
- For Example:
|
||||
'''
|
||||
|
||||
```
|
||||
1 0 obj
|
||||
<<
|
||||
/Kids [2 0 R]
|
||||
@ -160,8 +163,10 @@ The file body consists of a sequence of objects, each preceded by an object numb
|
||||
/Type /Pages
|
||||
>>
|
||||
endobj
|
||||
'''
|
||||
- Here, the object number is 1, and the generation number is 0 (it almost always is). The content for object 1 is in between the two lines 1 0 obj and endobj. In this case, it’s the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>>
|
||||
```
|
||||
|
||||
Here, the object number is 1, and the generation number is 0 (it almost always is). The content for object 1 is in between the two lines 1 0 obj and endobj.
|
||||
In this case, it’s the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>>
|
||||
|
||||
### Cross-Reference Table
|
||||
The cross-reference table lists the byte offset of each object in the file body.
|
||||
@ -170,35 +175,36 @@ Objects that are not used are never read, making the process efficient.
|
||||
Operations like counting the number of pages in a PDF document are fast, even in large files.
|
||||
Each object has an object number and a generation number.
|
||||
- Generation numbers are used when a cross-reference table entry is reused.
|
||||
- For simplicity, we would assume generation numbers to be always zero and ignore them.
|
||||
- The cross-reference table consists of:
|
||||
- For simplicity, we would assume generation numbers to be always zero and ignore them.
|
||||
The cross-reference table consists of:
|
||||
- Header line that indicates the number of entries.
|
||||
- Special entry (the first entry).
|
||||
- One line for each of the object in the file body.
|
||||
|
||||
'''
|
||||
**0 6 Six entries in table, starting at 0**
|
||||
0000000000 65535 **f Special entry**
|
||||
0000000015 00000 **n Object 1 is at byte offset 15**
|
||||
0000000074 00000 **n Object 2 is at byte offset 74**
|
||||
0000000192 00000 **n etc...**
|
||||
0000000291 00000 **n**
|
||||
0000000409 00000 **n Object 5 is at byte offset 409**
|
||||
'''
|
||||
```
|
||||
0 6 **Six entries in table, starting at 0**
|
||||
0000000000 65535 **f Special entry**
|
||||
0000000015 00000 **n Object 1 is at byte offset 15**
|
||||
0000000074 00000 **n Object 2 is at byte offset 74**
|
||||
0000000192 00000 **n etc...**
|
||||
0000000291 00000 **n**
|
||||
0000000409 00000 **n Object 5 is at byte offset 409**
|
||||
```
|
||||
|
||||
### Trailer
|
||||
- The first line of the trailer is just the trailer keyword. This is followed by the trailer dictionary, which contains at least the /Size entry (which gives the number of entries in the cross-reference table) and the /Root entry (which gives the object number of the document catalog, which is the root element of the graph of objects in the body).
|
||||
- There follows a line with just the startxref keyword, a line with a single number (the byte offset of the start of the cross-reference table within the file), and then the line %%EOF, which signals the end of the PDF file.
|
||||
'''
|
||||
trailer **Trailer keyword**
|
||||
<< **The trailer dictinonary**
|
||||
|
||||
```
|
||||
trailer **Trailer keyword**
|
||||
<< **The trailer dictinonary**
|
||||
/Root 5 0 R
|
||||
/Size 6
|
||||
>>
|
||||
startxref **startxref keyword**
|
||||
459 **Byte offset of cross-reference table**
|
||||
%%EOF **End-of-file marker**
|
||||
'''
|
||||
startxref **startxref keyword**
|
||||
459 **Byte offset of cross-reference table**
|
||||
%%EOF **End-of-file marker**
|
||||
```
|
||||
|
||||
Reading PDF Files
|
||||
-----------------
|
||||
|
Loading…
Reference in New Issue
Block a user