mirror of
https://github.com/michaelrsweet/pdfio.git
synced 2024-12-26 13:28:22 +01:00
Update pdfio.md
This commit is contained in:
parent
2cadfd8a1e
commit
853fa4fe8f
48
doc/pdfio.md
48
doc/pdfio.md
@ -146,13 +146,16 @@ Since PDF files almost always contain binary data, they can become corrupted if
|
|||||||
- For example: %âãÏÓ
|
- For example: %âãÏÓ
|
||||||
- The percent sign indicates another header line, the other few bytes are arbitrary character codes in excess of 127. So, the whole header in our example is:
|
- The percent sign indicates another header line, the other few bytes are arbitrary character codes in excess of 127. So, the whole header in our example is:
|
||||||
|
|
||||||
|
```
|
||||||
%PDF-1.0
|
%PDF-1.0
|
||||||
%âãÏÓ
|
%âãÏÓ
|
||||||
|
```
|
||||||
|
|
||||||
### Body
|
### Body
|
||||||
The file body consists of a sequence of objects, each preceded by an object number, generation number, and the obj keyword on one line, and followed by the endobj keyword on another.
|
The file body consists of a sequence of objects, each preceded by an object number, generation number, and the obj keyword on one line, and followed by the endobj keyword on another.
|
||||||
- For Example:
|
- For Example:
|
||||||
'''
|
|
||||||
|
```
|
||||||
1 0 obj
|
1 0 obj
|
||||||
<<
|
<<
|
||||||
/Kids [2 0 R]
|
/Kids [2 0 R]
|
||||||
@ -160,8 +163,10 @@ The file body consists of a sequence of objects, each preceded by an object numb
|
|||||||
/Type /Pages
|
/Type /Pages
|
||||||
>>
|
>>
|
||||||
endobj
|
endobj
|
||||||
'''
|
```
|
||||||
- Here, the object number is 1, and the generation number is 0 (it almost always is). The content for object 1 is in between the two lines 1 0 obj and endobj. In this case, it’s the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>>
|
|
||||||
|
Here, the object number is 1, and the generation number is 0 (it almost always is). The content for object 1 is in between the two lines 1 0 obj and endobj.
|
||||||
|
In this case, it’s the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>>
|
||||||
|
|
||||||
### Cross-Reference Table
|
### Cross-Reference Table
|
||||||
The cross-reference table lists the byte offset of each object in the file body.
|
The cross-reference table lists the byte offset of each object in the file body.
|
||||||
@ -170,35 +175,36 @@ Objects that are not used are never read, making the process efficient.
|
|||||||
Operations like counting the number of pages in a PDF document are fast, even in large files.
|
Operations like counting the number of pages in a PDF document are fast, even in large files.
|
||||||
Each object has an object number and a generation number.
|
Each object has an object number and a generation number.
|
||||||
- Generation numbers are used when a cross-reference table entry is reused.
|
- Generation numbers are used when a cross-reference table entry is reused.
|
||||||
- For simplicity, we would assume generation numbers to be always zero and ignore them.
|
- For simplicity, we would assume generation numbers to be always zero and ignore them.
|
||||||
- The cross-reference table consists of:
|
The cross-reference table consists of:
|
||||||
- Header line that indicates the number of entries.
|
- Header line that indicates the number of entries.
|
||||||
- Special entry (the first entry).
|
- Special entry (the first entry).
|
||||||
- One line for each of the object in the file body.
|
- One line for each of the object in the file body.
|
||||||
|
|
||||||
'''
|
```
|
||||||
**0 6 Six entries in table, starting at 0**
|
0 6 **Six entries in table, starting at 0**
|
||||||
0000000000 65535 **f Special entry**
|
0000000000 65535 **f Special entry**
|
||||||
0000000015 00000 **n Object 1 is at byte offset 15**
|
0000000015 00000 **n Object 1 is at byte offset 15**
|
||||||
0000000074 00000 **n Object 2 is at byte offset 74**
|
0000000074 00000 **n Object 2 is at byte offset 74**
|
||||||
0000000192 00000 **n etc...**
|
0000000192 00000 **n etc...**
|
||||||
0000000291 00000 **n**
|
0000000291 00000 **n**
|
||||||
0000000409 00000 **n Object 5 is at byte offset 409**
|
0000000409 00000 **n Object 5 is at byte offset 409**
|
||||||
'''
|
```
|
||||||
|
|
||||||
### Trailer
|
### Trailer
|
||||||
- The first line of the trailer is just the trailer keyword. This is followed by the trailer dictionary, which contains at least the /Size entry (which gives the number of entries in the cross-reference table) and the /Root entry (which gives the object number of the document catalog, which is the root element of the graph of objects in the body).
|
- The first line of the trailer is just the trailer keyword. This is followed by the trailer dictionary, which contains at least the /Size entry (which gives the number of entries in the cross-reference table) and the /Root entry (which gives the object number of the document catalog, which is the root element of the graph of objects in the body).
|
||||||
- There follows a line with just the startxref keyword, a line with a single number (the byte offset of the start of the cross-reference table within the file), and then the line %%EOF, which signals the end of the PDF file.
|
- There follows a line with just the startxref keyword, a line with a single number (the byte offset of the start of the cross-reference table within the file), and then the line %%EOF, which signals the end of the PDF file.
|
||||||
'''
|
|
||||||
trailer **Trailer keyword**
|
```
|
||||||
<< **The trailer dictinonary**
|
trailer **Trailer keyword**
|
||||||
|
<< **The trailer dictinonary**
|
||||||
/Root 5 0 R
|
/Root 5 0 R
|
||||||
/Size 6
|
/Size 6
|
||||||
>>
|
>>
|
||||||
startxref **startxref keyword**
|
startxref **startxref keyword**
|
||||||
459 **Byte offset of cross-reference table**
|
459 **Byte offset of cross-reference table**
|
||||||
%%EOF **End-of-file marker**
|
%%EOF **End-of-file marker**
|
||||||
'''
|
```
|
||||||
|
|
||||||
Reading PDF Files
|
Reading PDF Files
|
||||||
-----------------
|
-----------------
|
||||||
|
Loading…
Reference in New Issue
Block a user