Update pdfio.md

This commit is contained in:
ThePhatak 2024-10-14 13:14:59 +05:30 committed by GitHub
parent 2cadfd8a1e
commit 853fa4fe8f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -146,13 +146,16 @@ Since PDF files almost always contain binary data, they can become corrupted if
- For example: %âãÏÓ
- The percent sign indicates another header line, the other few bytes are arbitrary character codes in excess of 127. So, the whole header in our example is:
```
%PDF-1.0
%âãÏÓ
```
### Body
The file body consists of a sequence of objects, each preceded by an object number, generation number, and the obj keyword on one line, and followed by the endobj keyword on another.
- For Example:
'''
```
1 0 obj
<<
/Kids [2 0 R]
@ -160,8 +163,10 @@ The file body consists of a sequence of objects, each preceded by an object numb
/Type /Pages
>>
endobj
'''
- Here, the object number is 1, and the generation number is 0 (it almost always is). The content for object 1 is in between the two lines 1 0 obj and endobj. In this case, its the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>>
```
Here, the object number is 1, and the generation number is 0 (it almost always is). The content for object 1 is in between the two lines 1 0 obj and endobj.
In this case, its the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>>
### Cross-Reference Table
The cross-reference table lists the byte offset of each object in the file body.
@ -170,35 +175,36 @@ Objects that are not used are never read, making the process efficient.
Operations like counting the number of pages in a PDF document are fast, even in large files.
Each object has an object number and a generation number.
- Generation numbers are used when a cross-reference table entry is reused.
- For simplicity, we would assume generation numbers to be always zero and ignore them.
- The cross-reference table consists of:
- For simplicity, we would assume generation numbers to be always zero and ignore them.
The cross-reference table consists of:
- Header line that indicates the number of entries.
- Special entry (the first entry).
- One line for each of the object in the file body.
'''
**0 6 Six entries in table, starting at 0**
0000000000 65535 **f Special entry**
0000000015 00000 **n Object 1 is at byte offset 15**
0000000074 00000 **n Object 2 is at byte offset 74**
0000000192 00000 **n etc...**
0000000291 00000 **n**
0000000409 00000 **n Object 5 is at byte offset 409**
'''
```
0 6 **Six entries in table, starting at 0**
0000000000 65535 **f Special entry**
0000000015 00000 **n Object 1 is at byte offset 15**
0000000074 00000 **n Object 2 is at byte offset 74**
0000000192 00000 **n etc...**
0000000291 00000 **n**
0000000409 00000 **n Object 5 is at byte offset 409**
```
### Trailer
- The first line of the trailer is just the trailer keyword. This is followed by the trailer dictionary, which contains at least the /Size entry (which gives the number of entries in the cross-reference table) and the /Root entry (which gives the object number of the document catalog, which is the root element of the graph of objects in the body).
- There follows a line with just the startxref keyword, a line with a single number (the byte offset of the start of the cross-reference table within the file), and then the line %%EOF, which signals the end of the PDF file.
'''
trailer **Trailer keyword**
<< **The trailer dictinonary**
```
trailer **Trailer keyword**
<< **The trailer dictinonary**
/Root 5 0 R
/Size 6
>>
startxref **startxref keyword**
459 **Byte offset of cross-reference table**
%%EOF **End-of-file marker**
'''
startxref **startxref keyword**
459 **Byte offset of cross-reference table**
%%EOF **End-of-file marker**
```
Reading PDF Files
-----------------