diff --git a/doc/pdfio.md b/doc/pdfio.md index f948bf4..24d21c8 100644 --- a/doc/pdfio.md +++ b/doc/pdfio.md @@ -146,13 +146,16 @@ Since PDF files almost always contain binary data, they can become corrupted if - For example: %âãÏÓ - The percent sign indicates another header line, the other few bytes are arbitrary character codes in excess of 127. So, the whole header in our example is: +``` %PDF-1.0 %âãÏÓ +``` ### Body The file body consists of a sequence of objects, each preceded by an object number, generation number, and the obj keyword on one line, and followed by the endobj keyword on another. - For Example: -''' + +``` 1 0 obj << /Kids [2 0 R] @@ -160,8 +163,10 @@ The file body consists of a sequence of objects, each preceded by an object numb /Type /Pages >> endobj -''' -- Here, the object number is 1, and the generation number is 0 (it almost always is). The content for object 1 is in between the two lines 1 0 obj and endobj. In this case, it’s the dictionary <> +``` + +Here, the object number is 1, and the generation number is 0 (it almost always is). The content for object 1 is in between the two lines 1 0 obj and endobj. +In this case, it’s the dictionary <> ### Cross-Reference Table The cross-reference table lists the byte offset of each object in the file body. @@ -170,35 +175,36 @@ Objects that are not used are never read, making the process efficient. Operations like counting the number of pages in a PDF document are fast, even in large files. Each object has an object number and a generation number. - Generation numbers are used when a cross-reference table entry is reused. - - For simplicity, we would assume generation numbers to be always zero and ignore them. -- The cross-reference table consists of: + - For simplicity, we would assume generation numbers to be always zero and ignore them. +The cross-reference table consists of: - Header line that indicates the number of entries. - Special entry (the first entry). - One line for each of the object in the file body. -''' -**0 6 Six entries in table, starting at 0** -0000000000 65535 **f Special entry** -0000000015 00000 **n Object 1 is at byte offset 15** -0000000074 00000 **n Object 2 is at byte offset 74** -0000000192 00000 **n etc...** -0000000291 00000 **n** -0000000409 00000 **n Object 5 is at byte offset 409** -''' +``` +0 6 **Six entries in table, starting at 0** +0000000000 65535 **f Special entry** +0000000015 00000 **n Object 1 is at byte offset 15** +0000000074 00000 **n Object 2 is at byte offset 74** +0000000192 00000 **n etc...** +0000000291 00000 **n** +0000000409 00000 **n Object 5 is at byte offset 409** +``` ### Trailer - The first line of the trailer is just the trailer keyword. This is followed by the trailer dictionary, which contains at least the /Size entry (which gives the number of entries in the cross-reference table) and the /Root entry (which gives the object number of the document catalog, which is the root element of the graph of objects in the body). - There follows a line with just the startxref keyword, a line with a single number (the byte offset of the start of the cross-reference table within the file), and then the line %%EOF, which signals the end of the PDF file. -''' -trailer **Trailer keyword** -<< **The trailer dictinonary** + +``` +trailer **Trailer keyword** +<< **The trailer dictinonary** /Root 5 0 R /Size 6 >> -startxref **startxref keyword** -459 **Byte offset of cross-reference table** -%%EOF **End-of-file marker** -''' +startxref **startxref keyword** +459 **Byte offset of cross-reference table** +%%EOF **End-of-file marker** +``` Reading PDF Files -----------------