mirror of
				https://github.com/michaelrsweet/pdfio.git
				synced 2025-10-31 10:26:22 +01:00 
			
		
		
		
	Clean up updated docos (Issue #78)
This commit is contained in:
		
							
								
								
									
										135
									
								
								doc/pdfio.md
									
									
									
									
									
								
							
							
						
						
									
										135
									
								
								doc/pdfio.md
									
									
									
									
									
								
							| @@ -118,17 +118,20 @@ that are defined in a separate header file: | ||||
| ```c | ||||
| #include <pdfio-content.h> | ||||
| ``` | ||||
|  | ||||
|  | ||||
| Understanding PDF Files | ||||
| ----------------------- | ||||
|  | ||||
| A PDF file provides data and commands for displaying pages of graphics and text,  | ||||
| and is structured in a way that allows it to be displayed in the same way across  | ||||
| multiple devices and platforms.   | ||||
| The following is a PDF which shows "Hello, World!" on one page: | ||||
| A PDF file provides data and commands for displaying pages of graphics and text, | ||||
| and is structured in a way that allows it to be displayed in the same way across | ||||
| multiple devices and platforms.  The following is a PDF which shows "Hello, | ||||
| World!" on one page: | ||||
|  | ||||
| ``` | ||||
| %PDF-1.0                              %Header starts here | ||||
| %PDF-1.0                        % Header starts here | ||||
| %âãÏÓ | ||||
| 1 0 obj                               %Body starts here | ||||
| 1 0 obj                         % Body starts here | ||||
| << | ||||
| /Kids [2 0 R] | ||||
| /Count 1 | ||||
| @@ -175,7 +178,7 @@ endobj | ||||
| /Type /Catalog | ||||
| >> | ||||
| endobj | ||||
| xref                               %Cross-reference table starts here | ||||
| xref                            % Cross-reference table starts here | ||||
| 0 6 | ||||
| 0000000000 65535 f | ||||
| 0000000015 00000 n | ||||
| @@ -183,7 +186,7 @@ xref                               %Cross-reference table starts here | ||||
| 0000000192 00000 n | ||||
| 0000000291 00000 n | ||||
| 0000000409 00000 n | ||||
| trailer                            %Trailer starts here | ||||
| trailer                         % Trailer starts here | ||||
| << | ||||
| /Root 5 0 R | ||||
| /Size 6 | ||||
| @@ -193,29 +196,40 @@ startxref | ||||
| %%EOF | ||||
| ``` | ||||
|  | ||||
| ### Header | ||||
| This is the first line of a PDF File. This specifies the version of PDF Format used.   | ||||
| For Example: '%PDF-1.0' | ||||
|  | ||||
| Since PDF files almost always contain binary data, they can become corrupted if line  | ||||
| endings are changed (for example, if the file is transferred over FTP in text mode).  | ||||
| To allow legacy file transfer programs to determine that the file is binary, it is  | ||||
| usual to include some bytes withcharacter codes higher than 127 in the header. | ||||
| - For example: %âãÏÓ | ||||
| - The percent sign indicates another header line, the other few bytes are arbitrary | ||||
| character codes in excess of 127. So, the whole header in our example is: | ||||
| ### Header | ||||
|  | ||||
| The header is the first line of a PDF file that specifies the version of the PDF | ||||
| format that has been used, for example `%PDF-1.0`. | ||||
|  | ||||
| Since PDF files almost always contain binary data, they can become corrupted if | ||||
| line endings are changed.  For example, if the file is transferred using FTP in | ||||
| text mode or is edited in Notepad on Windows.  To allow legacy file transfer | ||||
| programs to determine that the file is binary, the PDF standard recommends | ||||
| including some bytes with character codes higher than 127 in the header, for | ||||
| example: | ||||
|  | ||||
| ``` | ||||
| %PDF-1.0   | ||||
| %âãÏÓ | ||||
| ``` | ||||
|  | ||||
| ### Body | ||||
| The file body consists of a sequence of objects, each preceded by an object number,  | ||||
| generation number, and the obj keyword on one line, and followed by the endobj keyword  | ||||
| on another. For Example: | ||||
| The percent sign indicates a comment line while the other few bytes are | ||||
| arbitrary character codes in excess of 127.  So, the whole header in our example | ||||
| is: | ||||
|  | ||||
| ```   | ||||
| ``` | ||||
| %PDF-1.0 | ||||
| %âãÏÓ | ||||
| ``` | ||||
|  | ||||
|  | ||||
| ### Body | ||||
|  | ||||
| The file body consists of a sequence of objects, each preceded by an object | ||||
| number, generation number, and the obj keyword on one line, and followed by the | ||||
| endobj keyword on another.  For example: | ||||
|  | ||||
| ``` | ||||
| 1 0 obj | ||||
| << | ||||
| /Kids [2 0 R] | ||||
| @@ -225,51 +239,60 @@ on another. For Example: | ||||
| endobj | ||||
| ``` | ||||
|  | ||||
| Here, the object number is 1, and the generation number is 0 (it almost always is).  | ||||
| The content for object 1 is in between the two lines 1 0 obj and endobj.   | ||||
| In this case, it’s the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>> | ||||
| In this example, the object number is 1 and the generation number is 0, meaning | ||||
| it is the first version of the object.  The content for object 1 is between the | ||||
| initial `1 0 obj` and trailing `endobj` lines.  In this case, the content is the | ||||
| dictionary `<</Kids [2 0 R] /Count 1 /Type /Pages>>`. | ||||
|  | ||||
|  | ||||
| ### Cross-Reference Table | ||||
|  | ||||
| The cross-reference table lists the byte offset of each object in the file body. | ||||
| This allows random access to objects, meaning they don't have to be read in order.   | ||||
| Objects that are not used are never read, making the process efficient. | ||||
| Operations like counting the number of pages in a PDF document are fast, even in large files. | ||||
| Each object has an object number and a generation number. | ||||
|   - Generation numbers are used when a cross-reference table entry is reused. | ||||
|   - For simplicity, we will assume generation numbers to be always zero and ignore them.   | ||||
| The cross-reference table consists of: | ||||
|   - Header line that indicates the number of entries. | ||||
|   - Special entry (the first entry). | ||||
|   - One line for each of the object in the file body. | ||||
| This allows random access to objects, meaning they don't have to be read in | ||||
| order.  Objects that are not used are never read, making the process efficient. | ||||
| Operations like counting the number of pages in a PDF document are fast, even in | ||||
| large files. | ||||
|  | ||||
| Each object has an object number and a generation number.  Generation numbers | ||||
| are used when a cross-reference table entry is reused.  For simplicity, we will | ||||
| assume generation numbers to be always zero and ignore them.  The | ||||
| cross-reference table consists of a header line that indicates the number of | ||||
| entries, a free entry line for object 0, and a line for each of the objects in | ||||
| the file body.  For example: | ||||
|  | ||||
| ``` | ||||
| 0 6                  %Six entries in table, starting at 0 | ||||
| 0000000000 65535 f   %Special entry | ||||
| 0000000015 00000 n   %Object 1 is at byte offset 15 | ||||
| 0000000074 00000 n   %Object 2 is at byte offset 74 | ||||
| 0000000192 00000 n   %etc... | ||||
| 0000000291 00000 n   | ||||
| 0000000409 00000 n   %Object 5 is at byte offset 409 | ||||
| 0 6                             % Six entries in table, starting at 0 | ||||
| 0000000000 65535 f              % Free entry for object 0 | ||||
| 0000000015 00000 n              % Object 1 is at byte offset 15 | ||||
| 0000000074 00000 n              % Object 2 is at byte offset 74 | ||||
| 0000000192 00000 n              % etc... | ||||
| 0000000291 00000 n | ||||
| 0000000409 00000 n              % Object 5 is at byte offset 409 | ||||
| ``` | ||||
|  | ||||
|  | ||||
| ### Trailer | ||||
| The first line of the trailer is just the trailer keyword. This is followed by the trailer dictionary,  | ||||
| which contains at least the /Size entry (Number of entries in the cross-reference table)  | ||||
| and the /Root entry (Object number of the document catalog, which is the root element  | ||||
| of the graph of objects in the body).   | ||||
| There follows a line with just the startxref keyword, a line with a single number (the byte offset of  | ||||
| the start of the cross-reference table within the file), and then the line %%EOF, which signals the  | ||||
| end of the PDF file. | ||||
|  | ||||
| The first line of the trailer is just the `trailer` keyword.  This is followed | ||||
| by the trailer dictionary which contains at least the `/Size` entry specifying | ||||
| the number of entries in the cross-reference table and the `/Root` entry which | ||||
| references the object for the document catalog which is the root element of the | ||||
| graph of objects in the body. | ||||
|  | ||||
| There follows a line with just the `startxref` keyword, a line with a single | ||||
| number specifying the byte offset of the start of the cross-reference table | ||||
| within the file, and then the line `%%EOF` which signals the end of the PDF | ||||
| file. | ||||
|  | ||||
| ``` | ||||
| trailer          %Trailer keyword | ||||
| <<               %The trailer dictinonary | ||||
| trailer                         % Trailer keyword | ||||
| <<                              % The trailer dictinonary | ||||
| /Root 5 0 R | ||||
| /Size 6 | ||||
| >> | ||||
| startxref        %startxref keyword | ||||
| 459              %Byte offset of cross-reference table | ||||
| %%EOF            %End-of-file marker | ||||
| startxref                       % startxref keyword | ||||
| 459                             % Byte offset of cross-reference table | ||||
| %%EOF                           % End-of-file marker | ||||
| ``` | ||||
|  | ||||
|  | ||||
|   | ||||
		Reference in New Issue
	
	Block a user