Multiple fixes to allow PDFio to read more edge-case PDFs.

- Update _pdfioFileGets to allow for really long lines where it doesn't matter if we lose the end of the line. - Update "startxref" detection at the end of the file. - Refactor repair logic so that you just get a single WARNING about the repair (debug messages available for testing) - Allow whitespace after the "obj" in the object header. - Make sure to close xref stream on error. - Update predictor code to support Colors <= 32 (some implementations set Colors to the number of bytes per record in the xref stream, which prevents the predictor from doing anything...) - Allow CR CR in xref table. - Clear old trailer/root/pages/etc. objects when repairing, update existing objects that were already found in load_xref. - Don't set current object in pdfioObjectCreate/OpenStream if the stream can't be created/opened.
2025-09-01 16:52:01 +02:00 · 2025-04-24 11:09:54 -04:00
parent 278ddb7fa7
commit cad8f450ab
6 changed files with 148 additions and 84 deletions
--- a/test-corpus.sh
+++ b/test-corpus.sh
@@ -18,12 +18,22 @@ if test $# = 0; then
 fi

 for file in $(find "$@" -name \*.pdf -print); do
+    # Don't worry about test files containing MIME garbage...
+    (head -4 $file | grep -q Content-Type) && continue;
+
+    # Or test files containing MacBinary garbage...
+    (file $file | grep -q MacBinary) && continue;
+
+    # Don't worry about test files that Xpdf can't handle...
    pdfinfo $file >/dev/null 2>&1 || continue;

+    # Run testpdfio to test loading the file...
    ./testpdfio $file >$file.log 2>&1
    if test $? = 0; then
+    	# Passed
        rm -f $file.log
    else
+    	# Failed, preserve log and write filename to stdout...
        echo $file
    fi
 done