Broken & Edge-Case File Library
A curated collection of intentionally broken and edge-case files for testing parsers, validators, and error handling. Each example includes an explanation of why it fails, expected error behavior, and how to fix it.
Encoding Issues
Character encoding mismatches and BOM problems
CSV with UTF-8 BOM
A CSV file starting with a UTF-8 Byte Order Mark (EF BB BF) that causes the first column header to be misread or include invisible characters.
XML with Wrong Encoding Declaration
An XML file that declares UTF-8 encoding in its header but actually contains ISO-8859-1 (Latin-1) encoded characters, causing parser failures on non-ASCII characters.
Structural Errors
Malformed syntax, missing tags, invalid grammar
JSON with Trailing Comma
A JSON file containing a trailing comma after the last element in an array or object. This is valid in JavaScript but invalid in strict JSON (RFC 8259).
JSON with Single Quotes
A JSON file using single quotes instead of double quotes for strings. This is valid JavaScript syntax but violates the JSON specification.
JSON with Comments
A JSON file containing JavaScript-style comments (// or /* */). Comments are not part of the JSON specification despite being common in config files.
JSON with Unquoted Keys
A JSON file where object keys are not wrapped in double quotes. This is valid in JavaScript but invalid in JSON.
CSV with Mixed Delimiters
A CSV file that uses both commas and semicolons as delimiters across different rows, causing parsers to misalign columns.
CSV with Inconsistent Column Count
A CSV file where different rows have different numbers of columns, causing import failures and data misalignment.
XML with Unclosed Tags
An XML file with opening tags that are never closed, violating the well-formedness requirement of XML.
XML with Mismatched Tags
An XML file where opening and closing tags have different names, violating XML well-formedness rules.
PDF with Broken Cross-Reference Table
A PDF file with a damaged or inconsistent cross-reference (xref) table, causing readers to fail when locating objects within the file.
Truncated Files
Incomplete downloads and missing end markers
Truncated PNG File
A PNG file that was cut short during download or transfer, missing its IEND chunk and potentially image data chunks.
Truncated JPEG File
A JPEG file that was cut short, missing its End of Image (EOI) marker (FF D9) and potentially scan data.
Truncated ZIP Archive
A ZIP archive that was cut short during download, missing its central directory record which is stored at the end of the file.
Content Issues
Invalid values, unsupported types, encoding in data
JSON with NaN / Infinity
A JSON file containing NaN, Infinity, or -Infinity as number values. These are valid IEEE 754 floating-point values but are not valid JSON.
CSV with Unescaped Quotes
A CSV file containing double-quote characters inside fields that are not properly escaped, causing parsers to misinterpret field boundaries.
Why Test with Broken Files?
Robust Error Handling
Discover how your application handles malformed input before your users do. Edge-case testing prevents crashes and data corruption in production.
Security Hardening
Malformed files are a common attack vector. Testing with intentionally broken files helps identify injection points and parser vulnerabilities.
Compliance & QA
Many standards (RFC 8259, RFC 4180, XML 1.0) define strict rules. Edge-case files verify your parser correctly rejects invalid input.