The byte order mark is a tiny invisible character that causes outsized trouble, especially in code and data files. Here is what it is and how to get rid of it.
What the BOM is
The byte order mark, Unicode U+FEFF, is an invisible character that can appear at the very start of a text file. Originally it signaled the byte order and encoding of the file. In UTF-8 it is unnecessary, but many editors and tools add it anyway, where it sits silently at position zero.
You cannot see it, but it is there, before your first visible character.
Why the BOM causes problems
- Broken JSON. A BOM before the opening brace makes many JSON parsers fail with a confusing error, because the first character is not what they expect.
- Stray characters in output. A BOM can appear as
or odd characters at the top of a web page or a printed file. - Failed string comparisons. A file or string with a BOM does not match the same content without one, breaking diffs, tests, and find-and-replace.
- Shell and script errors. A BOM at the top of a script can stop it from running.
How to remove the BOM
Save as UTF-8 without BOM. Most editors let you choose the encoding when saving:
- VS Code: click the encoding in the status bar, choose "Save with Encoding," then "UTF-8" (not "UTF-8 with BOM").
- Notepad++: Encoding menu > "UTF-8" (the option without BOM).
- Sublime: File > Save with Encoding > UTF-8.
Strip it from the text directly. If you are working with copied text rather than a file, paste it into a cleaner that removes U+FEFF. textscrubr strips the BOM along with other invisible characters and shows you that it found one, so you know it is gone.
How to check for a BOM
- Open the file in an editor that shows the encoding; "UTF-8 with BOM" means it is present.
- In a script, check whether the text starts with
\uFEFF. - Paste the text into a cleaner and look for the BOM in its report.
Prevent it from coming back
Set your editor's default encoding to UTF-8 without BOM. For files generated by other tools, add a strip-BOM step to your pipeline so it never reaches code or data that chokes on it.