Magic Number
Magic numbers are common in programs across many operating systems. Magic numbers implement strongly typed data and are a form of in-band signaling to the controlling program that reads the data type(s) at program run-time. Many files have such constants that identify the contained data. Detecting such constants in files is a simple and effective way of distinguishing between many file formats and can yield further run-time information.
The magic number approach offers better guarantees that the format will be identified correctly, and can often determine more precise information about the file. Since reasonably reliable "magic number" tests can be fairly complex, and each file must effectively be tested against every possibility in the magic database, this approach is relatively inefficient, especially for displaying large lists of files (in contrast, filename and metadata-based methods need check only one piece of data, and match it against a sorted index). Also, data must be read from the file itself, increasing latency as opposed to metadata stored in the directory. Where filetypes don't lend themselves to recognition in this way, the system must fall back to metadata. It is, however, the best way for a program to check if a file it has been told to process is of the correct format: while the file's name or metadata may be altered independently of its content, failing a well-designed magic number test is a pretty sure sign that the file is either corrupt or of the wrong type. On the other hand a valid magic number does not guarantee that the file is not corrupt or of a wrong type.
So-called shebang lines in script files are a special case of magic numbers. Here, the magic number is human-readable text that identifies a specific command interpreter and options to be passed to the command interpreter.
Another operating system using magic numbers is AmigaOS, where magic numbers were called "Magic Cookies" and were adopted as a standard system to recognize executables in Hunk executable file format and also to let single programs, tools and utilities deal automatically with their saved data files, or any other kind of file types when saving and loading data. This system was then enhanced with the Amiga standard Datatype recognition system. Another method was the FourCC method, originating in OSType on Macintosh, later adapted by Interchange File Format (IFF) and derivatives.
Examples
Some examples:
- Compiled Java class files (bytecode) start with hex
CAFEBABE
. When compressed with Pack200 the bytes are changed toCAFED00D
. - GIF image files have the ASCII code for "GIF89a" (
47
49
46
38
39
61
) or "GIF87a" (47
49
46
38
37
61
) - JPEG image files begin with
FF
D8
and end withFF
D9
. JPEG/JFIF files contain the ASCII code for "JFIF" (4A
46
49
46
) as a null terminated string. JPEG/Exif files contain the ASCII code for "Exif" (45
78
69
66
) also as a null terminated string, followed by more metadata about the file. - PNG image files begin with an 8-byte signature which identifies the file as a PNG file and allows detection of common file transfer problems:
\211
P
N
G
\r
\n
\032
\n
(89
50
4E
47
0D
0A
1A
0A
). That signature contains various newline characters to permit detecting unwarranted automated newline conversions, such as transferring the file using FTP with theASCII transfer mode instead of the binary mode.[5] - Standard MIDI music files have the ASCII code for "MThd" (
4D
54
68
64
) followed by more metadata. - Unix script files usually start with a shebang, "#!" (
23
21
) followed by the path to an interpreter. - PostScript files and programs start with "%!" (
25
21
). - PDF files start with "%PDF" (hex
25
50
44
46
). - MS-DOS EXE files and the EXE stub of the Microsoft Windows PE (Portable Executable) files start with the characters "MZ" (
4D
5A
), the initials of the designer of the file format, Mark Zbikowski. The definition allows "ZM" (5A
4D
) as well, but this is quite uncommon. - The Berkeley Fast File System superblock format is identified as either
19
54
01
19
or01
19
54
depending on version; both represent the birthday of the author, Marshall Kirk McKusick. - The Master Boot Record of bootable storage devices on almost all IA-32 IBM PC compatibles has a code of
AA
55
as its last two bytes. - Executables for the Game Boy and Game Boy Advance handheld video game systems have a 48-byte or 156-byte magic number, respectively, at a fixed spot in the header. This magic number encodes a bitmap of the Nintendo logo.
- Amiga software executable Hunk files running on Amiga classic 68000 machines all started with the hexadecimal number $000003f3, nicknamed the "Magic Cookie."
- Amiga's black screen of death called Guru Meditation, in its first version, when the machine hung up for uncertain reasons, showed the hexadecimal number 48454C50, which stands for "HELP" in hexadecimal ASCII characters (48=H, 45=E, 4C=L, 50=P).
- In the Amiga, the only absolute address in the system is hex $0000 0004 (memory location 4), which contains the start location called SysBase, a pointer to exec.library, the so-called kernel of Amiga.
- PEF files, used by Mac OS and BeOS for PowerPC executables, contain the ASCII code for "Joy!" (
4A
6F
79
21
) as a prefix. - TIFF files begin with either
II
orMM
followed by 42 as a two-byte integer in little or big endian byte ordering.II
is for Intel, which uses little endian byte ordering, so the magic number is49
49
2A
00
.MM
is for Motorola, which uses big endian byte ordering, so the magic number is4D
4D
00
2A
. - Unicode text files encoded in UTF-16 often start with the Byte Order Mark to detect endianness (
FE
FF
for big endian andFF
FE
for little endian). UTF-8 text files often start with the UTF-8 encoding of the same character,EF
BB
BF
. - LLVM Bitcode files start with
BC
(0x42, 0x43) - WAD files start with
IWAD
orPWAD
(for Doom),WAD2
(for Quake) andWAD3
(for Half-Life). - Microsoft Office document files start with
D0
CF
11
E0
, which is visually suggestive of the word "DOCFILE0". - Headers in ZIP files begin with "PK" (
50
4B
), the initials of Phil Katz, author of DOS compression utility PKZIP
0 comments:
Post a Comment