Arikah Map

Magic number (programming)

In computer programming, a magic number is a constant used to identify the file or data type employed. The term was initially found in a comment in the early Sixth Edition source code of the Unix operating system and, although it has lost its original meaning, it has become part of computer industry lexicon.

Today's magic numbers are often chosen based on (among other factors):


Contents

Magic number origin

Deep in the Sixth Edition source code of the Unix program manager, the exec() service read the executable (binary) image from the file system. The first 20 bytes or so were a header containing the sizes of the program (text) and initialized (global) data areas. Also, the first 16-bit word of the header was compared to two constants to determine if the executable image contained absolute memory address references, relocatable memory references, or the newly implemented paged executable image. Comments in the code referred to these constants as magic numbers without further explanation. Given that there were over 10,000 lines of code and many many constants employed in Unix, this indeed was a curious comment, almost as curious as the You are not expected to understand this.[1] comment used in the context switching section of the program manager.

However, if one spent time examining DEC PDP-11 assembly language listings and debugging PDP-11 programs, the constants had a familiar look. The high order byte was, in fact, the operation code for the PDP-11 branch instruction. Calculating branch offsets revealed that if the magic numbers were executed, they would branch the exec() Unix service over the executable image header data and start the program. In this way these special constants provided an illusion and met the criteria for magic.

The Sixth Edition had paging code and the magic number illusion was further preserved since the exec() service read the file header (meta) data into kernel space and read the executable image into user space, thereby not using the magic number branching feature. The magic number concept was implemented in the Unix linker and loader and magic number branching was probably used in the suite of stand-alone diagnostic programs that came with the Sixth Edition.

The first PDP-11/20 did not have memory protection and, therefore, the absolute address reference model was used.[1] Thus, the pre-Sixth Edition Unix versions read the executable file, with header, into memory and used the branch instruction (the initial magic number) to start the program. As more executable formats were developed, new magic numbers were added by incrementing the branch offset value by one. Magic numbers were also kept in the Sixth Edition kernel as a debugging safety feature.[2]

Magic numbers in files


"Magic numbers" were used to identify file types, then file system types. The term usage has expanded over time, and is now in current use by many other programs across many operating systems. It is a form of in-band signaling.

Many other types of files have content that identifies the type. Detecting such numbers in file content is therefore an effective way of distinguishing between file formats - and can yield further run-time information.

Some examples:

The Unix program file can read and interpret magic numbers from files, and indeed, the file which is used to parse the information is called magic. The Windows utility TrID has a similar purpose.

Magic numbers in protocols

Magic number (programming):Wiki letter w.svgPlease expand this section.
Further information might be found on the talk page or at Requests for expansion.
Please remove this message once the section has been expanded.

Magic numbers in code

The term magic number also refers to the bad programming practice of using numbers directly in source code without explanation. In most cases this makes programs harder to read, understand, and maintain. Although most guides make an exception for the numbers zero and one, it is a good idea to define all other numbers in code as named constants.

For example, to shuffle the values in an array randomly, this pseudocode will do the job:

   for i from 1 to 52       j := i + randomInt(53 - i) - 1       a.swapEntries(i, j)

where a is an array object, the function randomInt(x) chooses a random integer between 1 to x, inclusive, and swapEntries(i, j) swaps the ith and jth entries in the array. In this example, 52 is a magic number. It is considered better programming style to write:

   constant int deckSize := 52   for i from 1 to deckSize       j := i + randomInt(deckSize + 1 - i) - 1       a.swapEntries(i, j)

This is preferable for several reasons:

   function shuffle (int deckSize)      for i from 1 to deckSize          j := i + randomInt(deckSize + 1 - i) - 1          a.swapEntries(i, j)

Disadvantages are:

Allowed use of magic numbers

Although somewhat controversial, most programmers would concede that the use of 0 (zero) and 1 are the only two allowable magic numbers in general code. There are several reasons for this.

for index := 0 to list.count-1 do    DoSomething(index);

Magic debug values

Magic debug values are specific values written to memory during allocation or deallocation, so that it will later be possible to tell whether or not they have become corrupted and to make it obvious when values taken from uninitialized memory are being used.

Memory is usually viewed in hexadecimal, so common values used are often repeated digits or hexspeak.

Famous and common examples include:

Note that most of these are each 8 nibbles (32 bits) long, as most modern computers are designed to work on 32-bit values at a time.

The prevalence of these values in Microsoft technology is no coincidence; they are discussed in detail in Steve Maguire's well-known book Writing Solid Code from Microsoft Press. He gives a variety of criteria for these values, such as:

Since they were often used to mark areas of memory that were essentially empty, some of these terms came to be used in phrases meaning "gone, aborted, flushed from memory"; e.g. "Your program is DEADBEEF".

Pietr Brandehörst's ZUG programming language initialized memory to either 0x0000, 0xDEAD or 0xFFFF in development environment and to 0x0000 in the live environment, on the basis that uninitialised variables should be encouraged to misbehave under development to trap them, but encouraged to behave in a live environment to reduce errors.

See also

References

  1. ^ a b Odd Comments and Strange Doings in Unix[1]
  2. ^ Personal communication with Dennis M. Ritchie

Categories


Cleanup from September 2006 | All pages needing cleanup | Articles to be expanded | Anti-patterns

Find

Find

Find