Pages

Wednesday, May 9, 2018

Numeral systems, measurement units and capacity

On September 23rd, 1999,  NASA lost contact with its Mars Climate Orbiter (MCO) as it burned up unexpectedly on the day that should have ended up in celebration of it entering Mar’s orbit. The failure was due to one team using English units (e.g. inches, feet, and pounds) while the other used metric units (e.g. centimeters, meters, and kilograms) for key spacecraft operations that steered the MCO through space. Instead of putting the MCO into Mars’ orbit, the failure put it on a trajectory too close to the planet with the result being it burned up in the Martian atmosphere.

This may be one of the most radical examples of what can happen when organizations do not clarify which measurement units are being used but it serves the purpose of highlighting the importance of understanding this critical factor. To draw a parallel with IT, organizations that don’t understand which measurement units have been expressed could end up with too much or too little capacity for data in transit, storage or both.

Numeral and measurement systems have had internationally defined standards as far back as the 19th century. Organizations such as the Internal Standards Organization (ISO) and the International Bureau des Poids et Mesures (BIPM), (known in English as the International Bureau of Weights and Measures) maintain and update these in conjunction with many other global organizations. Other organizations such as the International Electrotechnical Commission (IEC) and the Institute of Electrical and Electronics Engineers (IEEE) have worked with the ISO and the BIPM to define specifics for the field of IT.




The standards used in IT for decimal multiples are called the International System of Unit (SI Unit) prefixes and at time of writing (December 4th, 2017) are documented in the 8th edition, 2006  of the International System of Units brochure available at the following link: https://www.bipm.org/utils/common/pdf/si_brochure_8.pdf#page=127
This edition recognizes that the SI Unit prefixes should not be used when expressing binary multiples. Instead, the adoption of prefixes for binary multiples as defined in IEC 60027-2 and the IEEE should be used in the field of IT to avoid the incorrect usage of the SI prefixes. IEC60027-2 has since been superseded by IEC 80000-13:2008 IEC - SI Zone > Prefixes for binary multiples

Tables 1 & 2 detail the SI Unit (Decimal) Prefixes and IEC (Binary) Prefixes, including their names, factors, and origins.


Value (No. bytes)
Unit Prefix
Name
Factor
Origin
Name
Symbol
1,000
kilo
kB
Kilobyte
103
thousand
1,000,000
mega
MB
Megabyte
106
million
1,000,000,000
giga
GB
Gigabyte
109
billion
1,000,000,000,000
tera
TB
Terabyte
1012
trillion
1,000,000,000,000,000
peta
PB
Petabyte
1015
quadrillion
1,000,000,000,000,000,000
exa
XB
Exabyte
1018
quintillion
Table 1 SI Unit Prefixes (Decimal)




Value (No. bytes)
Unit Prefix
Name
Factor
Origin
Name
Symbol
1,024
kibi
kiB
Kibibyte
210
Kilobinary ~thousand
1,048,576
mebi
MiB
Mebibyte
220
Megabinary ~million
1,073,741,824
gibi
GiB
Gibibyte
230
Gigabinary ~billion
1,099,511,627,776
tebi
TiB
Tebibyte
240
Terabinary ~trillion
1,125,899,906,842,620
pebi
PiB
Pebibyte
250
Petabinary ~quadrillion
1,152,921,504,606,850,000
exbi
XiB
Exbibyte
260
Exabinary ~quintillion

Table 2 IEC Prefixes (Binary)


Using the data from Table 1 and Table 2 let’s say we express our need for a new capacity requirement as 512 TB, as opposed to the actual capacity requirement of 512 TiB.512TB, equals 512,000,000,000,000 bytes, whereas 512TiB equals 562,949,953,421,312 bytes.

Just like the error that led to the failure of the MCO, in this example using the incorrect measurement unit would result in a shortfall of 50,949,953,421,312 bytes, or 46.34TiB.
Now, there is some good news for the capacity ranges that are common on today’s storage platforms that may not result in something as catastrophic as the Mars example!  Typical primary storage requirements can be measured in or around trillions of bytes (TB or TiB) with a growing number creeping into the quadrillions (PB or PiB).

At these ranges, the margin of error if the measurements are mistaken is that you’ll either lose or gain up to 9.95%.  On its own, this may not necessarily represent a problem as some (but not all) solutions will observe just marginally more than what has been requested to cover things like pool reserved capacities or drive protection overheads.  However, if you’re on the side of the 9.95% shortfall and this gets compounded with an assumption on data reduction technologies that is beyond what the system can do for the workload in question – you may find the solution short on capacity right out of the gates.

No comments:

Post a Comment