pipersbytes: May 2018

Wednesday, May 9, 2018

The Pipersbytes Library: Practical Storage Area Networking

When I got a date for an interview with EMC back in late 2006 I started reading this book I got from Amazon called “Practical Storage Area Networking” by Daniel Pollack, published in 2002. Pollack was a systems administrator with America Online at that time and I only bought the book as I had glanced at the acknowledgements page online and spotted that “EMC Corp.” was mentioned. I thought I should really read this as I’ve got an interview in a few weeks’ time with EMC!

Pollack’s book turned out to be one of the best sources of information I had on storage and workload prior to joining EMC and it still sits on my desk to this day if I need a gentle reminder of the principles he described. They are as useful now as they were then. At the time, I hadn’t read a lot of performance books or articles and storage was only something I’d tinkered with in the form of a few DAS boxes and an occasional login to a CX300. Therefore, what Pollack described in his book was new knowledge and I loved every word of it.

If I was to summarize the salient pieces of chapters two and three from his book it would go something like this: Storage workloads aren’t simply just a number of IOPS, it’s also the fact that these IOPS have a size that Pollack called bandwidth, (years later I now prefer the term throughput as opposed to bandwidth when describing the size of moving data). Combined, the number and throughput of IOPS in conjunction with their read/write ratio, random to sequential ratio and cache hit ratios is what determines the type and configuration of solution required to move the customer’s workload within a certain response time objective.

By the time I joined EMC in 2007, many companies including Compellent and EMC where well on the way to developing technologies that would take advantage of two other, very key workload characteristics not covered by Pollack: IO density and skew. I've added these to a summary table below.

Term	Definition
IOPS	Workload requests sent to the array, measured as input/output operations per second (IOPS). The number, size and different characteristics of IOPS puts pressure on different components of a storage system and through innovation, we have learned to use some of these characteristics to our advantage.
Throughput (MB/s or GB/s) (Pollack called it bandwidth)	Throughput is the collective size of all IOPS at a point in time, typically measured as megabytes per second (MB/s), or on today's larger platforms gigabytes per second (GB/s). The size of an individual IOP is known as a block, typically measured as kilobytes (kB). I like to call it “the size of moving data”. When modelling a solution the challenge is to find a series of pipes with a large enough diameter to let the moving data through.
Bandwidth	The maximum throughput of a device or component; think of this as the maximum diameter of a tube or pipe. Be careful as this can be measured in MB/s or megabits per second (Mb/s). The latter typically used to describe the maximum capability of a port, lane or bus.
Read-write ratio	The number of reads and the number of writes.
Random sequential ratio	The number of random IOs and the number of sequential IOs.
Cache hit ratio	The number of IOs that are served from cache.
Response time	The amount of time it takes a storage system to respond to a request.
Skew	A measurement of active and non-active data. Enables placement of data on different drive technologies for cost and performance optimization based on the changing temperature of active and non-active data typically grouped as extent groups or chunk sizes that vary depending on the technology used on the primary storage system.
IO density	Workload to capacity ratio, typically measured as the number of IOPS to either front-end or back-end capacity.

Numeral systems, measurement units and capacity

On September 23^rd, 1999, NASA lost contact with its Mars Climate Orbiter (MCO) as it burned up unexpectedly on the day that should have ended up in celebration of it entering Mar’s orbit. The failure was due to one team using English units (e.g. inches, feet, and pounds) while the other used metric units (e.g. centimeters, meters, and kilograms) for key spacecraft operations that steered the MCO through space. Instead of putting the MCO into Mars’ orbit, the failure put it on a trajectory too close to the planet with the result being it burned up in the Martian atmosphere.

This may be one of the most radical examples of what can happen when organizations do not clarify which measurement units are being used but it serves the purpose of highlighting the importance of understanding this critical factor. To draw a parallel with IT, organizations that don’t understand which measurement units have been expressed could end up with too much or too little capacity for data in transit, storage or both.

Numeral and measurement systems have had internationally defined standards as far back as the 19^th century. Organizations such as the Internal Standards Organization (ISO) and the International Bureau des Poids et Mesures (BIPM), (known in English as the International Bureau of Weights and Measures) maintain and update these in conjunction with many other global organizations. Other organizations such as the International Electrotechnical Commission (IEC) and the Institute of Electrical and Electronics Engineers (IEEE) have worked with the ISO and the BIPM to define specifics for the field of IT.

The standards used in IT for decimal multiples are called the International System of Unit (SI Unit) prefixes and at time of writing (December 4^th, 2017) are documented in the 8^th edition, 2006 of the International System of Units brochure available at the following link: https://www.bipm.org/utils/common/pdf/si_brochure_8.pdf#page=127
This edition recognizes that the SI Unit prefixes should not be used when expressing binary multiples. Instead, the adoption of prefixes for binary multiples as defined in IEC 60027-2 and the IEEE should be used in the field of IT to avoid the incorrect usage of the SI prefixes. IEC60027-2 has since been superseded by IEC 80000-13:2008 IEC - SI Zone > Prefixes for binary multiples

Tables 1 & 2 detail the SI Unit (Decimal) Prefixes and IEC (Binary) Prefixes, including their names, factors, and origins.

Value (No. bytes)	Unit Prefix		Name	Factor	Origin
Value (No. bytes)	Name	Symbol	Name	Factor	Origin
1,000	kilo	kB	Kilobyte	10³	thousand
1,000,000	mega	MB	Megabyte	10⁶	million
1,000,000,000	giga	GB	Gigabyte	10⁹	billion
1,000,000,000,000	tera	TB	Terabyte	10¹²	trillion
1,000,000,000,000,000	peta	PB	Petabyte	10¹⁵	quadrillion
1,000,000,000,000,000,000	exa	XB	Exabyte	10¹⁸	quintillion

Table 1 SI Unit Prefixes (Decimal)

Value (No. bytes)	Unit Prefix		Name	Factor	Origin
Value (No. bytes)	Name	Symbol	Name	Factor	Origin
1,024	kibi	kiB	Kibibyte	2¹⁰	Kilobinary ~thousand
1,048,576	mebi	MiB	Mebibyte	2²⁰	Megabinary ~million
1,073,741,824	gibi	GiB	Gibibyte	2³⁰	Gigabinary ~billion
1,099,511,627,776	tebi	TiB	Tebibyte	2⁴⁰	Terabinary ~trillion
1,125,899,906,842,620	pebi	PiB	Pebibyte	2⁵⁰	Petabinary ~quadrillion
1,152,921,504,606,850,000	exbi	XiB	Exbibyte	2⁶⁰	Exabinary ~quintillion

Table 2 IEC Prefixes (Binary)

Using the data from Table 1 and Table 2 let’s say we express our need for a new capacity requirement as 512 TB, as opposed to the actual capacity requirement of 512 TiB.512TB, equals 512,000,000,000,000 bytes, whereas 512TiB equals 562,949,953,421,312 bytes.

Just like the error that led to the failure of the MCO, in this example using the incorrect measurement unit would result in a shortfall of 50,949,953,421,312 bytes, or 46.34TiB.
Now, there is some good news for the capacity ranges that are common on today’s storage platforms that may not result in something as catastrophic as the Mars example! Typical primary storage requirements can be measured in or around trillions of bytes (TB or TiB) with a growing number creeping into the quadrillions (PB or PiB).

At these ranges, the margin of error if the measurements are mistaken is that you’ll either lose or gain up to 9.95%. On its own, this may not necessarily represent a problem as some (but not all) solutions will observe just marginally more than what has been requested to cover things like pool reserved capacities or drive protection overheads. However, if you’re on the side of the 9.95% shortfall and this gets compounded with an assumption on data reduction technologies that is beyond what the system can do for the workload in question – you may find the solution short on capacity right out of the gates.

Pages

Wednesday, May 9, 2018

The Pipersbytes Library: Practical Storage Area Networking

Numeral systems, measurement units and capacity