Today this is my last post at blogspot as I'm transitioning to another site over @ pipersbytes.blog
Farewell blogspot!
pipersbytes
bits and bytes from the tartan data centre
Wednesday, November 20, 2019
Saturday, July 7, 2018
Thoughts on overload and redundancy
Term
|
Definition
|
Overload
|
To load something to excess
|
Redundancy
|
The inclusion of extra components which are not strictly necessary to functioning, in case of failure of other components.
|
Introduction
Earlier in the year, I read an article on the many different types of failures encountered during the building of the Humber Bridge in the UK. This article put me in search of other bridge-related failures but this time I looked for examples where overload was a contributing factor in the collapse of the bridge itself. Some examples go to great depths detailing how particular bridges were sized, the materials systems and processes used to build them, what caused them to fail, and the various governmental and engineering innovations that followed failure events to mitigate future risk. There is one bridge failure that I explored over a number of books and various online materials as this event was to prove pivotal in the way that it acted as a catalyst for governments and engineers to come up with new approaches for the design, regulation and lifecycle management of bridges in the United States.
The Silver Bridge Collapse
At approximately 5pm on Friday, December 15th 1967 the Silver Bridge collapsed into the Ohio River – tragically killing forty-six people.
At the time the bridge was built in 1928 the typical car was a Ford Model-T that weighed around 1,600 lbs and local law prohibited any truck that was more than 20,000 lbs. The bridge was designed with a safety factor of 1.5 so was capable of supporting cars up to 2,400 lbs and trucks up to 30,000 lbs.
As the years passed, vehicles got heavier, and by the sixties, a typical car now weighed 4,000 lbs, with trucks weighing up to 60,000 to 70,000 lbs. The loads had almost tripled.
In the years before its collapse back to back traffic was typical and the tools and processes at the time used for bridge inspection were not capable of identifying a fracture that appeared in one of the eye-bars. Stress corrosion and corrosion fatigue where listed as the probable cause with loading being a major factor that helped collapse the bridge. The following year, then President Lyndon Johnson and Congress brought into law the National Bridge Inspection Standards, and the United States Department of Transport enforced tighter regulations around weight limits. The regulations also enforced regular inspection and brought with it the start of a national database holding construction and inspection information of all federal bridges.
Inspection, loading and redundancy are key elements associated with bridge engineering, and indeed they are also crucial elements of modern information technology infrastructure systems.
Taking DellEMCs VMAX & PowerMax product range as an example (the other products have similar), some great tools enable the presales sizing function and post-sale performance management.
- Sizing: VMAX Sizer Gives subject matter experts from Dell EMC and selected partners the ability to size a solution based on your business objectives, capacity and workload requirements.
- Performance Management: Unisphere offers real-time component health, response time alerting, component performance utilization, capacity utilization and enables pro-active decision making so you can make informed decisions about when its time to scale-up or scale-out.
- Maintenance: Secure Remote Services (formerly ESRS) gives subject matter experts from Dell EMC secure access to systems during support activities and regular maintenance.
Talk to your local sales team!
Thanks for reading and please feel free to leave comments or feedback.
Wednesday, May 9, 2018
The Pipersbytes Library: Practical Storage Area Networking
When I got a date for an interview with EMC back in late 2006 I started reading this book I got from Amazon called “Practical Storage Area Networking” by Daniel Pollack, published in 2002. Pollack was a systems administrator with America Online at that time and I only bought the book as I had glanced at the acknowledgements page online and spotted that “EMC Corp.” was mentioned. I thought I should really read this as I’ve got an interview in a few weeks’ time with EMC!
Pollack’s book turned out to be one of the best sources of information I had on storage and workload prior to joining EMC and it still sits on my desk to this day if I need a gentle reminder of the principles he described. They are as useful now as they were then. At the time, I hadn’t read a lot of performance books or articles and storage was only something I’d tinkered with in the form of a few DAS boxes and an occasional login to a CX300. Therefore, what Pollack described in his book was new knowledge and I loved every word of it.
If I was to summarize the salient pieces of chapters two and three from his book it would go something like this: Storage workloads aren’t simply just a number of IOPS, it’s also the fact that these IOPS have a size that Pollack called bandwidth, (years later I now prefer the term throughput as opposed to bandwidth when describing the size of moving data). Combined, the number and throughput of IOPS in conjunction with their read/write ratio, random to sequential ratio and cache hit ratios is what determines the type and configuration of solution required to move the customer’s workload within a certain response time objective.
By the time I joined EMC in 2007, many companies including Compellent and EMC where well on the way to developing technologies that would take advantage of two other, very key workload characteristics not covered by Pollack: IO density and skew. I've added these to a summary table below.
Term
|
Definition
|
IOPS
|
|
Throughput (MB/s or GB/s)
(Pollack called it bandwidth)
|
|
Bandwidth
|
|
Read-write ratio
|
|
Random sequential ratio
|
|
Cache hit ratio
|
|
Response time
|
|
Skew
|
|
IO density
|
|
Labels:
Bandwidth,
Daniel Pollack,
IO density,
IOPS,
Response Time,
Skew,
Throughput
Numeral systems, measurement units and capacity
On September 23rd, 1999, NASA lost contact with its Mars Climate Orbiter (MCO) as it burned up unexpectedly on the day that should have ended up in celebration of it entering Mar’s orbit. The failure was due to one team using English units (e.g. inches, feet, and pounds) while the other used metric units (e.g. centimeters, meters, and kilograms) for key spacecraft operations that steered the MCO through space. Instead of putting the MCO into Mars’ orbit, the failure put it on a trajectory too close to the planet with the result being it burned up in the Martian atmosphere.
This may be one of the most radical examples of what can happen when organizations do not clarify which measurement units are being used but it serves the purpose of highlighting the importance of understanding this critical factor. To draw a parallel with IT, organizations that don’t understand which measurement units have been expressed could end up with too much or too little capacity for data in transit, storage or both.
Numeral and measurement systems have had internationally defined standards as far back as the 19th century. Organizations such as the Internal Standards Organization (ISO) and the International Bureau des Poids et Mesures (BIPM), (known in English as the International Bureau of Weights and Measures) maintain and update these in conjunction with many other global organizations. Other organizations such as the International Electrotechnical Commission (IEC) and the Institute of Electrical and Electronics Engineers (IEEE) have worked with the ISO and the BIPM to define specifics for the field of IT.
The standards used in IT for decimal multiples are called the International System of Unit (SI Unit) prefixes and at time of writing (December 4th, 2017) are documented in the 8th edition, 2006 of the International System of Units brochure available at the following link: https://www.bipm.org/utils/common/pdf/si_brochure_8.pdf#page=127
This edition recognizes that the SI Unit prefixes should not be used when expressing binary multiples. Instead, the adoption of prefixes for binary multiples as defined in IEC 60027-2 and the IEEE should be used in the field of IT to avoid the incorrect usage of the SI prefixes. IEC60027-2 has since been superseded by IEC 80000-13:2008 IEC - SI Zone > Prefixes for binary multiples
Tables 1 & 2 detail the SI Unit (Decimal) Prefixes and IEC (Binary) Prefixes, including their names, factors, and origins.
Value (No. bytes)
|
Unit Prefix
|
Name
|
Factor
|
Origin
| |
Name
|
Symbol
| ||||
1,000
|
kilo
|
kB
|
Kilobyte
|
103
|
thousand
|
1,000,000
|
mega
|
MB
|
Megabyte
|
106
|
million
|
1,000,000,000
|
giga
|
GB
|
Gigabyte
|
109
|
billion
|
1,000,000,000,000
|
tera
|
TB
|
Terabyte
|
1012
|
trillion
|
1,000,000,000,000,000
|
peta
|
PB
|
Petabyte
|
1015
|
quadrillion
|
1,000,000,000,000,000,000
|
exa
|
XB
|
Exabyte
|
1018
|
quintillion
|
Value (No. bytes)
|
Unit Prefix
|
Name
|
Factor
|
Origin
| |
Name
|
Symbol
| ||||
1,024
|
kibi
|
kiB
|
Kibibyte
|
210
|
Kilobinary ~thousand
|
1,048,576
|
mebi
|
MiB
|
Mebibyte
|
220
|
Megabinary ~million
|
1,073,741,824
|
gibi
|
GiB
|
Gibibyte
|
230
|
Gigabinary ~billion
|
1,099,511,627,776
|
tebi
|
TiB
|
Tebibyte
|
240
|
Terabinary ~trillion
|
1,125,899,906,842,620
|
pebi
|
PiB
|
Pebibyte
|
250
|
Petabinary ~quadrillion
|
1,152,921,504,606,850,000
|
exbi
|
XiB
|
Exbibyte
|
260
|
Exabinary ~quintillion
|
Using the data from Table 1 and Table 2 let’s say we express our need for a new capacity requirement as 512 TB, as opposed to the actual capacity requirement of 512 TiB.512TB, equals 512,000,000,000,000 bytes, whereas 512TiB equals 562,949,953,421,312 bytes.
Just like the error that led to the failure of the MCO, in this example using the incorrect measurement unit would result in a shortfall of 50,949,953,421,312 bytes, or 46.34TiB.
Now, there is some good news for the capacity ranges that are common on today’s storage platforms that may not result in something as catastrophic as the Mars example! Typical primary storage requirements can be measured in or around trillions of bytes (TB or TiB) with a growing number creeping into the quadrillions (PB or PiB).
Now, there is some good news for the capacity ranges that are common on today’s storage platforms that may not result in something as catastrophic as the Mars example! Typical primary storage requirements can be measured in or around trillions of bytes (TB or TiB) with a growing number creeping into the quadrillions (PB or PiB).
At these ranges, the margin of error if the measurements are mistaken is that you’ll either lose or gain up to 9.95%. On its own, this may not necessarily represent a problem as some (but not all) solutions will observe just marginally more than what has been requested to cover things like pool reserved capacities or drive protection overheads. However, if you’re on the side of the 9.95% shortfall and this gets compounded with an assumption on data reduction technologies that is beyond what the system can do for the workload in question – you may find the solution short on capacity right out of the gates.
Subscribe to:
Posts (Atom)