Cộng đồng thành viên bong88

You are not logged in.

- Topics: Active | Unanswered

**RosalynQui****Member**- From: Switzerland, Butschwil
- Registered: 2020-08-19
- Posts: 1
- Website

What is Erasure Coding By Bill Jones, Enterprise Architect One of the newest features to arrive in large-scale storage is erasure coding.

What catches people off-guard is that erasure coding protects against data loss.

â€œErasureâ€ anything as â€œdata protectionâ€ makes for an unexpected combination of words.

So, Iâ€™d like to take a few minutes to tell you about how we got here and why.

Why Isnâ€™t Parity Enough.

Storage and server admins are often familiar with the word â€œparity.â€ RAID arrays can use parity to protect against a drive failure (or two).

As an admin, I had known those details for a few years before I learned what parity is.

With odd parity, the total number of ones must be odd; with even parity, the total number of ones must be even.

So, as an example, letâ€™s use even parity to protect an 8-bit piece of information â€“ 01101001.

When we combine the data and the parity bit, we want an even number of 1s.

The data contains four 1s.

Four is an even number.

Therefore the parity bit is 0.

If someone were to erase one of the bits from the sequence 01101001 and if we had access to the parity bit value 0, we can calculate the value of the missing bit.

If the bit that is erased is a 1, then there will be an odd number of ones.

The parity bit of 0 tells us that the number of 1s should be even.

So, the missing bit is a 1.

If the number of remaining bits is even, then the missing bit must be 0.

If the parity bit is missing, we just recalculate it.

In storage systems, .

Each block will reside on a different physical disk.

The size of the block will vary from manufacturer to manufacturer.

But, .

The 8+1 indicates that each stripe contains 8 data blocks and 1 parity block.

The parity calculation is done bit-by-bit across the stripe.

By that, I mean that the first bit of each data block in the stripe is used to calculate the first bit in the parity block; the second bit in each data block is used to calculate the second bit in the parity block; and so on, and so on, until all the bits in the block are protected.

One of the really cool things about this method is that the even parity calculation is a bitwise exclusive OR (a.k.a.

XOR) operation, and bitwise operations like AND, OR, and XOR are pretty fast, and that helps to keep rebuild times relatively low.

There are two problems with this method.

First, a single parity block per stripe can only protect against failure of a single drive.

Using our example above of 0110110 with parity of 0, if we lose two data bits and have an even number of ones remaining, we cannot be certain whether we lost two 1s or two 0s.

If we have an odd number of 1s, did we lose a 0 and a 1 or a 1 and a 0.

Second, as drives get larger and larger, it takes longer and longer to rebuild data when a drive fails.

And, if we havenâ€™t finished rebuilding before a second drive fails, we have the same problem we would have from losing two drives at once.

To address this, storage devices starting using dual-drive protection methods.

The most commonly known is RAID6; however, there are multiple methods available, and the exact method will vary based on manufacturer.

For example, if the first parity block is calculated using a traditional horizontal parity calculation, and the second parity block is calculated using a diagonal pattern (within the stripe), this gives us two independent parity calculations, and we can now recover from a dual drive failure.

With dual-drive protection, we can reduce the risk of a second drive failure occurring while the first drive is rebuilding.

Unfortunately, modern mechanical hard drives are large enough that rebuilds can take multiple days, and dual drive protection methods cannot protect against a triple drive failure.

(And I am one of those admins who has experienced a triple drive failure on a dual drive protected array.) Okay, .

The problem with coming up with new patterns for protecting parity isâ€¦ that it gets complicated.

For the calculations to work, we must ensure that each parity calculation is mathematically independent of the others.

This is difficult enough when doing it with two parity patterns.

But, with each additional parity pattern, careful analysis must be done to ensure that this independence exists and that it exists for every supported RAID group geometry.

For example, a diagonal pattern is relatively easy to envision with 2n data drives within each RAID group, but it is less easy to visualize when there are an odd number of drives.

Add in the complexity of multiple, non-horizontal parity calculations, and it quickly becomes a solution set that makes peopleâ€™s heads swim.

If data protection calculations are complicated with 2 parity blocks and really complicated with 3 parity blocks, what happens when hard drives get large enough that we need 4 or 5 parity blocks.

Erasure coding provides a more scalable method for protecting against larger and larger numbers of drive failures.

Wait, What.

I know this is a lot to take in.

I was a mathlete in school.

I have two patents in video encryption technology.

I do math for fun.

And it still took a lot of work for me to wrap my head around all of this.

So, if youâ€™re feeling lost, .

Parity uses XOR, which is fast.

Using parity to protect against a triple drive failure is really complicated.

Letâ€™s not do that.

Please hang in there.

Weâ€™re almost finished.

So, Erasure Coding Does What.

Erasure coding is a mathematical calculation for protecting data.

It uses a coding method to protect data from unwanted â€œerasuresâ€ â€“ like hard drive failures.

And thatâ€™s where it gets its name.

Erasure coding works like this: Divide the data into m segments Use a system of independent equations to calculate n data protection segments Itâ€™s two steps.

Thatâ€™s pretty simple.

If we want to protect against the â€œerasureâ€ of 3 pieces of data, we need 3 independent equations.

If we want to protect against 4, we need 4 equations.

If we want to have better storage efficiency, we use a higher ratio of data segments to data protection segments.

Erasure coding with a 9+3 algorithm data protection consumes 25% of the raw capacity.

With an erasure coding algorithm of 12+3, the data protection only consumes 20%.

What makes erasure coding so powerful is that it gives us a ready method for scaling our data protection levels as hard drives continue to grow.

In case youâ€™re wondering, a system of equations is a group of equations which share the same variables.

The equations are independent each equation cannot be created from the other equations in the solution set.

For example, these two equations are not independent: X + YÂ Â = 10 2X + 2Y = 20 The second equation can be created by multiplying each side of the first equation by two.

Every value of X and Y that solves the first equation will also solve the second equation.

there cannot be more than one set of values for X and Y that solves both equations.

In this case, the only valid solution is (X = -3) and (Y = 6).

Is There Anything Else I Should Know.

Yes.

Depending on which solution you go with, the erasure coding may occur on a per file (or per object) basis or may occur as part of the RAID implementation.

A given workload or data type may be better suited to one of these methods than it is to another method.

But, that topic is big enough that it probably deserves to be in its own blog post.

erasure coding and parity are both the names of mathematical algorithms which are designed to recover data that is erased or lost.

As hard drives get bigger, the rebuild times have grown from hours to days, and bigger hard drives are on the horizon.

but scaling has proven difficult.

Erasure coding gives us a ready model to scale beyond double-drive failures.

If you would like to learn more about enterprise storage solutions, SAN storage, object storage, etc., please contact [email protected] to setup an appointment.

I do math for fun.

So, in the hopes that somebody wants to see the math, here you goâ€¦ Letâ€™s look at Erasure Coding 9+3 â€“ that is 9 data segments and 3 data protection segments.

To make the math easier to follow, weâ€™ll use integers instead of 4KB data blocks.

Integers are a lot easier to type, and the principles are largely the same.

Also, weâ€™ll solve these equations using algebraic substitutions.

Computers would probably use multi-dimensional matrix-based math, and while matrices can make computation faster, they make explaining what is going on much more difficult.

For our 9 data values, weâ€™ll use A=1; B=3; C=5; D=7; E=2; F=4; G=6; H=8; & I=0 For our system of n independent equations, weâ€™ll use: P1 = A+B+C+D+E+F+G+H+I P2 = A+2B+3C+4D+5E+6F+7G+8H+9I P3 = A+4B+9C+16D+25E+36F+49G+64H+81I So, for our data: P1 = 36 P2 = 190 P3 = 1170 Now, what happens when the data for A, B, and C is â€œerasedâ€.

If we recalculate P1, P2, & P3, we get: P1 = A+B+C+27 P2 = A+2B+3C+168 P3 = A+3B+9C+1112 Substituting our previous calculated values for P1, P2, & P3, we get: 36 Â Â = A+B+C+27 190 Â = A+2B+3C+168 1170 = A+4B+9C+1112 Subtracting the integers from each side, we get: 9 Â = A+B+C 22 = A+2B+3C 58 = A+4B+9C If we subtract (9=A+B+C) from the bottom two equations, we get: 13 = B+2C 49 = 3B+8C If we multiply (13=B+2C) by 3 and then subtract it from the bottom equation, we get: 10 = 2C If we then divide both sides by 2, we get: C = 5 Having solved for C, we can then substitute in 5 for C and get: 9Â = A+B+5 22 = A+2B+15 Subtracting the integers from each side, we get: 4 = A+B 7 = A+2B If we subtract (4=A+B) from the bottom equations, we get: 3 = B Having solved for B, we can then substitute in 3 for B and get: 4 = A+3 Subtracting the integers from each side, we get: 1 = A The math would be different depending on which pieces of data were lost.

The point of the exercise above is to show that we can recover from the â€œerasureâ€ of three pieces of data using an erasure coding of 9+3.

The equations we use to calculate the data protection values do not need to be the ones I supplied here.

They can be any set of mathematically independent equations.

This post is powered by Mix Digital Marketing The post What is Erasure Coding appeared first on Dasher.

.

Also visit my webpage :: w88

Offline