Here is a fairly robust way to ensure a drive safe to put into service. I have tested this before and caught drives that would have failed shortly after put into prod, and some that would of after it was more than half full.

  1. Check S.M.A.R.T Info: Confirm no (0) Seek Error Rate, Read Error Rate, Reallocated Sector Count, Uncorrectable Sector Count

  2. Run Short S.M.A.R.T test

  3. Repeat Step 1

  4. Run Conveyance S.M.A.R.T test

  5. Repeat Step 1

  6. Run Destructive Badblocks test (read and write)

  7. Repeat Step 1

  8. Perform a FULL Format (Overwrite with Zeros)

  9. Repeat Step 1

  10. Run Extended S.M.A.R.T test

  11. Repeat Step 1

Return the drive if either of the following is true:

A) The formatting speed drops below 80MB/s by more than 10MB/s (my defective one was ~40MB/s from first power-on)

B) The S.M.A.R.T tests show error count increasing at any step

It is also highly advisable to stagger the testing (and repeat some) if you plan on using multiple drives in a pool/raid config. This way the wear on the drives differ, to reduce the likelihood of them failing at the same time. For example, I re-ran either the Full format or badblocks test on some of the drives so some drives have 48 hours of testing, some have 72, some have 96. This way, the chances of a multiple drive failures during rebuild is lower.

  • C-3H_gjP@alien.top
    cake
    B
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    Jeez you’re buring through so much of the drive’s lifespan just checking the damn thing. If a failed drive will cause problems worthy of this amount of burn-in time you need a more robust setup.

    I run all used ebay drives. Except for a glance at the smart data before addng them to the array I don’t test them at all. Just keep an extra drive or two on hand as spares. Life’s easier when you plan for failure instead of fighting it.

    • GolemancerVekk@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Same, except I also use Scrutiny to flag drives for my attention. It makes educated guesses for a pass/fail mark, using analysis of vendor-specific interpretations of SMART values, matched against the failure thresholds from the BackBlaze survey. It can tell you things like “the current value for the Command Timeout attribute for this drive falls into the 1-10% bracket of probability of failure according to BackBlaze”.

      It helps me to plan ahead. If for example I have 3 drives that Scrutiny says “smell funny” it would be nice if I had 2-3 spares on hand rather than just 1. Or if two of those drives happen to be together in a 2-pair mirror perhaps I can swap one somewhere else.