Music Database Maintenance


#1

I think I started seriously ripping my CDs in 2008 or 2009. Since I waited so long, I had the benefit of some good insight. I’ve only used dBpoweramp with its AccurateRip database search engine. I also saved all files in FLAC format using the dBpoweramp recommended compression level of 5. Those that know, the dBpoweramp rip is compared to a database and given, or not, an AccurateRip label with a confidence number (how many accurate comparisons to other rips in the AccurateRip database).

I also initiated a back-up plan to two separate hard-drives. I use a program call GoodSync to make back-ups. Until a couple years ago, I didn’t know much about Checksum and the ability of GoodSync to compare copy Checksums (a more sophisticated comparison of a digital copy). Also during this period (before utilizing Checksum), I had my main music hard-drive crash hard. No problem, I just bought another hard-drive and copied from one of my back-ups.

Just a little more background. I play all my digital music through the Bridge II to a DS Senior. I exclusively use JRiver (technically JRiver Media Center) playing over DLNA and control the music selection using JRemote.

I started to wonder about the integrity of my data. I have heard errors on some songs (I initially chalked it up to distortion within the original recording). Lucky, dBpoweramp sells a program call PerfectTUNES that will search your entire music database and again compare the files to those stored on AccurateRip (the files have to have been saved to a lossless format). To my surprise, I had 50 “corrupted” files and another 50 rips with errors. Luckily, I still have all the CDs. Unluckily, I’m swimming in too many CD purchases (around 2000) and most aren’t stored alphabetically. It was a serious project finding all the CDs.

The bottom line, I was able to replace all but one of the corrupted files and most of the inaccurate rips (some of the discs just won’t rip properly). Note, I do have a rather sizeable HD download directory which PerfectTUNES is unable to compare. Plus, HDTracks doesn’t offer the ability to re-download. Oh well, in this case ignorance is bliss and life will go on.

If you’re at all curious about the integrity of your music files, I strongly recommend PerfectTUNES. The full price is only $36 and discounts are offered if you’ve already purchased other products. The cost seems minimal compared to other tweeks in this nutty hobby. Who said the digital world was easy?


#2

Did you compare your corrupted files to your newly ripped files for sound quality?

In 2013 I made the conscious decision not to buy any more CD’s or LP’s. This decision was based on ecological reasons - CD packaging is awful - and practical: As I age, I want to accumulate less, not more stuff. So I have been mostly listening now to new recordings (classical genre) through Tidal.

But I do wonder about my ripped CD collection. I don’t have the patience as you did to recheck the CD rips, but I do wonder if the data integrity issue with stored files effects sound quality. Or does it make the rip unplayable?


#3

Excellent tip. Even files that are ripped properly can get corrupted over time. It’s one of the reasons I keep multiple archive copies separate from my working drives. The archive drives are only used when I add new music whereas the working drives are running all the time (although they may be allowed to sleep when not in active use).


#4

I didn’t compare the sound quality of re-ripped songs. I also never noticed any files that wouldn’t play, just audio distortion on a few songs (keep in mind the corruption could occur in the metadata or be very minor).

It took most of the weekend just to repair the database. Running the analysis was easy. It took PerfectTUNES about an hour to go through 2000 CDs. Subsequent runs take far less time since the initial results are stored in a cached filed. The patience required to fix the files was something else entirely. Making sure all the Metadata was consistent was a PITA to be honest.


#5

Thanks to amsco15 for mentioning PerfectTUNES – good tool to know about. Digital media do experience problems as they age, a phenomenon sometimes called “bit rot.” (See the Wikipedia page for a good overview of the types of damage to various media.) Not something we like to think about, but it happens.

Linux devices can use a file system called BTRFS which helps avoid this problem through the use of checksums (see here for details). I will be replacing my Synology NAS soon with a newer model that can run BTRFS since I am committed to long-term storage of my music files.

As to rodrigaj’s question, I think it all depends on the exact type and amount of damage.


#6

I don’t know that a digital file can be corrupted over time. It may have been saved with errors, in which case the errors will always be there. But computers today verify all data being read and written whether to disk or in transmission from source to target.

However, audio CD’s are often treated differently than data CD-R’s. While data disks must be bit perfect and often have checksums to verify their content, music CD’s on the other will play with errors and Windows, for instance, will not tell you there were errors.

Usually the best place for error correction is in the CD drive itself with EAC/XLD and other methods. Since DbPowerAmp records all user rips in their own database, they can provide a useful list of the players with the most accurate Rip Rates: https://forum.dbpoweramp.com/showthread.php?37706-CD-DVD-Drive-Accuracy-List-2016

Secondly, a program like DbPowerAmp or Exact Audio Copy should be a requirement for any serious ripping. DbPowerAmp is good because it can verify the rip using an external reference source - a database of previous rips of the same track. The error correction of these types of programs is also very robust and they will report any errors that are found.

I too have used PerfectTunes to quickly verify that there were 20 bad tracks or so from a 1,700/20,000 CD/track library that needed to be re-ripped. Pretty handy tool!


#7
JohnInToronto said

I don’t know that a digital file can be corrupted over time.


No physical medium – digital or analog – remains perfect over time, especially if it’s used. It’s just not possible. Think about the fact that a hard disk has zillions of microscopic pieces that can be magnetized one way or another to determine whether it’s a zero or a one. With repeated use, over a long period (especially if stored in poor environmental conditions), it’s hardly surprising that some get their charge flipped to read the wrong value or lose the ability to hold a charge. Error correction can deal with small errors and listeners may never notice. But that’s not to say it doesn’t happen.

Now don’t get me wrong: I am not an alarmist about such things. Our hardware today is quite reliable over the medium term. I began ripping CDs and purchasing digital downloads several years ago and I intend to keep these for the rest of my life. So I think it’s good to do what I can to avoid problems down the line, especially with downloads where I can’t go back to the original CD.


#8

Yeah, there is a whole school of thought around bit rot over time. Funny thing is, on some systems like QNAP w/Raid, it was self inflicted and they issued specific patches to fix it.

Unfortunately, regardless of the platform, unless the file has a checksum/signature generated when it was created, it’s not possible to tell if it’s corrupt unless you have a backup(s) as a reference.

Synology seems to be making progress: Re - Btrfs file self-healing feature in DSM 6.1.

"In order to provide better stability of the file system, Synology chose Linux RAID over Btrfs RAID by implementing the layers between Btrfs file system and the disks.

This implementation ensures that we have full control of the communication for the highest stability.

In DSM 6.1, we’ve further strengthen the features on Btrfs file system with the ability to recover bit-rot corruption, also featured as “Btrfs file self-healing.”

In order to achieve this goal, all disk read/write operations go through checksum verification to ensure the data consistency.

If checksum mismatch is detected, the file system layer instructs MD layer to read redundant copy (parity or mirror) of the damaged block.

It will then automatically recover the damaged block with the good redundant copy.

With this enhanced protection, we believe general bit-rot corruptions can be recovered automatically by the file system."


#9

Some thoughts:

Most personal computers these days have no ECC on memory so bits changing in RAM quite probably won’t be noticed. How much this happens changes from time to time depending on RAM dencity and the prevalent technology. They are almost all ways better than they were in the 80’s and 90’s, but RAM never has been perfect and we have a lot of it these days which drives up the overall probability of failure per system to almost a guarantee of errors in normal use.

One advantage of flac is that it has a checksum for the audio data and it can be checked for corruption with an integrity scan. (foobar2000, dBpoweramp, JRiver, etc. all have some sort of read and check for errors function.) I also keep md5 checksums for each of my SACD ISOs and I use ad hoc batch scripts to check the current md5 check sums of everything periodically.

Bit rot is real and as drives get bigger and bigger the probability of having an error per drive grows. Still most of the corruption of data on my drives in the past was caused by drives that were failing faster and faster (they all turned out to be external USBs with their own power supplies. None of the USB bus powered USB drives I have, nor my NAS drives have ever failed.)

I periodically get semi consistent data corruption on my NAS (which has RAID with two parity stripes) - when I hear bad data I do a scrub / integrity scan on the NAS which has always “fixed” the errors. I know that they get fixed because I have multiple levels of backup. So, ironically for me, a quality NAS with RAID is less reliable than simple bus powered USB drives…

Speaking of backup, one should always have multiple backups, at least one off site, etc. I used to use a safety deposit box and rotated USB drives thru it. Now I use CrashPlan online backups. I chose them because they understand security and they have a low flat rate no matter how much you use (I’ve got a little over 7TB backed up - it would take a while to reload that even with high speed internet access.)

PerfectTunes/Accurate Rip database aren’t panaceas: They don’t cover a non-trivial amount of my CD music collection nor any of my SACD ISO’s and (at least when I was beta testing PerfectTunes and using the check Accurate Rip functions) they often don’t really know which of the many versions of a disc I have - over time they’ve reported false errors and haven’t noticed some bad rips (both rarely, but enough that I don’t bother with them compared to integrity scans.)


#10

@johnintoronto I have a Synology NAS running DSM 6.1. Is there anything special I need to do to implement the btrfs self healing process? Or is this an automatic feature? In all honesty, I never read the info on what Synology DSM upgrades are doing.


#11

It’s not just a question of the Synology DSM version. I’ve got 6.1 but my old (in computer terms – 2013) NAS won’t use BTRFS. The one I’m planning to get (DS 2018+) can. I’m not sure at what point Synology introduced BTRFS support.

If your hardware does support BTRFS, you need to have set up that file system rather than one of the earlier ones when you got the NAS. It may be possible to move from an earlier file system to BTRFS on an existing machine; I don’t know details about that.

If you are running BTRFS now, AFAIK the self-healing is automatic but someone correct me if I’m wrong about that.


#12

Funny you mention ECC memory - I just consolidated 3 machines onto a SuperMicro 1’ Cube with ECC. It has 4 x hard drives + 2 x SSD bays. Trying out Raid 10 versus the Raid 6 I have on the Synology. Very quiet box, btw.

I ended up rebuilding the volumes on the Synology to support CheckSums at source. Hopefully that helps with the Raid scrubbing which now seems to be a fact of life. Prior to installing the drives I run them through an initialization routine for a few days and toss the ones with higher bad sector counts.

Like hard drives, if you follow the premise of avoiding errors at source then you buy a CD drive ripper with hardware correction ($35?). The overall success of products like DbPowerAmp certainly depends on whether your music is on their library or not, but I also found their disk recovery software good for recovering scratched and damaged disks. It recovered old disks that appeared to been used as beer coasters at one time or another :wink:

Love CrashPlan and have it headless on the Synology so all machines have a local and cloud backup destination.

Very sad that Crashplan is out of the consumer market as of October 23, 2018. Looking for similar alternatives since they were unique in their approach of peer to peer and multiple destinations.


#13

Here is the list of compatible Synology’s that support BTRFS:

https://www.synology.com/en-us/knowledgebase/DSM/tutorial/General/Which_Synology_NAS_models_support_the_Btrfs_file_system

As you can see, most of them are the + models which have better CPU’s/memory.

You specify BTRFS when you create the volume.

So, for an existing NAS you would need to offload the data, delete the current volume and then recreate it in BTRFS.

Perhaps not so trivial… but I’m sure you have a backup already for the NAS? :wink:


#14

Re CrashPlan out of the consumer market: I’m glad that they have no problem with me being a single person business. The price is a little higher for businesses than it was for individuals, but still only a handful of $'s per month and it’s always been worth it.

CUETools is great for fixing the bad bits in a rip if the % of errors isn’t too high. I’ve used it to fix rips of disc’s that dBpoweramp and EAC couldn’t quite read.


#15

@johnintoronto Thank you for the Synology link. I have a DS1513+ which is not on the list. Oh well…


#16

Couldn’t agree more. The main thing I liked with Crashplan was its free peer-to-peer ability since I had ‘a bunch’ of devices and only paid for the family cloud access. CrashPlan Small Business took away the peer-to-peer, so a bit of an ouch now with pricing at $10 per.

With what I would pay for Crashplan Small Business I bought another NAS, gave it to a friend offsite for his local backups and do my main backups to it in off hours.

Personal cloud lite, lol.

Haven’t tried CueTools. Will check it out. Thanks


#17
Ted Smith said - when I hear bad data I do a scrub / integrity scan on the NAS which has always "fixed" the errors. I know that they get fixed because I have multiple levels of backup.
As always, so much to learn. Thanks for the CUETools tip. I'm going to see if it can fix some of my file errors tonight. Second, how do you do a scrub/integrity scan? Is there a good enough "simple" method?

Thanks!!!


#18

Most (all?) NAS’s and other RAID like storage units support something like scrub. It’s a single command and (on most systems) doesn’t take the system down so you can just start it when you want or need it, and not worry about it (even if it takes days, which is typical.) I’ve discovered that I need to so a scrub on my NAS if I get power glitches/outages.

The scrub operation reads all of the data (and perhaps all of the disk drives at a low level) and if anything doesn’t read, doesn’t parity check, doesn’t match the redundant copies, etc., it’s rebuilt with the system’s redundant data and rewritten (probably in different blocks.) You are using some kind of redundant data storage aren’t you? :slight_smile: In addition if the hardware read is what fails, that block of the disk drive is spared out (or simply ignored in the future.)


#19

To expand on Ted’s comments:

These NAS’s support various levels of hard disk testing & diagnostics and all of them can be scheduled.

The main types are SMART testing and RAID scrubbing.

SMART testing occurs on the physical hard drive and can do a simple functional check of the drive or an extended test that tests all the sectors of the drive whether they contain data or not.

If during an extended test any read errors occur then the hard drive will ‘mark’ the sector as bad and ignore it. Extended SMART tests are often used to initialize new drives before they are added to the RAID array.

Note that NAS’s also allow for bad sector notifications. i.e. if the number of bad sectors hits 10 then issue a notification via email, etc.

As Ted mentioned, RAID scrubbing occurs at the data level and ensures file integrity. Scrubs can also be scheduled i.e. monthly or can be run on demand.

As with all NAS’s: Use a battery backup (UPS) and always backup up your NAS. NAS Raids are for large data processing and drive failure redundancy. It is not a substitute for a backup. If your NAS is hit with a power failure and you lose your RAID array, you data is gone.


#20

I tried CUETools on a couple “Inaccurate” files last night. It was a mixed bag but did work. The best thing is it fixed a couple files where I no longer have access to the original CD. It fixed another inaccurate rip on a Merle Haggard box set. I tried ripping that song on no less than 4 different CD drives with no luck. I was pleasantly surprised. A nice tool to add to the toolbox.