Size Matters in Email Archiving
by Thomas Bookwalter, CEO FMDC
Size, capacity, throughput, scalability are different words for the same problem. Email is expanding at an unimaginable rate. Messages are getting larger. Mail message volume per user is expanding. Archive retention periods are increasing. The number is on the rise of investigations and litigations that require email searches.
There is a major shift in how records are being stored. In the past, the storage of focus was disaster recovery (DR) and regulatory retention. For DR the design was to capture all the data, fast, and be able to restore a system in the event of a failure. The concern was getting the entire system up and running again. Individual records could be retrieved only after the system was rebuilt. Regulatory retention was about keeping the records for the required retention period. Individual record retrieval was rare. The use of email as a fundamental business communications tool and the increased frequency with which both regulators and litigators request emails in discovery has caused a major shift in the features that are important in email archives. The cost is prohibitive to access individual emails in a DR backup environment. There have been numerous incidents of restoring years of back tapes in order to search for just a few emails. The cost has been millions.
All of this has lead to a crisis in email archiving. Archives are becoming inaccessible because of their mass. Storage requirements continue to expand. Searches are bogging down sometimes even stopping. There is no end in sight.
Size matters. Archives that were a real asset when they were of a manageable size are fast becoming a real liability. Recent email archives are designed to always have individual email access without the need to reconstruct. In some cases, regulations require indexed archives. But even with indexing, poorly designed archives cannot survive the demands of massive capacity. New scalability requirements place new demands on users and on system designers. Scalability has two dimensions, wide (horizontal) and deep (vertical).
Horizontal scalability is characterized by being able to expand the archive with new processors and storage units adding both processing and storage capacity. Because of the need to access the data in a reasonable time, there is a balance between processing and storage. If the storage load on a processor set is too great, performance suffers. As the storage requirements expand, the processing power must expand with it in order to maintain reasonable accessibility. For massive archives with long retention periods like those found in healthcare, securities, pharmaceutical and government removable storage is an essential element in an electronic records retention plan.
Vertical scalability is characterized by being able to reach into the archive and retrieve individual records in a reasonable period of time with accuracy and reliability. Well-designed archive indices are the key to vertical scalability.
Before investing in email archive solutions have a clear understanding of the long-term requirements and the capabilities of the systems available. Test the systems’ ability to meet the requirements. Avoid having to buy a second system by planning for the future.
