Why media matters
In the 1982 feature-movie Blade Runner starring Harrison Ford and Rutger Hauer, a distant future world was envisioned where synthetic human robots fought for their humanity. You see they had a built in decommission date they wanted extended; they say in the end of the movie that their experiences will be lost when they are “retired”. That envisioned future was November 2019 (a distant world in 1982) which humorously is now the past as I write this in 2020. An interesting plot line in that movie was how do we know humanity vs. machine? Is it our dreams? Our hopes or our aspirations? Memories, especially childhood memories, are specifically referenced as “being human” since the robots were never born and grew up, they were created as adults with synthetic pasts complete with synthetic memories and photographs. Photographs of loved ones were cherished by these synthetic humans. Curiously the movie used printed photographs — obviously the future cloud computing and interconnected, personal digital devices was unthinkable in 1982. The film did correctly identify that media and culture (music, movies, photographs) are key bits of data worth saving / archiving. Computers, as we know, are ephemeral. A new model will come out with more capabilities and your data/media need to be migrated to the new platforms. In 2020, we are led to believe that the cloud will solve many problems and our data is “safe” in the cloud. Nintendo, Facebook, Instagram and other modern companies have shown that we can’t entirely trust companies with our media — we must be wary of vendor lock in and abandoning of the product (Nintendo) and the sharing of our private information to 3rd party retailers (Facebook and Instagram). Also, do we truly own the content saved on these services and how do we prevent unauthorized access. How do we achieve the freedom of having our media accessible and safe and also prevented from unwanted sharing or vendor lock in? Answer: Let’s build a network attached storage (NAS) device.
In depth on Mass Storage and Storage types
Most modern electronics have some form of writable mass storage; an SSD (solid state device), a magnetic platter hard disk, USB flash drives or even old skool magnetic tape or floppy drives. These mass storage devices are the primary location to store our digital media assets. In the past, folks would have offline collections of media (think DVDs or floppy disks or magnetic tapes). These offline collections of media were relatively inexpensive which was a plus but also suffered from bitrot (unable to read or altered) or could be easily misplaced or unorganized.
Storage capacities have increased over the decades and prices have plummeted and performance (seek/read times, write times) has increased. This has pushed manufacturing to new limits which ultimately obsoletes various forms of media in favor of new technology. For instance back in 2008, when Apple introduced the “World’s Thinnest Computer” aka the Macbook Air, the hefty price didn’t include the cost of a Solid State Drive (SSD). Macbook Air buyers could add a 64GB SSD for a whopping $1,300 premium over the ultra-thin computer’s normal price, and there was no consumer 1TB drive capable of fitting inside a laptop — a desktop 1TB SSD sold for $4,000. It has been said that a form of media lasts 10-20 years until it is obsolete (think floppy disks, CD-RW’s and even DVD’s and Blue Rays). In 2020, most modern desktop computers and all personal electronics (phones, watches, cameras) leverage solid-state mass storage for it’s compact size, durability and quick performance.
If we focus for the rest of this video on desktop computing and the needs of mass storage solutions on the desktop (we’ll come back to mobile devices at the end of this presentation). SSD are the correct solution for most modern desktop computers and especially for anyone involved in gaming, media creation or archival.
Rewrite in my own words
Magnetic disks are the predominant storage media in personal computers. Optical discs, however, are almost exclusively used in the large-scale distribution of retail software, music and movies because of the cost and manufacturing efficiency of the molding process used to produce DVD and compact discs and the nearly-universal presence of reader drives in personal computers and consumer appliances. Flash memory (in particular, NAND flash) has an established and growing niche as a replacement for magnetic hard disks in high performance enterprise computing installations due to its robustness stemming from its lack of moving parts, and its inherently much lower latency when compared to conventional magnetic hard drive solutions. Flash memory has also long been popular as removable storage such as USB sticks, where it de facto makes up the market.
File systems In depth
Any mass storage device needs a commonly understood system of reading and writing data off said device to be useful. How many times have you declared a USB drive as defective only because your computer couldn’t understand A file system quite simply enables the storage and retrieval of data. Taking its name from paper based systems of the past; each bit of data is labeled as a file and the entire collection of data is labeled a file system. All file systems are abstractions which remove the need of a typical user to know specifics of how data is written and retrieved from mass storage devices. Some filesystem all incorporate additional features such as security and redundancy. Some filesystems are specific to a particular type of media; for instance Redbook Audio (need detail) is used for compact disc audio recordings.
Rewite this in my own words
File systems allocate space in a granular manner, usually multiple physical units on the device. The file system is responsible for organizing files and directories, and keeping track of which areas of the media belong to which file and which are not being used. For example, in Apple DOS of the early 1980s, 256-byte sectors on 140 kilobyte floppy disk used a track/sector map.
Rewrite this in my own words
A filename (or file name) is used to identify a storage location in the file system. Most file systems have restrictions on the length of filenames. In some file systems, filenames are not case sensitive(i.e., the names MYFILE and myfile refer to the same file in a directory); in others, filenames are case sensitive (i.e., the names MYFILE, MyFile, and myfile refer to three separate files that are in the same directory).
Rewrite in my own words
File systems typically have directories (also called folders) which allow the user to group files into separate collections. This may be implemented by associating the file name with an index in a table of contents or an inode in a Unix-like file system. Directory structures may be flat (i.e. linear), or allow hierarchies where directories may contain subdirectories.
There are some file system buzzwords (or jargon) that you might use and not fully understand. Many USB mass storage devices are formatted in FAT32 file system; which is an industry standard filesystem that has its roots back to floppy disks in the 1970’s and 80’s. The maximum FAT32 volume size is 16 TiB (approximately 17.6 TB) with a sector size of 4,096 bytes. Windows operating systems through Windows 10 only create new FAT32 volumes up to 32 GB in size, however. Beyond 32 GB on Windows we typically format hard disks as NTFS (aka the Windows New Technology (NT) File System). NTFS has several technical improvements over the file systems that it superseded – File Allocation Table (FAT) and High Performance File System (HPFS) – such as improved support for metadata and advanced data structures to improve performance, reliability, and disk space use. Additional extensions are a more elaborate security system based on access control lists (ACLs) and file system journaling.
Rewrite in my own words
A network file system is a file system that acts as a client for a remote file access protocol, providing access to files on a server. Programs using local interfaces can transparently create, manage and access hierarchical directories and files in remote network-connected computers. Examples of network file systems include clients for the NFS, AFS, SMB protocols, and file-system-like clients for FTP and WebDAV.
Redundancy (at some point things will fail)
Now that we have established that we need to consume media to be human 🙂 And, we must utilize mass storage devices on personal (and enterprise) computers to store and retrieve said media. And, we have established we need an organized and agreed upon system of writing and reading information off mass storage devices. We next need to discuss the inevitable failures that will occur which could wipe out our precious media (and therefore using the facts we’ve established make us less human). Mass storage devices will fail — it is a fact. Magnetic media such as platter hard drives can suffer heat or component failures. USB mass storage media has a predetermined number of writes before they become unusable. What will happen when your mass storage device fails? Will you copy the data in multiple locations so that you will still have one copy if another copy fails. This is a good solution to the problem of mass storage failure (i.e. having two usb sticks and storing pictures (for instance) to both drives independently). But, this creates a new problem for us which is how do we keep those two independant drives synchronized? What if, for instance, that you touched-up a few of the photos using Adobe Photoshop? You would need to remember to copy the updated photo to the other USB drive which of course you would probably forget. You would also need to label the drives effectively to prevent confusion. Enter a solution called RAID or Redundant Array of Inexpensive Disks or Drives.
RAID is a solution implemented in both hardware and software that combines multiple physical mass storage devices into one or more logical mass storage device for data redundancy and performance reasons. RAID was created in the 1980’s by researchers at the University of California at Berkeley to prove that a small collection of disks could outperform a large, fault-tolerant drive also known as a “Single Large Expensive Disk” or SLED. We love our acronyms in Computer Science, don’t we!?! RAID as a concept can distribute data across drives in multiple ways, which is called RAID levels, depending on the needed performance or redundancy. There are many esoteric levels of RAID configuration that are enumerated RAID0 (called RAID Zero) to RAID1, 2, … 5, 10, etc. At a very high level all that you need to know is that RAID0 provides no redundancy and it is also called striping. A spanned volume of both disks is created and all data written is divided or striped between the two or more physical disks. RAID0 is very fast but it’s also very dangerous; in this configuration if one mass storage device is lost the entire data pool is lost!!! There goes our humanity. Typical applications for a RAID0 file system would be for a temporary filesystem or a scratch area (you are prioritizing speed over redundancy). RAID1 then is the exact opposite aka mirroring which copies (and keeps in sync) data on both drives. Read performance of data is still improved over a single drive since either of the mass storage devices can service a read request (for instance if one is busy), but writing will take a penalty since you must write all data twice. Skipping the more esoteric RAID levels and finishing with RAID5 or striping with parity; you get the performance improvements of writing data across multiple drives like RAID0 with the addition of parity data set which means that upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is lost. As we mentioned previously RAID as a technology is not owned by any company and such can be implemented by anyone. In modern systems a RAID enabled motherboard will have an onboard hardware controller to manage RAID volumes. Hardware is typically faster than software but that speed typically comes with specific requirements for hardware usage. Software RAID technologies like FreeNAS or UnRAID are more forgiving on hardware but also typically require an adequately sized CPU to be performant.
Let’s say that we want to create a RAID5 volume for our data set to store our precious media. We would need at minimum three mass storage disks but we can grow the volume to an increasing number of disks 5, 7, 9, etc. Typical modern RAID implementations don’t recommend growing above 20+ disks in a volume. We note that more disks increases our chances of media failure exponentially; many disks together run hotter than solo disk drives and heat is the primary cause of media failure. We also note that in RAID5 a failed drive can be hot-swapped and rebuilt from parity information — this process is called resilvering. The chances of a drive failure during rebuild can be high and therefore there are RAID configurations to allow for more than one drive failure before the volume is lost. RAID can also be configured with hot-spare drives that are not typically used but in the event of a mass storage drive failure will be immediately built into the volume to restore resiliency.
All of these redundancy concepts have been used for 40 years in the enterprise (think large corporations, the military and government). Many of these technologies are well within the capabilities of commodity personal computers and typical computer users.
An Introduction to FreeNAS
The $$$ per GB of storage for SSDs is dropping every year but it’s still at a level that long term archival of digital media is outside the price point of modern desktop computer users. Large computer and media companies know this so they market technologies like “cloud computing” to save all your digital assets online FOR A FEE. The music, television and movie industries have followed suit and offered their own “cloud-based” or “streaming” content services like Netflix, Hulu, Disney+, Amazon Music, etc. These systems seem afforable and they are typically very easy to use (win for the mega corporations) but over a lifetime they add up to a much higher total cost of ownership vs. direct purchase of media for most users.
Application of what we’ve learned: Create a Plex server on FreeNAS