mkroot: ping route tar wget vi tar sh NFS patch (NFS mounting with toybox) mips linker patch (ld version) initmpfs verbosity patch: Extracting cpio to tmpfs/ramfs... XX files in XX dirs. Running rootfs /init: errno rootfsflags= 0BSD data in license.txt https://lists.spdx.org/pipermail/spdx-legal/2015-December/001574.html https://lists.spdx.org/pipermail/spdx-legal/2015-December/001576.html https://lists.spdx.org/pipermail/spdx-legal/2015-December/001581.html https://lists.spdx.org/pipermail/spdx-legal/2015-December/001600.html Acknowledgement https://lists.spdx.org/pipermail/spdx-legal/2016-January/001607.html Submission: https://lists.spdx.org/pipermail/spdx-legal/2015-June/001443.html First objection: https://lists.spdx.org/pipermail/spdx-legal/2015-June/001456.html On 03/14/2016 11:52 AM, Jake Swensen wrote: > Hey Rob, > > > Phil and Ernie had mentioned that you may be interested in helping us > out with our file system corruption issues. If you've got some time this > week, I'd like to discuss your ideas and what you need from our end. Sure. What it sounds like you're having is filesystem corruption due to using a writeable non-flash filesystem on a flash device. Conventional filesystems are based on the assumption that blocks the filesystem didn't write to won't change, and the standard block size on Linux has been 4096 bytes for many years. [1] Hard drives used 512 byte blocks, and the newer ones use 4096 byte blocks, so you could update the underlying storage with fine granularity, and parts you weren't writing to stayed the same. But flash erase blocks are enormous by conventional storage device standards, I've seen anywhere from 128k to 2 megabytes. If the flash hardware is interrupted between the block erase and the corresponding block write (power loss or reset both do this), then the contents of the entire erase block is lost. Meaning you could lose a megabyte of data on each _SIDE_ of the area you wrote, which can knock out entire directories and allocation tables or even take out your superblock. Blocking a 1 megabyte hold in a conventional filesystem tends to render it unmountable, and filesystems designed for use on conventional hard drives don't know this is an option. FAT is especially vulnerable to this: the file allocation table is an array of block pointers all next to each other at the start of the partition. A single failure to rewrite the data after erasing an erase block will take out the entire FAT and trash the whole filesystem unrecoverably. It's a small race window, but the results are catastrophic. This is why there are "log-structured" filesystems designed specifically for flash, which cycle through all the available erase blocks and make a tree pointing back to the data that's still valid in the previous ones: https://en.wikipedia.org/wiki/Log-structured_file_system Linux has several several implementations of this concept: https://en.wikipedia.org/wiki/Flash_file_system#Linux_flash_filesystems This technique is sometimes confused with "journaling", because it provides many of the same benefits, but it's implemented differently. Log filesystems are organized into an array of erase blocks. To format one, you have to have to know the flash erase block size, and they must be aligned to the start of an erase block. Because of this you usually _can't_ use them on non-flash device because they filesystem driver will try to query the flash hardware to determine the erase block size, and if that fails they don't know how to arrange themselves. They're designed ONLY to work on flash. In operation, they cycle through all the available erase blocks and make a tree pointing back to the data that's still valid on the previous ones. Each new erase block contains both new data and any existing data collated out of the oldest block in the filesystem, I.E. the one which will be overwritten next. If there are free erase blocks the filesystem can just write new data (often leaving most of that erase block blank) without deleting an old block. If there are sparsely used erase blocks it copies the data from the oldest one to a new one and adds its new data to the extra space. When a log-structred filesystem is near full writes get slower because it has to cycle through a lot of blocks to find enough free space, copying the oldest data to the new one and collating the free space until it has enough space to write the new data. (The smarter ones can skip entirely full blocks and just replace blocks that had some free space in them.) Mounting them can also be a bit slow because it has to read the signature at the start of each erase block to figure out which one has the newest timestamp, I.E. the one contains the current root of the tree. The advantage of doing this (other than automatic wear-leveling) is that if writing is interrupted after an erase, the single erase block that got trashed can be ignored (each erase block is checksummed, detecting invalid data is easy). The previous block still has a root node describing the contents of the filesystem as it was before the last attempted write, and the oldest block never gets trashed until after the newest block is written. (That means it always needs one free block between the oldest block still in use and the newest block, to accomodate these failures. So you're never erasing a block that still contains valid data, the data had to be copied out to a new block first.) Note: read-only filesystems don't have this problem. You can stick a squashfs or read only ext2 image in flash and it's fine, because it never erases blocks so the granularity difference between what the filesystem was designed to expect and what the hardware actually does never comes up. It's only when _writing_ to flash that you need a filesystem designed for flash to avoid data corruption. [1] It used to be 1024 bytes, but the longest an individual file could be on ext2 with 1024 byte blocks is 16 gigs, and the largest with 4096 blocks is 4 terabytes, so everybody switched years ago. (Because it uses a 3 level tree to store metadata and each level can hold more branches in a 4096 byte block than a 1024 byte block, that's why the difference is so big.) > Additionally, I'll be in Austin March 29 - April 2 working with Phil > directly. If your available one of those days, it might be beneficial > for the three of us to get together to work on the system. I fly out to speak at a conference in Chicago on the 31st, but the 29th and 30th aren't spoken for. > I can be reached at > > Office: 651-737-4591 > > Cell: 320-333-8507 My cell phone is 512-297-3474 > Thanks, > > Jake Swensen Happy to help, Rob On 03/16/2016 09:42 AM, Jake Swensen wrote: > Rob, > > Thanks for the info! Would jffs2 be suitable for SD cards and eMMCs? Seems like it: https://lwn.net/Articles/528617/ > If so, I'll configure our OE build to use jffs2 (instead of ext) > and test it out for the next few weeks. The real question seems to be "can you disable the FTL (Flash Translation Layer) and enable MTD (Memory Technology Device) mode": http://free-electrons.com/blog/managing-flash-storage-with-linux/ > Two types of NAND flash storage are available today. The first type > emulates a standard block interface, and contains a hardware “Flash > Translation Layer” that takes care of erasing blocks, implementing > wear leveling and managing bad blocks. This corresponds to USB flash > drives, media cards, embedded MMC (eMMC) and Solid State Disks (SSD). > The operating system has no control on the way flash sectors are > managed, because it only sees an emulated block device. This is > useful to reduce software complexity on the OS side. However, > hardware makers usually keep their Flash Translation Layer algorithms > secret. This leaves no way for system developers to verify and tune > these algorithms, and I heard multiple voices in the Free Software > community suspecting that these trade secrets were a way to hide poor > implementations. For example, I was told that some flash media > implemented wear leveling on 16 MB sectors, instead of using the > whole storage space. This can make it very easy to break a flash > device. If you can figure out what the erase block size your sd card is using, you can theoretically use block2mtd: http://raspberrypi.stackexchange.com/questions/11932/how-to-use-jffs2-or-ubifs-to-avoid-data-corruption-and-increase-life-of-the-sd-c http://aldea.de/linux/debianonflash.html That takes a block device an adds manually supplied flash erase block information. This only works if the FTL implementation, when receiving an aligned erase block sized write, won't break it up and do silly things with it behind the scenes. (Depends on your sd card vendor, apparently?) According to block2mtd.c in the kernel source, you either insmod block2mtd from initramfs with "block2mtd=[,]" or else write to /sys/module/block2mtd/parameters/block2mtd for the static version. In theory it should be able to provide this in the kernel command line too, but I'm not sure they wired that up in this module? (I'd have to examine further...) Another alternative is to make sure your partitions are aligned to a nice big power of 2 size (so whatever your erase block is, you're not crossing them) and be prepared to lose the writeable partition. If you're logging to FAT, and the FAT partition is toast when you try to mount it, reformat the thing. That way the system can at least boot and start logging new data. Also, you only lose data while writing it, once it becomes read only it should be safe, so when doing firmware updates you can have two partitions, write the update over the "old" one, and then have your boot software do the same "check the partitions, which has a valid checksum and the newest date stamp, use that" and should always have one valid even if the other is toast. (This is a common approach in the embedded world, reduces bricking the device on update.) > Let's plan on meeting up at 3M Austin on March 30th. What time would work best for you? Sure, how does 10 AM sound? > Thanks again, > Jake Rob > ________________________________________ > From: Rob Landley > Sent: Monday, March 14, 2016 3:57 PM > To: Jake Swensen > Cc: Phillip Bergeron; David Badzinski; Ernesto Rodriguez; Russell Lait; Jen@SE-Instruments.com; rlarsen@se-instruments.com > Subject: [EXTERNAL] Re: Spartan: Linux Debugging > > On 03/14/2016 11:52 AM, Jake Swensen wrote: >> Hey Rob, >> >> >> Phil and Ernie had mentioned that you may be interested in helping us >> out with our file system corruption issues. If you've got some time this >> week, I'd like to discuss your ideas and what you need from our end. > > Sure. > > What it sounds like you're having is filesystem corruption due to using > a writeable non-flash filesystem on a flash device. > > Conventional filesystems are based on the assumption that blocks the > filesystem didn't write to won't change, and the standard block size on > Linux has been 4096 bytes for many years. [1] Hard drives used 512 byte > blocks, and the newer ones use 4096 byte blocks, so you could update the > underlying storage with fine granularity, and parts you weren't writing > to stayed the same. > > But flash erase blocks are enormous by conventional storage device > standards, I've seen anywhere from 128k to 2 megabytes. If the flash > hardware is interrupted between the block erase and the corresponding > block write (power loss or reset both do this), then the contents of the > entire erase block is lost. Meaning you could lose a megabyte of data on > each _SIDE_ of the area you wrote, which can knock out entire > directories and allocation tables or even take out your superblock. > Blocking a 1 megabyte hold in a conventional filesystem tends to render > it unmountable, and filesystems designed for use on conventional hard > drives don't know this is an option. > > FAT is especially vulnerable to this: the file allocation table is an > array of block pointers all next to each other at the start of the > partition. A single failure to rewrite the data after erasing an erase > block will take out the entire FAT and trash the whole filesystem > unrecoverably. > > It's a small race window, but the results are catastrophic. > > This is why there are "log-structured" filesystems designed specifically > for flash, which cycle through all the available erase blocks and make a > tree pointing back to the data that's still valid in the previous ones: > > https://en.wikipedia.org/wiki/Log-structured_file_system > > Linux has several several implementations of this concept: > > https://en.wikipedia.org/wiki/Flash_file_system#Linux_flash_filesystems > > This technique is sometimes confused with "journaling", because it > provides many of the same benefits, but it's implemented differently. > Log filesystems are organized into an array of erase blocks. To format > one, you have to have to know the flash erase block size, and they must > be aligned to the start of an erase block. Because of this you usually > _can't_ use them on non-flash device because they filesystem driver will > try to query the flash hardware to determine the erase block size, and > if that fails they don't know how to arrange themselves. They're > designed ONLY to work on flash. > > In operation, they cycle through all the available erase blocks and make > a tree pointing back to the data that's still valid on the previous > ones. Each new erase block contains both new data and any existing data > collated out of the oldest block in the filesystem, I.E. the one which > will be overwritten next. If there are free erase blocks the filesystem > can just write new data (often leaving most of that erase block blank) > without deleting an old block. If there are sparsely used erase blocks > it copies the data from the oldest one to a new one and adds its new > data to the extra space. > > When a log-structred filesystem is near full writes get slower because > it has to cycle through a lot of blocks to find enough free space, > copying the oldest data to the new one and collating the free space > until it has enough space to write the new data. (The smarter ones can > skip entirely full blocks and just replace blocks that had some free > space in them.) > > Mounting them can also be a bit slow because it has to read the > signature at the start of each erase block to figure out which one has > the newest timestamp, I.E. the one contains the current root of the tree. > > The advantage of doing this (other than automatic wear-leveling) is that > if writing is interrupted after an erase, the single erase block that > got trashed can be ignored (each erase block is checksummed, detecting > invalid data is easy). The previous block still has a root node > describing the contents of the filesystem as it was before the last > attempted write, and the oldest block never gets trashed until after the > newest block is written. (That means it always needs one free block > between the oldest block still in use and the newest block, to > accomodate these failures. So you're never erasing a block that still > contains valid data, the data had to be copied out to a new block first.) > > Note: read-only filesystems don't have this problem. You can stick a > squashfs or read only ext2 image in flash and it's fine, because it > never erases blocks so the granularity difference between what the > filesystem was designed to expect and what the hardware actually does > never comes up. It's only when _writing_ to flash that you need a > filesystem designed for flash to avoid data corruption. > > [1] It used to be 1024 bytes, but the longest an individual file could > be on ext2 with 1024 byte blocks is 16 gigs, and the largest with 4096 > blocks is 4 terabytes, so everybody switched years ago. (Because it uses > a 3 level tree to store metadata and each level can hold more branches > in a 4096 byte block than a 1024 byte block, that's why the difference > is so big.) > >> Additionally, I'll be in Austin March 29 - April 2 working with Phil >> directly. If your available one of those days, it might be beneficial >> for the three of us to get together to work on the system. > > I fly out to speak at a conference in Chicago on the 31st, but the 29th > and 30th aren't spoken for. > >> I can be reached at >> >> Office: 651-737-4591 >> >> Cell: 320-333-8507 > > My cell phone is 512-297-3474 > >> Thanks, >> >> Jake Swensen > > Happy to help, > > Rob > 3M security scanners have not detected any malicious content in this message. > > To report this email as SPAM, please forward it to spam@websense.com >