mkroot: ping route tar wget vi tar sh

NFS patch (NFS mounting with toybox)
mips linker patch (ld version)
initmpfs verbosity patch:
  Extracting cpio to tmpfs/ramfs... XX files in XX dirs.
  Running rootfs /init: errno
  rootfsflags=

0BSD data in license.txt

  https://lists.spdx.org/pipermail/spdx-legal/2015-December/001574.html
  https://lists.spdx.org/pipermail/spdx-legal/2015-December/001576.html
  https://lists.spdx.org/pipermail/spdx-legal/2015-December/001581.html
  https://lists.spdx.org/pipermail/spdx-legal/2015-December/001600.html

Acknowledgement
https://lists.spdx.org/pipermail/spdx-legal/2016-January/001607.html

Submission:
https://lists.spdx.org/pipermail/spdx-legal/2015-June/001443.html
First objection:
https://lists.spdx.org/pipermail/spdx-legal/2015-June/001456.html

On 03/14/2016 11:52 AM, Jake Swensen wrote:
> Hey Rob,
>
>
> Phil and Ernie had mentioned that you may be interested in helping us
> out with our file system corruption issues. If you've got some time this
> week, I'd like to discuss your ideas and what you need from our end.

Sure.

What it sounds like you're having is filesystem corruption due to using
a writeable non-flash filesystem on a flash device.

Conventional filesystems are based on the assumption that blocks the
filesystem didn't write to won't change, and the standard block size on
Linux has been 4096 bytes for many years. [1] Hard drives used 512 byte
blocks, and the newer ones use 4096 byte blocks, so you could update the
underlying storage with fine granularity, and parts you weren't writing
to stayed the same.

But flash erase blocks are enormous by conventional storage device
standards, I've seen anywhere from 128k to 2 megabytes. If the flash
hardware is interrupted between the block erase and the corresponding
block write (power loss or reset both do this), then the contents of the
entire erase block is lost. Meaning you could lose a megabyte of data on
each _SIDE_ of the area you wrote, which can knock out entire
directories and allocation tables or even take out your superblock.
Blocking a 1 megabyte hold in a conventional filesystem tends to render
it unmountable, and filesystems designed for use on conventional hard
drives don't know this is an option.

FAT is especially vulnerable to this: the file allocation table is an
array of block pointers all next to each other at the start of the
partition. A single failure to rewrite the data after erasing an erase
block will take out the entire FAT and trash the whole filesystem
unrecoverably.

It's a small race window, but the results are catastrophic.

This is why there are "log-structured" filesystems designed specifically
for flash, which cycle through all the available erase blocks and make a
tree pointing back to the data that's still valid in the previous ones:

https://en.wikipedia.org/wiki/Log-structured_file_system

Linux has several several implementations of this concept:

https://en.wikipedia.org/wiki/Flash_file_system#Linux_flash_filesystems

This technique is sometimes confused with "journaling", because it
provides many of the same benefits, but it's implemented differently.
Log filesystems are organized into an array of erase blocks. To format
one, you have to have to know the flash erase block size, and they must
be aligned to the start of an erase block. Because of this you usually
_can't_ use them on non-flash device because they filesystem driver will
try to query the flash hardware to determine the erase block size, and
if that fails they don't know how to arrange themselves. They're
designed ONLY to work on flash.

In operation, they cycle through all the available erase blocks and make
a tree pointing back to the data that's still valid on the previous
ones. Each new erase block contains both new data and any existing data
collated out of the oldest block in the filesystem, I.E. the one which
will be overwritten next. If there are free erase blocks the filesystem
can just write new data (often leaving most of that erase block blank)
without deleting an old block. If there are sparsely used erase blocks
it copies the data from the oldest one to a new one and adds its new
data to the extra space.

When a log-structred filesystem is near full writes get slower because
it has to cycle through a lot of blocks to find enough free space,
copying the oldest data to the new one and collating the free space
until it has enough space to write the new data. (The smarter ones can
skip entirely full blocks and just replace blocks that had some free
space in them.)

Mounting them can also be a bit slow because it has to read the
signature at the start of each erase block to figure out which one has
the newest timestamp, I.E. the one contains the current root of the tree.

The advantage of doing this (other than automatic wear-leveling) is that
if writing is interrupted after an erase, the single erase block that
got trashed can be ignored (each erase block is checksummed, detecting
invalid data is easy). The previous block still has a root node
describing the contents of the filesystem as it was before the last
attempted write, and the oldest block never gets trashed until after the
newest block is written. (That means it always needs one free block
between the oldest block still in use and the newest block, to
accomodate these failures. So you're never erasing a block that still
contains valid data, the data had to be copied out to a new block first.)

Note: read-only filesystems don't have this problem. You can stick a
squashfs or read only ext2 image in flash and it's fine, because it
never erases blocks so the granularity difference between what the
filesystem was designed to expect and what the hardware actually does
never comes up. It's only when _writing_ to flash that you need a
filesystem designed for flash to avoid data corruption.

[1] It used to be 1024 bytes, but the longest an individual file could
be on ext2 with 1024 byte blocks is 16 gigs, and the largest with 4096
blocks is 4 terabytes, so everybody switched years ago. (Because it uses
a 3 level tree to store metadata and each level can hold more branches
in a 4096 byte block than a 1024 byte block, that's why the difference
is so big.)

> Additionally, I'll be in Austin March 29 - April 2 working with Phil
> directly. If your available one of those days, it might be beneficial
> for the three of us to get together to work on the system.

I fly out to speak at a conference in Chicago on the 31st, but the 29th
and 30th aren't spoken for.

> I can be reached at
>
> Office: 651-737-4591
>
> Cell: 320-333-8507

My cell phone is 512-297-3474

> Thanks,
>
> Jake Swensen

Happy to help,

Rob


On 03/16/2016 09:42 AM, Jake Swensen wrote:
> Rob,
>
> Thanks for the info! Would jffs2 be suitable for SD cards and eMMCs?

Seems like it: https://lwn.net/Articles/528617/

> If so, I'll configure our OE build to use jffs2 (instead of ext)
> and test it out for the next few weeks.

The real question seems to be "can you disable the FTL (Flash
Translation Layer) and enable MTD (Memory Technology Device) mode":

  http://free-electrons.com/blog/managing-flash-storage-with-linux/

> Two types of NAND flash storage are available today. The first type
> emulates a standard block interface, and contains a hardware “Flash
> Translation Layer” that takes care of erasing blocks, implementing
> wear leveling and managing bad blocks. This corresponds to USB flash
> drives, media cards, embedded MMC (eMMC) and Solid State Disks (SSD).
> The operating system has no control on the way flash sectors are
> managed, because it only sees an emulated block device. This is
> useful to reduce software complexity on the OS side. However,
> hardware makers usually keep their Flash Translation Layer algorithms
> secret. This leaves no way for system developers to verify and tune
> these algorithms, and I heard multiple voices in the Free Software
> community suspecting that these trade secrets were a way to hide poor
> implementations. For example, I was told that some flash media
> implemented wear leveling on 16 MB sectors, instead of using the
> whole storage space. This can make it very easy to break a flash
> device.

If you can figure out what the erase block size your sd card is using,
you can theoretically use block2mtd:


http://raspberrypi.stackexchange.com/questions/11932/how-to-use-jffs2-or-ubifs-to-avoid-data-corruption-and-increase-life-of-the-sd-c

  http://aldea.de/linux/debianonflash.html

That takes a block device an adds manually supplied flash erase block
information. This only works if the FTL implementation, when receiving
an aligned erase block sized write, won't break it up and do silly
things with it behind the scenes. (Depends on your sd card vendor,
apparently?)

According to block2mtd.c in the kernel source, you either insmod
block2mtd from initramfs with "block2mtd=<dev>[,<erasesize>]" or else
write to /sys/module/block2mtd/parameters/block2mtd for the static
version. In theory it should be able to provide this in the kernel
command line too, but I'm not sure they wired that up in this module?
(I'd have to examine further...)

Another alternative is to make sure your partitions are aligned to a
nice big power of 2 size (so whatever your erase block is, you're not
crossing them) and be prepared to lose the writeable partition. If
you're logging to FAT, and the FAT partition is toast when you try to
mount it, reformat the thing. That way the system can at least boot and
start logging new data.

Also, you only lose data while writing it, once it becomes read only it
should be safe, so when doing firmware updates you can have two
partitions, write the update over the "old" one, and then have your boot
software do the same "check the partitions, which has a valid checksum
and the newest date stamp, use that" and should always have one valid
even if the other is toast. (This is a common approach in the embedded
world, reduces bricking the device on update.)

> Let's plan on meeting up at 3M Austin on March 30th. What time would work best for you?

Sure, how does 10 AM sound?

> Thanks again,
> Jake

Rob

> ________________________________________
> From: Rob Landley <rlandley@se-instruments.com>
> Sent: Monday, March 14, 2016 3:57 PM
> To: Jake Swensen
> Cc: Phillip Bergeron; David Badzinski; Ernesto Rodriguez; Russell Lait; Jen@SE-Instruments.com; rlarsen@se-instruments.com
> Subject: [EXTERNAL] Re: Spartan: Linux Debugging
>
> On 03/14/2016 11:52 AM, Jake Swensen wrote:
>> Hey Rob,
>>
>>
>> Phil and Ernie had mentioned that you may be interested in helping us
>> out with our file system corruption issues. If you've got some time this
>> week, I'd like to discuss your ideas and what you need from our end.
>
> Sure.
>
> What it sounds like you're having is filesystem corruption due to using
> a writeable non-flash filesystem on a flash device.
>
> Conventional filesystems are based on the assumption that blocks the
> filesystem didn't write to won't change, and the standard block size on
> Linux has been 4096 bytes for many years. [1] Hard drives used 512 byte
> blocks, and the newer ones use 4096 byte blocks, so you could update the
> underlying storage with fine granularity, and parts you weren't writing
> to stayed the same.
>
> But flash erase blocks are enormous by conventional storage device
> standards, I've seen anywhere from 128k to 2 megabytes. If the flash
> hardware is interrupted between the block erase and the corresponding
> block write (power loss or reset both do this), then the contents of the
> entire erase block is lost. Meaning you could lose a megabyte of data on
> each _SIDE_ of the area you wrote, which can knock out entire
> directories and allocation tables or even take out your superblock.
> Blocking a 1 megabyte hold in a conventional filesystem tends to render
> it unmountable, and filesystems designed for use on conventional hard
> drives don't know this is an option.
>
> FAT is especially vulnerable to this: the file allocation table is an
> array of block pointers all next to each other at the start of the
> partition. A single failure to rewrite the data after erasing an erase
> block will take out the entire FAT and trash the whole filesystem
> unrecoverably.
>
> It's a small race window, but the results are catastrophic.
>
> This is why there are "log-structured" filesystems designed specifically
> for flash, which cycle through all the available erase blocks and make a
> tree pointing back to the data that's still valid in the previous ones:
>
> https://en.wikipedia.org/wiki/Log-structured_file_system
>
> Linux has several several implementations of this concept:
>
> https://en.wikipedia.org/wiki/Flash_file_system#Linux_flash_filesystems
>
> This technique is sometimes confused with "journaling", because it
> provides many of the same benefits, but it's implemented differently.
> Log filesystems are organized into an array of erase blocks. To format
> one, you have to have to know the flash erase block size, and they must
> be aligned to the start of an erase block. Because of this you usually
> _can't_ use them on non-flash device because they filesystem driver will
> try to query the flash hardware to determine the erase block size, and
> if that fails they don't know how to arrange themselves. They're
> designed ONLY to work on flash.
>
> In operation, they cycle through all the available erase blocks and make
> a tree pointing back to the data that's still valid on the previous
> ones. Each new erase block contains both new data and any existing data
> collated out of the oldest block in the filesystem, I.E. the one which
> will be overwritten next. If there are free erase blocks the filesystem
> can just write new data (often leaving most of that erase block blank)
> without deleting an old block. If there are sparsely used erase blocks
> it copies the data from the oldest one to a new one and adds its new
> data to the extra space.
>
> When a log-structred filesystem is near full writes get slower because
> it has to cycle through a lot of blocks to find enough free space,
> copying the oldest data to the new one and collating the free space
> until it has enough space to write the new data. (The smarter ones can
> skip entirely full blocks and just replace blocks that had some free
> space in them.)
>
> Mounting them can also be a bit slow because it has to read the
> signature at the start of each erase block to figure out which one has
> the newest timestamp, I.E. the one contains the current root of the tree.
>
> The advantage of doing this (other than automatic wear-leveling) is that
> if writing is interrupted after an erase, the single erase block that
> got trashed can be ignored (each erase block is checksummed, detecting
> invalid data is easy). The previous block still has a root node
> describing the contents of the filesystem as it was before the last
> attempted write, and the oldest block never gets trashed until after the
> newest block is written. (That means it always needs one free block
> between the oldest block still in use and the newest block, to
> accomodate these failures. So you're never erasing a block that still
> contains valid data, the data had to be copied out to a new block first.)
>
> Note: read-only filesystems don't have this problem. You can stick a
> squashfs or read only ext2 image in flash and it's fine, because it
> never erases blocks so the granularity difference between what the
> filesystem was designed to expect and what the hardware actually does
> never comes up. It's only when _writing_ to flash that you need a
> filesystem designed for flash to avoid data corruption.
>
> [1] It used to be 1024 bytes, but the longest an individual file could
> be on ext2 with 1024 byte blocks is 16 gigs, and the largest with 4096
> blocks is 4 terabytes, so everybody switched years ago. (Because it uses
> a 3 level tree to store metadata and each level can hold more branches
> in a 4096 byte block than a 1024 byte block, that's why the difference
> is so big.)
>
>> Additionally, I'll be in Austin March 29 - April 2 working with Phil
>> directly. If your available one of those days, it might be beneficial
>> for the three of us to get together to work on the system.
>
> I fly out to speak at a conference in Chicago on the 31st, but the 29th
> and 30th aren't spoken for.
>
>> I can be reached at
>>
>> Office: 651-737-4591
>>
>> Cell: 320-333-8507
>
> My cell phone is 512-297-3474
>
>> Thanks,
>>
>> Jake Swensen
>
> Happy to help,
>
> Rob
> 3M security scanners have not detected any malicious content in this message.
>
> To report this email as SPAM, please forward it to spam@websense.com
>