v9fs: Plan 9 Resource Sharing for Linux
		    =======================================

ABOUT
=====

v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol,
designed by Ken Thompson, the original author of Unix. Just as UTF8 was Ken's
Plan 9 solution to character encoding, 9p was his network filesystem protocol.

v9fs provides a simple network filesystem that lets you run a server and go:

  mount -t 9p 127.0.0.1:/path /mnt

Like Samba, 9p uses one connection per mount point and the server is a normal
userspace processes. Unlike samba the protocol going across the wire is simple
and easily supports POSIX semantics.

like NFS, this offers a way to remotely mount filesystems. Unlike NFS,
v9fs does not reimplement half the VFS layer, the server is not running in
kernel space, it doesn't have any sort of RPC layer, re-exporting a v9fs mount
with another v9fs server should just work, you shouldn't get strange errors
from things like "mkdir sub; rm -rf sub > sub/log.txt", it doesn't care what
the underlying filesystem format is, etc.

The 9p protocol can operate across any bidirectional serial transport.
The v9fs driver includes support for TCP/IP, virtio, RDMA, named pipe, and
filehandle transports. A TCP/IP server named "Distributed I/O Daemon" is
available at https://code.google.com/p/diod and the "virtfs" subsystem of QEMU
and KVM is a virtio 9p server.

PROTOCOL:
=========

This driver supports three versions of the 9p protocol: the legacy 9p2000 and
9p2000.U versions, and the current (default) 9p2000.L.

In 2000 Ken Thompson retired from Bell Labs, and the last version of the Plan 9
file sharing protocol he worked on was called "9p2000". This version had 12
basic operations, and is documented at:

  http://plan9.bell-labs.com/magic/man2html/5/0intro

Around 2003 Plan 9 was open sourced, which led to the development of an
extended version of the protocol, "9p2000.U" (for Unix) providing full posix
semantics (by adding concepts the plan 9 OS didn't have, such as the suid bit
and UID numbers instead of names). The first v9fs driver for Linux was merged
in 2005 and based on 9p2000.U, as described in the paper "Grave Robbers From
Outer Space":

   http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html

The current protocol version, "9p2000.L" (for Linux) was developed for v9fs
after a few years of real world experience. It tailors the protocol to Linux
with support for ACLs, file locking, more efficient directory listing, corner
cases in things like file deletion, and so on. This new version was introduced
around 2009 and became the new default in 2011. The 9p2000.L protocol is
described at:

  https://code.google.com/p/diod/wiki/protocol

QUICK START:
============

If you just want to play around and get a feel for 9p, grab the diod (9p over
ipv4) server from https://code.google.com/p/diod/ compile it (./configure; make)
and run it as a normal user (this invocation does not require root access):

  diod/diod -f -n -S -l 127.0.0.1:9999 -e $PWD

Then fire up a qemu instance with standard emulated network (where 10.0.2.2
tunnels through to the host's loopback) and run:

  mount -t 9p -o port=9999,aname=$PWD,version=9p2000.L 10.0.2.2 /mnt

Where $PWD is same path diod served. (Note that diod currently requires
absolue paths for exports.)

There's a diod.8 man page in the same directory as the executable, and
man pages for diod.conf and diodmount (allowing user/password logins)
elsewhere in the source.

The QEMU/KVM documentation for setting up virtfs (their built-in 9p server over
virtio transport) is at:

  http://wiki.qemu.org/Documentation/9psetup

QEMU also usually runs its server as a normal user on the host. It offers a
"mapped" mode that saves ownership and device node information in extended
attributes.

CACHE MODES
===========

By default, 9p operates uncached, letting the server handle concurrency.
On a modern LAN this as fast or faster than talking to local disk,
and scales surprisingly well (about as well as web servers). Back before
cache support was even implemented, v9fs was tested with ~2000 simultaneous
clients running against the same server, and performed acceptably. (Run
the server as root so setrlimit could remove the per-process filehandle
restriction, give server had plenty of RAM and plug it into a gigabit
switch.)

The "-o cache=loose" mode enables Linux's local VFS cache but makes no attempt
to handle multi-user concurrency beyond not panicing the kernel: updates are
written back when the client's VFS gets around to flushing them, last writer
wins. File locking and fsync/fdatasync are available if an application wants
to handle its own concurrency. Loose cacheing works well for read-only
mounts (allowing scalable fanout in clusters with intermediate servers
re-exporting read-only v9fs mounts to more clients), or mounts with
nonconcurrent users (including only one client mounting a directory,
or user home directories under a common directory).

; multiple users of the
same mount are fine, the potential conflcit is that if multiple systems mount
the same directory and modify the same files under it, the cache won't be
notified of updates on the server. The client pulls data from the server,
the server cannot asynchronously push unrequested updates to the client).

The "-o cache=fscache" mode uses Linux's fscache subsystem to provide
persistent local cacheing (which doesn't help concurrency at all). See
Documentation/filesystems/cacheing/fscache.txt for details.

This code makes no attempt to handle the full range of cacheing corner cases
other protocols wrestle with; v9fs just doesn't go there. The old saying is
"reliability, performance, concurrency: pick two" for a reason. Uncached mode
provides reliability and concurrency, cached mode provides performance and
one other (your choice which).

Even with cacheing, multiple users of the same mount on a client are fine,
the potential conflicit is that if multiple client systems the same directory
from a server and modify the same files under it, the client's cache won't be
notified of updates from other clients before naturally expiring. The client
pulls data from the server, the server cannot asynchronously push unrequested
updates to the client. In 9p the server only responds to client requests, it
never initiates transactions.

SECURITY
========

Security is implemented as a wrapper around the serial protocol; once the
connection is established v9fs supplies a username or UID (-o uname=) to
the server and operations succeed or fail based on that user's credentials.

Diod's diodmount uses trans=fd under the covers, allowing the mount and server
programs to verify credentials first, then hand the connection off to the
kernel to mount the directory. QEMU's virtfs does not implement any login
mechanism because virtio data isn't visible from userspace, mounting a
directory requires root access, and client OS is the only user of the
virtfs server.

You can use ssh port forwarding to encrypt the protocol. To forward
the remote system's port 564 to localhosts's port 9999:

  ssh -L 9999:localhost:564 user@server sleep 999999999 &

Then mount from 127.0.0.1:9999.

USING AUTOMOUNT TO RECONNECT
============================

If a samba server crashes and is restarted, clients will automaticaly
reconnect. The v9fs driver doesn't do this, in part because the driver is
not involved in any security aspects of the login process so it doesn't
have enough information to restart the session.

In theory you can use automount to do this. I have no idea how.

USAGE
=====

For remote file server:

	mount -t 9p 10.10.1.2 /mnt/9

For Plan 9 From User Space applications (http://swtch.com/plan9)

	mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER

For server running on QEMU host with virtio transport:

	mount -t 9p -o trans=virtio <mount_tag> /mnt/9

where mount_tag is the tag associated by the server to each of the exported
mount points. Each 9P export is seen by the client as a virtio device with an
associated "mount_tag" property. Available mount tags can be
seen by reading /sys/bus/virtio/drivers/9pnet_virtio/virtio<n>/mount_tag files.

OPTIONS
=======

  trans=name	select an alternative transport.  Valid options are
  		currently:
			unix 	- specifying a named pipe mount point
			tcp	- specifying a normal TCP/IP connection
			fd   	- used passed file descriptors for connection
                                (see rfdno and wfdno)
			virtio	- connect to the next virtio channel available
				(from QEMU with trans_virtio module)
			rdma	- connect to a specified RDMA channel

  uname=name	user name to attempt mount as on the remote server.  The
  		server may override or ignore this value.  Certain user
		names may require authentication.

  aname=name	aname specifies the file tree to access when the server is
  		offering several exported file systems.

  cache=mode	specifies a caching policy.  By default, no caches are used.
			loose = no attempts are made at consistency,
                                intended for exclusive, read-only mounts
			fscache = use FS-Cache for a persistent, read-only
				cache backend.

  debug=n	specifies debug level.  The debug level is a bitmask.
			0x01  = display verbose error messages
			0x02  = developer debug (DEBUG_CURRENT)
			0x04  = display 9p trace
			0x08  = display VFS trace
			0x10  = display Marshalling debug
			0x20  = display RPC debug
			0x40  = display transport debug
			0x80  = display allocation debug
			0x100 = display protocol message debug
			0x200 = display Fid debug
			0x400 = display packet debug
			0x800 = display fscache tracing debug

  rfdno=n	the file descriptor for reading with trans=fd

  wfdno=n	the file descriptor for writing with trans=fd

  msize=n	the number of bytes to use for 9p packet payload

  port=n	port to connect to on the remote server

  noextend	force legacy mode (no 9p2000.u or 9p2000.L semantics)

  version=name	Select 9P protocol version. Valid options are:
			9p2000          - Legacy mode (same as noextend)
			9p2000.u        - Use 9P2000.u protocol
			9p2000.L        - Use 9P2000.L protocol

  dfltuid	attempt to mount as a particular uid

  dfltgid	attempt to mount with a particular gid

  afid		security channel - used by Plan 9 authentication protocols

  nodevmap	do not map special files - represent them as normal files.
  		This can be used to share devices/named pipes/sockets between
		hosts.  This functionality will be expanded in later versions.

  access	there are four access modes.
			user  = if a user tries to access a file on v9fs
			        filesystem for the first time, v9fs sends an
			        attach command (Tattach) for that user.
				This is the default mode.
			<uid> = allows only user with uid=<uid> to access
				the files on the mounted filesystem
			any   = v9fs does single attach and performs all
				operations as one user
			client = ACL based access check on the 9p client
			         side for access validation

  cachetag	cache tag to use the specified persistent cache.
		cache tags for existing cache sessions can be listed at
		/sys/fs/9p/caches. (applies only to cache=fscache)

RESOURCES
=========

This software was originally developed by Ron Minnich <rminnich@sandia.gov>
and Maya Gokhale.  Additional development by Greg Watson
<gwatson@lanl.gov> and most recently Eric Van Hensbergen
<ericvh@gmail.com>, Latchesar Ionkov <lucho@ionkov.net> and Russ Cox
<rsc@swtch.com>.

Protocol specifications are maintained on github:
http://ericvh.github.com/9p-rfc/

9p client and server implementations are listed on
http://9p.cat-v.org/implementations

A 9p2000.L server is being developed by LLNL and can be found
at http://code.google.com/p/diod/

There are user and developer mailing lists available through the v9fs project
on sourceforge (http://sourceforge.net/projects/v9fs).

News and other information is maintained on a Wiki.
(http://sf.net/apps/mediawiki/v9fs/index.php).

Bug reports may be issued through the kernel.org bugzilla 
(http://bugzilla.kernel.org)

For more information on the Plan 9 Operating System check out
http://plan9.bell-labs.com/plan9

For information on Plan 9 from User Space (Plan 9 applications and libraries
ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9

FURTHER READING:
================

* XCPU & Clustering
	http://xcpu.org/papers/xcpu-talk.pdf
* KVMFS: control file system for KVM
	http://xcpu.org/papers/kvmfs.pdf
* CellFS: A New Programming Model for the Cell BE
	http://xcpu.org/papers/cellfs-talk.pdf
* PROSE I/O: Using 9p to enable Application Partitions
	http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf
* VirtFS: A Virtualization Aware File System pass-through
	http://goo.gl/3WPDg

STATUS
======

9p2000.L is feature complete as of 2.6.38.

PLEASE USE THE KERNEL BUGZILLA TO REPORT PROBLEMS. (http://bugzilla.kernel.org)