增长的分类
今天看到一篇文章,来自“A study of Linux File System Evolution”, pdf文件在这里: https://www.usenix.org/system/files/login/articles/03_lu_010-017_final.pdf
他们将Linux的补丁进行了分类,筛选出针对文件系统的各种措施。
分类如下:
- bug fixes
- memory: incorrect handling of memory objects
- concurrency: incorrect concurrent behaviors
- semantic: incorrect design or implementation
- error code: missing or wrong error code handling
- performance improvements
- reliability enhancements
- new features
- maintenance and refactoring
针对的文件系统包括:ext3,JFS,ReiserFS,XFS,Btrfs
这个研究做的真的很厉害,先不说他们需要采集这些Patch,然后读它,还要分类,还要看看到底做了什么,这时间成本就不是一点点。再有,他们是针对不同的人开发的文件系统的研究,这样涵盖了这些做这些人的通病的问题,或者高概率出现问题的地方,通过对这些数据的了解和研究,可以为后来的人带来很多的经验,避免重复犯错。
补:文章最后说了这个项目耗时1年半。
文件系统分成几个逻辑组成部分,包括inodes,superblocks,journals.
更细致的划分为9个组件:
- data block allocation(balloc)
- directory management(dir)
- extent mapping(extent)
- file read and write operations(file)
- inode metadata(inode)
- transactional support(trans)
- superblock metadata(super)
- generic tree procedures(e.g:insert an entry, tree)
- other supporting components
关于性能问题的技术分类:
- inefficient usage of synchronization methods(sync)
- smarter access strategies (access)
- I/O scheduling improvement(sched)
- scale on-disk and in-memory data structures(scale)
- data block allocation optimization(locality)
- other performance techniques(other)
sync problem is more than a quarter of all performance patches across file systems. typical solutions used include removing a pair of unnecessary locks, using finer-grained locking, and replacing write locks with read/write locks.
access patches use smart strategies to optimize performance, including caching and work avoidance. for example, ext3 caches metadata stats in memory, avoiding I/O. an example Btrfs, before searching for free blocks, the patch first checks whether there is enough free space, avoiding unnecessary work.
sched, improve I/O scheduling for better performance, such as batching of writes, opportunistic readahead, and avoiding unnecessary synchrony in I/O. sched patches utilize scalable on-disk and in-memory data structures, such as hash tables, trees, and per block-group structures.