增长的分类

今天看到一篇文章,来自“A study of Linux File System Evolution”, pdf文件在这里: https://www.usenix.org/system/files/login/articles/03_lu_010-017_final.pdf

他们将Linux的补丁进行了分类,筛选出针对文件系统的各种措施。

分类如下:

针对的文件系统包括:ext3,JFS,ReiserFS,XFS,Btrfs

这个研究做的真的很厉害,先不说他们需要采集这些Patch,然后读它,还要分类,还要看看到底做了什么,这时间成本就不是一点点。再有,他们是针对不同的人开发的文件系统的研究,这样涵盖了这些做这些人的通病的问题,或者高概率出现问题的地方,通过对这些数据的了解和研究,可以为后来的人带来很多的经验,避免重复犯错。

补:文章最后说了这个项目耗时1年半。

文件系统分成几个逻辑组成部分,包括inodes,superblocks,journals.

更细致的划分为9个组件:

  1. data block allocation(balloc)
  2. directory management(dir)
  3. extent mapping(extent)
  4. file read and write operations(file)
  5. inode metadata(inode)
  6. transactional support(trans)
  7. superblock metadata(super)
  8. generic tree procedures(e.g:insert an entry, tree)
  9. other supporting components

关于性能问题的技术分类:

  1. inefficient usage of synchronization methods(sync)
  2. smarter access strategies (access)
  3. I/O scheduling improvement(sched)
  4. scale on-disk and in-memory data structures(scale)
  5. data block allocation optimization(locality)
  6. other performance techniques(other)

sync problem is more than a quarter of all performance patches across file systems. typical solutions used include removing a pair of unnecessary locks, using finer-grained locking, and replacing write locks with read/write locks.

access patches use smart strategies to optimize performance, including caching and work avoidance. for example, ext3 caches metadata stats in memory, avoiding I/O. an example Btrfs, before searching for free blocks, the patch first checks whether there is enough free space, avoiding unnecessary work.

sched, improve I/O scheduling for better performance, such as batching of writes, opportunistic readahead, and avoiding unnecessary synchrony in I/O. sched patches utilize scalable on-disk and in-memory data structures, such as hash tables, trees, and per block-group structures.

| 访问量:
Table of Contents