432 字
2 分钟
增长的分类
2024-03-27

今天看到一篇文章,来自“A study of Linux File System Evolution”, pdf文件在这里: https://www.usenix.org/system/files/login/articles/03_lu_010-017_final.pdf

他们将Linux的补丁进行了分类,筛选出针对文件系统的各种措施。

分类如下:

  • bug fixes
    • memory: incorrect handling of memory objects
    • concurrency: incorrect concurrent behaviors
    • semantic: incorrect design or implementation
    • error code: missing or wrong error code handling
  • performance improvements
  • reliability enhancements
  • new features
  • maintenance and refactoring

针对的文件系统包括:ext3,JFS,ReiserFS,XFS,Btrfs

这个研究做的真的很厉害,先不说他们需要采集这些Patch,然后读它,还要分类,还要看看到底做了什么,这时间成本就不是一点点。再有,他们是针对不同的人开发的文件系统的研究,这样涵盖了这些做这些人的通病的问题,或者高概率出现问题的地方,通过对这些数据的了解和研究,可以为后来的人带来很多的经验,避免重复犯错。

补:文章最后说了这个项目耗时1年半。

文件系统分成几个逻辑组成部分,包括inodes,superblocks,journals.

更细致的划分为9个组件:

  1. data block allocation(balloc)
  2. directory management(dir)
  3. extent mapping(extent)
  4. file read and write operations(file)
  5. inode metadata(inode)
  6. transactional support(trans)
  7. superblock metadata(super)
  8. generic tree procedures(e.g an entry, tree)
  9. other supporting components

关于性能问题的技术分类:

  1. inefficient usage of synchronization methods(sync)
  2. smarter access strategies (access)
  3. I/O scheduling improvement(sched)
  4. scale on-disk and in-memory data structures(scale)
  5. data block allocation optimization(locality)
  6. other performance techniques(other)

sync problem is more than a quarter of all performance patches across file systems. typical solutions used include removing a pair of unnecessary locks, using finer-grained locking, and replacing write locks with read/write locks.

access patches use smart strategies to optimize performance, including caching and work avoidance. for example, ext3 caches metadata stats in memory, avoiding I/O. an example Btrfs, before searching for free blocks, the patch first checks whether there is enough free space, avoiding unnecessary work.

sched, improve I/O scheduling for better performance, such as batching of writes, opportunistic readahead, and avoiding unnecessary synchrony in I/O. sched patches utilize scalable on-disk and in-memory data structures, such as hash tables, trees, and per block-group structures.

增长的分类
https://dididudu998.github.io/posts/增长的分类/
作者
滴滴嘟嘟
发布于
2024-03-27
许可协议
CC BY-NC-SA 4.0