2024 March 27 学习

增长的分类

今天看到一篇文章，来自“A study of Linux File System Evolution”, pdf文件在这里： https://www.usenix.org/system/files/login/articles/03_lu_010-017_final.pdf

他们将Linux的补丁进行了分类，筛选出针对文件系统的各种措施。

分类如下：

bug fixes
- memory: incorrect handling of memory objects
- concurrency: incorrect concurrent behaviors
- semantic: incorrect design or implementation
- error code: missing or wrong error code handling
performance improvements
reliability enhancements
new features
maintenance and refactoring

针对的文件系统包括：ext3,JFS,ReiserFS,XFS,Btrfs

这个研究做的真的很厉害，先不说他们需要采集这些Patch，然后读它，还要分类，还要看看到底做了什么，这时间成本就不是一点点。再有，他们是针对不同的人开发的文件系统的研究，这样涵盖了这些做这些人的通病的问题，或者高概率出现问题的地方，通过对这些数据的了解和研究，可以为后来的人带来很多的经验，避免重复犯错。

补：文章最后说了这个项目耗时1年半。

文件系统分成几个逻辑组成部分，包括inodes,superblocks,journals.

更细致的划分为9个组件:

data block allocation(balloc)
directory management(dir)
extent mapping(extent)
file read and write operations(file)
inode metadata(inode)
transactional support(trans)
superblock metadata(super)
generic tree procedures(e.g:insert an entry, tree)
other supporting components

关于性能问题的技术分类：

inefficient usage of synchronization methods(sync)
smarter access strategies (access)
I/O scheduling improvement(sched)
scale on-disk and in-memory data structures(scale)
data block allocation optimization(locality)
other performance techniques(other)

sync problem is more than a quarter of all performance patches across file systems. typical solutions used include removing a pair of unnecessary locks, using finer-grained locking, and replacing write locks with read/write locks.

access patches use smart strategies to optimize performance, including caching and work avoidance. for example, ext3 caches metadata stats in memory, avoiding I/O. an example Btrfs, before searching for free blocks, the patch first checks whether there is enough free space, avoiding unnecessary work.

sched, improve I/O scheduling for better performance, such as batching of writes, opportunistic readahead, and avoiding unnecessary synchrony in I/O. sched patches utilize scalable on-disk and in-memory data structures, such as hash tables, trees, and per block-group structures.