0%

Linux-Lab7-File system drivers

File system drivers

实验室目标:

  • 获取有关Linux中虚拟文件系统(VFS)的知识,并了解有关“inode”,“dentry”,“file”,超级块和数据块的概念
  • 了解在 VFS 中挂载文件系统的过程
  • 了解各种文件系统类型,并了解具有物理支持的文件系统(在磁盘上)和没有物理支持的文件系统之间的差异

Virtual File System (VFS)

虚拟文件系统(也称为 VFS)是内核的一个组件,用于处理与文件和文件系统相关的所有系统调用

  • VFS 是用户和特定文件系统之间的通用接口
  • VFS 的抽象简化了文件系统的实现,并提供了多个文件系统的集成
  • 文件系统的实现就通过使用 VFS 提供的 API 来完成,通用硬件和 I/O 子系统通信部分由 VFS 处理

从功能的角度来看,文件系统可以分为:

  • 磁盘文件系统(ext3, ext4, xfs, fat, ntfs …… )
  • 网络文件系统(nfs, smbfs/cifs, ncp …… )
  • 虚拟文件系统(procfs, sysfs, sockfs, pipefs …… )

Linux 内核将 VFS 用于目录和文件的层次结构(树),使用挂载操作将新的文件系统添加为 VFS 子树

VFS 可以使用普通文件作为虚拟块设备,因此可以在普通文件上挂载磁盘文件系统,这样,可以创建文件系统堆栈

VFS 的基本思想是提供一个可以表示来自任何文件系统的文件的单个文件模型,文件系统驱动程序负责引入公分母,这样,内核就可以创建包含整个系统的单个目录结构,将有一个文件系统将成为根,其余的将挂载在其各个目录中

The general file system model

通用文件系统模型(任何实现的文件系统都需要简化为通用文件系统模型)由几个明确定义的实体组成:

  • 超级块 superblock
    • 超级块存储已挂载文件系统所需的信息:
      • 输入和块位置
      • 文件系统块大小
      • 最大文件名长度
      • 最大文件大小
      • 根输入节点的位置
    • 对于磁盘文件系统,超级块在磁盘的第一个块中有一个对应项(文件系统控制块)
  • 索引结点 inode
    • 保留有关一般意义上的文件的信息:常规文件,目录,特殊文件 (pipe,fifo),块设备,字符设备,链接,或任何可以抽象为文件的内容
    • 一个索引结点存储信息:
      • 文件类型
      • 文件大小
      • 访问权限
      • 访问或修改时间
      • 磁盘上数据的位置(指向包含数据的磁盘块的指针)
    • 像超级块一样,每个 inodes 都有一个磁盘对应物,磁盘上的索引节点通常被分组到一个专门的区域(inode 区域,与数据块区域分开),在某些文件系统中,inode 的等效项分布在文件系统结构(FAT)中
  • 文件 file
    • 文件是最接近用户的文件系统模型的组件,该结构仅作为 VFS 实体存在于内存中,并且在磁盘上没有物理对应项
    • 文件对象表示进程已打开的文件,维护以下信息:
      • 文件光标位置
      • 文件打开权限
      • 指向关联 inode 的指针(最终为其索引)
  • 目录项 dentry
    • 目录(目录条目)将索引节点与文件名相关联
    • 通常,dentry 结构包含两个字段:
      • 标识 inode 的整数
      • 表示其名称的字符串

这些实体是文件系统元数据(它们包含有关数据或其他元数据的信息),其中需要注意的就是 inodefile

从文件系统的角度来看,inode 表示文件:

  • inode 的属性是与文件关联的大小,权限,时间
  • inode 唯一标识文件系统中的文件

从用户的角度来看,file 表示文件:

  • file 的属性是 inode,文件名,文件打开属性,文件位置
  • 所有打开的文件都有与之关联的 file 结构体

Register and unregister filesystems

在单个系统上,不太可能有超过 5-6 个文件系统

因此,文件系统(或者更准确地说,文件系统类型)作为模块实现,并且可以随时加载或卸载

  • 描述特定文件系统的结构是 file_system_type
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
struct file_system_type {
const char *name;
int fs_flags;
#define FS_REQUIRES_DEV 1
#define FS_BINARY_MOUNTDATA 2
#define FS_HAS_SUBTYPE 4
#define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */
#define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */
#define FS_THP_SUPPORT 8192 /* Remove once all fs converted */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
int (*init_fs_context)(struct fs_context *);
const struct fs_parameter_spec *parameters;
struct dentry *(*mount) (struct file_system_type *, int,
const char *, void *);
void (*kill_sb) (struct super_block *);
struct module *owner;
struct file_system_type * next;
struct hlist_head fs_supers;

struct lock_class_key s_lock_key;
struct lock_class_key s_umount_key;
struct lock_class_key s_vfs_rename_key;
struct lock_class_key s_writers_key[SB_FREEZE_LEVELS];

struct lock_class_key i_lock_key;
struct lock_class_key i_mutex_key;
struct lock_class_key i_mutex_dir_key;
};

为了能够动态加载/卸载文件系统模块,需要文件系统注册/注销的 API

将文件系统注册到内核中的操作,通常在模块初始化函数中执行,为了注册文件系统,需要完成如下的工作:

  • 填充 file_system_type 结构体(至少填充:name mount kill_sb fs_flags
  • 调用 register_filesystem 函数

卸载模块时,必须通过调用函数 unregister_filesystem 来注销文件系统

注册操作系统的案例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
static struct file_system_type ramfs_fs_type = {
.name = "ramfs",
.mount = ramfs_mount,
.kill_sb = ramfs_kill_sb,
.fs_flags = FS_USERNS_MOUNT,
};

static int __init init_ramfs_fs(void)
{
if (test_and_set_bit(0, &once))
return 0;
return register_filesystem(&ramfs_fs_type);
}

挂载文件系统时,内核会调用 file_system_type->mount,该函数会进行一组初始化并返回表示挂载点目录的 dentry 结构,最简单的做法是在 mount 中使用如下 API:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
struct dentry *mount_bdev(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data,
int (*fill_super)(struct super_block *, void *, int)); /* 挂载存储在块设备上的文件系统 */

struct dentry *mount_single(struct file_system_type *fs_type,
int flags, void *data,
int (*fill_super)(struct super_block *, void *, int)); /* 挂载在所有挂载操作之间共享的文件系统 */

struct dentry *mount_nodev(struct file_system_type *fs_type,
int flags, void *data,
int (*fill_super)(struct super_block *, void *, int)); /* 挂载不在物理设备上的文件系统 */

struct dentry *mount_pseudo(struct file_system_type *fs_type, char *name,
const struct super_operations *ops,
const struct dentry_operations *dops, unsigned long magic); /* 伪文件系统的帮助器函数(例如:sockfs,pipe,通常是无法挂载的文件系统的) */
  • 这些函数会获取一个指针,该指针指向将在超级块初始化后调用的函数,以完成驱动程序的初始化

卸载文件系统时,内核调用 kill_sb,它将会执行清理操作并调用以下 API 中的一个:

1
2
3
void kill_block_super(struct super_block *sb); /* 卸载块设备上的文件系统 */
void kill_anon_super(struct super_block *sb); /* 卸载虚拟文件系统 */
void kill_litter_super(struct super_block *sb); /* 卸载不在物理设备上的文件系统 */

Superblock in VFS

超级块既作为物理实体存在(磁盘上的实体),也作为 VFS 实体存在(结构体 super_block),超级块仅包含信息,用于从磁盘写入和读取元数据

超级块操作由以下结构描述:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
struct super_operations {
struct inode *(*alloc_inode)(struct super_block *sb); /* 分配inode关联的资源 */
void (*destroy_inode)(struct inode *); /* 销毁inode关联的资源 */
void (*free_inode)(struct inode *);

void (*dirty_inode) (struct inode *, int flags);
int (*write_inode) (struct inode *, struct writeback_control *wbc); /* 写入inode关联的资源 */
int (*drop_inode) (struct inode *);
void (*evict_inode) (struct inode *);
void (*put_super) (struct super_block *); /* 在卸载时释放超级块时调用 */
int (*sync_fs)(struct super_block *sb, int wait);
int (*freeze_super) (struct super_block *);
int (*freeze_fs) (struct super_block *);
int (*thaw_super) (struct super_block *);
int (*unfreeze_fs) (struct super_block *);
int (*statfs) (struct dentry *, struct kstatfs *); /* 当一个syscall完成时调用 */
int (*remount_fs) (struct super_block *, int *, char *); /* 当内核检测到重新挂载尝试时调用 */
void (*umount_begin) (struct super_block *);

int (*show_options)(struct seq_file *, struct dentry *);
int (*show_devname)(struct seq_file *, struct dentry *);
int (*show_path)(struct seq_file *, struct dentry *);
int (*show_stats)(struct seq_file *, struct dentry *);
#ifdef CONFIG_QUOTA
ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
struct dquot **(*get_dquots)(struct inode *);
#endif
int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
long (*nr_cached_objects)(struct super_block *,
struct shrink_control *);
long (*free_cached_objects)(struct super_block *,
struct shrink_control *);
};

有一些重要的 API 可以使用 super_operations

1
2
3
4
5
struct buffer_head *__bread(struct block_device *bdev, sector_t block, unsigned size); /* 读取结构block_device中具有给定块号block和给定大小size的块buffer_head,如果成功,则返回指向buffer_head结构的指针,否则返回NULL */
struct buffer_head *sb_bread(struct super_block *sb, sector_t block); /* 与上一个函数执行的操作相同,但读取块的大小取自超级块以及从中完成读取的设备 */
void mark_buffer_dirty(struct buffer_head *bh); /* 将缓冲区标记为脏Dirty */
void brelse(struct buffer_head *bh); /* 释放缓冲区使用的内存 */
void map_bh(struct buffer_head *bh, struct super_block *sb, sector_t block); /* 将缓冲头与相应的扇区相关联 */

填充超级块的一个案例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <linux/pagemap.h>

#define RAMFS_MAGIC 0x858458f6

static const struct super_operations ramfs_ops = {
.statfs = simple_statfs,
.drop_inode = generic_delete_inode,
.show_options = ramfs_show_options,
};

static int ramfs_fill_super(struct super_block *sb, void *data, int silent)
{
struct ramfs_fs_info *fsi;
struct inode *inode;
int err;

save_mount_options(sb, data);

fsi = kzalloc(sizeof(struct ramfs_fs_info), GFP_KERNEL);
sb->s_fs_info = fsi;
if (!fsi)
return -ENOMEM;

err = ramfs_parse_options(data, &fsi->mount_opts);
if (err)
return err;

sb->s_maxbytes = MAX_LFS_FILESIZE;
sb->s_blocksize = PAGE_SIZE;
sb->s_blocksize_bits = PAGE_SHIFT;
sb->s_magic = RAMFS_MAGIC;
sb->s_op = &ramfs_ops;
sb->s_time_gran = 1;

inode = ramfs_get_inode(sb, NULL, S_IFDIR | fsi->mount_opts.mode, 0);
sb->s_root = d_make_root(inode);
if (!sb->s_root)
return -ENOMEM;

return 0;
}
  • 内核提供了通用函数来实现文件系统结构的操作
  • 例如,上述代码中使用的 generic_delete_inodesimple_statfs(一般都以 generic 或者 simple 开头)

Buffer cache

缓冲区缓存是一个内核子系统,用于处理来自块设备的缓存(读取和写入)块,缓冲区缓存使用的基本实体是 buffer_head 结构体:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
struct buffer_head {
unsigned long b_state; /* 缓冲区的状态 */
struct buffer_head *b_this_page;/* circular list of page's buffers */
struct page *b_page; /* the page this bh is mapped to */

sector_t b_blocknr; /* 设备上已加载或需要保存在磁盘上的块数 */
size_t b_size; /* 缓冲区大小 */
char *b_data; /* 指向Read/Write内存区域的指针(缓冲区主体) */

struct block_device *b_bdev; /* 指向块设备 */
bh_end_io_t *b_end_io; /* I/O completion */
void *b_private; /* reserved for b_end_io */
struct list_head b_assoc_buffers; /* associated with another mapping */
struct address_space *b_assoc_map; /* mapping this buffer is
associated with */
atomic_t b_count; /* users using this buffer_head */
spinlock_t b_uptodate_lock; /* Used by the first bh in a page, to
* serialise IO completion of other
* buffers in the page */
};

函数和有用的宏:

1
2
3
4
5
6
7
8
9
10
11
12
unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size); /* 查找内存区域中的第一个零位(size参数表示搜索区域中的位数) */
int test_and_set_bit(int nr, unsigned long *addr); /* 设置一位并获取旧值 */
int test_and_clear_bit(int nr, unsigned long *addr); /* 删除一位并获取旧值 */
int test_and_change_bit(unsigned int nr, volatile unsigned long *p); /* 反转一位并获取旧值 */

#define S_ISDIR(mode) (((mode) & S_IFDIR) == S_IFDIR) // 检查inode是否为目录
#define S_ISCHR(mode) (((mode) & S_IFCHR) == S_IFCHR) // 检查inode是否为字符设备
#define S_ISBLK(mode) (((mode) & S_IFBLK) == S_IFBLK) // 检查inode是否为块设备
#define S_ISREG(mode) (((mode) & S_IFREG) == S_IFREG) // 检查inode是否为常规文件
#define S_ISFIFO(mode) (((mode) & S_IFIFO) == S_IFIFO) // 检查inode是否为FIFO
#define S_ISLNK(mode) (((mode) & S_IFLNK) == S_IFLNK) // 检查inode是否为链接
#define S_ISSOCK(mode) (((mode) & S_IFSOCK) == S_IFSOCK) // 检查inode是否为socket

The Inode Structure

索引节点 inode 是 UNIX 文件系统的重要组成部分,同时也是 VFS 的重要组成部分

索引节点是元数据(它具有有关信息的信息):

  • 索引节点唯一标识磁盘上的文件并保存有关该文件的信息(uid、gid、访问权限、访问时间、指向数据块的指针等)
  • 索引节点是指磁盘上的文件,一个 inode 可以关联任意数量的 file 结构(多个进程可以打开同一个文件,或者一个进程可以多次打开同一个文件)
  • 与 VFS 中的其他结构一样,它是一种通用结构,它涵盖了所有受支持的文件类型的选项,甚至包括那些没有关联磁盘实体(如 FAT)的文件类型
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
struct inode {
umode_t i_mode; /* 访问权限 */
unsigned short i_opflags;
kuid_t i_uid; /* uid */
kgid_t i_gid; /* gid */
unsigned int i_flags;

#ifdef CONFIG_FS_POSIX_ACL
struct posix_acl *i_acl;
struct posix_acl *i_default_acl;
#endif

const struct inode_operations *i_op; /* 指向操作inode的回调函数集 */
struct super_block *i_sb; /* inode所属文件系统的超级块结构 */
struct address_space *i_mapping;

#ifdef CONFIG_SECURITY
void *i_security;
#endif

/* Stat data, not accessed from path walking */
unsigned long i_ino; /* inode的编号(唯一标识文件系统中的inode) */
union {
const unsigned int i_nlink; /* 链接计数,对于没有链接(硬链接或符号链接)的文件系统,此值始终设置为'1' */
unsigned int __i_nlink;
};
dev_t i_rdev; /* 挂载此文件系统的设备 */
loff_t i_size; /* 大小(以字节为单位) */
struct timespec64 i_atime; /* access time:上一次访问该inode的时间 */
struct timespec64 i_mtime; /* modify time:上一次修改该inode的时间 */
struct timespec64 i_ctime; /* change time:上一次该inode状态改变的时间 */
spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
unsigned short i_bytes;
u8 i_blkbits; /* 用于块大小的位数 */
u8 i_write_hint;
blkcnt_t i_blocks; /* 文件使用的块数(这仅由配额子系统使用) */

#ifdef __NEED_I_SIZE_ORDERED
seqcount_t i_size_seqcount;
#endif

/* Misc */
unsigned long i_state;
struct rw_semaphore i_rwsem;

unsigned long dirtied_when; /* jiffies of first dirtying */
unsigned long dirtied_time_when;

struct hlist_node i_hash;
struct list_head i_io_list; /* backing dev IO list */
#ifdef CONFIG_CGROUP_WRITEBACK
struct bdi_writeback *i_wb; /* the associated cgroup wb */

/* foreign inode detection, see wbc_detach_inode() */
int i_wb_frn_winner;
u16 i_wb_frn_avg_time;
u16 i_wb_frn_history;
#endif
struct list_head i_lru; /* inode LRU list */
struct list_head i_sb_list;
struct list_head i_wb_list; /* backing dev writeback list */
union {
struct hlist_head i_dentry;
struct rcu_head i_rcu;
};
atomic64_t i_version;
atomic_t i_count; /* inode计数器(指示有多少内核组件使用该inode) */
atomic_t i_dio_count;
atomic_t i_writecount;
#ifdef CONFIG_IMA
atomic_t i_readcount; /* struct files open RO */
#endif
const struct file_operations *i_fop; /* 指向操作file的回调函数集 */
struct file_lock_context *i_flctx;
struct address_space i_data;
struct list_head i_devices;
union {
struct pipe_inode_info *i_pipe;
struct block_device *i_bdev;
struct cdev *i_cdev;
char *i_link;
unsigned i_dir_seq;
};

__u32 i_generation;

#ifdef CONFIG_FSNOTIFY
__u32 i_fsnotify_mask; /* all events this inode cares about */
struct fsnotify_mark_connector __rcu *i_fsnotify_marks;
#endif

#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
struct fscrypt_info *i_crypt_info;
#endif

void *i_private; /* fs or device private pointer */
} __randomize_layout;

相关 API 如下:

1
2
3
4
5
6
7
struct inode *new_inode(struct super_block *sb); /* 创建一个新的inode,初始化字段i_nlink,i_blkbits,i_sbi_dev(设置为'1') */
void insert_inode_hash(struct inode *inode); /* 将传入的inode添加到inode的哈希表中,如果inode被标记为脏,它将写入磁盘 */
void mark_inode_dirty(struct inode *inode); /* 将井内脏标记为脏污后,将它写回磁盘 */
struct inode * iget_locked(struct super_block *, unsigned long); /* 从磁盘加载具有给定编号的inode */
void unlock_new_inode(struct inode *); /* 与iget_locked结合使用,释放inode上的锁 */
void iput(struct inode *); /* 告诉内核在inode上的工作已完成,如果没有其他程序使用该inode,它将被销毁(如果该inode为脏,则在写回磁盘后销毁) */
void make_bad_inode(struct inode *); /* 告诉内核不能使用该inode */

创建一个 inode:

  • 通常,此函数将调用 iget_locked 从 VFS 获取 inode 结构,如果 inode 是新创建的,则需要从磁盘读取对应的超级块(使用 sb_bread)并填写有用的信息
  • 例如文件系统 minix 的 minix_iget 函数 :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
struct inode *minix_iget(struct super_block *sb, unsigned long ino)
{
struct inode *inode;
inode = iget_locked(sb, ino);
if (!inode)
return ERR_PTR(-ENOMEM);
if (!(inode->i_state & I_NEW))
return inode;

if (INODE_VERSION(inode) == MINIX_V1)
return V1_minix_iget(inode);
...
}

static struct inode *V1_minix_iget(struct inode *inode)
{
struct buffer_head * bh;
struct minix_inode * raw_inode;
struct minix_inode_info *minix_inode = minix_i(inode);
int i;

raw_inode = minix_V1_raw_inode(inode->i_sb, inode->i_ino, &bh);
if (!raw_inode) {
iget_failed(inode);
return ERR_PTR(-EIO);
...
}
  • minix_iget 会先调用 iget_locked 来获取具有给定编号的 inode
  • 如果没有成功获取,程序将调用 V1_minix_iget,进而调用 minix_V1_raw_inode 来从磁盘读取输入,然后使用读取信息完成 inode

The File Structure

文件结构对应于进程打开的文件,仅存在于内存中,与 inode 相关联

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
struct file {
union {
struct llist_node fu_llist; /* 文件对象链表 */
struct rcu_head fu_rcuhead; /* 释放之后的RCU链表 */
} f_u;
struct path f_path; /* 包含的目录项 */
struct inode *f_inode; /* 索引结点 */
const struct file_operations *f_op; /* 指向操作file的回调函数集 */

spinlock_t f_lock; /* 保护文件的自旋锁 */
enum rw_hint f_write_hint;
atomic_long_t f_count;
unsigned int f_flags; /* 文件标志:O_RDONLY,O_NONBLOCK,O_SYNC */
fmode_t f_mode; /* 文件读/写模式:FMODE_READ,FMODE_WRITE */
struct mutex f_pos_lock;
loff_t f_pos; /* 当前读写位置 */
struct fown_struct f_owner;
const struct cred *f_cred;
struct file_ra_state f_ra;

u64 f_version;
#ifdef CONFIG_SECURITY
void *f_security;
#endif
/* needed for tty driver, and maybe others */
void *private_data; /* 文件私有数据 */

#ifdef CONFIG_EPOLL
/* Used by fs/eventpoll.c to link all the hooks to this file */
struct list_head f_ep_links;
struct list_head f_tfile_llink;
#endif /* #ifdef CONFIG_EPOLL */
struct address_space *f_mapping; /* 指向该页所在地址空间描述结构的指针 */
errseq_t f_wb_err;
} __randomize_layout
__attribute__((aligned(4))); /* lest something weird decides that 2 is OK */
  • 文件系统的文件操作 file->f_op 是使用 inode->i_fop 字段初始化的,以便后续系统调用使用存储在 file->f_op 中的值
  • 结构体 file 中还有一个有意思的条目 address_space,值得单独分析一下(其实这个条目是由 inode->i_data 进行初始化的)

Address space operations

进程的地址空间和文件之间有着密切的联系:

  • 程序的执行几乎完全是通过将文件映射到进程地址空间来完成的(例如 execve
  • 由于此方法运行良好且非常通用,因此也可用于常规系统调用,如读取和写入

描述地址空间的结构是 address_space(也被称为地址空间描述符),并且使用它的操作由结构体 address_space_operations 描述,要初始化地址空间操作,必须填写 inode->i_mapping->a_ops

结构体 address_space 用于管理 “索引结点inode” 到 “内存页面-page” 的映射:

  • 一个文件对应一个 address_space 结构
  • 一个 address_space 与一个偏移量能够确定 page cacheswap cache 中的一个页面
  • 结构体 address_space 的条目如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
struct address_space {
struct inode *host; /* 指向对应的索引结点 */
struct xarray i_pages;
gfp_t gfp_mask;
atomic_t i_mmap_writable; /* 共享映射数VM_SHARED记数 */
struct rb_root_cached i_mmap; /* 优先搜索树的root */
struct rw_semaphore i_mmap_rwsem;
unsigned long nrpages; /* 页总数 */
unsigned long nrexceptional;
pgoff_t writeback_index; /* 回写的起始偏移 */
const struct address_space_operations *a_ops; /* 操作函数表 */
unsigned long flags; /* 掩码与错误标识 */
errseq_t wb_err;
spinlock_t private_lock; /* 私有address_space锁 */
struct list_head private_list; /* 私有address_space链表 */
void *private_data; /* 私有数据 */
} __attribute__((aligned(sizeof(long)))) __randomize_layout;

The Dentry Structure

目录项 Dentry 的主要任务是在 inode 和文件名之间建立链接,该结构的重要字段如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
struct dentry {
/* RCU lookup touched fields */
unsigned int d_flags; /* protected by d_lock */
seqcount_t d_seq; /* per dentry seqlock */
struct hlist_bl_node d_hash; /* lookup hash list */
struct dentry *d_parent; /* 与父目录关联的目录 */
struct qstr d_name; /* 包含dentry名称和名称长度的结构体 */
struct inode *d_inode; /* 此dentry引用的inode */
unsigned char d_iname[DNAME_INLINE_LEN]; /* small names */

/* Ref lookup also touches following */
struct lockref d_lockref; /* per-dentry lock and refcount */
const struct dentry_operations *d_op; /* 操作dentry的回调函数集 */
struct super_block *d_sb; /* The root of the dentry tree */
unsigned long d_time; /* used by d_revalidate */
void *d_fsdata; /* 为实现dentry操作的文件系统保留的字段 */

union {
struct list_head d_lru; /* LRU list */
wait_queue_head_t *d_wait; /* in-lookup ones only */
};
struct list_head d_child; /* child of parent list */
struct list_head d_subdirs; /* our children */
/*
* d_alias and d_rcu can share memory
*/
union {
struct hlist_node d_alias; /* inode alias list */
struct hlist_bl_node d_in_lookup_hash; /* only for in-lookup ones */
struct rcu_head d_rcu;
} d_u;
} __randomize_layout;
  • 内核使用 Dentry 来构建并管理文件系统的目录树
  • 目录项在内核中起到了连接不同的文件对象 inode 的作用,进而起到了维护文件系统目录树的作用

Bitmap operations

使用文件系统时,管理信息(哪个 block 是空闲或忙碌,哪个 inode 是空闲或忙碌)使用位图 Bitmap 存储,为此,我们经常需要使用位运算,此类操作包括:

1
2
3
4
5
6
unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size); /* 在bitmap指定范围内找到第一个zero bit的位置 */
unsigned long find_first_bit(const unsigned long *addr, unsigned long size); /* 在bitmap指定范围内找到第一个bit的位置 */
void set_bit(int nr, volatile void *addr); /* 将指针指向的数据的第nr位,置"1" */
void clear_bit(int nr, volatile void *addr); /* 将指针指向的数据的第nr位,置"0" */
int test_and_set_bit(int nr, volatile void *addr); /* 将指针指向的数据的第nr位,置"1",并返回原来这一位的值 */
int test_and_clear_bit(int nr, volatile void *addr); /* 将指针指向的数据的第nr位,置"0",并返回原来这一位的值 */

下面列出了一些用法示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
unsigned int map;
unsigned char array_map[NUM_BYTES];
size_t idx;
int changed;

/* Find first zero bit in 32 bit integer. */
idx = find_first_zero_bit(&map, 32);
printk (KERN_ALERT "The %zu-th bit is the first zero bit.\n", idx);

/* Find first one bit in NUM_BYTES bytes array. */
idx = find_first_bit(array_map, NUM_BYTES * 8);
printk (KERN_ALERT "The %zu-th bit is the first one bit.\n", idx);

/*
* Clear the idx-th bit in integer.
* It is assumed idx is less the number of bits in integer.
*/
clear_bit(idx, &map);

/*
* Test and set the idx-th bit in array.
* It is assumed idx is less the number of bits in array.
*/
changed = __test_and_set_bit(idx, &sbi->imap);
if (changed)
printk(KERN_ALERT "%zu-th bit changed\n", idx);

Exercises

要解决练习,您需要执行以下步骤:

  • 从模板准备 skeletons
  • 构建模块
  • 将模块复制到虚拟机
  • 启动 VM 并在 VM 中测试模块
1
2
3
make clean
LABS=filesystems make skels
make build

1.myfs 完整代码:

首先,我们计划熟悉 Linux 内核和虚拟文件系统 (VFS) 组件公开的界面:

  • 设计并使用一个简单的虚拟文件系统(即没有物理磁盘支持)
  • 文件系统称为 myfs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
/*
* SO2 Lab - Filesystem drivers
* Exercise #1 (no-dev filesystem)
*/

#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/pagemap.h>

MODULE_DESCRIPTION("Simple no-dev filesystem");
MODULE_AUTHOR("SO2");
MODULE_LICENSE("GPL");

#define MYFS_BLOCKSIZE 4096
#define MYFS_BLOCKSIZE_BITS 12
#define MYFS_MAGIC 0xbeefcafe
#define LOG_LEVEL KERN_ALERT

/* declarations of functions that are part of operation structures */

static int myfs_mknod(struct inode *dir,
struct dentry *dentry, umode_t mode, dev_t dev);
static int myfs_create(struct inode *dir, struct dentry *dentry,
umode_t mode, bool excl);
static int myfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);

/* TODO 2: define super_operations structure */
static const struct super_operations myfs_ops = {
.statfs = simple_statfs,
.drop_inode = generic_drop_inode,
};

static const struct inode_operations myfs_dir_inode_operations = {
/* TODO 5: Fill dir inode operations structure. */
.create = myfs_create,
.lookup = simple_lookup,
.link = simple_link,
.unlink = simple_unlink,
.mkdir = myfs_mkdir,
.rmdir = simple_rmdir,
.mknod = myfs_mknod,
.rename = simple_rename,
};

static const struct file_operations myfs_file_operations = {
/* TODO 6: Fill file operations structure. */
.read_iter = generic_file_read_iter,
.write_iter = generic_file_write_iter,
.mmap = generic_file_mmap,
.llseek = generic_file_llseek,
};

static const struct inode_operations myfs_file_inode_operations = {
/* TODO 6: Fill file inode operations structure. */
.getattr = simple_getattr,
};

static const struct address_space_operations myfs_aops = {
/* TODO 6: Fill address space operations structure. */
.readpage = simple_readpage,
.write_begin = simple_write_begin,
.write_end = simple_write_end,
};

struct inode *myfs_get_inode(struct super_block *sb, const struct inode *dir,
int mode)
{
struct inode *inode = new_inode(sb);

if (!inode)
return NULL;

/* TODO 3: fill inode structure
* - mode
* - uid
* - gid
* - atime,ctime,mtime
* - ino
*/
inode_init_owner(inode, dir, mode);
inode->i_atime = current_time(inode);
inode->i_mtime = current_time(inode);
inode->i_ctime = current_time(inode);
inode->i_ino = 1;

/* TODO 5: Init i_ino using get_next_ino */
inode->i_ino = get_next_ino();
/* TODO 6: Initialize address space operations. */
inode->i_mapping->a_ops = &myfs_aops;

if (S_ISDIR(mode)) {
/* TODO 3: set inode operations for dir inodes. */
inode->i_op = &simple_dir_inode_operations;
inode->i_fop = &simple_dir_operations;
/* TODO 5: use myfs_dir_inode_operations for inode
* operations (i_op).
*/
inode->i_op = &myfs_dir_inode_operations;
/* TODO 3: directory inodes start off with i_nlink == 2 (for "." entry).
* Directory link count should be incremented (use inc_nlink).
*/
inc_nlink(inode);
}

/* TODO 6: Set file inode and file operations for regular files
* (use the S_ISREG macro).
*/
if (S_ISREG(mode)) {
inode->i_op = &myfs_file_inode_operations;
inode->i_fop = &myfs_file_operations;
}
return inode;
}

/* TODO 5: Implement myfs_mknod, myfs_create, myfs_mkdir. */
static int myfs_mknod(struct inode *dir,
struct dentry *dentry, umode_t mode, dev_t dev)
{
struct inode *inode = myfs_get_inode(dir->i_sb, dir, mode);

if (inode == NULL)
return -ENOSPC;

d_instantiate(dentry, inode);
dget(dentry);
dir->i_mtime = dir->i_ctime = current_time(inode);

return 0;
}

static int myfs_create(struct inode *dir, struct dentry *dentry,
umode_t mode, bool excl)
{
return myfs_mknod(dir, dentry, mode | S_IFREG, 0);
}

static int myfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
{
int ret;

ret = myfs_mknod(dir, dentry, mode | S_IFDIR, 0);
if (ret != 0)
return ret;
inc_nlink(dir);
return 0;
}

static int myfs_fill_super(struct super_block *sb, void *data, int silent)
{
struct inode *root_inode;
struct dentry *root_dentry;

/* TODO 2: fill super_block
* - blocksize, blocksize_bits
* - magic
* - super operations
* - maxbytes
*/
sb->s_maxbytes = MAX_LFS_FILESIZE;
sb->s_blocksize = MYFS_BLOCKSIZE;
sb->s_blocksize_bits = MYFS_BLOCKSIZE_BITS;
sb->s_magic = MYFS_MAGIC;
sb->s_op = &myfs_ops;

/* mode = directory & access rights (755) */
root_inode = myfs_get_inode(sb, NULL,
S_IFDIR | S_IRWXU | S_IRGRP |
S_IXGRP | S_IROTH | S_IXOTH);

printk(LOG_LEVEL "root inode has %d link(s)\n", root_inode->i_nlink);

if (!root_inode)
return -ENOMEM;

root_dentry = d_make_root(root_inode);
if (!root_dentry)
goto out_no_root;
sb->s_root = root_dentry;

return 0;

out_no_root:
iput(root_inode);
return -ENOMEM;
}

static struct dentry *myfs_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
/* TODO 1: call superblock mount function */
return mount_nodev(fs_type, flags, data, myfs_fill_super);
}

/* TODO 1: define file_system_type structure */
static struct file_system_type my_fs_type = {
.owner = THIS_MODULE,
.name = "myfs",
.mount = myfs_mount,
.kill_sb = kill_litter_super,
};

static int __init myfs_init(void)
{
int err;

/* TODO 1: register */
err = register_filesystem(&my_fs_type);
if (err) {
printk(LOG_LEVEL "register_filesystem failed\n");
return err;
}

return 0;
}

static void __exit myfs_exit(void)
{
/* TODO 1: unregister */
unregister_filesystem(&my_fs_type);
}

module_init(myfs_init);
module_exit(myfs_exit);
  • 第一次写文件系统驱动,很多东西都不懂,所以很大程度上参考了答案
  • 感觉我自己写的时候就是 API 操作不熟悉,在网上不一定能找到正确的 API,有些 API 有特殊的运用场景,不能随便使用
  • 根据传入参数和返回值可以判断一些 API 是否符合使用场景,但后来懒得一个一个试就直接看答案了

2.minfs 完整代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
/*
* SO2 Lab - Filesystem drivers
* Exercise #2 (dev filesystem)
*/

#include <linux/buffer_head.h>
#include <linux/cred.h>
#include <linux/fs.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pagemap.h>
#include <linux/sched.h>
#include <linux/slab.h>

#include "minfs.h"

MODULE_DESCRIPTION("Simple filesystem");
MODULE_AUTHOR("SO2");
MODULE_LICENSE("GPL");

#define LOG_LEVEL KERN_ALERT


struct minfs_sb_info {
__u8 version;
unsigned long imap;
struct buffer_head *sbh;
};

struct minfs_inode_info {
__u16 data_block;
struct inode vfs_inode;
};

/* declarations of functions that are part of operation structures */

static int minfs_readdir(struct file *filp, struct dir_context *ctx);
static struct dentry *minfs_lookup(struct inode *dir,
struct dentry *dentry, unsigned int flags);
static int minfs_create(struct inode *dir, struct dentry *dentry,
umode_t mode, bool excl);

/* dir and inode operation structures */

static const struct file_operations minfs_dir_operations = {
.read = generic_read_dir,
.iterate = minfs_readdir,
};

static const struct inode_operations minfs_dir_inode_operations = {
.lookup = minfs_lookup,
/* TODO 7: Use minfs_create as the create function. */
.create = minfs_create,
};

static const struct address_space_operations minfs_aops = {
.readpage = simple_readpage,
.write_begin = simple_write_begin,
.write_end = simple_write_end,
};

static const struct file_operations minfs_file_operations = {
.read_iter = generic_file_read_iter,
.write_iter = generic_file_write_iter,
.mmap = generic_file_mmap,
.llseek = generic_file_llseek,
};

static const struct inode_operations minfs_file_inode_operations = {
.getattr = simple_getattr,
};

static struct inode *minfs_iget(struct super_block *s, unsigned long ino)
{
struct minfs_inode *mi;
struct buffer_head *bh;
struct inode *inode;
struct minfs_inode_info *mii;

/* Allocate VFS inode. */
inode = iget_locked(s, ino);
if (inode == NULL) {
printk(LOG_LEVEL "error aquiring inode\n");
return ERR_PTR(-ENOMEM);
}

/* Return inode from cache */
if (!(inode->i_state & I_NEW))
return inode;

/* TODO 4: Read block with inodes. It's the second block on
* the device, i.e. the block with the index 1. This is the index
* to be passed to sb_bread().
*/
bh = sb_bread(s,1);
if(bh==NULL){
goto out_bad_sb;
}
/* TODO 4: Get inode with index ino from the block. */
mi = ((struct minfs_inode *)bh->b_data) + ino;
/* TODO 4: fill VFS inode */
inode->i_mode = mi->mode;
inode->i_size = mi->size;
inode->i_blocks = mi->data_block;
i_uid_write(inode, mi->uid);
i_gid_write(inode, mi->gid);
inode->i_mtime = current_time(inode);
inode->i_atime = current_time(inode);
inode->i_ctime = current_time(inode);

/* TODO 7: Fill address space operations (inode->i_mapping->a_ops) */
inode->i_mapping->a_ops = &minfs_aops;
if (S_ISDIR(inode->i_mode)) {
/* TODO 4: Fill dir inode operations. */
inode->i_op = &simple_dir_inode_operations;
inode->i_fop = &simple_dir_operations;
/* TODO 5: Use minfs_dir_inode_operations for i_op
* and minfs_dir_operations for i_fop. */
inode->i_op = &minfs_dir_inode_operations;
inode->i_fop = &minfs_dir_operations;
/* TODO 4: Directory inodes start off with i_nlink == 2.
* (use inc_link) */
inc_nlink(inode);
}

/* TODO 7: Fill inode and file operations for regular files
* (i_op and i_fop). Use the S_ISREG macro.
*/
if(S_ISREG(inode->i_mode)){
inode->i_op = &minfs_file_inode_operations;
inode->i_fop = &minfs_file_operations;
}

/* fill data for mii */
mii = container_of(inode, struct minfs_inode_info, vfs_inode);
/* TODO 4: uncomment after the minfs_inode is initialized */
mii->data_block = mi->data_block;
/* Free resources. */
/* TODO 4: uncomment after the buffer_head is initialized */
brelse(bh);
unlock_new_inode(inode);

return inode;

out_bad_sb:
iget_failed(inode);
return NULL;
}

static int minfs_readdir(struct file *filp, struct dir_context *ctx)
{
struct buffer_head *bh;
struct minfs_dir_entry *de;
struct minfs_inode_info *mii;
struct inode *inode;
struct super_block *sb;
int over;
int err = 0;

/* TODO 5: Get inode of directory and container inode. */
inode = file_inode(filp);
mii = container_of(inode, struct minfs_inode_info, vfs_inode);
/* TODO 5: Get superblock from inode (i_sb). */
sb = inode->i_sb;
/* TODO 5: Read data block for directory inode. */
bh = sb_bread(sb, mii->data_block);
if (bh == NULL) {
err = -ENOMEM;
goto out_bad_sb;
}
for (; ctx->pos < MINFS_NUM_ENTRIES; ctx->pos++) {
/* TODO 5: Data block contains an array of
* "struct minfs_dir_entry". Use `de' for storing.
*/
de = (struct minfs_dir_entry *) bh->b_data + ctx->pos;
/* TODO 5: Step over empty entries (de->ino == 0). */
if (de->ino == 0) {
continue;
}
/*
* Use `over` to store return value of dir_emit and exit
* if required.
*/
over = dir_emit(ctx, de->name, MINFS_NAME_LEN, de->ino,
DT_UNKNOWN);
if (over) {
printk(KERN_DEBUG "Read %s from folder %s, ctx->pos: %lld\n",
de->name,
filp->f_path.dentry->d_name.name,
ctx->pos);
ctx->pos++;
goto done;
}
}

done:
brelse(bh);
out_bad_sb:
return err;
}

/*
* Find dentry in parent folder. Return parent folder's data buffer_head.
*/

static struct minfs_dir_entry *minfs_find_entry(struct dentry *dentry,
struct buffer_head **bhp)
{
struct buffer_head *bh;
struct inode *dir = dentry->d_parent->d_inode;
struct minfs_inode_info *mii = container_of(dir,
struct minfs_inode_info, vfs_inode);
struct super_block *sb = dir->i_sb;
const char *name = dentry->d_name.name;
struct minfs_dir_entry *final_de = NULL;
struct minfs_dir_entry *de;
int i;

/* TODO 6: Read parent folder data block (contains dentries).
* Fill bhp with return value.
*/
bh = sb_bread(sb,mii->data_block);
if (bh == NULL) {
return NULL;
}
*bhp = bh;
for (i = 0; i < MINFS_NUM_ENTRIES; i++) {
/* TODO 6: Traverse all entries, find entry by name
* Use `de' to traverse. Use `final_de' to store dentry
* found, if existing.
*/
de = ((struct minfs_dir_entry *) bh->b_data) + i;
if (de->ino != 0) {
/* found it */
if (strcmp(name, de->name) == 0) {
printk(KERN_DEBUG "Found entry %s on position: %zd\n",
name, i);
final_de = de;
break;
}
}
}

/* bh needs to be released by caller. */
return final_de;
}

static struct dentry *minfs_lookup(struct inode *dir,
struct dentry *dentry, unsigned int flags)
{
/* TODO 6: Comment line. */
return simple_lookup(dir, dentry, flags);

struct super_block *sb = dir->i_sb;
struct minfs_dir_entry *de;
struct buffer_head *bh = NULL;
struct inode *inode = NULL;

dentry->d_op = sb->s_root->d_op;

de = minfs_find_entry(dentry, &bh);
if (de != NULL) {
printk(KERN_DEBUG "getting entry: name: %s, ino: %d\n",
de->name, de->ino);
inode = minfs_iget(sb, de->ino);
if (IS_ERR(inode))
return ERR_CAST(inode);
}

d_add(dentry, inode);
brelse(bh);

printk(KERN_DEBUG "looked up dentry %s\n", dentry->d_name.name);

return NULL;
}

static struct inode *minfs_alloc_inode(struct super_block *s)
{
struct minfs_inode_info *mii;

/* TODO 3: Allocate minfs_inode_info. */
mii = (struct minfs_inode_info *)kmalloc(sizeof(struct minfs_inode_info),0);
/* TODO 3: init VFS inode in minfs_inode_info */
inode_init_once(&mii->vfs_inode);
return &mii->vfs_inode;
}

static void minfs_destroy_inode(struct inode *inode)
{
/* TODO 3: free minfs_inode_info */
struct minfs_inode_info *mii = container_of(inode, struct minfs_inode_info, vfs_inode);
kfree(mii);
}

/*
* Create a new VFS inode. Do basic initialization and fill imap.
*/

static struct inode *minfs_new_inode(struct inode *dir)
{
struct super_block *sb = dir->i_sb;
struct minfs_sb_info *sbi = sb->s_fs_info;
struct inode *inode;
int idx;

/* TODO 7: Find first available inode. */
idx = find_first_zero_bit(&sbi->imap, MINFS_NUM_INODES);
if (idx == MINFS_NUM_INODES) {
printk(LOG_LEVEL "no space left in imap\n");
return NULL;
}
/* TODO 7: Mark the inode as used in the bitmap and mark
* the superblock buffer head as dirty.
*/
__test_and_set_bit(idx, &sbi->imap);
mark_buffer_dirty(sbi->sbh);

/* TODO 7: Call new_inode(), fill inode fields
* and insert inode into inode hash table.
*/
inode = new_inode(sb);
inode->i_uid = current_fsuid();
inode->i_gid = current_fsgid();
inode->i_ino = idx;
inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
inode->i_blocks = 0;
insert_inode_hash(inode);
/* Actual writing to the disk will be done in minfs_write_inode,
* which will be called at a later time.
*/

return inode;
}

/*
* Add dentry link on parent inode disk structure.
*/

static int minfs_add_link(struct dentry *dentry, struct inode *inode)
{
struct buffer_head *bh;
struct inode *dir;
struct super_block *sb;
struct minfs_inode_info *mii;
struct minfs_dir_entry *de;
int i;
int err = 0;

/* TODO 7: Get: directory inode (in inode); containing inode (in mii); superblock (in sb). */
dir = dentry->d_parent->d_inode;
mii = container_of(dir, struct minfs_inode_info, vfs_inode);
sb = dir->i_sb;
/* TODO 7: Read dir data block (use sb_bread). */
bh = sb_bread(sb, mii->data_block);
/* TODO 7: Find first free dentry (de->ino == 0). */
for (i = 0; i < MINFS_NUM_ENTRIES; i++) {
de = (struct minfs_dir_entry *) bh->b_data + i;
if (de->ino == 0)
break;
}

if (i == MINFS_NUM_ENTRIES) {
err = -ENOSPC;
goto out;
}
/* TODO 7: Place new entry in the available slot. Mark buffer_head
* as dirty. */
de->ino = inode->i_ino;
memcpy(de->name, dentry->d_name.name, MINFS_NAME_LEN);
dir->i_mtime = dir->i_ctime = current_time(inode);
mark_buffer_dirty(bh);

out:
brelse(bh);

return err;
}

/*
* Create a VFS file inode. Use minfs_file_... operations.
*/

static int minfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
bool excl)
{
struct inode *inode;
struct minfs_inode_info *mii;
int err;

inode = minfs_new_inode(dir);
if (inode == NULL) {
printk(LOG_LEVEL "error allocating new inode\n");
err = -ENOMEM;
goto err_new_inode;
}

inode->i_mode = mode;
inode->i_op = &minfs_file_inode_operations;
inode->i_fop = &minfs_file_operations;
mii = container_of(inode, struct minfs_inode_info, vfs_inode);
mii->data_block = MINFS_FIRST_DATA_BLOCK + inode->i_ino;

err = minfs_add_link(dentry, inode);
if (err != 0)
goto err_add_link;

d_instantiate(dentry, inode);
mark_inode_dirty(inode);

printk(KERN_DEBUG "new file inode created (ino = %lu)\n",
inode->i_ino);

return 0;

err_add_link:
inode_dec_link_count(inode);
iput(inode);
err_new_inode:
return err;
}

/*
* Write VFS inode contents to disk inode.
*/

static int minfs_write_inode(struct inode *inode,
struct writeback_control *wbc)
{
struct super_block *sb = inode->i_sb;
struct minfs_inode *mi;
struct minfs_inode_info *mii = container_of(inode,
struct minfs_inode_info, vfs_inode);
struct buffer_head *bh;
int err = 0;

bh = sb_bread(sb, MINFS_INODE_BLOCK);
if (bh == NULL) {
printk(LOG_LEVEL "could not read block\n");
err = -ENOMEM;
goto out;
}

mi = (struct minfs_inode *) bh->b_data + inode->i_ino;

/* fill disk inode */
mi->mode = inode->i_mode;
mi->uid = i_uid_read(inode);
mi->gid = i_gid_read(inode);
mi->size = inode->i_size;
mi->data_block = mii->data_block;

printk(KERN_DEBUG "mode is %05o; data_block is %d\n", mi->mode,
mii->data_block);

mark_buffer_dirty(bh);
brelse(bh);

printk(KERN_DEBUG "wrote inode %lu\n", inode->i_ino);

out:
return err;
}

static void minfs_put_super(struct super_block *sb)
{
struct minfs_sb_info *sbi = sb->s_fs_info;

/* Free superblock buffer head. */
mark_buffer_dirty(sbi->sbh);
brelse(sbi->sbh);

printk(KERN_DEBUG "released superblock resources\n");
}

static const struct super_operations minfs_ops = {
.statfs = simple_statfs,
.put_super = minfs_put_super,
/* TODO 4: add alloc and destroy inode functions */
.alloc_inode = minfs_alloc_inode,
.destroy_inode = minfs_destroy_inode,
/* TODO 7: = set write_inode function. */
.write_inode = minfs_write_inode,
};

static int minfs_fill_super(struct super_block *s, void *data, int silent)
{
struct minfs_sb_info *sbi;
struct minfs_super_block *ms;
struct inode *root_inode;
struct dentry *root_dentry;
struct buffer_head *bh;
int ret = -EINVAL;

sbi = kzalloc(sizeof(struct minfs_sb_info), GFP_KERNEL);
if (!sbi)
return -ENOMEM;
s->s_fs_info = sbi;

/* Set block size for superblock. */
if (!sb_set_blocksize(s, MINFS_BLOCK_SIZE))
goto out_bad_blocksize;

/* TODO 2: Read block with superblock. It's the first block on
* the device, i.e. the block with the index 0. This is the index
* to be passed to sb_bread().
*/
bh = sb_bread(s,0);
if(bh == NULL){
goto out_bad_sb;
}
/* TODO 2: interpret read data as minfs_super_block */
ms = (struct minfs_super_block*)bh->b_data;
/* TODO 2: check magic number with value defined in minfs.h. jump to out_bad_magic if not suitable */
if(ms->magic != MINFS_MAGIC){
goto out_bad_magic;
}
/* TODO 2: fill super_block with magic_number, super_operations */
s->s_magic = MINFS_MAGIC;
s->s_op = &minfs_ops;
/* TODO 2: Fill sbi with rest of information from disk superblock
* (i.e. version).
*/
sbi->version = ms->version;
sbi->imap = ms->imap;
/* allocate root inode and root dentry */
/* TODO 2: use myfs_get_inode instead of minfs_iget */
root_inode = minfs_iget(s, MINFS_ROOT_INODE);
if (!root_inode)
goto out_bad_inode;

root_dentry = d_make_root(root_inode);
if (!root_dentry)
goto out_iput;
s->s_root = root_dentry;

/* Store superblock buffer_head for further use. */
sbi->sbh = bh;

return 0;

out_iput:
iput(root_inode);
out_bad_inode:
printk(LOG_LEVEL "bad inode\n");
out_bad_magic:
printk(LOG_LEVEL "bad magic number\n");
brelse(bh);
out_bad_sb:
printk(LOG_LEVEL "error reading buffer_head\n");
out_bad_blocksize:
printk(LOG_LEVEL "bad block size\n");
s->s_fs_info = NULL;
kfree(sbi);
return ret;
}

static struct dentry *minfs_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
/* TODO 1: call superblock mount function */
return mount_bdev(fs_type, flags, dev_name, data, minfs_fill_super);
}

static struct file_system_type minfs_fs_type = {
.owner = THIS_MODULE,
.name = "minfs",
/* TODO 1: add mount, kill_sb and fs_flags */
.mount = minfs_mount,
.kill_sb = kill_litter_super,
.fs_flags = FS_USERNS_MOUNT,
};

static int __init minfs_init(void)
{
int err;

err = register_filesystem(&minfs_fs_type);
if (err) {
printk(LOG_LEVEL "register_filesystem failed\n");
return err;
}

return 0;
}

static void __exit minfs_exit(void)
{
unregister_filesystem(&minfs_fs_type);
}

module_init(minfs_init);
module_exit(minfs_exit);
  • 结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
root@qemux86:~/skels/filesystems/minfs/user# set -ex
root@qemux86:~/skels/filesystems/minfs/user# insmod ../kernel/minfs.ko
+ insmod ../kernel/minfs.ko
minfs: loading out-of-tree module taints kernel.
root@qemux86:~/skels/filesystems/minfs/user# mkdir -p /mnt/minfs
+ mkdir -p /mnt/minfs
root@qemux86:~/skels/filesystems/minfs/user# ./mkfs.minfs /dev/vdb
+ ./mkfs.minfs /dev/vdb
root@qemux86:~/skels/filesystems/minfs/user# mount -t minfs /dev/vdb /mnt/minfs
+ mount -t minfs /dev/vdb /mnt/minfs
root@qemux86:~/skels/filesystems/minfs/user# cat /proc/filesystems | grep minfs
+ cat /proc/filesystems
+ grep minfs
nodev minfs
root@qemux86:~/skels/filesystems/minfs/user# cat /proc/mounts | grep minfs
+ + grep minfs
cat /proc/mounts
/dev/vdb /mnt/minfs minfs rw,relatime 0 0
root@qemux86:~/skels/filesystems/minfs/user# stat -f /mnt/minfs
+ stat -f /mnt/minfs
File: "/mnt/minfs"
ID: 0 Namelen: 255 Type: UNKNOWN
Block size: 4096
Blocks: Total: 0 Free: 0 Available: 0
Inodes: Total: 0 Free: 0
root@qemux86:~/skels/filesystems/minfs/user# cd /mnt/minfs
+ cd /mnt/minfs
root@qemux86:/mnt/minfs# ls -la
Read a.txt from folder /, ctx->pos: 0
ls: ./a.txt: No such file or directory
root@qemux86:/mnt/minfs# mode is 40755; data_block is 2
wrote inode 0

root@qemux86:/mnt/minfs# cd ..
root@qemux86:/mnt# umount /mnt/minfs
released superblock resources
root@qemux86:/mnt# rmmod minfs
  • 感觉本实验其实就主要完成了两个工作:
    • register_filesystem(&minfs_fs_type)unregister_filesystem(&minfs_fs_type)
    • 其他的操作都是对上面这两个操作的完善
  • 新注册文件系统只有一个操作是需要由我们完成的:minfs_mount
1
2
3
4
5
6
7
static struct file_system_type minfs_fs_type = {
.owner = THIS_MODULE,
.name = "minfs",
.mount = minfs_mount,
.kill_sb = kill_block_super,
.fs_flags = FS_USERNS_MOUNT,
};
  • 而在 minfs_mount 我们又只需要完成用于填充超级块的函数 minfs_fill_super
  • 这一部分和模板差不多,使用 sb_bread 读出超级块,效验 magic number 并且把超级块的信息填入 minfs_sb_info,我们需要完成 minfs_ops 中的函数:
1
2
3
4
5
6
7
static const struct super_operations minfs_ops = {
.statfs = simple_statfs,
.put_super = minfs_put_super,
.alloc_inode = minfs_alloc_inode,
.destroy_inode = minfs_destroy_inode,
.write_inode = minfs_write_inode,
};
  • 另外程序用于读取 inode 的 minfs_iget 函数需要实现
  • minfs_iget 中:
    • 先是使用 iget_locked(s, ino) 从挂载的文件系统获取 inode
    • 然后就是对 inode 的初始化,分配 address_space_operations
    • 再根据 inode 类型为其分配对应的 inode_operationsfile_operations
    • 最后返回 inode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
static const struct file_operations minfs_dir_operations = {
.read = generic_read_dir,
.iterate = minfs_readdir,
};

static const struct file_operations minfs_file_operations = {
.read_iter = generic_file_read_iter,
.write_iter = generic_file_write_iter,
.mmap = generic_file_mmap,
.llseek = generic_file_llseek,
};

static const struct inode_operations minfs_dir_inode_operations = {
.lookup = minfs_lookup,
.create = minfs_create,
};

static const struct inode_operations minfs_file_inode_operations = {
.getattr = simple_getattr,
};

static const struct address_space_operations minfs_aops = {
.readpage = simple_readpage,
.write_begin = simple_write_begin,
.write_end = simple_write_end,
};
  • 其中又需要我们实现的函数有:minfs_create minfs_lookup minfs_readdir
  • 借助参考答案和多次试错,感觉大体的流程清楚了,不过细节还需要打磨