实现简易Shell
流程
shell执行一系列的读/求值步骤,然后终止
- 读步骤 读取来自用户的一个命令行 fgets
- 求值步骤 解析命令行,并代表用户运行 eval
- 2.1 前台 or 后台 parseline
- 2.2 whether shell内置命令 or executable可执行文件 builtin_command
- 2.3 not builtin_command — execve / builtin_command — execute immediately
tsh原理流程图(不严谨画的,但意思是那个意思)
几个值得注意的地方。
- 由tsh fork出的子进程所在的进程组所执行的job 通过job list : jobs 管理。每个job结构体记录job的pid(进程组组长的,也即该job的),jid,state,cmdline。
tsh的前台进程组就是占据了tsh输入的进程组
waitfg做的事情就是占据终端。
- 通过busy loop实现,一直spin自旋。
- 直到检测到前台进程组组长改变。
脚本运行bug:解释器错误: 没有那个文件或目录
- windows下,每一行的结尾是\n\r,而在linux下文件的结尾是\n,那么你在windows下编辑过的文件在linux下打开看的时候每一行的结尾就会多出来一个字符\r,用cat -A yourfile时你可以看到这个\r字符被显示为^M,这时候只需要删除这个字符就可以了。
- ssh连接linux
- 但是还是编辑在windows下的vscode下
- 记得去掉trace(不然参数传递的个数会与预期不符)和sdriver.pl中多余的回车符号 \r(ASCII13)
sed -i 's/\r$//' yourfile
限制比较字符个数也可通过nstrncmp
信号处理
main(parent)
- 对于终端键入的SIGINT以及SIGTSTP信号,tsh的main(parent)负责对其进行forward转发,转发给foreground(通过sigint_handler和sigtstp_handler)
- 对于CHILD死亡后产生的SIGCHLD信号,tsh的main(parent)负责wait、回收。(通过sigchld_handler)
child
- 根据以下 child会继承parent的handler,但是由于execve,因此,不必手动恢复默认,child对SIGINT,SIGTSTP,SIGCHLD的信号都是默认的或由child execve的prog自己决定。
1
2
3// 没必要手动恢复
Signal(SIGINT, SIG_DFL); // change to default (so child progess would terminated instead of forwarding the signal)
Signal(SIGTSTP, SIG_DFL); // change to default (so child progess would stopped instead of forwarding the signal) - A child created via fork(2) inherits a copy of its parent’s signal dispositions.
- During an execve(2), the dispositions of handled signals are reset to the default; the dispositions of ignored signals are left unchanged.
- 根据以下 child会继承parent的handler,但是由于execve,因此,不必手动恢复默认,child对SIGINT,SIGTSTP,SIGCHLD的信号都是默认的或由child execve的prog自己决定。
Why parent should forward SIGINT and SIGTSTP ?
- 信号机制中 信号由前台进程组的所有信号接收
- fork出的子进程,默认和其父进程同属一个进程组,而我们的tsh是运行在Linux shell的foreground中,因此tsh以及其fork出的child都是Linux shell的foreground
- 所以,如果键入一个信号,将发送给我们tsh fork出的所有子进程,并且也包括tsh的main
- 所以我们的tsh fork出的每个子进程,在execve之前,都要独立成一个进程组(setpgid(0,0))
- 这样保证Linux shell的前台进程组只有我们的tsh
- 当键入信号时,shell可以将信号正确转发给应当接收信号的进程组
- This ensures that there will be only one process, your shell, in the foreground process group. When you type ctrl-c, the shell should catch the resulting SIGINT and then forward it to the appropriate foreground job (or more precisely, the process group that contains the foreground)
- return value of waitpid
- on success, returns the process ID of the child whose state has changed;
- if WNOHANG was specified and one or more child(ren) specified by pid exist, but have not yet changed state, then 0 is returned.
- On error, -1 is returned.
waitpid(-1, &status, 0)
- block等待parent的所有child死亡。
- return -1 && errno = echild 代表子进程已经全部回收。
waitpid(-1, &status, WNOHANG)
- 不block等待parent的child死亡
- return 0 if progess of set is not terminated(change state)
waitpid(-1, &status, WNOHANG | WUNTRACED)
- 不block等待set中的progess terminated or stopped
- WUNTRACED 与默认不同。默认等待terminated发出的SIGCHLD,WUNTRACED等待STOPPED发出的SIGCHLD
- == 0 剩余的子进程都没terminate/stop
- == -1 error
- 好难找:这条只能在之前的waitpid是block的时候用!(检验是否还有剩余的子进程(正在运行的、停止的、僵尸的都算剩余的(exits))。因为如果是WNOHANG,此时出while后,set中仍然有unwaited-for child,打印errno是success,不是ECHILD
1
2
3
4ECHILD : The calling process does not have any unwaited-for children.
if(errno! = ECHILD){
unix_error
}
- 关于SIGCHLD信号应当由waitfg还是sigchld_handler来wait ANSWER FROM HINT
- One of the tricky parts of the assignment is deciding on the allocation of work between the waitfg and sigchld handler functions. We recommend the following approach:
– In waitfg, use a busy loop around the sleep function.
– In sigchld handler, use exactly one call to waitpid - While other solutions are possible, such as calling waitpid in both waitfg and sigchld handler, these can be very confusing. It is simpler to do all reaping in the handler.(qs,之前没看hint,直接写,大多数情况SIGCHLD是被sigchld handler接收了)
- One of the tricky parts of the assignment is deciding on the allocation of work between the waitfg and sigchld handler functions. We recommend the following approach:
父进程死了,子进程在之前被停止,那么子进程不会死。如下,子进程被stopped,tsh死了之后,子进程仍然是stopped
1
2
3
42 53179 0 0 ? -1 I 0 0:00 [kworker/u256:2-]
1 53235 53234 40687 ? -1 R 1000 1:38 ./tsh -p
**53235 53239 53239 40687 ? -1 T 1000 0:00 ./myspin 5**
4522 53604 4522 4522 ? -1 S 1000 0:00 sleep 180在操作全局的data structure :jobs时,建议
上锁,block所有信号,以防止信号打断触发的handler里面对jobs进行modify。- 在tsh中,和 main 为并发关系,且可以操作jobs的,就是main注册handler。他们均有可能操作jobs structure ,因此要先将所有信号阻塞 防止打断 ,再操作jobs。
关于main的addjob和child 的 data race
- 不能假设addjob在child的excve之前
- 如果child先exit,那么就会deletejob一个没记录的job,又会addjob一个不存在的job。
- 所以应当先为parent block掉SIGCHLD(子进程不block)
- 当addjob之后,parent再unblock掉SIGCHLD
- 不能假设addjob在child的excve之前
Linux Shell bg和fg的原理是通过SIGCONT信号
- 用bg或fg命令会发SIGCONT, 也可以用kill -SIGCONT PID
- fg比起bg就是多了一个占据输入(调用waitfg)(用spin锁 busy loop)
结果
- 举几个trace
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51shc@shc-virtual-machine:~/Shared_ln/csapp/shlab$ ./sdriver.pl -t trace03.txt -s ./tsh -a "-p"
#
# trace03.txt - Run a foreground job.
#
tsh> quit
shc@shc-virtual-machine:~/Shared_ln/csapp/shlab$ ./sdriver.pl -t trace04.txt -s ./tsh -a "-p"
#
# trace04.txt - Run a background job.
#
tsh> ./myspin 1 &
[1] (66506) ./myspin 1 &
shc@shc-virtual-machine:~/Shared_ln/csapp/shlab$ ./sdriver.pl -t trace05.txt -s ./tsh -a "-p"
#
# trace05.txt - Process jobs builtin command.
#
tsh> ./myspin 2 &
[1] (66540) ./myspin 2 &
tsh> ./myspin 3 &
[2] (66542) ./myspin 3 &
tsh> jobs
[1] (66540) Running ./myspin 2 &
[2] (66542) Running ./myspin 3 &
shc@shc-virtual-machine:~/Shared_ln/csapp/shlab$ ./sdriver.pl -t trace15.txt -s ./tsh -a "-p"
#
# trace15.txt - Putting it all together
#
tsh> ./bogus
./bogus: Command not found
tsh> ./myspin 10
Job [0] (0) killed by signal 2
tsh> ./myspin 3 &
[1] (66596) ./myspin 3 &
tsh> ./myspin 4 &
[2] (66598) ./myspin 4 &
tsh> jobs
[1] (66596) Running ./myspin 3 &
[2] (66598) Running ./myspin 4 &
tsh> fg %1
Job [1] (66596) stopped by signal 20
tsh> jobs
[1] (66596) Stopped ./myspin 3 &
[2] (66598) Running ./myspin 4 &
tsh> bg %3
%3: No such job
tsh> bg %1
[1] (66596) ./myspin 3 &
tsh> jobs
[1] (66596) Running ./myspin 3 &
[2] (66598) Running ./myspin 4 &
tsh> fg %1
tsh> quit