2022年6月7日星期二

------------

这也都没有Windows环境了,只好在网上找了几个文章看看。

就是到了2019年,也还有搞这个的。相比十几年前的通过firewire接口,现在的通过PCI-E接口。还没仔细看,大概好像是用块PCI-E的fpga开发板,插上以后搜内存。


好消息是,win10的验证系统和xp基本没变,还是有msv1_0.dll

https://docs.microsoft.com/en-us/windows/win32/secauthn/msv1-0-authentication-package


最终核心函数还是MsvpPasswordValidate,我自己这没环境,看文章里的图

https://www.synacktiv.com/en/publications/practical-dma-attack-on-windows-10.html


 最终比较rc4的hash的是用RtlCompareMemory

让朋友帮着u了份汇编

0:003> uf RtlCompareMemory
ntdll!RtlCompareMemory:
76ff6970 56 push esi
76ff6971 57 push edi
76ff6972 fc cld
76ff6973 8b74240c mov esi,dword ptr [esp+0Ch]
76ff6977 8b7c2410 mov edi,dword ptr [esp+10h]
76ff697b 8b4c2414 mov ecx,dword ptr [esp+14h]
76ff697f c1e902 shr ecx,2
76ff6982 7404 je ntdll!RtlCompareMemory+0x18 (76ff6988)

ntdll!RtlCompareMemory+0x14:
76ff6984 f3a7 repe cmps dword ptr [esi],dword ptr es:[edi]
76ff6986 7516 jne ntdll!RtlCompareMemory+0x2e (76ff699e)

ntdll!RtlCompareMemory+0x18:
76ff6988 8b4c2414 mov ecx,dword ptr [esp+14h]
76ff698c 83e103 and ecx,3
76ff698f 7404 je ntdll!RtlCompareMemory+0x25 (76ff6995)

ntdll!RtlCompareMemory+0x21:
76ff6991 f3a6 repe cmps byte ptr [esi],byte ptr es:[edi]
76ff6993 7516 jne ntdll!RtlCompareMemory+0x3b (76ff69ab)

ntdll!RtlCompareMemory+0x25:
76ff6995 8b442414 mov eax,dword ptr [esp+14h]
76ff6999 5f pop edi
76ff699a 5e pop esi
76ff699b c20c00 ret 0Ch

ntdll!RtlCompareMemory+0x2e:
76ff699e 83ee04 sub esi,4
76ff69a1 83ef04 sub edi,4
76ff69a4 b904000000 mov ecx,4
76ff69a9 f3a6 repe cmps byte ptr [esi],byte ptr es:[edi]

ntdll!RtlCompareMemory+0x3b:
76ff69ab 4e dec esi
76ff69ac 2b74240c sub esi,dword ptr [esp+0Ch]
76ff69b0 8bc6 mov eax,esi
76ff69b2 5f pop edi
76ff69b3 5e          


如果要比较的内存长度是4字节对齐的,就走红色代码部分。

rc4 hash是16字节,这个时32bit系统。

所以要搞的就是repe cmps byte ptr [esi],byte ptr es:[edi]

比如密码123的rc4 hash是

97e6bd3d a79016d7 eb4b2069 78362812 


repe应该是prefix,先看看qemu tcg里怎么搞。


64位的RtlCompareMemory

0:000> uf RtlCompareMemory
ntdll!RtlCompareMemory:
00007ff9`1cbb06f0 57              push    rdi
00007ff9`1cbb06f1 56              push    rsi
00007ff9`1cbb06f2 488bf1          mov     rsi,rcx
00007ff9`1cbb06f5 488bfa          mov     rdi,rdx
00007ff9`1cbb06f8 33d1            xor     edx,ecx
00007ff9`1cbb06fa 83e207          and     edx,7
00007ff9`1cbb06fd 7553            jne     ntdll!RtlCompareMemory+0x62 (00007ff9`1cbb0752)

ntdll!RtlCompareMemory+0xf:
00007ff9`1cbb06ff 4983f808        cmp     r8,8
00007ff9`1cbb0703 724d            jb      ntdll!RtlCompareMemory+0x62 (00007ff9`1cbb0752)

ntdll!RtlCompareMemory+0x15:
00007ff9`1cbb0705 4c8bcf          mov     r9,rdi
00007ff9`1cbb0708 f7d9            neg     ecx
00007ff9`1cbb070a 83e107          and     ecx,7
00007ff9`1cbb070d 7407            je      ntdll!RtlCompareMemory+0x26 (00007ff9`1cbb0716)

ntdll!RtlCompareMemory+0x1f:
00007ff9`1cbb070f 4c2bc1          sub     r8,rcx
00007ff9`1cbb0712 f3a6            repe cmps byte ptr [rsi],byte ptr [rdi]
00007ff9`1cbb0714 7530            jne     ntdll!RtlCompareMemory+0x56 (00007ff9`1cbb0746)

ntdll!RtlCompareMemory+0x26:
00007ff9`1cbb0716 498bc8          mov     rcx,r8
00007ff9`1cbb0719 4883e1f8        and     rcx,0FFFFFFFFFFFFFFF8h
00007ff9`1cbb071d 741b            je      ntdll!RtlCompareMemory+0x4a (00007ff9`1cbb073a)

ntdll!RtlCompareMemory+0x2f:
00007ff9`1cbb071f 4c2bc1          sub     r8,rcx
00007ff9`1cbb0722 48c1e903        shr     rcx,3
00007ff9`1cbb0726 f348a7          repe cmps qword ptr [rsi],qword ptr [rdi]
00007ff9`1cbb0729 740f            je      ntdll!RtlCompareMemory+0x4a (00007ff9`1cbb073a)

ntdll!RtlCompareMemory+0x3b:
00007ff9`1cbb072b 48ffc1          inc     rcx
00007ff9`1cbb072e 4883ee08        sub     rsi,8
00007ff9`1cbb0732 4883ef08        sub     rdi,8
00007ff9`1cbb0736 48c1e103        shl     rcx,3

ntdll!RtlCompareMemory+0x4a:
00007ff9`1cbb073a 4c03c1          add     r8,rcx
00007ff9`1cbb073d 740a            je      ntdll!RtlCompareMemory+0x59 (00007ff9`1cbb0749)

ntdll!RtlCompareMemory+0x4f:
00007ff9`1cbb073f 498bc8          mov     rcx,r8
00007ff9`1cbb0742 f3a6            repe cmps byte ptr [rsi],byte ptr [rdi]
00007ff9`1cbb0744 7403            je      ntdll!RtlCompareMemory+0x59 (00007ff9`1cbb0749)

ntdll!RtlCompareMemory+0x56:
00007ff9`1cbb0746 48ffcf          dec     rdi

ntdll!RtlCompareMemory+0x59:
00007ff9`1cbb0749 492bf9          sub     rdi,r9
00007ff9`1cbb074c 488bc7          mov     rax,rdi
00007ff9`1cbb074f 5e              pop     rsi
00007ff9`1cbb0750 5f              pop     rdi
00007ff9`1cbb0751 c3              ret

ntdll!RtlCompareMemory+0x62:
00007ff9`1cbb0752 4d85c0          test    r8,r8
00007ff9`1cbb0755 740d            je      ntdll!RtlCompareMemory+0x74 (00007ff9`1cbb0764)

ntdll!RtlCompareMemory+0x67:
00007ff9`1cbb0757 498bc8          mov     rcx,r8
00007ff9`1cbb075a f3a6            repe cmps byte ptr [rsi],byte ptr [rdi]
00007ff9`1cbb075c 7406            je      ntdll!RtlCompareMemory+0x74 (00007ff9`1cbb0764)

ntdll!RtlCompareMemory+0x6e:
00007ff9`1cbb075e 48ffc1          inc     rcx
00007ff9`1cbb0761 4c2bc1          sub     r8,rcx

ntdll!RtlCompareMemory+0x74:
00007ff9`1cbb0764 498bc0          mov     rax,r8
00007ff9`1cbb0767 5e              pop     rsi
00007ff9`1cbb0768 5f              pop     rdi
00007ff9`1cbb0769 c3              ret

2022年6月5日星期日

龙芯8089d装debian系统

 https://mirrors.cloud.tencent.com/loongson/install/

 

就这里还能找到龙芯8089d能用的debian iso了。

试了几个,就loongson2_debian6_20111010.tar.lzma 这个最好用。

 直接放u盘根目录里,开机按tab恢复安装就行。把vmlinux 也一起下载放到根目录。

网络配置也可以,能用wifi, 图形界面是gnome的。

gnome太慢了,换成了i3,就还可以凑合用了 。


软件源就这里说的这些,还可以用,就是密钥的问题总解决不了。

只是apt-get update不行了,但装软件没事。

https://www.jianshu.com/p/5cdb7fb4b6a8

 

 

 



2022年4月4日星期一

QEMU tcg configure

 current

 

# Configured with: '../configure' '--target-list=i386-softmmu' '--enable-tcg-interpreter' '--enable-debug-tcg' '--enable-debug-info'

 

then make

 

 

try 

# Configured with: '../configure' '--target-list=x86_64-softmmu' '--enable-tcg-interpreter' '--enable-debug-tcg' '--enable-debug-info'

works but slow


../configure --target-list=x86_64-softmmu

 只有这个参数就够了,就是跑的tcg。

2022年3月15日星期二

看看 integer指令怎么通过issue & ex1 & ex2

I want to trace how the "add" instruction goes through the ex1 and ex2 stages. Because there are several things that I feel weird about. For one, there are ALUs in both ex1 and ex2. And for simple arithmetic operations such as add, it only takes one cycle to finish.

I traced ex1_port0_a, ex1_port0_b, ex1_port0_op, ex1_port0_double, ex1_port0_c, ex1_port0_ignore_a, ex1_port0_ignore_b, ex1_port0_b_get_a. They come from the issue stage and go to ex1. They are calculated in the ALU  and get the result ex1_alu0_res, but the result returns to the issue stage. I guess this is for the forwarding-related stuff. Then, the ex1 stage passes those signals to the ex2 stage as ex2_port0_a, ex2_port0_b, etc.

I guess it will be calculated again. I get confused because the ALU is not a tiny functional unit.

Let's see the forwarding-related code in the ex1_stage.

 737 ////forwarding related                                                                                                                                                                                
 738 //forwarding check                                                                                                                                                                                    
 739 assign r1_1_w1_fw =     ex2_port0_valid && (ex1_raddr0_0 == ex2_port0_rf_target) && (ex2_port0_rf_target != 5'd0);                                                                                    
 740 assign r1_2_w1_fw = ex2_port0_valid && (ex1_raddr0_1 == ex2_port0_rf_target) && (ex2_port0_rf_target != 5'd0);                                                                                        
 741 assign r1_1_w2_fw =     ex2_port1_valid && (ex1_raddr0_0 == ex2_port1_rf_target) && (ex2_port1_rf_target != 5'd0);                                                                                    
 742 assign r1_2_w2_fw = ex2_port1_valid && (ex1_raddr0_1 == ex2_port1_rf_target) && (ex2_port1_rf_target != 5'd0);


First of all, the naming is inconsistent.
r1_1_w1_fw means the reading of rf (raddr0_0) is conflicted with the writing of rf in the next cycle (ex2). This signal should be named r0_0_w1_fw.

后面再跟ex2_port0_a,


在issue模块的参数里,

 1255     //forwarding related                                                                                                                                                                             
 1256     .ex1_raddr0_0       (ex1_raddr0_0       ),                                                                                                                                                       
 1257     .ex1_raddr0_1       (ex1_raddr0_1       ),                                                                                                                                                       
 1258     .ex1_raddr1_0       (ex1_raddr1_0       ),                                                                                                                                                       
 1259     .ex1_raddr1_1       (ex1_raddr1_1       ),                                                                                                                                                       
 1260     .ex1_raddr2_0       (ex1_raddr2_0       ),                                                                                                                                                       
 1261     .ex1_raddr2_1       (ex1_raddr2_1       ),                                                                                                                                                       
 1262                                                                                                                                                                                                      
 1263     .ex1_alu0_res       (ex1_alu0_res       ),                                                                                                                                                       
 1264     .ex1_alu1_res       (ex1_alu1_res       ),                                                                                                                                                       
 1265     .ex1_bru_res        (bru_link_pc        ),                                                                                                                                                       
 1266     .ex1_none0_res      (ex1_none0_result   ),                                                                                                                                                       
 1267     .ex1_none1_res      (ex1_none1_result   ),                                                                                                                                                       
 1268                                                                                                                                                                                                      
 1269     .ex2_port0_src       (ex2_port0_src      ),                                                                                                                                                      
 1270     .ex2_port0_valid     (ex2_port0_valid    ),                                                                                                                                                      
 1271     .ex2_port0_rf_target (ex2_port0_rf_target),                                                                                                                                                      
 1272     .ex2_port1_src       (ex2_port1_src      ),                                                                                                                                                      
 1273     .ex2_port1_valid     (ex2_port1_valid    ),                                                                                                                                                      
 1274     .ex2_port1_rf_target (ex2_port1_rf_target),                                                                                                                                                      
 1275                                                                                                                                                                                                      
 1276     .ex2_alu0_res       (ex2_alu0_res       ),                                                                                                                                                       
 1277     .ex2_alu1_res       (ex2_alu1_res       ),                                                                                                                                                       
 1278     .ex2_lsu_res        (ex2_lsu_res        ),                                                                                                                                                       
 1279     .ex2_bru_res        (ex2_bru_link_pc    ),                                                                                                                                                       
 1280     .ex2_none0_res      (ex2_none0_result   ),                                                                                                                                                       
 1281     .ex2_none1_res      (ex2_none1_result   ),                                                                                                                                                       
 1282     .ex2_mul_res        (ex2_mul_res        ),                                                                                                                                                       
 1283     .ex2_div_res        (ex2_div_res        )                                                                                                                                                        
 1284 );

 

能看到ex1_port0_res ,ex2_port0_res,两个ex stage里的alu的结果都送回issue里用来forwarding了,这里肯定得有点问题。

 



2022年2月23日星期三

lsoc1000_stage_is的作用

 issue,自然就是发射执行到后面的各个功能模块里。gs232里一共有三个port,也就是port0,port1,port2。

port0和port1是什么指令都能执行,port2好像有点限制,似乎只能走bru指令,还没弄准。


在这里,指令分成几类,分别是lsu bru mul div和none。none主要是csr和tlb相关的指令。

先挖几个坑。

后面的ex1_stage里有alu0, alu1, lsu_s1, branch几个模块。再后面的ex2_stage里有alu0, alu1, lsu_s2, bru_s2模块。问题是,为什么alu要重复出现?mul,div模块都已经独立出来了,alu里应该只有加减法,bit操作和位移了,都是单周期就能完成的。

lsu模块有两个stage,可以理解。branch怎么还分branch和bru_s2模块?



一说起多发射,自然会想到同时发射的指令是否有冲突,比如当前port0是lsu指令,那port1就不能再发射一条lsu指令了。同样的,mul,div这种也都不能同时发射,因为后面的功能模块有限。

但在chiplab,这个功能并不是实现在issue里,而是在decode里(lsoc1000_stage_de2.v),叫做crash,比如 data crash

 201 wire data_crash_01 =   (((raddr1_0 == waddr0) && raddr1_0_valid) || ((raddr1_1 == waddr0) && raddr1_1_valid) || ((stx_read == waddr0) && triple_read_1) )                       

 202                     && de_port0_valid && rf_wen0 && (waddr0 != 5'd0);                                                                                                           

 203                                                                                                                                                                

 204 wire data_crash_02 =   (((raddr2_0 == waddr0) && raddr2_0_valid) || ((raddr2_1 == waddr0) && raddr2_1_valid))                                                                   

 205                     && de_port0_valid && rf_wen0 && (waddr0 != 5'd0);  

 206

 207 wire data_crash_12 =   (((raddr2_0 == waddr1) && raddr2_0_valid) || ((raddr2_1 == waddr1) && raddr2_1_valid))                                                                   

 208                     && de_port1_valid && rf_wen1 && (waddr1 != 5'd0);  


一旦有crash,port1就不会设置valid,就单发射了

285 wire crash          = unit_crash_01 || data_crash_01 || lsu_protect_01 || single_issue || csr_read_crash;


 524 always @(posedge clk) begin // internal valid                                                                                                                                   

 525     if (rst) valid1 <= 1'd0;                                                                                                                                                    

 526     else if (exception || eret || bru_cancel || wb_cancel) valid1 <= 1'b0;                                                                                                      

 527     else if (de_allow_in) valid1 <= (de_port1_valid&& !crash) || (de_port0_valid && port0_op[`LSOC1K_RDTIME]);                                                                  

 528 end   


再来说issue模块,主要作用是几个


1. 读寄存器,为后面的发射准备好参数。

2. 处理data forwarding

3. type_crash还没搞懂是怎么回事

2022年1月24日星期一

chiplab里的取指相关的信号

Chiplab ifu

chiplab里cpu取值令有一组信号给外层,外层负责实现cache和DRAM axi的通信。

ifu里 5 // group inst
6 output wire [31 :0] inst_addr ,
7 input wire inst_addr_ok ,
8 output wire inst_cancel ,
9 input wire [1 :0] inst_count ,
10 input wire inst_ex ,
11 input wire [5 :0] inst_exccode ,
12 input wire [127:0] inst_rdata ,
13 output wire inst_req ,
14 input wire inst_uncache ,
15 input wire inst_valid ,

其中,inst_addr是取指的地址,inst_req是发出取值请求。 比如pcbf的值assign给inst_addr,然后设置inst_req为1。等到指令成功取到以后,inst_valid输入为1。 像现在chiplab没有cache的情况下,inst_addr_ok和inst_valid会同时返回1。因为没cache,所以inst_uncache也同时返回1。数据则是由inst_rdata带回,可以看到最多带128bit,也就是16个字节,但现在每次都只带回4字节,高位为空。还不知道哪个信号可以控制这个。

inst_ex,inst_excode还没有看有啥用。

inst_cancel有意思,设为1后,当前的取值并不会停,inst_rdata和inst_addr_ok都还会正常来,但inst_valid不会来1了。要是在inst_cancel后重新给inst_addr并且inst_req,等inst_valid再来就是取回新的指令了。


我现在用inst_valid和其它信号来控制pc_bf到pc_f的过程。 如果遇到branch指令,在不用delay slot的时候,是需要刷流水线寄存器的。br_cancel是由ex2里brucancel_ex2过来。其实这已经在很深的流水线里了,经过了de is ex1。如果ipc是1的话,if段已经过了3条指令了,这时候要刷de is ex1。但由于现在的chiplab没有cache,而是直接通过axi来取值,所以需要6个cycle才能取指。


 

从inst_valid开始,de_port0_valid, de_port0_pc, de_port0_inst才有效并准备进入de的pipeline register。下一个clock,进入de pipeline并decode, 然后is_port0_valid, is_port0_pc,is_port0_inst, is_port0_op才有效。 这里pc_bf和pc_f由于刚才的inst_valid,各自+4了。再下一个clock,br_cancel来了(现在还比较奇怪为什么br_cancel好像来的有点早,不应该刚到ex1吗)。这时,pcbf是加过4的,所以在取值下一条指令。但这时候因为brcancel的反馈,所以pcbf马上要转到br_target。这就导致下一个inst_valid到的时候,pc_f和inst的不一致。


 

 

 

 

2022年1月6日星期四

OpenSPARC T1是怎样刷流水线?

在ifu/rtl/sparc_ifu_fcl.v里看到这么一段

//-------------------------
// Rollback
//-------------------------

   // 04/05/02
   // Looks like we made a mistake with rollback.  Should never
   // rollback to S.  In the event of a dmiss or mul contention, just
   // kill all the instructions and rollback to F.  This adds one
   // cycle to the dmiss penalty and to the mul latency if we have to
   // wait, both not a very high price to pay.  This would have saved
   // lots of hours of design and verif time.
   //    
   assign rb2_inst_d = thr_match_dw & inst_vld_d & dtu_fcl_rollback_g;
   assign rb1_inst_s = thr_match_fw & inst_vld_s & dtu_fcl_rollback_g;
   assign rb0_inst_bf = thr_match_nw & switch_bf & dtu_fcl_rollback_g;


   assign retract_iferr_d1 = erb_dtu_ifeterr_d1 & inst_vld_d1;

   assign retract_inst_d = retract_iferr_d1 & thr_match_de &
                           fcl_dtu_inst_vld_d |
                           mark4rb_d |
                           dtu_fcl_retract_d;

   assign rt1_inst_s = thr_match_fd & inst_vld_s & dtu_fcl_retract_d |
                       mark4rb_s;


看来是既有rollback (rb)也有kill。



   // determine rollback amount
   assign rb_frome = {4{(rb2_inst_e | rt2_inst_e) &
                        (inst_vld_e | intr_vld_e)}} & thr_e;
   assign rb_fromd = {4{(rb1_inst_d | rt1_inst_d) &
                        (inst_vld_d | intr_vld_d)}} & thr_d;
   assign rb_froms = {4{rb_stg_s & inst_vld_s_crit}} & thr_f;
   assign rb_w2 = rb_frome | rb_fromd;
   assign rb_for_iferr_e = {4{retract_iferr_e}} & thr_e;



感觉应该有控制流水线寄存器的部分,但还没找到。。

我之前清流水线是把流水线寄存器reset,流水线寄存器清0。相当于加入流水线泡泡 NOP就是00000000。

 

发现opensparc t1不是这个思路。一条指令最终被执行,实质就是它改写了寄存器或者改写了内存,或者是对其它系统状态结果有影响。比如一个add指令, add x2 x0 x3,把x2寄存器的值给到x3。这条指令可以在inst流水线寄存器里被清0,也可以最终不写regfile,这样这条指令也相当于没执行(其实是执行了,但是没有效果)。

 opensparc应该就是这种方式的。比如这个ifu_exe_kill_e信号,注释也说的很清楚了

(ifu产生的,发给exu,kill当前流水线E里的这条指令)

input        ifu_exu_kill_e;         // kill instruction in e-stage


进入sparc_exu()->sparc_exu_ecl()

然后ifu_exu_kill_e分别进入

sparc_exu_eclccr()
sparc_exu_ecl_wb()
sparc_exu_eclbyplog_rs1()
sparc_exu_eclbyplog byplog_rs2()
sparc_exu_eclbyplog byplog_rs3()
sparc_exu_eclbyplog byplog_rs3h()


寄存器bypassing 逻辑都需要用到这个信号,应该基本上所有更新寄存器的地方都要用这个信号。

看主要的sparc_exu_ecl_wb() Writeback control logic

//  Module Name: sparc_exu_ecl_wb
//      Description:  Implements the writeback logic for the exu.
//              This includes the control signals for the w1 and w2 input
//      muxes as well as keeping track of the wen signal for ALU ops.

keeping track of the wen signal for ALU ops. 应该就是说的这个ifu_exu_kill_e了。


   assign wen_w_inst_vld = valid_w | inst_vld_noflush_wen_w;
   assign ecl_irf_wen_w = ifu_exu_inst_vld_w & wen_w_inst_vld | wen_no_inst_vld_w;

   // bypass valid logic and flops
   dff_s dff_wb_d2e(.din(ifu_exu_wen_d), .clk(clk), .q(wb_e), .se(se),
                  .si(), .so());
   dff_s dff_wb_e2m(.din(valid_e), .clk(clk), .q(wb_m), .se(se),
                  .si(), .so());
   dffr_s dff_wb_m2w(.din(valid_m), .clk(clk), .q(wb_w), .se(se),
                  .si(), .so(), .rst(reset));
   assign  valid_e = wb_e & ~ifu_exu_kill_e & ~restore_e & ~wrsr_e;// restore doesn't finish on time
   assign  bypass_m = wb_m;// bypass doesn't need to check for traps or sehold
   assign  valid_m = bypass_m & ~rml_ecl_kill_m & ~sehold;// sehold turns off writes from this path
   assign  valid_w = (wb_w & ~early_flush_w & ~ifu_tlu_flush_w);// check inst_vld later
   // don't check flush for bypass
   assign  bypass_w = wb_w | inst_vld_noflush_wen_w | wen_no_inst_vld_w;

最终ifu_exu_kill_e这个信号混合其它信号,再经过几个流水线级,最终影响这个ecl_irf_wen_w。这个信号output出sparc_exu_ecl(),进入bw_r_irf irf(),也就是整数register file。


//  Module Name: bw_r_irf
//      Description: Register file with 3 read ports and 2 write ports.  Has
//                              32 registers per thread with 4 threads.  Reading and writing
//                              the same register concurrently produces x.



module bw_r_irf (/*AUTOARG*/
   // Outputs
   so, irf_byp_rs1_data_d_l, irf_byp_rs2_data_d_l,
   irf_byp_rs3_data_d_l, irf_byp_rs3h_data_d_l,
   // Inputs
   rclk, reset_l, si, se, sehold, rst_tri_en, ifu_exu_tid_s2,
   ifu_exu_rs1_s, ifu_exu_rs2_s, ifu_exu_rs3_s, ifu_exu_ren1_s,
   ifu_exu_ren2_s, ifu_exu_ren3_s, ecl_irf_wen_w, ecl_irf_wen_w2,
   ecl_irf_rd_m, ecl_irf_rd_g, byp_irf_rd_data_w, byp_irf_rd_data_w2,
   ecl_irf_tid_m, ecl_irf_tid_g, rml_irf_old_lo_cwp_e,
   rml_irf_new_lo_cwp_e, rml_irf_old_e_cwp_e, rml_irf_new_e_cwp_e,
   rml_irf_swap_even_e, rml_irf_swap_odd_e, rml_irf_swap_local_e,
   rml_irf_kill_restore_w, rml_irf_cwpswap_tid_e, rml_irf_old_agp,
   rml_irf_new_agp, rml_irf_swap_global, rml_irf_global_tid
   ) ;


这个register file有点复杂,参数也有点多。。。但可以看出如果这个wen (ecl_irf_wen_w)没有的话,是不会写register file的。

 

但为什么用这种方式,而不是用pipeline bubble呢? 还没搞清楚。。。

但似乎ifu_exu_kill_e这个信号并没有到lsu,比如:

ld [%L1],%L2 
ld invalid address (触发异常)

遇到异常或中断,要刷流水线,如果是用opensparc t1的这种方式,那么前面这条指令是会被送到lsu,只是最后回写结果的时候没有写进寄存器。但这部操作已经会触发cache相关的操作了吧。