OpenJDK 17的C1和C2编译器实现中,方法返回前插入安全点(Safepoint Poll)的机制

OpenJDK 17 JIT编译器堆栈分析-CSDN博客

在OpenJDK 17的C1和C2编译器实现中,方法返回前插入安全点(Safepoint Poll)的机制主要涉及以下关键步骤,结合源代码进行分析:

1. 安全点轮询桩(Safepoint Poll Stub)的生成

  • C1编译器

    通过C1SafepointPollStub::emit_code()生成桩代码:

    cpp

    复制代码
    void C1SafepointPollStub::emit_code(LIR_Assembler* ce) {
      __ bind(_entry);
      InternalAddress safepoint_pc(ce->masm()->pc() - ce->masm()->offset() + safepoint_offset());
      // ... 保存安全点PC到线程对象
      address stub = SharedRuntime::polling_page_return_handler_blob()->entry_point();
      __ jump(RuntimeAddress(stub)); // 跳转到VM的安全点处理例程
    }
    • 计算安全点PC地址(safepoint_pc)并保存到线程对象的saved_exception_pc_offset位置。

    • 跳转到VM预生成的polling_page_return_handler_blob入口点处理安全点。

  • C2编译器

    通过C2SafepointPollStubTable::emit_stub_impl()生成桩代码:

    cpp

    复制代码
    void C2SafepointPollStubTable::emit_stub_impl(...) {
      __ bind(entry->_stub_label);
      InternalAddress safepoint_pc(...); // 计算安全点PC
      // ... 保存PC到线程对象
      __ jump(callback_addr); // 跳转到VM的安全点处理例程
    }
    • 逻辑与C1类似,但通过C2SafepointPollStubTable管理多个桩代码。

2. 在方法返回前插入桩调用

  • 代码生成阶段PhaseOutput::fill_buffer()):

    编译器遍历基本块(Block)生成机器码时,在返回指令(Return)前插入安全点检查:

    cpp

    复制代码
    for (uint i = 0; i < nblocks; i++) {
      Block* block = C->cfg()->get_block(i);
      for (uint j = 0; j < last_inst; j++) {
        Node* n = block->get_node(j);
        if (n->is_MachSafePoint()) { // 安全点节点(含返回前检查)
          non_safepoints.observe_safepoint(...); // 记录安全点位置
          Process_OopMap_Node(...);             // 生成OopMap
        }
        // ... 正常发射指令
      }
    }
    safepoint_poll_table()->emit(*cb); // 发射安全点桩代码
    • 当遇到MachSafePoint节点(代表安全点,包括方法返回前的检查点)时:

      • 记录安全点的位置(non_safepoints.observe_safepoint())。

      • 生成OopMap(用于GC时定位寄存器/栈帧中的对象)。

    • 最终通过safepoint_poll_table()->emit()将桩代码写入代码缓冲区。

3. 桩代码与返回指令的关联

  • 在方法返回指令(如ret之前 ,编译器生成一条跳转指令 到桩代码标签(_stub_label):

    cpp

    复制代码
    // 在fill_buffer()中处理分支指令时:
    else if (mach->is_MachBranch()) {
      mach->as_MachBranch()->label_set(&blk_labels[target_block], target_block);
    }
  • 桩代码标签由C2SafepointPollStubTable管理,每个桩对应一个唯一的标签和偏移量。

4. 运行时处理

  • 当线程执行到跳转指令时,会跳转到桩代码:

    • 桩代码保存当前PC(用于安全点恢复)。

    • 跳转到VM的公共处理例程polling_page_return_handler_blob

  • VM例程检查安全点状态:

    • 无需安全点:恢复执行,继续方法返回。

    • 需要安全点:挂起线程,等待所有线程到达安全点后再恢复。

关键设计要点

  1. 共享桩代码

    C2使用C2SafepointPollStubTable管理多个桩,避免重复代码。每个返回点共享相同的处理逻辑,仅PC地址不同。

  2. 延迟绑定

    桩代码地址在最终生成机器码时确定(PhaseOutput::fill_buffer()末尾的safepoint_poll_table()->emit())。

  3. 平台适配

    • #ifdef _LP64区分64/32位实现:64位直接用r15_thread寄存器,32位需临时保存rcx/rdx

    • 使用rscratch1(平台无关的临时寄存器)保证可移植性。

示例流程

假设方法返回前的机器码逻辑:

text

复制代码
0x01: ...          // 方法正常代码
0x0F: jmp <stub>   // 编译器插入的跳转(到桩代码)
0x10: ret           // 返回指令(跳转后执行)

桩代码执行:

asm

复制代码
stub:
  lea rscratch1, [safepoint_pc]  // 计算安全点PC
  mov [r15_thread + saved_exception_pc], rscratch1 // 保存PC
  jmp VM_handler                 // 跳转到VM公共处理例程

这样确保线程在返回前必经过安全点检查,且能准确报告PC位置供VM处理。


总结:C1/C2通过在方法返回指令前插入跳转到共享桩代码的指令,由桩代码触发VM的安全点检查机制。这既减少了代码膨胀,又保证了安全点的精确性。

##源码

cpp 复制代码
void C1SafepointPollStub::emit_code(LIR_Assembler* ce) {
  __ bind(_entry);
  InternalAddress safepoint_pc(ce->masm()->pc() - ce->masm()->offset() + safepoint_offset());
#ifdef _LP64
  __ lea(rscratch1, safepoint_pc);
  __ movptr(Address(r15_thread, JavaThread::saved_exception_pc_offset()), rscratch1);
#else
  const Register tmp1 = rcx;
  const Register tmp2 = rdx;
  __ push(tmp1);
  __ push(tmp2);

  __ lea(tmp1, safepoint_pc);
  __ get_thread(tmp2);
  __ movptr(Address(tmp2, JavaThread::saved_exception_pc_offset()), tmp1);

  __ pop(tmp2);
  __ pop(tmp1);
#endif /* _LP64 */
  assert(SharedRuntime::polling_page_return_handler_blob() != NULL,
         "polling page return stub not created yet");

  address stub = SharedRuntime::polling_page_return_handler_blob()->entry_point();
  __ jump(RuntimeAddress(stub));
}



#define __ masm.
void C2SafepointPollStubTable::emit_stub_impl(MacroAssembler& masm, C2SafepointPollStub* entry) const {
  assert(SharedRuntime::polling_page_return_handler_blob() != NULL,
         "polling page return stub not created yet");
  address stub = SharedRuntime::polling_page_return_handler_blob()->entry_point();

  RuntimeAddress callback_addr(stub);

  __ bind(entry->_stub_label);
  InternalAddress safepoint_pc(masm.pc() - masm.offset() + entry->_safepoint_offset);
#ifdef _LP64
  __ lea(rscratch1, safepoint_pc);
  __ movptr(Address(r15_thread, JavaThread::saved_exception_pc_offset()), rscratch1);
#else
  const Register tmp1 = rcx;
  const Register tmp2 = rdx;
  __ push(tmp1);
  __ push(tmp2);

  __ lea(tmp1, safepoint_pc);
  __ get_thread(tmp2);
  __ movptr(Address(tmp2, JavaThread::saved_exception_pc_offset()), tmp1);

  __ pop(tmp2);
  __ pop(tmp1);
#endif
  __ jump(callback_addr);
}
#undef __


 void emit_stub_impl(MacroAssembler& masm, C2SafepointPollStub* entry) const;

  // The selection logic below relieves the need to add dummy files to unsupported platforms.
  template <bool enabled>
  typename EnableIf<enabled>::type
  select_emit_stub(MacroAssembler& masm, C2SafepointPollStub* entry) const {
    emit_stub_impl(masm, entry);
  }
  
void emit_stub(MacroAssembler& masm, C2SafepointPollStub* entry) const {
    select_emit_stub<VM_Version::supports_stack_watermark_barrier()>(masm, entry);
  }
  
  
void C2SafepointPollStubTable::emit(CodeBuffer& cb) {
  MacroAssembler masm(&cb);
  for (int i = _safepoints.length() - 1; i >= 0; i--) {
    // Make sure there is enough space in the code buffer
    if (cb.insts()->maybe_expand_to_ensure_remaining(PhaseOutput::MAX_inst_size) && cb.blob() == NULL) {
      ciEnv::current()->record_failure("CodeCache is full");
      return;
    }

    C2SafepointPollStub* entry = _safepoints.at(i);
    emit_stub(masm, entry);
  }
}


//------------------------------fill_buffer------------------------------------
void PhaseOutput::fill_buffer(CodeBuffer* cb, uint* blk_starts) {
  // blk_starts[] contains offsets calculated during short branches processing,
  // offsets should not be increased during following steps.

  // Compute the size of first NumberOfLoopInstrToAlign instructions at head
  // of a loop. It is used to determine the padding for loop alignment.
  Compile::TracePhase tp("fill buffer", &timers[_t_fillBuffer]);

  compute_loop_first_inst_sizes();

  // Create oopmap set.
  _oop_map_set = new OopMapSet();

  // !!!!! This preserves old handling of oopmaps for now
  C->debug_info()->set_oopmaps(_oop_map_set);

  uint nblocks  = C->cfg()->number_of_blocks();
  // Count and start of implicit null check instructions
  uint inct_cnt = 0;
  uint* inct_starts = NEW_RESOURCE_ARRAY(uint, nblocks+1);

  // Count and start of calls
  uint* call_returns = NEW_RESOURCE_ARRAY(uint, nblocks+1);

  uint  return_offset = 0;
  int nop_size = (new MachNopNode())->size(C->regalloc());

  int previous_offset = 0;
  int current_offset  = 0;
  int last_call_offset = -1;
  int last_avoid_back_to_back_offset = -1;
#ifdef ASSERT
  uint* jmp_target = NEW_RESOURCE_ARRAY(uint,nblocks);
  uint* jmp_offset = NEW_RESOURCE_ARRAY(uint,nblocks);
  uint* jmp_size   = NEW_RESOURCE_ARRAY(uint,nblocks);
  uint* jmp_rule   = NEW_RESOURCE_ARRAY(uint,nblocks);
#endif

  // Create an array of unused labels, one for each basic block, if printing is enabled
#if defined(SUPPORT_OPTO_ASSEMBLY)
  int* node_offsets      = NULL;
  uint node_offset_limit = C->unique();

  if (C->print_assembly()) {
    node_offsets = NEW_RESOURCE_ARRAY(int, node_offset_limit);
  }
  if (node_offsets != NULL) {
    // We need to initialize. Unused array elements may contain garbage and mess up PrintOptoAssembly.
    memset(node_offsets, 0, node_offset_limit*sizeof(int));
  }
#endif

  NonSafepointEmitter non_safepoints(C);  // emit non-safepoints lazily

  // Emit the constant table.
  if (C->has_mach_constant_base_node()) {
    if (!constant_table().emit(*cb)) {
      C->record_failure("consts section overflow");
      return;
    }
  }

  // Create an array of labels, one for each basic block
  Label* blk_labels = NEW_RESOURCE_ARRAY(Label, nblocks+1);
  for (uint i = 0; i <= nblocks; i++) {
    blk_labels[i].init();
  }

  // Now fill in the code buffer
  Node* delay_slot = NULL;
  for (uint i = 0; i < nblocks; i++) {
    Block* block = C->cfg()->get_block(i);
    _block = block;
    Node* head = block->head();

    // If this block needs to start aligned (i.e, can be reached other
    // than by falling-thru from the previous block), then force the
    // start of a new bundle.
    if (Pipeline::requires_bundling() && starts_bundle(head)) {
      cb->flush_bundle(true);
    }

#ifdef ASSERT
    if (!block->is_connector()) {
      stringStream st;
      block->dump_head(C->cfg(), &st);
      MacroAssembler(cb).block_comment(st.as_string());
    }
    jmp_target[i] = 0;
    jmp_offset[i] = 0;
    jmp_size[i]   = 0;
    jmp_rule[i]   = 0;
#endif
    int blk_offset = current_offset;

    // Define the label at the beginning of the basic block
    MacroAssembler(cb).bind(blk_labels[block->_pre_order]);

    uint last_inst = block->number_of_nodes();

    // Emit block normally, except for last instruction.
    // Emit means "dump code bits into code buffer".
    for (uint j = 0; j<last_inst; j++) {
      _index = j;

      // Get the node
      Node* n = block->get_node(j);

      // See if delay slots are supported
      if (valid_bundle_info(n) && node_bundling(n)->used_in_unconditional_delay()) {
        assert(delay_slot == NULL, "no use of delay slot node");
        assert(n->size(C->regalloc()) == Pipeline::instr_unit_size(), "delay slot instruction wrong size");

        delay_slot = n;
        continue;
      }

      // If this starts a new instruction group, then flush the current one
      // (but allow split bundles)
      if (Pipeline::requires_bundling() && starts_bundle(n))
        cb->flush_bundle(false);

      // Special handling for SafePoint/Call Nodes
      bool is_mcall = false;
      if (n->is_Mach()) {
        MachNode *mach = n->as_Mach();
        is_mcall = n->is_MachCall();
        bool is_sfn = n->is_MachSafePoint();

        // If this requires all previous instructions be flushed, then do so
        if (is_sfn || is_mcall || mach->alignment_required() != 1) {
          cb->flush_bundle(true);
          current_offset = cb->insts_size();
        }

        // A padding may be needed again since a previous instruction
        // could be moved to delay slot.

        // align the instruction if necessary
        int padding = mach->compute_padding(current_offset);
        // Make sure safepoint node for polling is distinct from a call's
        // return by adding a nop if needed.
        if (is_sfn && !is_mcall && padding == 0 && current_offset == last_call_offset) {
          padding = nop_size;
        }
        if (padding == 0 && mach->avoid_back_to_back(MachNode::AVOID_BEFORE) &&
            current_offset == last_avoid_back_to_back_offset) {
          // Avoid back to back some instructions.
          padding = nop_size;
        }

        if (padding > 0) {
          assert((padding % nop_size) == 0, "padding is not a multiple of NOP size");
          int nops_cnt = padding / nop_size;
          MachNode *nop = new MachNopNode(nops_cnt);
          block->insert_node(nop, j++);
          last_inst++;
          C->cfg()->map_node_to_block(nop, block);
          // Ensure enough space.
          cb->insts()->maybe_expand_to_ensure_remaining(MAX_inst_size);
          if ((cb->blob() == NULL) || (!CompileBroker::should_compile_new_jobs())) {
            C->record_failure("CodeCache is full");
            return;
          }
          nop->emit(*cb, C->regalloc());
          cb->flush_bundle(true);
          current_offset = cb->insts_size();
        }

        bool observe_safepoint = is_sfn;
        // Remember the start of the last call in a basic block
        if (is_mcall) {
          MachCallNode *mcall = mach->as_MachCall();

          // This destination address is NOT PC-relative
          mcall->method_set((intptr_t)mcall->entry_point());

          // Save the return address
          call_returns[block->_pre_order] = current_offset + mcall->ret_addr_offset();

          observe_safepoint = mcall->guaranteed_safepoint();
        }

        // sfn will be valid whenever mcall is valid now because of inheritance
        if (observe_safepoint) {
          // Handle special safepoint nodes for synchronization
          if (!is_mcall) {
            MachSafePointNode *sfn = mach->as_MachSafePoint();
            // !!!!! Stubs only need an oopmap right now, so bail out
            if (sfn->jvms()->method() == NULL) {
              // Write the oopmap directly to the code blob??!!
              continue;
            }
          } // End synchronization

          non_safepoints.observe_safepoint(mach->as_MachSafePoint()->jvms(),
                                           current_offset);
          Process_OopMap_Node(mach, current_offset);
        } // End if safepoint

          // If this is a null check, then add the start of the previous instruction to the list
        else if( mach->is_MachNullCheck() ) {
          inct_starts[inct_cnt++] = previous_offset;
        }

          // If this is a branch, then fill in the label with the target BB's label
        else if (mach->is_MachBranch()) {
          // This requires the TRUE branch target be in succs[0]
          uint block_num = block->non_connector_successor(0)->_pre_order;

          // Try to replace long branch if delay slot is not used,
          // it is mostly for back branches since forward branch's
          // distance is not updated yet.
          bool delay_slot_is_used = valid_bundle_info(n) &&
                                    C->output()->node_bundling(n)->use_unconditional_delay();
          if (!delay_slot_is_used && mach->may_be_short_branch()) {
            assert(delay_slot == NULL, "not expecting delay slot node");
            int br_size = n->size(C->regalloc());
            int offset = blk_starts[block_num] - current_offset;
            if (block_num >= i) {
              // Current and following block's offset are not
              // finalized yet, adjust distance by the difference
              // between calculated and final offsets of current block.
              offset -= (blk_starts[i] - blk_offset);
            }
            // In the following code a nop could be inserted before
            // the branch which will increase the backward distance.
            bool needs_padding = (current_offset == last_avoid_back_to_back_offset);
            if (needs_padding && offset <= 0)
              offset -= nop_size;

            if (C->matcher()->is_short_branch_offset(mach->rule(), br_size, offset)) {
              // We've got a winner.  Replace this branch.
              MachNode* replacement = mach->as_MachBranch()->short_branch_version();

              // Update the jmp_size.
              int new_size = replacement->size(C->regalloc());
              assert((br_size - new_size) >= (int)nop_size, "short_branch size should be smaller");
              // Insert padding between avoid_back_to_back branches.
              if (needs_padding && replacement->avoid_back_to_back(MachNode::AVOID_BEFORE)) {
                MachNode *nop = new MachNopNode();
                block->insert_node(nop, j++);
                C->cfg()->map_node_to_block(nop, block);
                last_inst++;
                nop->emit(*cb, C->regalloc());
                cb->flush_bundle(true);
                current_offset = cb->insts_size();
              }
#ifdef ASSERT
              jmp_target[i] = block_num;
              jmp_offset[i] = current_offset - blk_offset;
              jmp_size[i]   = new_size;
              jmp_rule[i]   = mach->rule();
#endif
              block->map_node(replacement, j);
              mach->subsume_by(replacement, C);
              n    = replacement;
              mach = replacement;
            }
          }
          mach->as_MachBranch()->label_set( &blk_labels[block_num], block_num );
        } else if (mach->ideal_Opcode() == Op_Jump) {
          for (uint h = 0; h < block->_num_succs; h++) {
            Block* succs_block = block->_succs[h];
            for (uint j = 1; j < succs_block->num_preds(); j++) {
              Node* jpn = succs_block->pred(j);
              if (jpn->is_JumpProj() && jpn->in(0) == mach) {
                uint block_num = succs_block->non_connector()->_pre_order;
                Label *blkLabel = &blk_labels[block_num];
                mach->add_case_label(jpn->as_JumpProj()->proj_no(), blkLabel);
              }
            }
          }
        }
#ifdef ASSERT
          // Check that oop-store precedes the card-mark
        else if (mach->ideal_Opcode() == Op_StoreCM) {
          uint storeCM_idx = j;
          int count = 0;
          for (uint prec = mach->req(); prec < mach->len(); prec++) {
            Node *oop_store = mach->in(prec);  // Precedence edge
            if (oop_store == NULL) continue;
            count++;
            uint i4;
            for (i4 = 0; i4 < last_inst; ++i4) {
              if (block->get_node(i4) == oop_store) {
                break;
              }
            }
            // Note: This test can provide a false failure if other precedence
            // edges have been added to the storeCMNode.
            assert(i4 == last_inst || i4 < storeCM_idx, "CM card-mark executes before oop-store");
          }
          assert(count > 0, "storeCM expects at least one precedence edge");
        }
#endif
        else if (!n->is_Proj()) {
          // Remember the beginning of the previous instruction, in case
          // it's followed by a flag-kill and a null-check.  Happens on
          // Intel all the time, with add-to-memory kind of opcodes.
          previous_offset = current_offset;
        }

        // Not an else-if!
        // If this is a trap based cmp then add its offset to the list.
        if (mach->is_TrapBasedCheckNode()) {
          inct_starts[inct_cnt++] = current_offset;
        }
      }

      // Verify that there is sufficient space remaining
      cb->insts()->maybe_expand_to_ensure_remaining(MAX_inst_size);
      if ((cb->blob() == NULL) || (!CompileBroker::should_compile_new_jobs())) {
        C->record_failure("CodeCache is full");
        return;
      }

      // Save the offset for the listing
#if defined(SUPPORT_OPTO_ASSEMBLY)
      if ((node_offsets != NULL) && (n->_idx < node_offset_limit)) {
        node_offsets[n->_idx] = cb->insts_size();
      }
#endif
      assert(!C->failing(), "Should not reach here if failing.");

      // "Normal" instruction case
      DEBUG_ONLY(uint instr_offset = cb->insts_size());
      n->emit(*cb, C->regalloc());
      current_offset = cb->insts_size();

      // Above we only verified that there is enough space in the instruction section.
      // However, the instruction may emit stubs that cause code buffer expansion.
      // Bail out here if expansion failed due to a lack of code cache space.
      if (C->failing()) {
        return;
      }

      assert(!is_mcall || (call_returns[block->_pre_order] <= (uint)current_offset),
             "ret_addr_offset() not within emitted code");

#ifdef ASSERT
      uint n_size = n->size(C->regalloc());
      if (n_size < (current_offset-instr_offset)) {
        MachNode* mach = n->as_Mach();
        n->dump();
        mach->dump_format(C->regalloc(), tty);
        tty->print_cr(" n_size (%d), current_offset (%d), instr_offset (%d)", n_size, current_offset, instr_offset);
        Disassembler::decode(cb->insts_begin() + instr_offset, cb->insts_begin() + current_offset + 1, tty);
        tty->print_cr(" ------------------- ");
        BufferBlob* blob = this->scratch_buffer_blob();
        address blob_begin = blob->content_begin();
        Disassembler::decode(blob_begin, blob_begin + n_size + 1, tty);
        assert(false, "wrong size of mach node");
      }
#endif
      non_safepoints.observe_instruction(n, current_offset);

      // mcall is last "call" that can be a safepoint
      // record it so we can see if a poll will directly follow it
      // in which case we'll need a pad to make the PcDesc sites unique
      // see  5010568. This can be slightly inaccurate but conservative
      // in the case that return address is not actually at current_offset.
      // This is a small price to pay.

      if (is_mcall) {
        last_call_offset = current_offset;
      }

      if (n->is_Mach() && n->as_Mach()->avoid_back_to_back(MachNode::AVOID_AFTER)) {
        // Avoid back to back some instructions.
        last_avoid_back_to_back_offset = current_offset;
      }

      // See if this instruction has a delay slot
      if (valid_bundle_info(n) && node_bundling(n)->use_unconditional_delay()) {
        guarantee(delay_slot != NULL, "expecting delay slot node");

        // Back up 1 instruction
        cb->set_insts_end(cb->insts_end() - Pipeline::instr_unit_size());

        // Save the offset for the listing
#if defined(SUPPORT_OPTO_ASSEMBLY)
        if ((node_offsets != NULL) && (delay_slot->_idx < node_offset_limit)) {
          node_offsets[delay_slot->_idx] = cb->insts_size();
        }
#endif

        // Support a SafePoint in the delay slot
        if (delay_slot->is_MachSafePoint()) {
          MachNode *mach = delay_slot->as_Mach();
          // !!!!! Stubs only need an oopmap right now, so bail out
          if (!mach->is_MachCall() && mach->as_MachSafePoint()->jvms()->method() == NULL) {
            // Write the oopmap directly to the code blob??!!
            delay_slot = NULL;
            continue;
          }

          int adjusted_offset = current_offset - Pipeline::instr_unit_size();
          non_safepoints.observe_safepoint(mach->as_MachSafePoint()->jvms(),
                                           adjusted_offset);
          // Generate an OopMap entry
          Process_OopMap_Node(mach, adjusted_offset);
        }

        // Insert the delay slot instruction
        delay_slot->emit(*cb, C->regalloc());

        // Don't reuse it
        delay_slot = NULL;
      }

    } // End for all instructions in block

    // If the next block is the top of a loop, pad this block out to align
    // the loop top a little. Helps prevent pipe stalls at loop back branches.
    if (i < nblocks-1) {
      Block *nb = C->cfg()->get_block(i + 1);
      int padding = nb->alignment_padding(current_offset);
      if( padding > 0 ) {
        MachNode *nop = new MachNopNode(padding / nop_size);
        block->insert_node(nop, block->number_of_nodes());
        C->cfg()->map_node_to_block(nop, block);
        nop->emit(*cb, C->regalloc());
        current_offset = cb->insts_size();
      }
    }
    // Verify that the distance for generated before forward
    // short branches is still valid.
    guarantee((int)(blk_starts[i+1] - blk_starts[i]) >= (current_offset - blk_offset), "shouldn't increase block size");

    // Save new block start offset
    blk_starts[i] = blk_offset;
  } // End of for all blocks
  blk_starts[nblocks] = current_offset;

  non_safepoints.flush_at_end();

  // Offset too large?
  if (C->failing())  return;

  // Define a pseudo-label at the end of the code
  MacroAssembler(cb).bind( blk_labels[nblocks] );

  // Compute the size of the first block
  _first_block_size = blk_labels[1].loc_pos() - blk_labels[0].loc_pos();

#ifdef ASSERT
  for (uint i = 0; i < nblocks; i++) { // For all blocks
    if (jmp_target[i] != 0) {
      int br_size = jmp_size[i];
      int offset = blk_starts[jmp_target[i]]-(blk_starts[i] + jmp_offset[i]);
      if (!C->matcher()->is_short_branch_offset(jmp_rule[i], br_size, offset)) {
        tty->print_cr("target (%d) - jmp_offset(%d) = offset (%d), jump_size(%d), jmp_block B%d, target_block B%d", blk_starts[jmp_target[i]], blk_starts[i] + jmp_offset[i], offset, br_size, i, jmp_target[i]);
        assert(false, "Displacement too large for short jmp");
      }
    }
  }
#endif

  BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2();
  bs->emit_stubs(*cb);
  if (C->failing())  return;

  // Fill in stubs for calling the runtime from safepoint polls.
  safepoint_poll_table()->emit(*cb);
  if (C->failing())  return;

#ifndef PRODUCT
  // Information on the size of the method, without the extraneous code
  Scheduling::increment_method_size(cb->insts_size());
#endif

  // ------------------
  // Fill in exception table entries.
  FillExceptionTables(inct_cnt, call_returns, inct_starts, blk_labels);

  // Only java methods have exception handlers and deopt handlers
  // class HandlerImpl is platform-specific and defined in the *.ad files.
  if (C->method()) {
    // Emit the exception handler code.
    _code_offsets.set_value(CodeOffsets::Exceptions, HandlerImpl::emit_exception_handler(*cb));
    if (C->failing()) {
      return; // CodeBuffer::expand failed
    }
    // Emit the deopt handler code.
    _code_offsets.set_value(CodeOffsets::Deopt, HandlerImpl::emit_deopt_handler(*cb));

    // Emit the MethodHandle deopt handler code (if required).
    if (C->has_method_handle_invokes() && !C->failing()) {
      // We can use the same code as for the normal deopt handler, we
      // just need a different entry point address.
      _code_offsets.set_value(CodeOffsets::DeoptMH, HandlerImpl::emit_deopt_handler(*cb));
    }
  }

  // One last check for failed CodeBuffer::expand:
  if ((cb->blob() == NULL) || (!CompileBroker::should_compile_new_jobs())) {
    C->record_failure("CodeCache is full");
    return;
  }

#if defined(SUPPORT_ABSTRACT_ASSEMBLY) || defined(SUPPORT_ASSEMBLY) || defined(SUPPORT_OPTO_ASSEMBLY)
  if (C->print_assembly()) {
    tty->cr();
    tty->print_cr("============================= C2-compiled nmethod ==============================");
  }
#endif

#if defined(SUPPORT_OPTO_ASSEMBLY)
  // Dump the assembly code, including basic-block numbers
  if (C->print_assembly()) {
    ttyLocker ttyl;  // keep the following output all in one block
    if (!VMThread::should_terminate()) {  // test this under the tty lock
      // This output goes directly to the tty, not the compiler log.
      // To enable tools to match it up with the compilation activity,
      // be sure to tag this tty output with the compile ID.
      if (xtty != NULL) {
        xtty->head("opto_assembly compile_id='%d'%s", C->compile_id(),
                   C->is_osr_compilation() ? " compile_kind='osr'" : "");
      }
      if (C->method() != NULL) {
        tty->print_cr("----------------------- MetaData before Compile_id = %d ------------------------", C->compile_id());
        C->method()->print_metadata();
      } else if (C->stub_name() != NULL) {
        tty->print_cr("----------------------------- RuntimeStub %s -------------------------------", C->stub_name());
      }
      tty->cr();
      tty->print_cr("------------------------ OptoAssembly for Compile_id = %d -----------------------", C->compile_id());
      dump_asm(node_offsets, node_offset_limit);
      tty->print_cr("--------------------------------------------------------------------------------");
      if (xtty != NULL) {
        // print_metadata and dump_asm above may safepoint which makes us loose the ttylock.
        // Retake lock too make sure the end tag is coherent, and that xmlStream->pop_tag is done
        // thread safe
        ttyLocker ttyl2;
        xtty->tail("opto_assembly");
      }
    }
  }
#endif
}