linux 中断管理机制

中断的概念

中断是指在CPU正常运行期间,由于内外部事件或由程序预先安排的事件引起的 CPU 暂时停止正在运行的程序,转而为该内部或外部事件或预先安排的事件服务的程序中去,服务完毕后再返回去继续运行被暂时中断的程序。Linux中通常分为外部中断(又叫硬件中断)和内部中断(又叫异常)。

软件对硬件进行配置后,软件期望等待硬件的某种状态(比如,收到了数据),这里有两种方式,一种是轮询(polling): CPU 不断的去读硬件状态。另一种是当硬件完成某种事件后,给 CPU 一个中断,让 CPU 停下手上的事情,去处理这个中断。很显然,中断的交互方式提高了系统的吞吐。

当 CPU 收到一个中断 (IRQ)的时候,会去执行该中断对应的处理函数(ISR)。普通情况下,会有一个中断向量表,向量表中定义了 CPU 对应的每一个外设资源的中断处理程序的入口,当发生对应的中断的时候, CPU 直接跳转到这个入口执行程序。也就是中断上下文。(注意:中断上下文中,不可阻塞睡眠)。

Linux 中断 top/bottom

玩过 MCU 的人都知道,中断服务程序的设计最好是快速完成任务并退出,因为此刻系统处于被中断中。但是在 ISR 中又有一些必须完成的事情,比如:清中断标志,读/写数据,寄存器操作等。

在 Linux 中,同样也是这个要求,希望尽快的完成 ISR。但事与愿违,有些 ISR 中任务繁重,会消耗很多时间,导致响应速度变差。Linux 中针对这种情况,将中断分为了两部分:

  1. 上半部(top half):收到一个中断,立即执行,有严格的时间限制,只做一些必要的工作,比如:应答,复位等。这些工作都是在所有中断被禁止的情况下完成的。

  2. 底半部(bottom half):能够被推迟到后面完成的任务会在底半部进行。在适合的时机,下半部会被开中断执行。(具体的机制在接下来章节分析(软中断、tasklet、工作队列))。

中断处理程序

驱动程序可以使用接口:

static inline int __must_check request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,const char *name, void *dev)

像系统申请注册一个中断处理程序。其中的参数:

参数 含义

irq 表了该中断的中断号,一般 CPU 的中断号都会事先定义好。

handler 中断发生后的 ISR

flags 中断标志( IRQF_DISABLED / IRQFSAMPLE_RANDOM / IRQF_TIMER / IRQF_SHARED)

name 中断相关的设备 ASCII 文本,例如 "keyboard",这些名字会在 /proc/irq 和 /proc/interrupts 文件使用

dev 用于共享中断线,传递驱动程序的设备结构。非共享类型的中断,直接设置成为 NULL

中断标志 flag 的含义:

标志 含义

IRQF_DISABLED 设置这个标志的话,意味着内核在处理这个 ISR 期间,要禁止其他中断(多数情况不使用这个)

IRQFSAMPLE_RANDOM 表明这个设备产生的中断对内核熵池有贡献

IRQF_TIMER 为系统定时器准备的标志

IRQF_SHARED 表明多个中断处理程序之间共享中断线。同一个给定的线上注册每个处理程序,必须设置这个

调用 request_irq 成功执行返回 0。常见错误是 -EBUSY,表示给定的中断线已经在使用(或者没有指定 IRQF_SHARED)

注意:request_irq 函数可能引起睡眠,所以不允许在中断上下文或者不允许睡眠的代码中调用。

释放中断:

const void *free_irq(unsigned int irq, void *dev_id) //用于释放中断处理函数。

注意:Linux 中的中断处理程序是无须重入的。当给定的中断处理程序正在执行的时候,其中断线在所有的处理器上都会被屏蔽掉,以防在同一个中断线上又接收到另一个新的中断。通常情况下,除了该中断的其他中断都是打开的,也就是说其他的中断线上的重点都能够被处理,但是当前的中断线总是被禁止的,故,同一个中断处理程序是绝对不会被自己嵌套的,另外ARM上也不支持中断优先级,也就是没有使用FIQ,因此ARM不支持中断嵌套。

中断上下文

与进程上下文不一样,内核执行中断服务程序的时候,处于中断上下文。中断处理程序并没有自己的独立的栈,而是使用了内核栈,其大小一般是有限制的(32bit 机器 8KB)。所以其必须短小精悍。同时中断服务程序是打断了正常的程序流程,这一点上也必须保证快速的执行。同时中断上下文中是不允许睡眠,阻塞的。

中断上下文不能睡眠的原因是:

1、 中断处理的时候,不应该发生进程切换,因为在中断context中,唯一能打断当前中断handler的只有更高优先级的中断,它不会被进程打断,如果在中断context中休眠,则没有办法唤醒它,因为所有的wake_up_xxx都是针对某个进程而言的,而在中断context中,没有进程的概念,没有一个task_struct(这点对于softirq和tasklet一样),因此真的休眠了,比如调用了会导致block的例程,内核几乎肯定会死。

2、schedule()在切换进程时,保存当前的进程上下文(CPU寄存器的值、进程的状态以及堆栈中的内容),以便以后恢复此进程运行。中断发生后,内核会先保存当前被中断的进程上下文(在调用中断处理程序后恢复);但在中断处理程序里,CPU寄存器的值肯定已经变化了吧(最重要的程序计数器PC、堆栈SP等),如果此时因为睡眠或阻塞操作调用了schedule(),则保存的进程上下文就不是当前的进程context了.所以不可以在中断处理程序中调用schedule()。

3、内核中schedule()函数本身在进来的时候判断是否处于中断上下文:

if(unlikely(in_interrupt()))

BUG();

因此,强行调用schedule()的结果就是内核BUG。

4、中断handler会使用被中断的进程内核堆栈,但不会对它有任何影响,因为handler使用完后会完全清除它使用的那部分堆栈,恢复被中断前的原貌。

5、处于中断context时候,内核是不可抢占的。因此,如果休眠,则内核一定挂起

中断处理流程

发生中断时,CPU执行异常向量vector_irq的代码, 即异常向量表中的中断异常的代码,它是一个跳转指令,跳去执行真正的中断处理程序,在vector_irq里面,最终会调用中断处理的总入口函数。

对于 ARM64 处理器的异常级别 1、 2 和 3,每个异常级别都有自己的异常向量表,异常向量表的起始虚拟地址存放在寄存器 VBAR_ELn(向量基准地址寄存器, Vector Based Address Register)中。每个异常向量表有 16 项,分为 4 组,每组 4 项,每项的长度是 128 字节(可以存放32 条指令)。异常级别 n 的异常向量表所示。

异常级别 n 的异常向量表

地址 异常类型 说明

VBAR_ELn + 0x000 同步异常 当前异常级别生成的异常,使用异常

级别0的栈指针寄存器SP_EL0

  • 0x080 中断

  • 0x100 快速中断

  • 0x180 系统错误

  • 0x200 同步异常 当前异常级别生成的异常,使用当前

异常级别的栈指针寄存器SP_ELn

  • 0x280 中断

  • 0x300 快速中断

  • 0x380 系统错误

  • 0x400 同步异常 64位应用程序在异常级别( n-1)生

成的异常

  • 0x480 中断

  • 0x500 快速中断

  • 0x580 系统错误

  • 0x600 同步异常 32位应用程序在异常级别( n-1)生

成的异常

  • 0x680 中断

  • 0x700 快速中断

  • 0x780 系统错误

ARM64 架构内核定义的异常向量表如下:

这部分内容在《Linux应用层和内核交互》中系统调用章节讲过,这里只列出与中断有关的内容;

arch/arm64/kernel/entry.S:

/*

* Exception vectors.

*/

.pushsection ".entry.text", "ax"

.align 11

ENTRY(vectors)

kernel_ventry 1, sync_invalid //异常级别1生成的同步异常,使用栈指针寄存器SP_EL0

kernel_ventry 1, irq_invalid //异常级别1生成的中断,使用栈指针寄存器SP_EL0

kernel_ventry 1, fiq_invalid //异常级别1生成的快速中断,使用栈指针寄存器SP_EL0

kernel_ventry 1, error_invalid //异常级别1生成的系统错误,使用栈指针寄存器SP_EL0

kernel_ventry 1, sync //异常级别1生成的同步异常,使用栈指针寄存器SP_EL1

kernel_ventry 1, irq //异常级别1生成的中断,使用栈指针寄存器SP_EL1

kernel_ventry 1, fiq_invalid //异常级别1生成的快速中断,使用栈指针寄存器SP_EL1

kernel_ventry 1, error_invalid //异常级别1生成的系统错误,使用栈指针寄存器SP_EL1

kernel_ventry 0, sync //64位应用程序在异常级别0生成的同步异常

kernel_ventry 0, irq // 64位应用程序在异常级别0生成的中断

kernel_ventry 0, fiq_invalid // 64位应用程序在异常级别0生成的快速中断

kernel_ventry 0, error_invalid //64位应用程序在异常级别0生成的系统错误

#ifdef CONFIG_COMPAT

kernel_ventry 0, sync_compat, 32 //32位应用程序在异常级别0生成的同步异常

kernel_ventry 0, irq_compat, 32 // 32位应用程序在异常级别0生成的中断

kernel_ventry 0, fiq_invalid_compat, 32 // 32位应用程序在异常级别0生成的快速中断

kernel_ventry 0, error_invalid_compat, 32 // 32位应用程序在异常级别0生成的系统错误

#else

kernel_ventry 0, sync_invalid, 32 //32位应用程序在异常级别0生成的同步异常

kernel_ventry 0, irq_invalid, 32 // 32位应用程序在异常级别0生成的中断

kernel_ventry 0, fiq_invalid, 32 // 32位应用程序在异常级别0生成的快速中断

kernel_ventry 0, error_invalid, 32 // 32位应用程序在异常级别0生成的系统错误

#endif

END(vectors)

kernel_ventry是一个宏,参数是跳转标号,即异常处理程序的标号,宏的定义如下(/arch/arm64/kernel/entry.S):

.macro kernel_ventry, el, label, regsize = 64

.align 7

sub sp, sp, #S_FRAME_SIZE // 将sp预留一个fram_size, 这个size 就是struct pt_regs的大小

#ifdef CONFIG_VMAP_STACK

....这里省略掉检查栈溢出的代码

#endif

b el\()\el\()_\label // 跳转到对应级别的异常处理函数, kernel_entry 1, irq为el1_irq

.endm

" .align 7"表示把下一条指令的地址对齐到 2^7,即对齐到 128; 对于向量表vectors中的kernel_ventry 1, irq , 则 b el\()\el\()_\label跳转到el1_irq函数。 其中1表示的是从哪个异常模式产生的,比如是User->kernel就是0, kernel->kernel就是1.

每个CPU 在初始化是,都会设置中断向量地址。

arch/arm64/kernel/head.S

__primary_switched:

adrp x4, init_thread_union

add sp, x4, #THREAD_SIZE

adr_l x5, init_task

msr sp_el0, x5 // Save thread_info

adr_l x8, vectors // load VBAR_EL1 with virtual

msr vbar_el1, x8 // vector table address

isb

stp xzr, x30, [sp, #-16]!

mov x29, sp

str_l x21, __fdt_pointer, x5 // Save FDT pointer

ldr_l x4, kimage_vaddr // Save the offset between

sub x4, x4, x0 // the kernel virtual and

str_l x4, kimage_voffset, x5 // physical mappings

// Clear BSS

adr_l x0, __bss_start

mov x1, xzr

adr_l x2, __bss_stop

sub x2, x2, x0

bl __pi_memset

dsb ishst // Make zero page visible to PTW

#ifdef CONFIG_KASAN

bl kasan_early_init

#endif

#ifdef CONFIG_RANDOMIZE_BASE

tst x23, ~(MIN_KIMG_ALIGN - 1) // already running randomized?

b.ne 0f

mov x0, x21 // pass FDT address in x0

bl kaslr_early_init // parse FDT for KASLR options

cbz x0, 0f // KASLR disabled? just proceed

orr x23, x23, x0 // record KASLR offset

ldp x29, x30, [sp], #16 // we must enable KASLR, return

ret // to __primary_switch()

0:

#endif

add sp, sp, #16

mov x29, #0

mov x30, #0

b start_kernel

ENDPROC(__primary_switched)

__secondary_switched:

adr_l x5, vectors //设置中断向量地址

msr vbar_el1, x5

isb

adr_l x0, secondary_data

ldr x1, [x0, #CPU_BOOT_STACK] // get secondary_data.stack

mov sp, x1

ldr x2, [x0, #CPU_BOOT_TASK]

msr sp_el0, x2

mov x29, #0

mov x30, #0

b secondary_start_kernel

ENDPROC(__secondary_switched)

有中断产生时, GIC会向相应的CPU发出中断信号,CPU检测到中断信号,根据中断向量表,跳转到el1_irq。

arch/arm64/kernel/entry.S

el1_irq:

kernel_entry 1

enable_dbg

#ifdef CONFIG_TRACE_IRQFLAGS

bl trace_hardirqs_off

#endif

irq_handler

#ifdef CONFIG_PREEMPT

get_thread_info tsk

ldr w24, [tsk, #TI_PREEMPT] // get preempt count

cbnz w24, 1f // preempt count != 0

ldr x0, [tsk, #TI_FLAGS] // get flags

tbz x0, #TIF_NEED_RESCHED, 1f // needs rescheduling?

bl el1_preempt

1:

#endif

#ifdef CONFIG_TRACE_IRQFLAGS

bl trace_hardirqs_on

#endif

kernel_exit 1

ENDPROC(el1_irq)

/*

* Interrupt handling.

*/

.macro irq_handler

#ifdef CONFIG_STRICT_MEMORY_RWX

ldr x1, =handle_arch_irq

ldr x1, [x1]

#else

ldr x1, handle_arch_irq

#endif

mov x0, sp

blr x1

.endm

.text

arch/arm64/kernel/irq.c

void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))

{

if (handle_arch_irq)

return;

handle_arch_irq = handle_irq;

}

Gicv2中断控制器初始化时会调用set_handle_irq(gic_handle_irq);

dtb:

gic: interrupt-controller@1400000 {

compatible = "arm,gic-400";

#interrupt-cells = <3>;

interrupt-controller;

reg = <0x0 0x1401000 0 0x1000>, /* GICD */

<0x0 0x1402000 0 0x2000>, /* GICC */

<0x0 0x1404000 0 0x2000>, /* GICH */

<0x0 0x1406000 0 0x2000>; /* GICV */

interrupts = <1 9 0xf08>;

};

IRQCHIP_DECLARE(gic_400, "arm,gic-400", gic_of_init);

设置代码路径:gic_of_init()->__gic_init_bases()->set_handle_irq(gic_handle_irq);

static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)

{

u32 irqstat, irqnr;

struct gic_chip_data *gic = &gic_data[0];

void __iomem *cpu_base = gic_data_cpu_base(gic);

do {

irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK);

irqnr = irqstat & GICC_IAR_INT_ID_MASK;

if (likely(irqnr > 15 && irqnr < 1020)) {

if (static_key_true(&supports_deactivate))

writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);

isb();

handle_domain_irq(gic->domain, irqnr, regs); //调用相应的中断处理函数

continue;

}

if (irqnr < 16) {

writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);

if (static_key_true(&supports_deactivate))

writel_relaxed(irqstat, cpu_base + GIC_CPU_DEACTIVATE);

#ifdef CONFIG_SMP

/*

* Ensure any shared data written by the CPU sending

* the IPI is read after we've read the ACK register

* on the GIC.

*

* Pairs with the write barrier in gic_raise_softirq

*/

smp_rmb();

handle_IPI(irqnr, regs); //SMP 核间中断

#endif

continue;

}

break;

} while (1);

}

gic_handle_irq()->handle_domain_irq()->__handle_domain_irq()

static inline int handle_domain_irq(struct irq_domain *domain,

unsigned int hwirq, struct pt_regs *regs)

{

return __handle_domain_irq(domain, hwirq, true, regs);

}

/**

* __handle_domain_irq - Invoke the handler for a HW irq belonging to a domain

* @domain: The domain where to perform the lookup

* @hwirq: The HW irq number to convert to a logical one

* @lookup: Whether to perform the domain lookup or not

* @regs: Register file coming from the low-level handling code

*

* Returns: 0 on success, or -EINVAL if conversion has failed

*/

int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq,

bool lookup, struct pt_regs *regs)

{

struct pt_regs *old_regs = set_irq_regs(regs);

unsigned int irq = hwirq;

int ret = 0;

irq_enter();

#ifdef CONFIG_IRQ_DOMAIN

if (lookup)

irq = irq_find_mapping(domain, hwirq);

#endif

/*

* Some hardware gives randomly wrong interrupts. Rather

* than crashing, do something sensible.

*/

if (unlikely(!irq || irq >= nr_irqs)) {

ack_bad_irq(irq);

ret = -EINVAL;

} else {

generic_handle_irq(irq);

}

irq_exit();

set_irq_regs(old_regs);

return ret;

}

这里请注意:

先调用了 irq_enter 标记进入了硬件中断:

irq_enter是更新一些系统的统计信息,同时在__irq_enter宏中禁止了进程的抢占。虽然在产生IRQ时,ARM会自动把CPSR中的I位置位,禁止新的IRQ请求,直到中断控制转到相应的流控层后才通过local_irq_enable()打开。那为何还要禁止抢占?这是因为要考虑中断嵌套的问题,一旦流控层或驱动程序主动通过local_irq_enable打开了IRQ,而此时该中断还没处理完成,新的irq请求到达,这时代码会再次进入irq_enter,在本次嵌套中断返回时,内核不希望进行抢占调度,而是要等到最外层的中断处理完成后才做出调度动作,所以才有了禁止抢占这一处理

再调用 generic_handle_irq()最后调用 irq_exit 删除进入硬件中断的标记。

gic_handle_irq()->handle_domain_irq()->__handle_domain_irq()->generic_handle_irq()

/**

* generic_handle_irq - Invoke the handler for a particular irq

* @irq: The irq number to handle

*

*/

int generic_handle_irq(unsigned int irq)

{

struct irq_desc *desc = irq_to_desc(irq);

if (!desc)

return -EINVAL;

generic_handle_irq_desc(desc);

return 0;

}

首先在函数 irq_to_desc 中根据发生中断的中断号,去取出它的 irq_desc 中断描述结构,然后调用 generic_handle_irq_desc:

gic_handle_irq()->handle_domain_irq()->__handle_domain_irq()->generic_handle_irq()->generic_handle_irq_desc()

/*

* Architectures call this to let the generic IRQ layer

* handle an interrupt.

*/

static inline void generic_handle_irq_desc(struct irq_desc *desc)

{

desc->handle_irq(desc);

}

这里调用了 handle_irq 函数。所以,在上述流程中,还需要分析 irq_to_desc 流程:

struct irq_desc *irq_to_desc(unsigned int irq)

{

return (irq < NR_IRQS) ? irq_desc + irq : NULL;

}

NR_IRQS 是支持的总的中断个数,当然,irq 不能够大于这个数目。所以返回 irq_desc + irq。

irq_desc 是一个全局的数组:

struct irq_desc irq_desc[NR_IRQS] __cacheline_aligned_in_smp = {

[0 ... NR_IRQS-1] = {

.handle_irq = handle_bad_irq,

.depth = 1,

.lock = __RAW_SPIN_LOCK_UNLOCKED(irq_desc->lock),

}

};

这里是这个数组的初始化的地方。所有的 handle_irq 函数都被初始化成为了 handle_bad_irq。

细心的观众可能发现了,调用这个 desc->handle_irq(desc) 函数,并不是咱们注册进去的中断处理函数啊,因为两个函数的原型定义都不一样。这个 handle_irq 是 irq_flow_handler_t 类型,而我们注册进去的服务程序是 irq_handler_t,这两个明显不是同一个东西,所以这里我们还需要继续分析。

1.5.1 中断相关的数据结构

Linux 中断相关的数据结构有 3 个

结构名称 作用

irq_desc IRQ 的软件层面上的资源描述

irqaction IRQ 的通用操作

irq_chip 对应每个芯片的具体实现

1.5.1.1 struct irq_desc

irq_desc 结构如下:

/**

* struct irq_desc - interrupt descriptor

* @irq_common_data: per irq and chip data passed down to chip functions

* @kstat_irqs: irq stats per cpu

* @handle_irq: highlevel irq-events handler

* @preflow_handler: handler called before the flow handler (currently used by sparc)

* @action: the irq action chain

* @status: status information

* @core_internal_state__do_not_mess_with_it: core internal status information

* @depth: disable-depth, for nested irq_disable() calls

* @wake_depth: enable depth, for multiple irq_set_irq_wake() callers

* @irq_count: stats field to detect stalled irqs

* @last_unhandled: aging timer for unhandled count

* @irqs_unhandled: stats field for spurious unhandled interrupts

* @threads_handled: stats field for deferred spurious detection of threaded handlers

* @threads_handled_last: comparator field for deferred spurious detection of theraded handlers

* @lock: locking for SMP

* @affinity_hint: hint to user space for preferred irq affinity

* @affinity_notify: context for notification of affinity changes

* @pending_mask: pending rebalanced interrupts

* @threads_oneshot: bitfield to handle shared oneshot threads

* @threads_active: number of irqaction threads currently running

* @wait_for_threads: wait queue for sync_irq to wait for threaded handlers

* @nr_actions: number of installed actions on this descriptor

* @no_suspend_depth: number of irqactions on a irq descriptor with

* IRQF_NO_SUSPEND set

* @force_resume_depth: number of irqactions on a irq descriptor with

* IRQF_FORCE_RESUME set

* @rcu: rcu head for delayed free

* @kobj: kobject used to represent this struct in sysfs

* @request_mutex: mutex to protect request/free before locking desc->lock

* @dir: /proc/irq/ procfs entry

* @debugfs_file: dentry for the debugfs file

* @name: flow handler name for /proc/interrupts output

*/

struct irq_desc {

struct irq_common_data irq_common_data;

struct irq_data irq_data;

unsigned int __percpu *kstat_irqs;

irq_flow_handler_t handle_irq;

#ifdef CONFIG_IRQ_PREFLOW_FASTEOI

irq_preflow_handler_t preflow_handler;

#endif

struct irqaction *action; /* IRQ action list */

unsigned int status_use_accessors;

unsigned int core_internal_state__do_not_mess_with_it;

unsigned int depth; /* nested irq disables */

unsigned int wake_depth; /* nested wake enables */

unsigned int irq_count; /* For detecting broken IRQs */

unsigned long last_unhandled; /* Aging timer for unhandled count */

unsigned int irqs_unhandled;

atomic_t threads_handled;

int threads_handled_last;

raw_spinlock_t lock;

struct cpumask *percpu_enabled;

const struct cpumask *percpu_affinity;

#ifdef CONFIG_SMP

const struct cpumask *affinity_hint;

struct irq_affinity_notify *affinity_notify;

#ifdef CONFIG_GENERIC_PENDING_IRQ

cpumask_var_t pending_mask;

#endif

#endif

unsigned long threads_oneshot;

atomic_t threads_active;

wait_queue_head_t wait_for_threads;

#ifdef CONFIG_PM_SLEEP

unsigned int nr_actions;

unsigned int no_suspend_depth;

unsigned int cond_suspend_depth;

unsigned int force_resume_depth;

#endif

#ifdef CONFIG_PROC_FS

struct proc_dir_entry *dir;

#endif

#ifdef CONFIG_GENERIC_IRQ_DEBUGFS

struct dentry *debugfs_file;

#endif

#ifdef CONFIG_SPARSE_IRQ

struct rcu_head rcu;

struct kobject kobj;

#endif

struct mutex request_mutex;

int parent_irq;

struct module *owner;

const char *name;

} ____cacheline_internodealigned_in_smp;

1.5.1.2 struct irqaction

irqaction 结构如下:

/**

* struct irqaction - per interrupt action descriptor

* @handler: interrupt handler function

* @name: name of the device

* @dev_id: cookie to identify the device

* @percpu_dev_id: cookie to identify the device

* @next: pointer to the next irqaction for shared interrupts

* @irq: interrupt number

* @flags: flags (see IRQF_* above)

* @thread_fn: interrupt handler function for threaded interrupts

* @thread: thread pointer for threaded interrupts

* @secondary: pointer to secondary irqaction (force threading)

* @thread_flags: flags related to @thread

* @thread_mask: bitmask for keeping track of @thread activity

* @dir: pointer to the proc/irq/NN/name entry

*/

struct irqaction {

irq_handler_t handler;

void *dev_id;

void __percpu *percpu_dev_id;

struct irqaction *next;

irq_handler_t thread_fn;

struct task_struct *thread;

struct irqaction *secondary;

unsigned int irq;

unsigned int flags;

unsigned long thread_flags;

unsigned long thread_mask;

const char *name;

struct proc_dir_entry *dir;

} ____cacheline_internodealigned_in_smp;

1.5.1.3 struct irq_chip

irq_chip 描述如下:

/**

* struct irq_chip - hardware interrupt chip descriptor

*

* @parent_device: pointer to parent device for irqchip

* @name: name for /proc/interrupts

* @irq_startup: start up the interrupt (defaults to ->enable if NULL)

* @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL)

* @irq_enable: enable the interrupt (defaults to chip->unmask if NULL)

* @irq_disable: disable the interrupt

* @irq_ack: start of a new interrupt

* @irq_mask: mask an interrupt source

* @irq_mask_ack: ack and mask an interrupt source

* @irq_unmask: unmask an interrupt source

* @irq_eoi: end of interrupt

* @irq_set_affinity: Set the CPU affinity on SMP machines. If the force

* argument is true, it tells the driver to

* unconditionally apply the affinity setting. Sanity

* checks against the supplied affinity mask are not

* required. This is used for CPU hotplug where the

* target CPU is not yet set in the cpu_online_mask.

* @irq_retrigger: resend an IRQ to the CPU

* @irq_set_type: set the flow type (IRQ_TYPE_LEVEL/etc.) of an IRQ

* @irq_set_wake: enable/disable power-management wake-on of an IRQ

* @irq_bus_lock: function to lock access to slow bus (i2c) chips

* @irq_bus_sync_unlock:function to sync and unlock slow bus (i2c) chips

* @irq_cpu_online: configure an interrupt source for a secondary CPU

* @irq_cpu_offline: un-configure an interrupt source for a secondary CPU

* @irq_suspend: function called from core code on suspend once per

* chip, when one or more interrupts are installed

* @irq_resume: function called from core code on resume once per chip,

* when one ore more interrupts are installed

* @irq_pm_shutdown: function called from core code on shutdown once per chip

* @irq_calc_mask: Optional function to set irq_data.mask for special cases

* @irq_print_chip: optional to print special chip info in show_interrupts

* @irq_request_resources: optional to request resources before calling

* any other callback related to this irq

* @irq_release_resources: optional to release resources acquired with

* irq_request_resources

* @irq_compose_msi_msg: optional to compose message content for MSI

* @irq_write_msi_msg: optional to write message content for MSI

* @irq_get_irqchip_state: return the internal state of an interrupt

* @irq_set_irqchip_state: set the internal state of a interrupt

* @irq_set_vcpu_affinity: optional to target a vCPU in a virtual machine

* @ipi_send_single: send a single IPI to destination cpus

* @ipi_send_mask: send an IPI to destination cpus in cpumask

* @flags: chip specific flags

*/

struct irq_chip {

struct device *parent_device;

const char *name;

unsigned int (*irq_startup)(struct irq_data *data);

void (*irq_shutdown)(struct irq_data *data);

void (*irq_enable)(struct irq_data *data);

void (*irq_disable)(struct irq_data *data);

void (*irq_ack)(struct irq_data *data);

void (*irq_mask)(struct irq_data *data);

void (*irq_mask_ack)(struct irq_data *data);

void (*irq_unmask)(struct irq_data *data);

void (*irq_eoi)(struct irq_data *data);

int (*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force);

int (*irq_retrigger)(struct irq_data *data);

int (*irq_set_type)(struct irq_data *data, unsigned int flow_type);

int (*irq_set_wake)(struct irq_data *data, unsigned int on);

void (*irq_bus_lock)(struct irq_data *data);

void (*irq_bus_sync_unlock)(struct irq_data *data);

void (*irq_cpu_online)(struct irq_data *data);

void (*irq_cpu_offline)(struct irq_data *data);

void (*irq_suspend)(struct irq_data *data);

void (*irq_resume)(struct irq_data *data);

void (*irq_pm_shutdown)(struct irq_data *data);

void (*irq_calc_mask)(struct irq_data *data);

void (*irq_print_chip)(struct irq_data *data, struct seq_file *p);

int (*irq_request_resources)(struct irq_data *data);

void (*irq_release_resources)(struct irq_data *data);

void (*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg);

void (*irq_write_msi_msg)(struct irq_data *data, struct msi_msg *msg);

int (*irq_get_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool *state);

int (*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state);

int (*irq_set_vcpu_affinity)(struct irq_data *data, void *vcpu_info);

void (*ipi_send_single)(struct irq_data *data, unsigned int cpu);

void (*ipi_send_mask)(struct irq_data *data, const struct cpumask *dest);

unsigned long flags;

};

irq_chip 是一串和芯片相关的函数指针,这里定义的非常的全面,基本上和 IRQ 相关的可能出现的操作都全部定义进去了,具体根据不同的芯片,需要在不同的芯片的地方去初始化这个结构,然后这个结构会嵌入到通用的 IRQ 处理软件中去使用,使得软件处理逻辑和芯片逻辑完全的分开。

我们接下来继续前进。

1.5.2 初始化 Chip 相关的 IRQ

众所周知,启动的时候,C 语言从 start_kernel 开始,在这里面,调用了和 machine 相关的 IRQ 的初始化 init_IRQ():

1.5.2.1 init_IRQ()

asmlinkage __visible void __init start_kernel(void)

{

char *command_line;

char *after_dashes;

.....

early_irq_init();

init_IRQ();

.....

}

1.5.2.1.1 irqchip_init ()

在 init_IRQ 中,调用了irqchip_init ():

void __init init_IRQ(void)

{

init_irq_stacks();

irqchip_init();

if (!handle_arch_irq)

panic("No interrupt controller found.");

}

void __init irqchip_init(void)

{

of_irq_init(__irqchip_of_table);

acpi_probe_device_table(irqchip);

}

__irqchip_of_table就是内核irq chip table的首地址,这个table也就保存了kernel支持的所有的中断控制器的ID信息(用于和device node的匹配)。of_irq_init函数执行之前,系统已经完成了device tree的初始化,因此系统中的所有的设备节点都已经形成了一个树状结构,每个节点代表一个设备的device node。of_irq_init是在所有的device node中寻找中断控制器节点,形成树状结构(系统可以有多个interrupt controller,之所以形成中断控制器的树状结构,是为了让系统中所有的中断控制器驱动按照一定的顺序进行初始化)。之后,从root interrupt controller节点开始,对于每一个interrupt controller的device node,扫描irq chip table,进行匹配,一旦匹配到,就调用该interrupt controller的初始化函数,并把该中断控制器的device node以及parent中断控制器的device node作为参数传递给irq chip driver。。具体的匹配过程的代码属于Device Tree模块的内容,更详细的信息可以参考Device Tree代码分析文档。

1.5.2.1.1.1 of_irq_init()

/**

* of_irq_init - Scan and init matching interrupt controllers in DT

* @matches: 0 terminated array of nodes to match and init function to call

*

* This function scans the device tree for matching interrupt controller nodes,

* and calls their initialization functions in order with parents first.

*/

void __init of_irq_init(const struct of_device_id *matches)

{

const struct of_device_id *match;

struct device_node *np, *parent = NULL;

struct of_intc_desc *desc, *temp_desc;

struct list_head intc_desc_list, intc_parent_list;

INIT_LIST_HEAD(&intc_desc_list);

INIT_LIST_HEAD(&intc_parent_list);

for_each_matching_node_and_match(np, matches, &match) {

if (!of_property_read_bool(np, "interrupt-controller") ||

!of_device_is_available(np))

continue;

if (WARN(!match->data, "of_irq_init: no init function for %s\n",

match->compatible))

continue;

/*

* Here, we allocate and populate an of_intc_desc with the node

* pointer, interrupt-parent device_node etc.

*/

desc = kzalloc(sizeof(*desc), GFP_KERNEL);

if (WARN_ON(!desc)) {

of_node_put(np);

goto err;

}

desc->irq_init_cb = match->data;

desc->dev = of_node_get(np);

desc->interrupt_parent = of_irq_find_parent(np);

if (desc->interrupt_parent == np)

desc->interrupt_parent = NULL;

list_add_tail(&desc->list, &intc_desc_list);

}

/*

* The root irq controller is the one without an interrupt-parent.

* That one goes first, followed by the controllers that reference it,

* followed by the ones that reference the 2nd level controllers, etc.

*/

while (!list_empty(&intc_desc_list)) {

/*

* Process all controllers with the current 'parent'.

* First pass will be looking for NULL as the parent.

* The assumption is that NULL parent means a root controller.

*/

list_for_each_entry_safe(desc, temp_desc, &intc_desc_list, list) {

int ret;

if (desc->interrupt_parent != parent)

continue;

list_del(&desc->list);

of_node_set_flag(desc->dev, OF_POPULATED);

pr_debug("of_irq_init: init %pOF (%p), parent %p\n",

desc->dev,

desc->dev, desc->interrupt_parent);

ret = desc->irq_init_cb(desc->dev,

desc->interrupt_parent);

if (ret) {

of_node_clear_flag(desc->dev, OF_POPULATED);

kfree(desc);

continue;

}

/*

* This one is now set up; add it to the parent list so

* its children can get processed in a subsequent pass.

*/

list_add_tail(&desc->list, &intc_parent_list);

}

/* Get the next pending parent that might have children */

desc = list_first_entry_or_null(&intc_parent_list,

typeof(*desc), list);

if (!desc) {

pr_err("of_irq_init: children remain, but no parents\n");

break;

}

list_del(&desc->list);

parent = desc->dev;

kfree(desc);

}

list_for_each_entry_safe(desc, temp_desc, &intc_parent_list, list) {

list_del(&desc->list);

kfree(desc);

}

err:

list_for_each_entry_safe(desc, temp_desc, &intc_desc_list, list) {

list_del(&desc->list);

of_node_put(desc->dev);

kfree(desc);

}

}

dtb:

gic: interrupt-controller@1400000 {

compatible = "arm,gic-400";

#interrupt-cells = <3>;

interrupt-controller;

reg = <0x0 0x1401000 0 0x1000>, /* GICD */

<0x0 0x1402000 0 0x2000>, /* GICC */

<0x0 0x1404000 0 0x2000>, /* GICH */

<0x0 0x1406000 0 0x2000>; /* GICV */

interrupts = <1 9 0xf08>;

};

IRQCHIP_DECLARE(gic_400, "arm,gic-400", gic_of_init);

IRQCHIP_DECLARE(arm11mp_gic, "arm,arm11mp-gic", gic_of_init);

IRQCHIP_DECLARE(arm1176jzf_dc_gic, "arm,arm1176jzf-devchip-gic", gic_of_init);

IRQCHIP_DECLARE(cortex_a15_gic, "arm,cortex-a15-gic", gic_of_init);

IRQCHIP_DECLARE(cortex_a9_gic, "arm,cortex-a9-gic", gic_of_init);

IRQCHIP_DECLARE(cortex_a7_gic, "arm,cortex-a7-gic", gic_of_init);

IRQCHIP_DECLARE(msm_8660_qgic, "qcom,msm-8660-qgic", gic_of_init);

IRQCHIP_DECLARE(msm_qgic2, "qcom,msm-qgic2", gic_of_init);

IRQCHIP_DECLARE(pl390, "arm,pl390", gic_of_init);

#define IRQCHIP_DECLARE(name, compat, fn) OF_DECLARE_2(irqchip, name, compat, fn)

#define OF_DECLARE_2(table, name, compat, fn) \

_OF_DECLARE(table, name, compat, fn, of_init_fn_2)

#define _OF_DECLARE(table, name, compat, fn, fn_type) \

static const struct of_device_id _of_table##name \

__used section(##table##_of_table) \

= { .compatible = compat, \

.data = (fn == (fn_type)NULL) ? fn : fn }

GIC driver初始化代码分析:

1.5.2.1.1.1.1 gic_of_init()

int __init

gic_of_init(struct device_node *node, struct device_node *parent)

{

struct gic_chip_data *gic;

int irq, ret;

if (WARN_ON(!node))

return -ENODEV;

if (WARN_ON(gic_cnt >= CONFIG_ARM_GIC_MAX_NR))

return -EINVAL;

gic = &gic_data[gic_cnt];

ret = gic_of_setup(gic, node);

if (ret)

return ret;

/*

* Disable split EOI/Deactivate if either HYP is not available

* or the CPU interface is too small.

*/

if (gic_cnt == 0 && !gic_check_eoimode(node, &gic->raw_cpu_base))

static_key_slow_dec(&supports_deactivate);

ret = __gic_init_bases(gic, -1, &node->fwnode);

if (ret) {

gic_teardown(gic);

return ret;

}

if (!gic_cnt) {

gic_init_physaddr(node);

gic_of_setup_kvm_info(node);

}

if (parent) {

irq = irq_of_parse_and_map(node, 0);

gic_cascade_irq(gic_cnt, irq);

}

if (IS_ENABLED(CONFIG_ARM_GIC_V2M))

gicv2m_init(&node->fwnode, gic_data[gic_cnt].domain);

gic_cnt++;

return 0;

}

1.5.2.1.1.1.1.1 gic_init_bases()

__gic_init_bases()->gic_init_bases()

static int gic_init_bases(struct gic_chip_data *gic, int irq_start,

struct fwnode_handle *handle)

{

irq_hw_number_t hwirq_base;

int gic_irqs, irq_base, ret;

if (IS_ENABLED(CONFIG_GIC_NON_BANKED) && gic->percpu_offset) {

/* Frankein-GIC without banked registers... */

unsigned int cpu;

gic->dist_base.percpu_base = alloc_percpu(void __iomem *);

gic->cpu_base.percpu_base = alloc_percpu(void __iomem *);

if (WARN_ON(!gic->dist_base.percpu_base ||

!gic->cpu_base.percpu_base)) {

ret = -ENOMEM;

goto error;

}

for_each_possible_cpu(cpu) {

u32 mpidr = cpu_logical_map(cpu);

u32 core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0);

unsigned long offset = gic->percpu_offset * core_id;

*per_cpu_ptr(gic->dist_base.percpu_base, cpu) =

gic->raw_dist_base + offset;

*per_cpu_ptr(gic->cpu_base.percpu_base, cpu) =

gic->raw_cpu_base + offset;

}

gic_set_base_accessor(gic, gic_get_percpu_base);

} else {

/* Normal, sane GIC... */

WARN(gic->percpu_offset,

"GIC_NON_BANKED not enabled, ignoring %08x offset!",

gic->percpu_offset);

gic->dist_base.common_base = gic->raw_dist_base;

gic->cpu_base.common_base = gic->raw_cpu_base;

gic_set_base_accessor(gic, gic_get_common_base);

}

/*

* Find out how many interrupts are supported.

* The GIC only supports up to 1020 interrupt sources.

*/

gic_irqs = readl_relaxed(gic_data_dist_base(gic) + GIC_DIST_CTR) & 0x1f;

gic_irqs = (gic_irqs + 1) * 32;

if (gic_irqs > 1020)

gic_irqs = 1020;

gic->gic_irqs = gic_irqs;

if (handle) { /* DT/ACPI */

gic->domain = irq_domain_create_linear(handle, gic_irqs,

&gic_irq_domain_hierarchy_ops,

gic);

} else { /* Legacy support */

/*

* For primary GICs, skip over SGIs.

* For secondary GICs, skip over PPIs, too.

*/

if (gic == &gic_data[0] && (irq_start & 31) > 0) {

hwirq_base = 16;

if (irq_start != -1)

irq_start = (irq_start & ~31) + 16;

} else {

hwirq_base = 32;

}

gic_irqs -= hwirq_base; /* calculate # of irqs to allocate */

irq_base = irq_alloc_descs(irq_start, 16, gic_irqs,

numa_node_id());

if (irq_base < 0) {

WARN(1, "Cannot allocate irq_descs @ IRQ%d, assuming pre-allocated\n",

irq_start);

irq_base = irq_start;

}

gic->domain = irq_domain_add_legacy(NULL, gic_irqs, irq_base,

hwirq_base, &gic_irq_domain_ops, gic);

}

if (WARN_ON(!gic->domain)) {

ret = -ENODEV;

goto error;

}

gic_dist_init(gic);

ret = gic_cpu_init(gic);

if (ret)

goto error;

ret = gic_pm_init(gic);

if (ret)

goto error;

return 0;

error:

if (IS_ENABLED(CONFIG_GIC_NON_BANKED) && gic->percpu_offset) {

free_percpu(gic->dist_base.percpu_base);

free_percpu(gic->cpu_base.percpu_base);

}

return ret;

}

这段代码主要是向系统中注册一个irq domain的数据结构。为何需要struct irq_domain这样一个数据结构呢?从linux kernel的角度来看,任何外部的设备的中断都是一个异步事件,kernel都需要识别这个事件。在内核中,用IRQ number来标识某一个设备的某个interrupt request。有了IRQ number就可以定位到该中断的描述符(struct irq_desc)。但是,对于中断控制器而言,它不并知道IRQ number,它只是知道HW interrupt number(中断控制器会为其支持的interrupt source进行编码,这个编码被称为Hardware interrupt number )。不同的软件模块用不同的ID来识别interrupt source,这样就需要映射了。如何将Hardware interrupt number 映射到IRQ number呢?这需要一个translation object,内核定义为struct irq_domain。

每个interrupt controller都会形成一个irq domain,负责解析其下游的interrut source。如果interrupt controller有级联的情况,那么一个非root interrupt controller的中断控制器也是其parent irq domain的一个普通的interrupt source。struct irq_domain定义如下:

struct irq_domain {

......

const struct irq_domain_ops *ops;

void *host_data;

......

};

在注册GIC的irq domain的时候还有一个重要的数据结构gic_irq_domain_ops,其类型是struct irq_domain_ops ,对于GIC,其irq domain的操作函数是gic_irq_domain_ops,定义如下:

static const struct irq_domain_ops gic_irq_domain_ops = {

.map = gic_irq_domain_map,

.unmap = gic_irq_domain_unmap,

};

irq domain的概念是一个通用中断子系统的概念,

irq domain相关callback函数分析: gic_irq_domain_map函数:创建IRQ number和GIC hw interrupt ID之间映射关系的时候,需要调用该回调函数。具体代码如下:

static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,

irq_hw_number_t hw)

{

struct gic_chip_data *gic = d->host_data;

if (hw < 32) {

irq_set_percpu_devid(irq);

irq_domain_set_info(d, irq, hw, &gic->chip, d->host_data,

handle_percpu_devid_irq, NULL, NULL);

irq_set_status_flags(irq, IRQ_NOAUTOEN);

} else {

irq_domain_set_info(d, irq, hw, &gic->chip, d->host_data,

handle_fasteoi_irq, NULL, NULL);

irq_set_probe(irq);

irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(irq)));

}

return 0;

}

由此,这里就找到了desc->handle_irq(desc) 函数被设置为handle_percpu_devid_irq或者handle_fasteoi_irq,以handle_percpu_devid_irq为例:

/**

* handle_percpu_devid_irq - Per CPU local irq handler with per cpu dev ids

* @desc: the interrupt description structure for this irq

*

* Per CPU interrupts on SMP machines without locking requirements. Same as

* handle_percpu_irq() above but with the following extras:

*

* action->percpu_dev_id is a pointer to percpu variables which

* contain the real device id for the cpu on which this handler is

* called

*/

void handle_percpu_devid_irq(struct irq_desc *desc)

{

struct irq_chip *chip = irq_desc_get_chip(desc);

struct irqaction *action = desc->action;

unsigned int irq = irq_desc_get_irq(desc);

irqreturn_t res;

kstat_incr_irqs_this_cpu(desc);

if (chip->irq_ack)

chip->irq_ack(&desc->irq_data);

if (likely(action)) {

trace_irq_handler_entry(irq, action);

res = action->handler(irq, raw_cpu_ptr(action->percpu_dev_id));

trace_irq_handler_exit(irq, action, res);

} else {

unsigned int cpu = smp_processor_id();

bool enabled = cpumask_test_cpu(cpu, desc->percpu_enabled);

if (enabled)

irq_percpu_disable(desc, cpu);

pr_err_once("Spurious%s percpu IRQ%u on CPU%u\n",

enabled ? " and unmasked" : "", irq, cpu);

}

if (chip->irq_eoi)

chip->irq_eoi(&desc->irq_data);

}

最终就调用了我们注册进去的服务程序。

相关推荐
Mr_Xuhhh1 分钟前
递归搜索与回溯算法
c语言·开发语言·c++·算法·github
爱吃生蚝的于勒2 小时前
C语言内存函数
c语言·开发语言·数据结构·c++·学习·算法
PcVue China3 小时前
PcVue + SQL Grid : 释放数据的无限潜力
大数据·服务器·数据库·sql·科技·安全·oracle
长安11083 小时前
前后端、网关、协议方面补充
网络
失落的香蕉5 小时前
C语言串讲-2之指针和结构体
java·c语言·开发语言
舞动CPU5 小时前
linux c/c++最高效的计时方法
linux·运维·服务器
钰@6 小时前
小程序开发者工具的network选项卡中有某域名的接口请求,但是在charles中抓不到该接口
运维·服务器·小程序
wanhengwangluo6 小时前
云服务器和物理服务器的区别有哪些?
运维·服务器
hzyyyyyyyu6 小时前
隧道技术-tcp封装icmp出网
网络·网络协议·tcp/ip
南猿北者6 小时前
docker Network(网络)
网络·docker·容器