中断的概念
中断是指在CPU正常运行期间,由于内外部事件或由程序预先安排的事件引起的 CPU 暂时停止正在运行的程序,转而为该内部或外部事件或预先安排的事件服务的程序中去,服务完毕后再返回去继续运行被暂时中断的程序。Linux中通常分为外部中断(又叫硬件中断)和内部中断(又叫异常)。
软件对硬件进行配置后,软件期望等待硬件的某种状态(比如,收到了数据),这里有两种方式,一种是轮询(polling): CPU 不断的去读硬件状态。另一种是当硬件完成某种事件后,给 CPU 一个中断,让 CPU 停下手上的事情,去处理这个中断。很显然,中断的交互方式提高了系统的吞吐。
当 CPU 收到一个中断 (IRQ)的时候,会去执行该中断对应的处理函数(ISR)。普通情况下,会有一个中断向量表,向量表中定义了 CPU 对应的每一个外设资源的中断处理程序的入口,当发生对应的中断的时候, CPU 直接跳转到这个入口执行程序。也就是中断上下文。(注意:中断上下文中,不可阻塞睡眠)。
Linux 中断 top/bottom
玩过 MCU 的人都知道,中断服务程序的设计最好是快速完成任务并退出,因为此刻系统处于被中断中。但是在 ISR 中又有一些必须完成的事情,比如:清中断标志,读/写数据,寄存器操作等。
在 Linux 中,同样也是这个要求,希望尽快的完成 ISR。但事与愿违,有些 ISR 中任务繁重,会消耗很多时间,导致响应速度变差。Linux 中针对这种情况,将中断分为了两部分:
-
上半部(top half):收到一个中断,立即执行,有严格的时间限制,只做一些必要的工作,比如:应答,复位等。这些工作都是在所有中断被禁止的情况下完成的。
-
底半部(bottom half):能够被推迟到后面完成的任务会在底半部进行。在适合的时机,下半部会被开中断执行。(具体的机制在接下来章节分析(软中断、tasklet、工作队列))。
中断处理程序
驱动程序可以使用接口:
static inline int __must_check request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,const char *name, void *dev)
像系统申请注册一个中断处理程序。其中的参数:
参数 含义
irq 表了该中断的中断号,一般 CPU 的中断号都会事先定义好。
handler 中断发生后的 ISR
flags 中断标志( IRQF_DISABLED / IRQFSAMPLE_RANDOM / IRQF_TIMER / IRQF_SHARED)
name 中断相关的设备 ASCII 文本,例如 "keyboard",这些名字会在 /proc/irq 和 /proc/interrupts 文件使用
dev 用于共享中断线,传递驱动程序的设备结构。非共享类型的中断,直接设置成为 NULL
中断标志 flag 的含义:
标志 含义
IRQF_DISABLED 设置这个标志的话,意味着内核在处理这个 ISR 期间,要禁止其他中断(多数情况不使用这个)
IRQFSAMPLE_RANDOM 表明这个设备产生的中断对内核熵池有贡献
IRQF_TIMER 为系统定时器准备的标志
IRQF_SHARED 表明多个中断处理程序之间共享中断线。同一个给定的线上注册每个处理程序,必须设置这个
调用 request_irq 成功执行返回 0。常见错误是 -EBUSY,表示给定的中断线已经在使用(或者没有指定 IRQF_SHARED)
注意:request_irq 函数可能引起睡眠,所以不允许在中断上下文或者不允许睡眠的代码中调用。
释放中断:
const void *free_irq(unsigned int irq, void *dev_id) //用于释放中断处理函数。
注意:Linux 中的中断处理程序是无须重入的。当给定的中断处理程序正在执行的时候,其中断线在所有的处理器上都会被屏蔽掉,以防在同一个中断线上又接收到另一个新的中断。通常情况下,除了该中断的其他中断都是打开的,也就是说其他的中断线上的重点都能够被处理,但是当前的中断线总是被禁止的,故,同一个中断处理程序是绝对不会被自己嵌套的,另外ARM上也不支持中断优先级,也就是没有使用FIQ,因此ARM不支持中断嵌套。
中断上下文
与进程上下文不一样,内核执行中断服务程序的时候,处于中断上下文。中断处理程序并没有自己的独立的栈,而是使用了内核栈,其大小一般是有限制的(32bit 机器 8KB)。所以其必须短小精悍。同时中断服务程序是打断了正常的程序流程,这一点上也必须保证快速的执行。同时中断上下文中是不允许睡眠,阻塞的。
中断上下文不能睡眠的原因是:
1、 中断处理的时候,不应该发生进程切换,因为在中断context中,唯一能打断当前中断handler的只有更高优先级的中断,它不会被进程打断,如果在中断context中休眠,则没有办法唤醒它,因为所有的wake_up_xxx都是针对某个进程而言的,而在中断context中,没有进程的概念,没有一个task_struct(这点对于softirq和tasklet一样),因此真的休眠了,比如调用了会导致block的例程,内核几乎肯定会死。
2、schedule()在切换进程时,保存当前的进程上下文(CPU寄存器的值、进程的状态以及堆栈中的内容),以便以后恢复此进程运行。中断发生后,内核会先保存当前被中断的进程上下文(在调用中断处理程序后恢复);但在中断处理程序里,CPU寄存器的值肯定已经变化了吧(最重要的程序计数器PC、堆栈SP等),如果此时因为睡眠或阻塞操作调用了schedule(),则保存的进程上下文就不是当前的进程context了.所以不可以在中断处理程序中调用schedule()。
3、内核中schedule()函数本身在进来的时候判断是否处于中断上下文:
if(unlikely(in_interrupt()))
BUG();
因此,强行调用schedule()的结果就是内核BUG。
4、中断handler会使用被中断的进程内核堆栈,但不会对它有任何影响,因为handler使用完后会完全清除它使用的那部分堆栈,恢复被中断前的原貌。
5、处于中断context时候,内核是不可抢占的。因此,如果休眠,则内核一定挂起
中断处理流程
发生中断时,CPU执行异常向量vector_irq的代码, 即异常向量表中的中断异常的代码,它是一个跳转指令,跳去执行真正的中断处理程序,在vector_irq里面,最终会调用中断处理的总入口函数。
对于 ARM64 处理器的异常级别 1、 2 和 3,每个异常级别都有自己的异常向量表,异常向量表的起始虚拟地址存放在寄存器 VBAR_ELn(向量基准地址寄存器, Vector Based Address Register)中。每个异常向量表有 16 项,分为 4 组,每组 4 项,每项的长度是 128 字节(可以存放32 条指令)。异常级别 n 的异常向量表所示。
异常级别 n 的异常向量表
地址 异常类型 说明
VBAR_ELn + 0x000 同步异常 当前异常级别生成的异常,使用异常
级别0的栈指针寄存器SP_EL0
-
0x080 中断
-
0x100 快速中断
-
0x180 系统错误
-
0x200 同步异常 当前异常级别生成的异常,使用当前
异常级别的栈指针寄存器SP_ELn
-
0x280 中断
-
0x300 快速中断
-
0x380 系统错误
-
0x400 同步异常 64位应用程序在异常级别( n-1)生
成的异常
-
0x480 中断
-
0x500 快速中断
-
0x580 系统错误
-
0x600 同步异常 32位应用程序在异常级别( n-1)生
成的异常
-
0x680 中断
-
0x700 快速中断
-
0x780 系统错误
ARM64 架构内核定义的异常向量表如下:
这部分内容在《Linux应用层和内核交互》中系统调用章节讲过,这里只列出与中断有关的内容;
arch/arm64/kernel/entry.S:
/*
* Exception vectors.
*/
.pushsection ".entry.text", "ax"
.align 11
ENTRY(vectors)
kernel_ventry 1, sync_invalid //异常级别1生成的同步异常,使用栈指针寄存器SP_EL0
kernel_ventry 1, irq_invalid //异常级别1生成的中断,使用栈指针寄存器SP_EL0
kernel_ventry 1, fiq_invalid //异常级别1生成的快速中断,使用栈指针寄存器SP_EL0
kernel_ventry 1, error_invalid //异常级别1生成的系统错误,使用栈指针寄存器SP_EL0
kernel_ventry 1, sync //异常级别1生成的同步异常,使用栈指针寄存器SP_EL1
kernel_ventry 1, irq //异常级别1生成的中断,使用栈指针寄存器SP_EL1
kernel_ventry 1, fiq_invalid //异常级别1生成的快速中断,使用栈指针寄存器SP_EL1
kernel_ventry 1, error_invalid //异常级别1生成的系统错误,使用栈指针寄存器SP_EL1
kernel_ventry 0, sync //64位应用程序在异常级别0生成的同步异常
kernel_ventry 0, irq // 64位应用程序在异常级别0生成的中断
kernel_ventry 0, fiq_invalid // 64位应用程序在异常级别0生成的快速中断
kernel_ventry 0, error_invalid //64位应用程序在异常级别0生成的系统错误
#ifdef CONFIG_COMPAT
kernel_ventry 0, sync_compat, 32 //32位应用程序在异常级别0生成的同步异常
kernel_ventry 0, irq_compat, 32 // 32位应用程序在异常级别0生成的中断
kernel_ventry 0, fiq_invalid_compat, 32 // 32位应用程序在异常级别0生成的快速中断
kernel_ventry 0, error_invalid_compat, 32 // 32位应用程序在异常级别0生成的系统错误
#else
kernel_ventry 0, sync_invalid, 32 //32位应用程序在异常级别0生成的同步异常
kernel_ventry 0, irq_invalid, 32 // 32位应用程序在异常级别0生成的中断
kernel_ventry 0, fiq_invalid, 32 // 32位应用程序在异常级别0生成的快速中断
kernel_ventry 0, error_invalid, 32 // 32位应用程序在异常级别0生成的系统错误
#endif
END(vectors)
kernel_ventry是一个宏,参数是跳转标号,即异常处理程序的标号,宏的定义如下(/arch/arm64/kernel/entry.S):
.macro kernel_ventry, el, label, regsize = 64
.align 7
sub sp, sp, #S_FRAME_SIZE // 将sp预留一个fram_size, 这个size 就是struct pt_regs的大小
#ifdef CONFIG_VMAP_STACK
....这里省略掉检查栈溢出的代码
#endif
b el\()\el\()_\label // 跳转到对应级别的异常处理函数, kernel_entry 1, irq为el1_irq
.endm
" .align 7"表示把下一条指令的地址对齐到 2^7,即对齐到 128; 对于向量表vectors中的kernel_ventry 1, irq , 则 b el\()\el\()_\label跳转到el1_irq函数。 其中1表示的是从哪个异常模式产生的,比如是User->kernel就是0, kernel->kernel就是1.
每个CPU 在初始化是,都会设置中断向量地址。
arch/arm64/kernel/head.S
__primary_switched:
adrp x4, init_thread_union
add sp, x4, #THREAD_SIZE
adr_l x5, init_task
msr sp_el0, x5 // Save thread_info
adr_l x8, vectors // load VBAR_EL1 with virtual
msr vbar_el1, x8 // vector table address
isb
stp xzr, x30, [sp, #-16]!
mov x29, sp
str_l x21, __fdt_pointer, x5 // Save FDT pointer
ldr_l x4, kimage_vaddr // Save the offset between
sub x4, x4, x0 // the kernel virtual and
str_l x4, kimage_voffset, x5 // physical mappings
// Clear BSS
adr_l x0, __bss_start
mov x1, xzr
adr_l x2, __bss_stop
sub x2, x2, x0
bl __pi_memset
dsb ishst // Make zero page visible to PTW
#ifdef CONFIG_KASAN
bl kasan_early_init
#endif
#ifdef CONFIG_RANDOMIZE_BASE
tst x23, ~(MIN_KIMG_ALIGN - 1) // already running randomized?
b.ne 0f
mov x0, x21 // pass FDT address in x0
bl kaslr_early_init // parse FDT for KASLR options
cbz x0, 0f // KASLR disabled? just proceed
orr x23, x23, x0 // record KASLR offset
ldp x29, x30, [sp], #16 // we must enable KASLR, return
ret // to __primary_switch()
0:
#endif
add sp, sp, #16
mov x29, #0
mov x30, #0
b start_kernel
ENDPROC(__primary_switched)
__secondary_switched:
adr_l x5, vectors //设置中断向量地址
msr vbar_el1, x5
isb
adr_l x0, secondary_data
ldr x1, [x0, #CPU_BOOT_STACK] // get secondary_data.stack
mov sp, x1
ldr x2, [x0, #CPU_BOOT_TASK]
msr sp_el0, x2
mov x29, #0
mov x30, #0
b secondary_start_kernel
ENDPROC(__secondary_switched)
有中断产生时, GIC会向相应的CPU发出中断信号,CPU检测到中断信号,根据中断向量表,跳转到el1_irq。
arch/arm64/kernel/entry.S
el1_irq:
kernel_entry 1
enable_dbg
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
#endif
irq_handler
#ifdef CONFIG_PREEMPT
get_thread_info tsk
ldr w24, [tsk, #TI_PREEMPT] // get preempt count
cbnz w24, 1f // preempt count != 0
ldr x0, [tsk, #TI_FLAGS] // get flags
tbz x0, #TIF_NEED_RESCHED, 1f // needs rescheduling?
bl el1_preempt
1:
#endif
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_on
#endif
kernel_exit 1
ENDPROC(el1_irq)
/*
* Interrupt handling.
*/
.macro irq_handler
#ifdef CONFIG_STRICT_MEMORY_RWX
ldr x1, =handle_arch_irq
ldr x1, [x1]
#else
ldr x1, handle_arch_irq
#endif
mov x0, sp
blr x1
.endm
.text
arch/arm64/kernel/irq.c
void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
{
if (handle_arch_irq)
return;
handle_arch_irq = handle_irq;
}
Gicv2中断控制器初始化时会调用set_handle_irq(gic_handle_irq);
dtb:
gic: interrupt-controller@1400000 {
compatible = "arm,gic-400";
#interrupt-cells = <3>;
interrupt-controller;
reg = <0x0 0x1401000 0 0x1000>, /* GICD */
<0x0 0x1402000 0 0x2000>, /* GICC */
<0x0 0x1404000 0 0x2000>, /* GICH */
<0x0 0x1406000 0 0x2000>; /* GICV */
interrupts = <1 9 0xf08>;
};
IRQCHIP_DECLARE(gic_400, "arm,gic-400", gic_of_init);
设置代码路径:gic_of_init()->__gic_init_bases()->set_handle_irq(gic_handle_irq);
static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
{
u32 irqstat, irqnr;
struct gic_chip_data *gic = &gic_data[0];
void __iomem *cpu_base = gic_data_cpu_base(gic);
do {
irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK);
irqnr = irqstat & GICC_IAR_INT_ID_MASK;
if (likely(irqnr > 15 && irqnr < 1020)) {
if (static_key_true(&supports_deactivate))
writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
isb();
handle_domain_irq(gic->domain, irqnr, regs); //调用相应的中断处理函数
continue;
}
if (irqnr < 16) {
writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
if (static_key_true(&supports_deactivate))
writel_relaxed(irqstat, cpu_base + GIC_CPU_DEACTIVATE);
#ifdef CONFIG_SMP
/*
* Ensure any shared data written by the CPU sending
* the IPI is read after we've read the ACK register
* on the GIC.
*
* Pairs with the write barrier in gic_raise_softirq
*/
smp_rmb();
handle_IPI(irqnr, regs); //SMP 核间中断
#endif
continue;
}
break;
} while (1);
}
gic_handle_irq()->handle_domain_irq()->__handle_domain_irq()
static inline int handle_domain_irq(struct irq_domain *domain,
unsigned int hwirq, struct pt_regs *regs)
{
return __handle_domain_irq(domain, hwirq, true, regs);
}
/**
* __handle_domain_irq - Invoke the handler for a HW irq belonging to a domain
* @domain: The domain where to perform the lookup
* @hwirq: The HW irq number to convert to a logical one
* @lookup: Whether to perform the domain lookup or not
* @regs: Register file coming from the low-level handling code
*
* Returns: 0 on success, or -EINVAL if conversion has failed
*/
int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq,
bool lookup, struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
unsigned int irq = hwirq;
int ret = 0;
irq_enter();
#ifdef CONFIG_IRQ_DOMAIN
if (lookup)
irq = irq_find_mapping(domain, hwirq);
#endif
/*
* Some hardware gives randomly wrong interrupts. Rather
* than crashing, do something sensible.
*/
if (unlikely(!irq || irq >= nr_irqs)) {
ack_bad_irq(irq);
ret = -EINVAL;
} else {
generic_handle_irq(irq);
}
irq_exit();
set_irq_regs(old_regs);
return ret;
}
这里请注意:
先调用了 irq_enter 标记进入了硬件中断:
irq_enter是更新一些系统的统计信息,同时在__irq_enter宏中禁止了进程的抢占。虽然在产生IRQ时,ARM会自动把CPSR中的I位置位,禁止新的IRQ请求,直到中断控制转到相应的流控层后才通过local_irq_enable()打开。那为何还要禁止抢占?这是因为要考虑中断嵌套的问题,一旦流控层或驱动程序主动通过local_irq_enable打开了IRQ,而此时该中断还没处理完成,新的irq请求到达,这时代码会再次进入irq_enter,在本次嵌套中断返回时,内核不希望进行抢占调度,而是要等到最外层的中断处理完成后才做出调度动作,所以才有了禁止抢占这一处理
再调用 generic_handle_irq()最后调用 irq_exit 删除进入硬件中断的标记。
gic_handle_irq()->handle_domain_irq()->__handle_domain_irq()->generic_handle_irq()
/**
* generic_handle_irq - Invoke the handler for a particular irq
* @irq: The irq number to handle
*
*/
int generic_handle_irq(unsigned int irq)
{
struct irq_desc *desc = irq_to_desc(irq);
if (!desc)
return -EINVAL;
generic_handle_irq_desc(desc);
return 0;
}
首先在函数 irq_to_desc 中根据发生中断的中断号,去取出它的 irq_desc 中断描述结构,然后调用 generic_handle_irq_desc:
gic_handle_irq()->handle_domain_irq()->__handle_domain_irq()->generic_handle_irq()->generic_handle_irq_desc()
/*
* Architectures call this to let the generic IRQ layer
* handle an interrupt.
*/
static inline void generic_handle_irq_desc(struct irq_desc *desc)
{
desc->handle_irq(desc);
}
这里调用了 handle_irq 函数。所以,在上述流程中,还需要分析 irq_to_desc 流程:
struct irq_desc *irq_to_desc(unsigned int irq)
{
return (irq < NR_IRQS) ? irq_desc + irq : NULL;
}
NR_IRQS 是支持的总的中断个数,当然,irq 不能够大于这个数目。所以返回 irq_desc + irq。
irq_desc 是一个全局的数组:
struct irq_desc irq_desc[NR_IRQS] __cacheline_aligned_in_smp = {
0 ... NR_IRQS-1\] = { .handle_irq = handle_bad_irq, .depth = 1, .lock = __RAW_SPIN_LOCK_UNLOCKED(irq_desc-\>lock), } }; 这里是这个数组的初始化的地方。所有的 handle_irq 函数都被初始化成为了 handle_bad_irq。 细心的观众可能发现了,调用这个 desc-\>handle_irq(desc) 函数,并不是咱们注册进去的中断处理函数啊,因为两个函数的原型定义都不一样。这个 handle_irq 是 irq_flow_handler_t 类型,而我们注册进去的服务程序是 irq_handler_t,这两个明显不是同一个东西,所以这里我们还需要继续分析。 1.5.1 中断相关的数据结构 Linux 中断相关的数据结构有 3 个 结构名称 作用 irq_desc IRQ 的软件层面上的资源描述 irqaction IRQ 的通用操作 irq_chip 对应每个芯片的具体实现 1.5.1.1 struct irq_desc irq_desc 结构如下: /\*\* \* struct irq_desc - interrupt descriptor \* @irq_common_data: per irq and chip data passed down to chip functions \* @kstat_irqs: irq stats per cpu \* @handle_irq: highlevel irq-events handler \* @preflow_handler: handler called before the flow handler (currently used by sparc) \* @action: the irq action chain \* @status: status information \* @core_internal_state__do_not_mess_with_it: core internal status information \* @depth: disable-depth, for nested irq_disable() calls \* @wake_depth: enable depth, for multiple irq_set_irq_wake() callers \* @irq_count: stats field to detect stalled irqs \* @last_unhandled: aging timer for unhandled count \* @irqs_unhandled: stats field for spurious unhandled interrupts \* @threads_handled: stats field for deferred spurious detection of threaded handlers \* @threads_handled_last: comparator field for deferred spurious detection of theraded handlers \* @lock: locking for SMP \* @affinity_hint: hint to user space for preferred irq affinity \* @affinity_notify: context for notification of affinity changes \* @pending_mask: pending rebalanced interrupts \* @threads_oneshot: bitfield to handle shared oneshot threads \* @threads_active: number of irqaction threads currently running \* @wait_for_threads: wait queue for sync_irq to wait for threaded handlers \* @nr_actions: number of installed actions on this descriptor \* @no_suspend_depth: number of irqactions on a irq descriptor with \* IRQF_NO_SUSPEND set \* @force_resume_depth: number of irqactions on a irq descriptor with \* IRQF_FORCE_RESUME set \* @rcu: rcu head for delayed free \* @kobj: kobject used to represent this struct in sysfs \* @request_mutex: mutex to protect request/free before locking desc-\>lock \* @dir: /proc/irq/ procfs entry \* @debugfs_file: dentry for the debugfs file \* @name: flow handler name for /proc/interrupts output \*/ struct irq_desc { struct irq_common_data irq_common_data; struct irq_data irq_data; unsigned int __percpu \*kstat_irqs; irq_flow_handler_t handle_irq; #ifdef CONFIG_IRQ_PREFLOW_FASTEOI irq_preflow_handler_t preflow_handler; #endif struct irqaction \*action; /\* IRQ action list \*/ unsigned int status_use_accessors; unsigned int core_internal_state__do_not_mess_with_it; unsigned int depth; /\* nested irq disables \*/ unsigned int wake_depth; /\* nested wake enables \*/ unsigned int irq_count; /\* For detecting broken IRQs \*/ unsigned long last_unhandled; /\* Aging timer for unhandled count \*/ unsigned int irqs_unhandled; atomic_t threads_handled; int threads_handled_last; raw_spinlock_t lock; struct cpumask \*percpu_enabled; const struct cpumask \*percpu_affinity; #ifdef CONFIG_SMP const struct cpumask \*affinity_hint; struct irq_affinity_notify \*affinity_notify; #ifdef CONFIG_GENERIC_PENDING_IRQ cpumask_var_t pending_mask; #endif #endif unsigned long threads_oneshot; atomic_t threads_active; wait_queue_head_t wait_for_threads; #ifdef CONFIG_PM_SLEEP unsigned int nr_actions; unsigned int no_suspend_depth; unsigned int cond_suspend_depth; unsigned int force_resume_depth; #endif #ifdef CONFIG_PROC_FS struct proc_dir_entry \*dir; #endif #ifdef CONFIG_GENERIC_IRQ_DEBUGFS struct dentry \*debugfs_file; #endif #ifdef CONFIG_SPARSE_IRQ struct rcu_head rcu; struct kobject kobj; #endif struct mutex request_mutex; int parent_irq; struct module \*owner; const char \*name; } ____cacheline_internodealigned_in_smp; 1.5.1.2 struct irqaction irqaction 结构如下: /\*\* \* struct irqaction - per interrupt action descriptor \* @handler: interrupt handler function \* @name: name of the device \* @dev_id: cookie to identify the device \* @percpu_dev_id: cookie to identify the device \* @next: pointer to the next irqaction for shared interrupts \* @irq: interrupt number \* @flags: flags (see IRQF_\* above) \* @thread_fn: interrupt handler function for threaded interrupts \* @thread: thread pointer for threaded interrupts \* @secondary: pointer to secondary irqaction (force threading) \* @thread_flags: flags related to @thread \* @thread_mask: bitmask for keeping track of @thread activity \* @dir: pointer to the proc/irq/NN/name entry \*/ struct irqaction { irq_handler_t handler; void \*dev_id; void __percpu \*percpu_dev_id; struct irqaction \*next; irq_handler_t thread_fn; struct task_struct \*thread; struct irqaction \*secondary; unsigned int irq; unsigned int flags; unsigned long thread_flags; unsigned long thread_mask; const char \*name; struct proc_dir_entry \*dir; } ____cacheline_internodealigned_in_smp; 1.5.1.3 struct irq_chip irq_chip 描述如下: /\*\* \* struct irq_chip - hardware interrupt chip descriptor \* \* @parent_device: pointer to parent device for irqchip \* @name: name for /proc/interrupts \* @irq_startup: start up the interrupt (defaults to -\>enable if NULL) \* @irq_shutdown: shut down the interrupt (defaults to -\>disable if NULL) \* @irq_enable: enable the interrupt (defaults to chip-\>unmask if NULL) \* @irq_disable: disable the interrupt \* @irq_ack: start of a new interrupt \* @irq_mask: mask an interrupt source \* @irq_mask_ack: ack and mask an interrupt source \* @irq_unmask: unmask an interrupt source \* @irq_eoi: end of interrupt \* @irq_set_affinity: Set the CPU affinity on SMP machines. If the force \* argument is true, it tells the driver to \* unconditionally apply the affinity setting. Sanity \* checks against the supplied affinity mask are not \* required. This is used for CPU hotplug where the \* target CPU is not yet set in the cpu_online_mask. \* @irq_retrigger: resend an IRQ to the CPU \* @irq_set_type: set the flow type (IRQ_TYPE_LEVEL/etc.) of an IRQ \* @irq_set_wake: enable/disable power-management wake-on of an IRQ \* @irq_bus_lock: function to lock access to slow bus (i2c) chips \* @irq_bus_sync_unlock:function to sync and unlock slow bus (i2c) chips \* @irq_cpu_online: configure an interrupt source for a secondary CPU \* @irq_cpu_offline: un-configure an interrupt source for a secondary CPU \* @irq_suspend: function called from core code on suspend once per \* chip, when one or more interrupts are installed \* @irq_resume: function called from core code on resume once per chip, \* when one ore more interrupts are installed \* @irq_pm_shutdown: function called from core code on shutdown once per chip \* @irq_calc_mask: Optional function to set irq_data.mask for special cases \* @irq_print_chip: optional to print special chip info in show_interrupts \* @irq_request_resources: optional to request resources before calling \* any other callback related to this irq \* @irq_release_resources: optional to release resources acquired with \* irq_request_resources \* @irq_compose_msi_msg: optional to compose message content for MSI \* @irq_write_msi_msg: optional to write message content for MSI \* @irq_get_irqchip_state: return the internal state of an interrupt \* @irq_set_irqchip_state: set the internal state of a interrupt \* @irq_set_vcpu_affinity: optional to target a vCPU in a virtual machine \* @ipi_send_single: send a single IPI to destination cpus \* @ipi_send_mask: send an IPI to destination cpus in cpumask \* @flags: chip specific flags \*/ struct irq_chip { struct device \*parent_device; const char \*name; unsigned int (\*irq_startup)(struct irq_data \*data); void (\*irq_shutdown)(struct irq_data \*data); void (\*irq_enable)(struct irq_data \*data); void (\*irq_disable)(struct irq_data \*data); void (\*irq_ack)(struct irq_data \*data); void (\*irq_mask)(struct irq_data \*data); void (\*irq_mask_ack)(struct irq_data \*data); void (\*irq_unmask)(struct irq_data \*data); void (\*irq_eoi)(struct irq_data \*data); int (\*irq_set_affinity)(struct irq_data \*data, const struct cpumask \*dest, bool force); int (\*irq_retrigger)(struct irq_data \*data); int (\*irq_set_type)(struct irq_data \*data, unsigned int flow_type); int (\*irq_set_wake)(struct irq_data \*data, unsigned int on); void (\*irq_bus_lock)(struct irq_data \*data); void (\*irq_bus_sync_unlock)(struct irq_data \*data); void (\*irq_cpu_online)(struct irq_data \*data); void (\*irq_cpu_offline)(struct irq_data \*data); void (\*irq_suspend)(struct irq_data \*data); void (\*irq_resume)(struct irq_data \*data); void (\*irq_pm_shutdown)(struct irq_data \*data); void (\*irq_calc_mask)(struct irq_data \*data); void (\*irq_print_chip)(struct irq_data \*data, struct seq_file \*p); int (\*irq_request_resources)(struct irq_data \*data); void (\*irq_release_resources)(struct irq_data \*data); void (\*irq_compose_msi_msg)(struct irq_data \*data, struct msi_msg \*msg); void (\*irq_write_msi_msg)(struct irq_data \*data, struct msi_msg \*msg); int (\*irq_get_irqchip_state)(struct irq_data \*data, enum irqchip_irq_state which, bool \*state); int (\*irq_set_irqchip_state)(struct irq_data \*data, enum irqchip_irq_state which, bool state); int (\*irq_set_vcpu_affinity)(struct irq_data \*data, void \*vcpu_info); void (\*ipi_send_single)(struct irq_data \*data, unsigned int cpu); void (\*ipi_send_mask)(struct irq_data \*data, const struct cpumask \*dest); unsigned long flags; }; irq_chip 是一串和芯片相关的函数指针,这里定义的非常的全面,基本上和 IRQ 相关的可能出现的操作都全部定义进去了,具体根据不同的芯片,需要在不同的芯片的地方去初始化这个结构,然后这个结构会嵌入到通用的 IRQ 处理软件中去使用,使得软件处理逻辑和芯片逻辑完全的分开。 我们接下来继续前进。 1.5.2 初始化 Chip 相关的 IRQ 众所周知,启动的时候,C 语言从 start_kernel 开始,在这里面,调用了和 machine 相关的 IRQ 的初始化 init_IRQ(): 1.5.2.1 init_IRQ() asmlinkage __visible void __init start_kernel(void) { char \*command_line; char \*after_dashes; ..... early_irq_init(); init_IRQ(); ..... } 1.5.2.1.1 irqchip_init () 在 init_IRQ 中,调用了irqchip_init (): void __init init_IRQ(void) { init_irq_stacks(); irqchip_init(); if (!handle_arch_irq) panic("No interrupt controller found."); } void __init irqchip_init(void) { of_irq_init(__irqchip_of_table); acpi_probe_device_table(irqchip); } __irqchip_of_table就是内核irq chip table的首地址,这个table也就保存了kernel支持的所有的中断控制器的ID信息(用于和device node的匹配)。of_irq_init函数执行之前,系统已经完成了device tree的初始化,因此系统中的所有的设备节点都已经形成了一个树状结构,每个节点代表一个设备的device node。of_irq_init是在所有的device node中寻找中断控制器节点,形成树状结构(系统可以有多个interrupt controller,之所以形成中断控制器的树状结构,是为了让系统中所有的中断控制器驱动按照一定的顺序进行初始化)。之后,从root interrupt controller节点开始,对于每一个interrupt controller的device node,扫描irq chip table,进行匹配,一旦匹配到,就调用该interrupt controller的初始化函数,并把该中断控制器的device node以及parent中断控制器的device node作为参数传递给irq chip driver。。具体的匹配过程的代码属于Device Tree模块的内容,更详细的信息可以参考Device Tree代码分析文档。 1.5.2.1.1.1 of_irq_init() /\*\* \* of_irq_init - Scan and init matching interrupt controllers in DT \* @matches: 0 terminated array of nodes to match and init function to call \* \* This function scans the device tree for matching interrupt controller nodes, \* and calls their initialization functions in order with parents first. \*/ void __init of_irq_init(const struct of_device_id \*matches) { const struct of_device_id \*match; struct device_node \*np, \*parent = NULL; struct of_intc_desc \*desc, \*temp_desc; struct list_head intc_desc_list, intc_parent_list; INIT_LIST_HEAD(\&intc_desc_list); INIT_LIST_HEAD(\&intc_parent_list); for_each_matching_node_and_match(np, matches, \&match) { if (!of_property_read_bool(np, "interrupt-controller") \|\| !of_device_is_available(np)) continue; if (WARN(!match-\>data, "of_irq_init: no init function for %s\\n", match-\>compatible)) continue; /\* \* Here, we allocate and populate an of_intc_desc with the node \* pointer, interrupt-parent device_node etc. \*/ desc = kzalloc(sizeof(\*desc), GFP_KERNEL); if (WARN_ON(!desc)) { of_node_put(np); goto err; } desc-\>irq_init_cb = match-\>data; desc-\>dev = of_node_get(np); desc-\>interrupt_parent = of_irq_find_parent(np); if (desc-\>interrupt_parent == np) desc-\>interrupt_parent = NULL; list_add_tail(\&desc-\>list, \&intc_desc_list); } /\* \* The root irq controller is the one without an interrupt-parent. \* That one goes first, followed by the controllers that reference it, \* followed by the ones that reference the 2nd level controllers, etc. \*/ while (!list_empty(\&intc_desc_list)) { /\* \* Process all controllers with the current 'parent'. \* First pass will be looking for NULL as the parent. \* The assumption is that NULL parent means a root controller. \*/ list_for_each_entry_safe(desc, temp_desc, \&intc_desc_list, list) { int ret; if (desc-\>interrupt_parent != parent) continue; list_del(\&desc-\>list); of_node_set_flag(desc-\>dev, OF_POPULATED); pr_debug("of_irq_init: init %pOF (%p), parent %p\\n", desc-\>dev, desc-\>dev, desc-\>interrupt_parent); ret = desc-\>irq_init_cb(desc-\>dev, desc-\>interrupt_parent); if (ret) { of_node_clear_flag(desc-\>dev, OF_POPULATED); kfree(desc); continue; } /\* \* This one is now set up; add it to the parent list so \* its children can get processed in a subsequent pass. \*/ list_add_tail(\&desc-\>list, \&intc_parent_list); } /\* Get the next pending parent that might have children \*/ desc = list_first_entry_or_null(\&intc_parent_list, typeof(\*desc), list); if (!desc) { pr_err("of_irq_init: children remain, but no parents\\n"); break; } list_del(\&desc-\>list); parent = desc-\>dev; kfree(desc); } list_for_each_entry_safe(desc, temp_desc, \&intc_parent_list, list) { list_del(\&desc-\>list); kfree(desc); } err: list_for_each_entry_safe(desc, temp_desc, \&intc_desc_list, list) { list_del(\&desc-\>list); of_node_put(desc-\>dev); kfree(desc); } } dtb: gic: interrupt-controller@1400000 { compatible = "arm,gic-400"; #interrupt-cells = \<3\>; interrupt-controller; reg = \<0x0 0x1401000 0 0x1000\>, /\* GICD \*/ \<0x0 0x1402000 0 0x2000\>, /\* GICC \*/ \<0x0 0x1404000 0 0x2000\>, /\* GICH \*/ \<0x0 0x1406000 0 0x2000\>; /\* GICV \*/ interrupts = \<1 9 0xf08\>; }; IRQCHIP_DECLARE(gic_400, "arm,gic-400", gic_of_init); IRQCHIP_DECLARE(arm11mp_gic, "arm,arm11mp-gic", gic_of_init); IRQCHIP_DECLARE(arm1176jzf_dc_gic, "arm,arm1176jzf-devchip-gic", gic_of_init); IRQCHIP_DECLARE(cortex_a15_gic, "arm,cortex-a15-gic", gic_of_init); IRQCHIP_DECLARE(cortex_a9_gic, "arm,cortex-a9-gic", gic_of_init); IRQCHIP_DECLARE(cortex_a7_gic, "arm,cortex-a7-gic", gic_of_init); IRQCHIP_DECLARE(msm_8660_qgic, "qcom,msm-8660-qgic", gic_of_init); IRQCHIP_DECLARE(msm_qgic2, "qcom,msm-qgic2", gic_of_init); IRQCHIP_DECLARE(pl390, "arm,pl390", gic_of_init); #define IRQCHIP_DECLARE(name, compat, fn) OF_DECLARE_2(irqchip, name, compat, fn) #define OF_DECLARE_2(table, name, compat, fn) \\ _OF_DECLARE(table, name, compat, fn, of_init_fn_2) #define _OF_DECLARE(table, name, compat, fn, fn_type) \\ static const struct of_device_id __of_table_##name \\ __used __section(__##table##_of_table) \\ = { .compatible = compat, \\ .data = (fn == (fn_type)NULL) ? fn : fn } GIC driver初始化代码分析: 1.5.2.1.1.1.1 gic_of_init() int __init gic_of_init(struct device_node \*node, struct device_node \*parent) { struct gic_chip_data \*gic; int irq, ret; if (WARN_ON(!node)) return -ENODEV; if (WARN_ON(gic_cnt \>= CONFIG_ARM_GIC_MAX_NR)) return -EINVAL; gic = \&gic_data\[gic_cnt\]; ret = gic_of_setup(gic, node); if (ret) return ret; /\* \* Disable split EOI/Deactivate if either HYP is not available \* or the CPU interface is too small. \*/ if (gic_cnt == 0 \&\& !gic_check_eoimode(node, \&gic-\>raw_cpu_base)) static_key_slow_dec(\&supports_deactivate); ret = __gic_init_bases(gic, -1, \&node-\>fwnode); if (ret) { gic_teardown(gic); return ret; } if (!gic_cnt) { gic_init_physaddr(node); gic_of_setup_kvm_info(node); } if (parent) { irq = irq_of_parse_and_map(node, 0); gic_cascade_irq(gic_cnt, irq); } if (IS_ENABLED(CONFIG_ARM_GIC_V2M)) gicv2m_init(\&node-\>fwnode, gic_data\[gic_cnt\].domain); gic_cnt++; return 0; } 1.5.2.1.1.1.1.1 gic_init_bases() __gic_init_bases()-\>gic_init_bases() static int gic_init_bases(struct gic_chip_data \*gic, int irq_start, struct fwnode_handle \*handle) { irq_hw_number_t hwirq_base; int gic_irqs, irq_base, ret; if (IS_ENABLED(CONFIG_GIC_NON_BANKED) \&\& gic-\>percpu_offset) { /\* Frankein-GIC without banked registers... \*/ unsigned int cpu; gic-\>dist_base.percpu_base = alloc_percpu(void __iomem \*); gic-\>cpu_base.percpu_base = alloc_percpu(void __iomem \*); if (WARN_ON(!gic-\>dist_base.percpu_base \|\| !gic-\>cpu_base.percpu_base)) { ret = -ENOMEM; goto error; } for_each_possible_cpu(cpu) { u32 mpidr = cpu_logical_map(cpu); u32 core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); unsigned long offset = gic-\>percpu_offset \* core_id; \*per_cpu_ptr(gic-\>dist_base.percpu_base, cpu) = gic-\>raw_dist_base + offset; \*per_cpu_ptr(gic-\>cpu_base.percpu_base, cpu) = gic-\>raw_cpu_base + offset; } gic_set_base_accessor(gic, gic_get_percpu_base); } else { /\* Normal, sane GIC... \*/ WARN(gic-\>percpu_offset, "GIC_NON_BANKED not enabled, ignoring %08x offset!", gic-\>percpu_offset); gic-\>dist_base.common_base = gic-\>raw_dist_base; gic-\>cpu_base.common_base = gic-\>raw_cpu_base; gic_set_base_accessor(gic, gic_get_common_base); } /\* \* Find out how many interrupts are supported. \* The GIC only supports up to 1020 interrupt sources. \*/ gic_irqs = readl_relaxed(gic_data_dist_base(gic) + GIC_DIST_CTR) \& 0x1f; gic_irqs = (gic_irqs + 1) \* 32; if (gic_irqs \> 1020) gic_irqs = 1020; gic-\>gic_irqs = gic_irqs; if (handle) { /\* DT/ACPI \*/ gic-\>domain = irq_domain_create_linear(handle, gic_irqs, \&gic_irq_domain_hierarchy_ops, gic); } else { /\* Legacy support \*/ /\* \* For primary GICs, skip over SGIs. \* For secondary GICs, skip over PPIs, too. \*/ if (gic == \&gic_data\[0\] \&\& (irq_start \& 31) \> 0) { hwirq_base = 16; if (irq_start != -1) irq_start = (irq_start \& \~31) + 16; } else { hwirq_base = 32; } gic_irqs -= hwirq_base; /\* calculate # of irqs to allocate \*/ irq_base = irq_alloc_descs(irq_start, 16, gic_irqs, numa_node_id()); if (irq_base \< 0) { WARN(1, "Cannot allocate irq_descs @ IRQ%d, assuming pre-allocated\\n", irq_start); irq_base = irq_start; } gic-\>domain = irq_domain_add_legacy(NULL, gic_irqs, irq_base, hwirq_base, \&gic_irq_domain_ops, gic); } if (WARN_ON(!gic-\>domain)) { ret = -ENODEV; goto error; } gic_dist_init(gic); ret = gic_cpu_init(gic); if (ret) goto error; ret = gic_pm_init(gic); if (ret) goto error; return 0; error: if (IS_ENABLED(CONFIG_GIC_NON_BANKED) \&\& gic-\>percpu_offset) { free_percpu(gic-\>dist_base.percpu_base); free_percpu(gic-\>cpu_base.percpu_base); } return ret; } 这段代码主要是向系统中注册一个irq domain的数据结构。为何需要struct irq_domain这样一个数据结构呢?从linux kernel的角度来看,任何外部的设备的中断都是一个异步事件,kernel都需要识别这个事件。在内核中,用IRQ number来标识某一个设备的某个interrupt request。有了IRQ number就可以定位到该中断的描述符(struct irq_desc)。但是,对于中断控制器而言,它不并知道IRQ number,它只是知道HW interrupt number(中断控制器会为其支持的interrupt source进行编码,这个编码被称为Hardware interrupt number )。不同的软件模块用不同的ID来识别interrupt source,这样就需要映射了。如何将Hardware interrupt number 映射到IRQ number呢?这需要一个translation object,内核定义为struct irq_domain。 每个interrupt controller都会形成一个irq domain,负责解析其下游的interrut source。如果interrupt controller有级联的情况,那么一个非root interrupt controller的中断控制器也是其parent irq domain的一个普通的interrupt source。struct irq_domain定义如下: struct irq_domain { ...... const struct irq_domain_ops \*ops; void \*host_data; ...... }; 在注册GIC的irq domain的时候还有一个重要的数据结构gic_irq_domain_ops,其类型是struct irq_domain_ops ,对于GIC,其irq domain的操作函数是gic_irq_domain_ops,定义如下: static const struct irq_domain_ops gic_irq_domain_ops = { .map = gic_irq_domain_map, .unmap = gic_irq_domain_unmap, }; irq domain的概念是一个通用中断子系统的概念, irq domain相关callback函数分析: gic_irq_domain_map函数:创建IRQ number和GIC hw interrupt ID之间映射关系的时候,需要调用该回调函数。具体代码如下: static int gic_irq_domain_map(struct irq_domain \*d, unsigned int irq, irq_hw_number_t hw) { struct gic_chip_data \*gic = d-\>host_data; if (hw \< 32) { irq_set_percpu_devid(irq); irq_domain_set_info(d, irq, hw, \&gic-\>chip, d-\>host_data, handle_percpu_devid_irq, NULL, NULL); irq_set_status_flags(irq, IRQ_NOAUTOEN); } else { irq_domain_set_info(d, irq, hw, \&gic-\>chip, d-\>host_data, handle_fasteoi_irq, NULL, NULL); irq_set_probe(irq); irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(irq))); } return 0; } 由此,这里就找到了desc-\>handle_irq(desc) 函数被设置为handle_percpu_devid_irq或者handle_fasteoi_irq,以handle_percpu_devid_irq为例: /\*\* \* handle_percpu_devid_irq - Per CPU local irq handler with per cpu dev ids \* @desc: the interrupt description structure for this irq \* \* Per CPU interrupts on SMP machines without locking requirements. Same as \* handle_percpu_irq() above but with the following extras: \* \* action-\>percpu_dev_id is a pointer to percpu variables which \* contain the real device id for the cpu on which this handler is \* called \*/ void handle_percpu_devid_irq(struct irq_desc \*desc) { struct irq_chip \*chip = irq_desc_get_chip(desc); struct irqaction \*action = desc-\>action; unsigned int irq = irq_desc_get_irq(desc); irqreturn_t res; kstat_incr_irqs_this_cpu(desc); if (chip-\>irq_ack) chip-\>irq_ack(\&desc-\>irq_data); if (likely(action)) { trace_irq_handler_entry(irq, action); res = action-\>handler(irq, raw_cpu_ptr(action-\>percpu_dev_id)); trace_irq_handler_exit(irq, action, res); } else { unsigned int cpu = smp_processor_id(); bool enabled = cpumask_test_cpu(cpu, desc-\>percpu_enabled); if (enabled) irq_percpu_disable(desc, cpu); pr_err_once("Spurious%s percpu IRQ%u on CPU%u\\n", enabled ? " and unmasked" : "", irq, cpu); } if (chip-\>irq_eoi) chip-\>irq_eoi(\&desc-\>irq_data); } 最终就调用了我们注册进去的服务程序。