中断的概念
中断是指在CPU正常运行期间,由于内外部事件或由程序预先安排的事件引起的 CPU 暂时停止正在运行的程序,转而为该内部或外部事件或预先安排的事件服务的程序中去,服务完毕后再返回去继续运行被暂时中断的程序。Linux中通常分为外部中断(又叫硬件中断)和内部中断(又叫异常)。
软件对硬件进行配置后,软件期望等待硬件的某种状态(比如,收到了数据),这里有两种方式,一种是轮询(polling): CPU 不断的去读硬件状态。另一种是当硬件完成某种事件后,给 CPU 一个中断,让 CPU 停下手上的事情,去处理这个中断。很显然,中断的交互方式提高了系统的吞吐。
当 CPU 收到一个中断 (IRQ)的时候,会去执行该中断对应的处理函数(ISR)。普通情况下,会有一个中断向量表,向量表中定义了 CPU 对应的每一个外设资源的中断处理程序的入口,当发生对应的中断的时候, CPU 直接跳转到这个入口执行程序。也就是中断上下文。(注意:中断上下文中,不可阻塞睡眠)。
Linux 中断 top/bottom
玩过 MCU 的人都知道,中断服务程序的设计最好是快速完成任务并退出,因为此刻系统处于被中断中。但是在 ISR 中又有一些必须完成的事情,比如:清中断标志,读/写数据,寄存器操作等。
在 Linux 中,同样也是这个要求,希望尽快的完成 ISR。但事与愿违,有些 ISR 中任务繁重,会消耗很多时间,导致响应速度变差。Linux 中针对这种情况,将中断分为了两部分:
-
上半部(top half):收到一个中断,立即执行,有严格的时间限制,只做一些必要的工作,比如:应答,复位等。这些工作都是在所有中断被禁止的情况下完成的。
-
底半部(bottom half):能够被推迟到后面完成的任务会在底半部进行。在适合的时机,下半部会被开中断执行。(具体的机制在接下来章节分析(软中断、tasklet、工作队列))。
中断处理程序
驱动程序可以使用接口:
static inline int __must_check request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,const char *name, void *dev)
像系统申请注册一个中断处理程序。其中的参数:
参数 含义
irq 表了该中断的中断号,一般 CPU 的中断号都会事先定义好。
handler 中断发生后的 ISR
flags 中断标志( IRQF_DISABLED / IRQFSAMPLE_RANDOM / IRQF_TIMER / IRQF_SHARED)
name 中断相关的设备 ASCII 文本,例如 "keyboard",这些名字会在 /proc/irq 和 /proc/interrupts 文件使用
dev 用于共享中断线,传递驱动程序的设备结构。非共享类型的中断,直接设置成为 NULL
中断标志 flag 的含义:
标志 含义
IRQF_DISABLED 设置这个标志的话,意味着内核在处理这个 ISR 期间,要禁止其他中断(多数情况不使用这个)
IRQFSAMPLE_RANDOM 表明这个设备产生的中断对内核熵池有贡献
IRQF_TIMER 为系统定时器准备的标志
IRQF_SHARED 表明多个中断处理程序之间共享中断线。同一个给定的线上注册每个处理程序,必须设置这个
调用 request_irq 成功执行返回 0。常见错误是 -EBUSY,表示给定的中断线已经在使用(或者没有指定 IRQF_SHARED)
注意:request_irq 函数可能引起睡眠,所以不允许在中断上下文或者不允许睡眠的代码中调用。
释放中断:
const void *free_irq(unsigned int irq, void *dev_id) //用于释放中断处理函数。
注意:Linux 中的中断处理程序是无须重入的。当给定的中断处理程序正在执行的时候,其中断线在所有的处理器上都会被屏蔽掉,以防在同一个中断线上又接收到另一个新的中断。通常情况下,除了该中断的其他中断都是打开的,也就是说其他的中断线上的重点都能够被处理,但是当前的中断线总是被禁止的,故,同一个中断处理程序是绝对不会被自己嵌套的,另外ARM上也不支持中断优先级,也就是没有使用FIQ,因此ARM不支持中断嵌套。
中断上下文
与进程上下文不一样,内核执行中断服务程序的时候,处于中断上下文。中断处理程序并没有自己的独立的栈,而是使用了内核栈,其大小一般是有限制的(32bit 机器 8KB)。所以其必须短小精悍。同时中断服务程序是打断了正常的程序流程,这一点上也必须保证快速的执行。同时中断上下文中是不允许睡眠,阻塞的。
中断上下文不能睡眠的原因是:
1、 中断处理的时候,不应该发生进程切换,因为在中断context中,唯一能打断当前中断handler的只有更高优先级的中断,它不会被进程打断,如果在中断context中休眠,则没有办法唤醒它,因为所有的wake_up_xxx都是针对某个进程而言的,而在中断context中,没有进程的概念,没有一个task_struct(这点对于softirq和tasklet一样),因此真的休眠了,比如调用了会导致block的例程,内核几乎肯定会死。
2、schedule()在切换进程时,保存当前的进程上下文(CPU寄存器的值、进程的状态以及堆栈中的内容),以便以后恢复此进程运行。中断发生后,内核会先保存当前被中断的进程上下文(在调用中断处理程序后恢复);但在中断处理程序里,CPU寄存器的值肯定已经变化了吧(最重要的程序计数器PC、堆栈SP等),如果此时因为睡眠或阻塞操作调用了schedule(),则保存的进程上下文就不是当前的进程context了.所以不可以在中断处理程序中调用schedule()。
3、内核中schedule()函数本身在进来的时候判断是否处于中断上下文:
if(unlikely(in_interrupt()))
BUG();
因此,强行调用schedule()的结果就是内核BUG。
4、中断handler会使用被中断的进程内核堆栈,但不会对它有任何影响,因为handler使用完后会完全清除它使用的那部分堆栈,恢复被中断前的原貌。
5、处于中断context时候,内核是不可抢占的。因此,如果休眠,则内核一定挂起
中断处理流程
发生中断时,CPU执行异常向量vector_irq的代码, 即异常向量表中的中断异常的代码,它是一个跳转指令,跳去执行真正的中断处理程序,在vector_irq里面,最终会调用中断处理的总入口函数。
对于 ARM64 处理器的异常级别 1、 2 和 3,每个异常级别都有自己的异常向量表,异常向量表的起始虚拟地址存放在寄存器 VBAR_ELn(向量基准地址寄存器, Vector Based Address Register)中。每个异常向量表有 16 项,分为 4 组,每组 4 项,每项的长度是 128 字节(可以存放32 条指令)。异常级别 n 的异常向量表所示。
异常级别 n 的异常向量表
地址 异常类型 说明
VBAR_ELn + 0x000 同步异常 当前异常级别生成的异常,使用异常
级别0的栈指针寄存器SP_EL0
-
0x080 中断
-
0x100 快速中断
-
0x180 系统错误
-
0x200 同步异常 当前异常级别生成的异常,使用当前
异常级别的栈指针寄存器SP_ELn
-
0x280 中断
-
0x300 快速中断
-
0x380 系统错误
-
0x400 同步异常 64位应用程序在异常级别( n-1)生
成的异常
-
0x480 中断
-
0x500 快速中断
-
0x580 系统错误
-
0x600 同步异常 32位应用程序在异常级别( n-1)生
成的异常
-
0x680 中断
-
0x700 快速中断
-
0x780 系统错误
ARM64 架构内核定义的异常向量表如下:
这部分内容在《Linux应用层和内核交互》中系统调用章节讲过,这里只列出与中断有关的内容;
arch/arm64/kernel/entry.S:
/*
* Exception vectors.
*/
.pushsection ".entry.text", "ax"
.align 11
ENTRY(vectors)
kernel_ventry 1, sync_invalid //异常级别1生成的同步异常,使用栈指针寄存器SP_EL0
kernel_ventry 1, irq_invalid //异常级别1生成的中断,使用栈指针寄存器SP_EL0
kernel_ventry 1, fiq_invalid //异常级别1生成的快速中断,使用栈指针寄存器SP_EL0
kernel_ventry 1, error_invalid //异常级别1生成的系统错误,使用栈指针寄存器SP_EL0
kernel_ventry 1, sync //异常级别1生成的同步异常,使用栈指针寄存器SP_EL1
kernel_ventry 1, irq //异常级别1生成的中断,使用栈指针寄存器SP_EL1
kernel_ventry 1, fiq_invalid //异常级别1生成的快速中断,使用栈指针寄存器SP_EL1
kernel_ventry 1, error_invalid //异常级别1生成的系统错误,使用栈指针寄存器SP_EL1
kernel_ventry 0, sync //64位应用程序在异常级别0生成的同步异常
kernel_ventry 0, irq // 64位应用程序在异常级别0生成的中断
kernel_ventry 0, fiq_invalid // 64位应用程序在异常级别0生成的快速中断
kernel_ventry 0, error_invalid //64位应用程序在异常级别0生成的系统错误
#ifdef CONFIG_COMPAT
kernel_ventry 0, sync_compat, 32 //32位应用程序在异常级别0生成的同步异常
kernel_ventry 0, irq_compat, 32 // 32位应用程序在异常级别0生成的中断
kernel_ventry 0, fiq_invalid_compat, 32 // 32位应用程序在异常级别0生成的快速中断
kernel_ventry 0, error_invalid_compat, 32 // 32位应用程序在异常级别0生成的系统错误
#else
kernel_ventry 0, sync_invalid, 32 //32位应用程序在异常级别0生成的同步异常
kernel_ventry 0, irq_invalid, 32 // 32位应用程序在异常级别0生成的中断
kernel_ventry 0, fiq_invalid, 32 // 32位应用程序在异常级别0生成的快速中断
kernel_ventry 0, error_invalid, 32 // 32位应用程序在异常级别0生成的系统错误
#endif
END(vectors)
kernel_ventry是一个宏,参数是跳转标号,即异常处理程序的标号,宏的定义如下(/arch/arm64/kernel/entry.S):
.macro kernel_ventry, el, label, regsize = 64
.align 7
sub sp, sp, #S_FRAME_SIZE // 将sp预留一个fram_size, 这个size 就是struct pt_regs的大小
#ifdef CONFIG_VMAP_STACK
....这里省略掉检查栈溢出的代码
#endif
b el\()\el\()_\label // 跳转到对应级别的异常处理函数, kernel_entry 1, irq为el1_irq
.endm
" .align 7"表示把下一条指令的地址对齐到 2^7,即对齐到 128; 对于向量表vectors中的kernel_ventry 1, irq , 则 b el\()\el\()_\label跳转到el1_irq函数。 其中1表示的是从哪个异常模式产生的,比如是User->kernel就是0, kernel->kernel就是1.
每个CPU 在初始化是,都会设置中断向量地址。
arch/arm64/kernel/head.S
__primary_switched:
adrp x4, init_thread_union
add sp, x4, #THREAD_SIZE
adr_l x5, init_task
msr sp_el0, x5 // Save thread_info
adr_l x8, vectors // load VBAR_EL1 with virtual
msr vbar_el1, x8 // vector table address
isb
stp xzr, x30, [sp, #-16]!
mov x29, sp
str_l x21, __fdt_pointer, x5 // Save FDT pointer
ldr_l x4, kimage_vaddr // Save the offset between
sub x4, x4, x0 // the kernel virtual and
str_l x4, kimage_voffset, x5 // physical mappings
// Clear BSS
adr_l x0, __bss_start
mov x1, xzr
adr_l x2, __bss_stop
sub x2, x2, x0
bl __pi_memset
dsb ishst // Make zero page visible to PTW
#ifdef CONFIG_KASAN
bl kasan_early_init
#endif
#ifdef CONFIG_RANDOMIZE_BASE
tst x23, ~(MIN_KIMG_ALIGN - 1) // already running randomized?
b.ne 0f
mov x0, x21 // pass FDT address in x0
bl kaslr_early_init // parse FDT for KASLR options
cbz x0, 0f // KASLR disabled? just proceed
orr x23, x23, x0 // record KASLR offset
ldp x29, x30, [sp], #16 // we must enable KASLR, return
ret // to __primary_switch()
0:
#endif
add sp, sp, #16
mov x29, #0
mov x30, #0
b start_kernel
ENDPROC(__primary_switched)
__secondary_switched:
adr_l x5, vectors //设置中断向量地址
msr vbar_el1, x5
isb
adr_l x0, secondary_data
ldr x1, [x0, #CPU_BOOT_STACK] // get secondary_data.stack
mov sp, x1
ldr x2, [x0, #CPU_BOOT_TASK]
msr sp_el0, x2
mov x29, #0
mov x30, #0
b secondary_start_kernel
ENDPROC(__secondary_switched)
有中断产生时, GIC会向相应的CPU发出中断信号,CPU检测到中断信号,根据中断向量表,跳转到el1_irq。
arch/arm64/kernel/entry.S
el1_irq:
kernel_entry 1
enable_dbg
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
#endif
irq_handler
#ifdef CONFIG_PREEMPT
get_thread_info tsk
ldr w24, [tsk, #TI_PREEMPT] // get preempt count
cbnz w24, 1f // preempt count != 0
ldr x0, [tsk, #TI_FLAGS] // get flags
tbz x0, #TIF_NEED_RESCHED, 1f // needs rescheduling?
bl el1_preempt
1:
#endif
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_on
#endif
kernel_exit 1
ENDPROC(el1_irq)
/*
* Interrupt handling.
*/
.macro irq_handler
#ifdef CONFIG_STRICT_MEMORY_RWX
ldr x1, =handle_arch_irq
ldr x1, [x1]
#else
ldr x1, handle_arch_irq
#endif
mov x0, sp
blr x1
.endm
.text
arch/arm64/kernel/irq.c
void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
{
if (handle_arch_irq)
return;
handle_arch_irq = handle_irq;
}
Gicv2中断控制器初始化时会调用set_handle_irq(gic_handle_irq);
dtb:
gic: interrupt-controller@1400000 {
compatible = "arm,gic-400";
#interrupt-cells = <3>;
interrupt-controller;
reg = <0x0 0x1401000 0 0x1000>, /* GICD */
<0x0 0x1402000 0 0x2000>, /* GICC */
<0x0 0x1404000 0 0x2000>, /* GICH */
<0x0 0x1406000 0 0x2000>; /* GICV */
interrupts = <1 9 0xf08>;
};
IRQCHIP_DECLARE(gic_400, "arm,gic-400", gic_of_init);
设置代码路径:gic_of_init()->__gic_init_bases()->set_handle_irq(gic_handle_irq);
static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
{
u32 irqstat, irqnr;
struct gic_chip_data *gic = &gic_data[0];
void __iomem *cpu_base = gic_data_cpu_base(gic);
do {
irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK);
irqnr = irqstat & GICC_IAR_INT_ID_MASK;
if (likely(irqnr > 15 && irqnr < 1020)) {
if (static_key_true(&supports_deactivate))
writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
isb();
handle_domain_irq(gic->domain, irqnr, regs); //调用相应的中断处理函数
continue;
}
if (irqnr < 16) {
writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
if (static_key_true(&supports_deactivate))
writel_relaxed(irqstat, cpu_base + GIC_CPU_DEACTIVATE);
#ifdef CONFIG_SMP
/*
* Ensure any shared data written by the CPU sending
* the IPI is read after we've read the ACK register
* on the GIC.
*
* Pairs with the write barrier in gic_raise_softirq
*/
smp_rmb();
handle_IPI(irqnr, regs); //SMP 核间中断
#endif
continue;
}
break;
} while (1);
}
gic_handle_irq()->handle_domain_irq()->__handle_domain_irq()
static inline int handle_domain_irq(struct irq_domain *domain,
unsigned int hwirq, struct pt_regs *regs)
{
return __handle_domain_irq(domain, hwirq, true, regs);
}
/**
* __handle_domain_irq - Invoke the handler for a HW irq belonging to a domain
* @domain: The domain where to perform the lookup
* @hwirq: The HW irq number to convert to a logical one
* @lookup: Whether to perform the domain lookup or not
* @regs: Register file coming from the low-level handling code
*
* Returns: 0 on success, or -EINVAL if conversion has failed
*/
int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq,
bool lookup, struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
unsigned int irq = hwirq;
int ret = 0;
irq_enter();
#ifdef CONFIG_IRQ_DOMAIN
if (lookup)
irq = irq_find_mapping(domain, hwirq);
#endif
/*
* Some hardware gives randomly wrong interrupts. Rather
* than crashing, do something sensible.
*/
if (unlikely(!irq || irq >= nr_irqs)) {
ack_bad_irq(irq);
ret = -EINVAL;
} else {
generic_handle_irq(irq);
}
irq_exit();
set_irq_regs(old_regs);
return ret;
}
这里请注意:
先调用了 irq_enter 标记进入了硬件中断:
irq_enter是更新一些系统的统计信息,同时在__irq_enter宏中禁止了进程的抢占。虽然在产生IRQ时,ARM会自动把CPSR中的I位置位,禁止新的IRQ请求,直到中断控制转到相应的流控层后才通过local_irq_enable()打开。那为何还要禁止抢占?这是因为要考虑中断嵌套的问题,一旦流控层或驱动程序主动通过local_irq_enable打开了IRQ,而此时该中断还没处理完成,新的irq请求到达,这时代码会再次进入irq_enter,在本次嵌套中断返回时,内核不希望进行抢占调度,而是要等到最外层的中断处理完成后才做出调度动作,所以才有了禁止抢占这一处理
再调用 generic_handle_irq()最后调用 irq_exit 删除进入硬件中断的标记。
gic_handle_irq()->handle_domain_irq()->__handle_domain_irq()->generic_handle_irq()
/**
* generic_handle_irq - Invoke the handler for a particular irq
* @irq: The irq number to handle
*
*/
int generic_handle_irq(unsigned int irq)
{
struct irq_desc *desc = irq_to_desc(irq);
if (!desc)
return -EINVAL;
generic_handle_irq_desc(desc);
return 0;
}
首先在函数 irq_to_desc 中根据发生中断的中断号,去取出它的 irq_desc 中断描述结构,然后调用 generic_handle_irq_desc:
gic_handle_irq()->handle_domain_irq()->__handle_domain_irq()->generic_handle_irq()->generic_handle_irq_desc()
/*
* Architectures call this to let the generic IRQ layer
* handle an interrupt.
*/
static inline void generic_handle_irq_desc(struct irq_desc *desc)
{
desc->handle_irq(desc);
}
这里调用了 handle_irq 函数。所以,在上述流程中,还需要分析 irq_to_desc 流程:
struct irq_desc *irq_to_desc(unsigned int irq)
{
return (irq < NR_IRQS) ? irq_desc + irq : NULL;
}
NR_IRQS 是支持的总的中断个数,当然,irq 不能够大于这个数目。所以返回 irq_desc + irq。
irq_desc 是一个全局的数组:
struct irq_desc irq_desc[NR_IRQS] __cacheline_aligned_in_smp = {
[0 ... NR_IRQS-1] = {
.handle_irq = handle_bad_irq,
.depth = 1,
.lock = __RAW_SPIN_LOCK_UNLOCKED(irq_desc->lock),
}
};
这里是这个数组的初始化的地方。所有的 handle_irq 函数都被初始化成为了 handle_bad_irq。
细心的观众可能发现了,调用这个 desc->handle_irq(desc) 函数,并不是咱们注册进去的中断处理函数啊,因为两个函数的原型定义都不一样。这个 handle_irq 是 irq_flow_handler_t 类型,而我们注册进去的服务程序是 irq_handler_t,这两个明显不是同一个东西,所以这里我们还需要继续分析。
1.5.1 中断相关的数据结构
Linux 中断相关的数据结构有 3 个
结构名称 作用
irq_desc IRQ 的软件层面上的资源描述
irqaction IRQ 的通用操作
irq_chip 对应每个芯片的具体实现
1.5.1.1 struct irq_desc
irq_desc 结构如下:
/**
* struct irq_desc - interrupt descriptor
* @irq_common_data: per irq and chip data passed down to chip functions
* @kstat_irqs: irq stats per cpu
* @handle_irq: highlevel irq-events handler
* @preflow_handler: handler called before the flow handler (currently used by sparc)
* @action: the irq action chain
* @status: status information
* @core_internal_state__do_not_mess_with_it: core internal status information
* @depth: disable-depth, for nested irq_disable() calls
* @wake_depth: enable depth, for multiple irq_set_irq_wake() callers
* @irq_count: stats field to detect stalled irqs
* @last_unhandled: aging timer for unhandled count
* @irqs_unhandled: stats field for spurious unhandled interrupts
* @threads_handled: stats field for deferred spurious detection of threaded handlers
* @threads_handled_last: comparator field for deferred spurious detection of theraded handlers
* @lock: locking for SMP
* @affinity_hint: hint to user space for preferred irq affinity
* @affinity_notify: context for notification of affinity changes
* @pending_mask: pending rebalanced interrupts
* @threads_oneshot: bitfield to handle shared oneshot threads
* @threads_active: number of irqaction threads currently running
* @wait_for_threads: wait queue for sync_irq to wait for threaded handlers
* @nr_actions: number of installed actions on this descriptor
* @no_suspend_depth: number of irqactions on a irq descriptor with
* IRQF_NO_SUSPEND set
* @force_resume_depth: number of irqactions on a irq descriptor with
* IRQF_FORCE_RESUME set
* @rcu: rcu head for delayed free
* @kobj: kobject used to represent this struct in sysfs
* @request_mutex: mutex to protect request/free before locking desc->lock
* @dir: /proc/irq/ procfs entry
* @debugfs_file: dentry for the debugfs file
* @name: flow handler name for /proc/interrupts output
*/
struct irq_desc {
struct irq_common_data irq_common_data;
struct irq_data irq_data;
unsigned int __percpu *kstat_irqs;
irq_flow_handler_t handle_irq;
#ifdef CONFIG_IRQ_PREFLOW_FASTEOI
irq_preflow_handler_t preflow_handler;
#endif
struct irqaction *action; /* IRQ action list */
unsigned int status_use_accessors;
unsigned int core_internal_state__do_not_mess_with_it;
unsigned int depth; /* nested irq disables */
unsigned int wake_depth; /* nested wake enables */
unsigned int irq_count; /* For detecting broken IRQs */
unsigned long last_unhandled; /* Aging timer for unhandled count */
unsigned int irqs_unhandled;
atomic_t threads_handled;
int threads_handled_last;
raw_spinlock_t lock;
struct cpumask *percpu_enabled;
const struct cpumask *percpu_affinity;
#ifdef CONFIG_SMP
const struct cpumask *affinity_hint;
struct irq_affinity_notify *affinity_notify;
#ifdef CONFIG_GENERIC_PENDING_IRQ
cpumask_var_t pending_mask;
#endif
#endif
unsigned long threads_oneshot;
atomic_t threads_active;
wait_queue_head_t wait_for_threads;
#ifdef CONFIG_PM_SLEEP
unsigned int nr_actions;
unsigned int no_suspend_depth;
unsigned int cond_suspend_depth;
unsigned int force_resume_depth;
#endif
#ifdef CONFIG_PROC_FS
struct proc_dir_entry *dir;
#endif
#ifdef CONFIG_GENERIC_IRQ_DEBUGFS
struct dentry *debugfs_file;
#endif
#ifdef CONFIG_SPARSE_IRQ
struct rcu_head rcu;
struct kobject kobj;
#endif
struct mutex request_mutex;
int parent_irq;
struct module *owner;
const char *name;
} ____cacheline_internodealigned_in_smp;
1.5.1.2 struct irqaction
irqaction 结构如下:
/**
* struct irqaction - per interrupt action descriptor
* @handler: interrupt handler function
* @name: name of the device
* @dev_id: cookie to identify the device
* @percpu_dev_id: cookie to identify the device
* @next: pointer to the next irqaction for shared interrupts
* @irq: interrupt number
* @flags: flags (see IRQF_* above)
* @thread_fn: interrupt handler function for threaded interrupts
* @thread: thread pointer for threaded interrupts
* @secondary: pointer to secondary irqaction (force threading)
* @thread_flags: flags related to @thread
* @thread_mask: bitmask for keeping track of @thread activity
* @dir: pointer to the proc/irq/NN/name entry
*/
struct irqaction {
irq_handler_t handler;
void *dev_id;
void __percpu *percpu_dev_id;
struct irqaction *next;
irq_handler_t thread_fn;
struct task_struct *thread;
struct irqaction *secondary;
unsigned int irq;
unsigned int flags;
unsigned long thread_flags;
unsigned long thread_mask;
const char *name;
struct proc_dir_entry *dir;
} ____cacheline_internodealigned_in_smp;
1.5.1.3 struct irq_chip
irq_chip 描述如下:
/**
* struct irq_chip - hardware interrupt chip descriptor
*
* @parent_device: pointer to parent device for irqchip
* @name: name for /proc/interrupts
* @irq_startup: start up the interrupt (defaults to ->enable if NULL)
* @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL)
* @irq_enable: enable the interrupt (defaults to chip->unmask if NULL)
* @irq_disable: disable the interrupt
* @irq_ack: start of a new interrupt
* @irq_mask: mask an interrupt source
* @irq_mask_ack: ack and mask an interrupt source
* @irq_unmask: unmask an interrupt source
* @irq_eoi: end of interrupt
* @irq_set_affinity: Set the CPU affinity on SMP machines. If the force
* argument is true, it tells the driver to
* unconditionally apply the affinity setting. Sanity
* checks against the supplied affinity mask are not
* required. This is used for CPU hotplug where the
* target CPU is not yet set in the cpu_online_mask.
* @irq_retrigger: resend an IRQ to the CPU
* @irq_set_type: set the flow type (IRQ_TYPE_LEVEL/etc.) of an IRQ
* @irq_set_wake: enable/disable power-management wake-on of an IRQ
* @irq_bus_lock: function to lock access to slow bus (i2c) chips
* @irq_bus_sync_unlock:function to sync and unlock slow bus (i2c) chips
* @irq_cpu_online: configure an interrupt source for a secondary CPU
* @irq_cpu_offline: un-configure an interrupt source for a secondary CPU
* @irq_suspend: function called from core code on suspend once per
* chip, when one or more interrupts are installed
* @irq_resume: function called from core code on resume once per chip,
* when one ore more interrupts are installed
* @irq_pm_shutdown: function called from core code on shutdown once per chip
* @irq_calc_mask: Optional function to set irq_data.mask for special cases
* @irq_print_chip: optional to print special chip info in show_interrupts
* @irq_request_resources: optional to request resources before calling
* any other callback related to this irq
* @irq_release_resources: optional to release resources acquired with
* irq_request_resources
* @irq_compose_msi_msg: optional to compose message content for MSI
* @irq_write_msi_msg: optional to write message content for MSI
* @irq_get_irqchip_state: return the internal state of an interrupt
* @irq_set_irqchip_state: set the internal state of a interrupt
* @irq_set_vcpu_affinity: optional to target a vCPU in a virtual machine
* @ipi_send_single: send a single IPI to destination cpus
* @ipi_send_mask: send an IPI to destination cpus in cpumask
* @flags: chip specific flags
*/
struct irq_chip {
struct device *parent_device;
const char *name;
unsigned int (*irq_startup)(struct irq_data *data);
void (*irq_shutdown)(struct irq_data *data);
void (*irq_enable)(struct irq_data *data);
void (*irq_disable)(struct irq_data *data);
void (*irq_ack)(struct irq_data *data);
void (*irq_mask)(struct irq_data *data);
void (*irq_mask_ack)(struct irq_data *data);
void (*irq_unmask)(struct irq_data *data);
void (*irq_eoi)(struct irq_data *data);
int (*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force);
int (*irq_retrigger)(struct irq_data *data);
int (*irq_set_type)(struct irq_data *data, unsigned int flow_type);
int (*irq_set_wake)(struct irq_data *data, unsigned int on);
void (*irq_bus_lock)(struct irq_data *data);
void (*irq_bus_sync_unlock)(struct irq_data *data);
void (*irq_cpu_online)(struct irq_data *data);
void (*irq_cpu_offline)(struct irq_data *data);
void (*irq_suspend)(struct irq_data *data);
void (*irq_resume)(struct irq_data *data);
void (*irq_pm_shutdown)(struct irq_data *data);
void (*irq_calc_mask)(struct irq_data *data);
void (*irq_print_chip)(struct irq_data *data, struct seq_file *p);
int (*irq_request_resources)(struct irq_data *data);
void (*irq_release_resources)(struct irq_data *data);
void (*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg);
void (*irq_write_msi_msg)(struct irq_data *data, struct msi_msg *msg);
int (*irq_get_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool *state);
int (*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state);
int (*irq_set_vcpu_affinity)(struct irq_data *data, void *vcpu_info);
void (*ipi_send_single)(struct irq_data *data, unsigned int cpu);
void (*ipi_send_mask)(struct irq_data *data, const struct cpumask *dest);
unsigned long flags;
};
irq_chip 是一串和芯片相关的函数指针,这里定义的非常的全面,基本上和 IRQ 相关的可能出现的操作都全部定义进去了,具体根据不同的芯片,需要在不同的芯片的地方去初始化这个结构,然后这个结构会嵌入到通用的 IRQ 处理软件中去使用,使得软件处理逻辑和芯片逻辑完全的分开。
我们接下来继续前进。
1.5.2 初始化 Chip 相关的 IRQ
众所周知,启动的时候,C 语言从 start_kernel 开始,在这里面,调用了和 machine 相关的 IRQ 的初始化 init_IRQ():
1.5.2.1 init_IRQ()
asmlinkage __visible void __init start_kernel(void)
{
char *command_line;
char *after_dashes;
.....
early_irq_init();
init_IRQ();
.....
}
1.5.2.1.1 irqchip_init ()
在 init_IRQ 中,调用了irqchip_init ():
void __init init_IRQ(void)
{
init_irq_stacks();
irqchip_init();
if (!handle_arch_irq)
panic("No interrupt controller found.");
}
void __init irqchip_init(void)
{
of_irq_init(__irqchip_of_table);
acpi_probe_device_table(irqchip);
}
__irqchip_of_table就是内核irq chip table的首地址,这个table也就保存了kernel支持的所有的中断控制器的ID信息(用于和device node的匹配)。of_irq_init函数执行之前,系统已经完成了device tree的初始化,因此系统中的所有的设备节点都已经形成了一个树状结构,每个节点代表一个设备的device node。of_irq_init是在所有的device node中寻找中断控制器节点,形成树状结构(系统可以有多个interrupt controller,之所以形成中断控制器的树状结构,是为了让系统中所有的中断控制器驱动按照一定的顺序进行初始化)。之后,从root interrupt controller节点开始,对于每一个interrupt controller的device node,扫描irq chip table,进行匹配,一旦匹配到,就调用该interrupt controller的初始化函数,并把该中断控制器的device node以及parent中断控制器的device node作为参数传递给irq chip driver。。具体的匹配过程的代码属于Device Tree模块的内容,更详细的信息可以参考Device Tree代码分析文档。
1.5.2.1.1.1 of_irq_init()
/**
* of_irq_init - Scan and init matching interrupt controllers in DT
* @matches: 0 terminated array of nodes to match and init function to call
*
* This function scans the device tree for matching interrupt controller nodes,
* and calls their initialization functions in order with parents first.
*/
void __init of_irq_init(const struct of_device_id *matches)
{
const struct of_device_id *match;
struct device_node *np, *parent = NULL;
struct of_intc_desc *desc, *temp_desc;
struct list_head intc_desc_list, intc_parent_list;
INIT_LIST_HEAD(&intc_desc_list);
INIT_LIST_HEAD(&intc_parent_list);
for_each_matching_node_and_match(np, matches, &match) {
if (!of_property_read_bool(np, "interrupt-controller") ||
!of_device_is_available(np))
continue;
if (WARN(!match->data, "of_irq_init: no init function for %s\n",
match->compatible))
continue;
/*
* Here, we allocate and populate an of_intc_desc with the node
* pointer, interrupt-parent device_node etc.
*/
desc = kzalloc(sizeof(*desc), GFP_KERNEL);
if (WARN_ON(!desc)) {
of_node_put(np);
goto err;
}
desc->irq_init_cb = match->data;
desc->dev = of_node_get(np);
desc->interrupt_parent = of_irq_find_parent(np);
if (desc->interrupt_parent == np)
desc->interrupt_parent = NULL;
list_add_tail(&desc->list, &intc_desc_list);
}
/*
* The root irq controller is the one without an interrupt-parent.
* That one goes first, followed by the controllers that reference it,
* followed by the ones that reference the 2nd level controllers, etc.
*/
while (!list_empty(&intc_desc_list)) {
/*
* Process all controllers with the current 'parent'.
* First pass will be looking for NULL as the parent.
* The assumption is that NULL parent means a root controller.
*/
list_for_each_entry_safe(desc, temp_desc, &intc_desc_list, list) {
int ret;
if (desc->interrupt_parent != parent)
continue;
list_del(&desc->list);
of_node_set_flag(desc->dev, OF_POPULATED);
pr_debug("of_irq_init: init %pOF (%p), parent %p\n",
desc->dev,
desc->dev, desc->interrupt_parent);
ret = desc->irq_init_cb(desc->dev,
desc->interrupt_parent);
if (ret) {
of_node_clear_flag(desc->dev, OF_POPULATED);
kfree(desc);
continue;
}
/*
* This one is now set up; add it to the parent list so
* its children can get processed in a subsequent pass.
*/
list_add_tail(&desc->list, &intc_parent_list);
}
/* Get the next pending parent that might have children */
desc = list_first_entry_or_null(&intc_parent_list,
typeof(*desc), list);
if (!desc) {
pr_err("of_irq_init: children remain, but no parents\n");
break;
}
list_del(&desc->list);
parent = desc->dev;
kfree(desc);
}
list_for_each_entry_safe(desc, temp_desc, &intc_parent_list, list) {
list_del(&desc->list);
kfree(desc);
}
err:
list_for_each_entry_safe(desc, temp_desc, &intc_desc_list, list) {
list_del(&desc->list);
of_node_put(desc->dev);
kfree(desc);
}
}
dtb:
gic: interrupt-controller@1400000 {
compatible = "arm,gic-400";
#interrupt-cells = <3>;
interrupt-controller;
reg = <0x0 0x1401000 0 0x1000>, /* GICD */
<0x0 0x1402000 0 0x2000>, /* GICC */
<0x0 0x1404000 0 0x2000>, /* GICH */
<0x0 0x1406000 0 0x2000>; /* GICV */
interrupts = <1 9 0xf08>;
};
IRQCHIP_DECLARE(gic_400, "arm,gic-400", gic_of_init);
IRQCHIP_DECLARE(arm11mp_gic, "arm,arm11mp-gic", gic_of_init);
IRQCHIP_DECLARE(arm1176jzf_dc_gic, "arm,arm1176jzf-devchip-gic", gic_of_init);
IRQCHIP_DECLARE(cortex_a15_gic, "arm,cortex-a15-gic", gic_of_init);
IRQCHIP_DECLARE(cortex_a9_gic, "arm,cortex-a9-gic", gic_of_init);
IRQCHIP_DECLARE(cortex_a7_gic, "arm,cortex-a7-gic", gic_of_init);
IRQCHIP_DECLARE(msm_8660_qgic, "qcom,msm-8660-qgic", gic_of_init);
IRQCHIP_DECLARE(msm_qgic2, "qcom,msm-qgic2", gic_of_init);
IRQCHIP_DECLARE(pl390, "arm,pl390", gic_of_init);
#define IRQCHIP_DECLARE(name, compat, fn) OF_DECLARE_2(irqchip, name, compat, fn)
#define OF_DECLARE_2(table, name, compat, fn) \
_OF_DECLARE(table, name, compat, fn, of_init_fn_2)
#define _OF_DECLARE(table, name, compat, fn, fn_type) \
static const struct of_device_id _of_table##name \
__used section(##table##_of_table) \
= { .compatible = compat, \
.data = (fn == (fn_type)NULL) ? fn : fn }
GIC driver初始化代码分析:
1.5.2.1.1.1.1 gic_of_init()
int __init
gic_of_init(struct device_node *node, struct device_node *parent)
{
struct gic_chip_data *gic;
int irq, ret;
if (WARN_ON(!node))
return -ENODEV;
if (WARN_ON(gic_cnt >= CONFIG_ARM_GIC_MAX_NR))
return -EINVAL;
gic = &gic_data[gic_cnt];
ret = gic_of_setup(gic, node);
if (ret)
return ret;
/*
* Disable split EOI/Deactivate if either HYP is not available
* or the CPU interface is too small.
*/
if (gic_cnt == 0 && !gic_check_eoimode(node, &gic->raw_cpu_base))
static_key_slow_dec(&supports_deactivate);
ret = __gic_init_bases(gic, -1, &node->fwnode);
if (ret) {
gic_teardown(gic);
return ret;
}
if (!gic_cnt) {
gic_init_physaddr(node);
gic_of_setup_kvm_info(node);
}
if (parent) {
irq = irq_of_parse_and_map(node, 0);
gic_cascade_irq(gic_cnt, irq);
}
if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
gicv2m_init(&node->fwnode, gic_data[gic_cnt].domain);
gic_cnt++;
return 0;
}
1.5.2.1.1.1.1.1 gic_init_bases()
__gic_init_bases()->gic_init_bases()
static int gic_init_bases(struct gic_chip_data *gic, int irq_start,
struct fwnode_handle *handle)
{
irq_hw_number_t hwirq_base;
int gic_irqs, irq_base, ret;
if (IS_ENABLED(CONFIG_GIC_NON_BANKED) && gic->percpu_offset) {
/* Frankein-GIC without banked registers... */
unsigned int cpu;
gic->dist_base.percpu_base = alloc_percpu(void __iomem *);
gic->cpu_base.percpu_base = alloc_percpu(void __iomem *);
if (WARN_ON(!gic->dist_base.percpu_base ||
!gic->cpu_base.percpu_base)) {
ret = -ENOMEM;
goto error;
}
for_each_possible_cpu(cpu) {
u32 mpidr = cpu_logical_map(cpu);
u32 core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0);
unsigned long offset = gic->percpu_offset * core_id;
*per_cpu_ptr(gic->dist_base.percpu_base, cpu) =
gic->raw_dist_base + offset;
*per_cpu_ptr(gic->cpu_base.percpu_base, cpu) =
gic->raw_cpu_base + offset;
}
gic_set_base_accessor(gic, gic_get_percpu_base);
} else {
/* Normal, sane GIC... */
WARN(gic->percpu_offset,
"GIC_NON_BANKED not enabled, ignoring %08x offset!",
gic->percpu_offset);
gic->dist_base.common_base = gic->raw_dist_base;
gic->cpu_base.common_base = gic->raw_cpu_base;
gic_set_base_accessor(gic, gic_get_common_base);
}
/*
* Find out how many interrupts are supported.
* The GIC only supports up to 1020 interrupt sources.
*/
gic_irqs = readl_relaxed(gic_data_dist_base(gic) + GIC_DIST_CTR) & 0x1f;
gic_irqs = (gic_irqs + 1) * 32;
if (gic_irqs > 1020)
gic_irqs = 1020;
gic->gic_irqs = gic_irqs;
if (handle) { /* DT/ACPI */
gic->domain = irq_domain_create_linear(handle, gic_irqs,
&gic_irq_domain_hierarchy_ops,
gic);
} else { /* Legacy support */
/*
* For primary GICs, skip over SGIs.
* For secondary GICs, skip over PPIs, too.
*/
if (gic == &gic_data[0] && (irq_start & 31) > 0) {
hwirq_base = 16;
if (irq_start != -1)
irq_start = (irq_start & ~31) + 16;
} else {
hwirq_base = 32;
}
gic_irqs -= hwirq_base; /* calculate # of irqs to allocate */
irq_base = irq_alloc_descs(irq_start, 16, gic_irqs,
numa_node_id());
if (irq_base < 0) {
WARN(1, "Cannot allocate irq_descs @ IRQ%d, assuming pre-allocated\n",
irq_start);
irq_base = irq_start;
}
gic->domain = irq_domain_add_legacy(NULL, gic_irqs, irq_base,
hwirq_base, &gic_irq_domain_ops, gic);
}
if (WARN_ON(!gic->domain)) {
ret = -ENODEV;
goto error;
}
gic_dist_init(gic);
ret = gic_cpu_init(gic);
if (ret)
goto error;
ret = gic_pm_init(gic);
if (ret)
goto error;
return 0;
error:
if (IS_ENABLED(CONFIG_GIC_NON_BANKED) && gic->percpu_offset) {
free_percpu(gic->dist_base.percpu_base);
free_percpu(gic->cpu_base.percpu_base);
}
return ret;
}
这段代码主要是向系统中注册一个irq domain的数据结构。为何需要struct irq_domain这样一个数据结构呢?从linux kernel的角度来看,任何外部的设备的中断都是一个异步事件,kernel都需要识别这个事件。在内核中,用IRQ number来标识某一个设备的某个interrupt request。有了IRQ number就可以定位到该中断的描述符(struct irq_desc)。但是,对于中断控制器而言,它不并知道IRQ number,它只是知道HW interrupt number(中断控制器会为其支持的interrupt source进行编码,这个编码被称为Hardware interrupt number )。不同的软件模块用不同的ID来识别interrupt source,这样就需要映射了。如何将Hardware interrupt number 映射到IRQ number呢?这需要一个translation object,内核定义为struct irq_domain。
每个interrupt controller都会形成一个irq domain,负责解析其下游的interrut source。如果interrupt controller有级联的情况,那么一个非root interrupt controller的中断控制器也是其parent irq domain的一个普通的interrupt source。struct irq_domain定义如下:
struct irq_domain {
......
const struct irq_domain_ops *ops;
void *host_data;
......
};
在注册GIC的irq domain的时候还有一个重要的数据结构gic_irq_domain_ops,其类型是struct irq_domain_ops ,对于GIC,其irq domain的操作函数是gic_irq_domain_ops,定义如下:
static const struct irq_domain_ops gic_irq_domain_ops = {
.map = gic_irq_domain_map,
.unmap = gic_irq_domain_unmap,
};
irq domain的概念是一个通用中断子系统的概念,
irq domain相关callback函数分析: gic_irq_domain_map函数:创建IRQ number和GIC hw interrupt ID之间映射关系的时候,需要调用该回调函数。具体代码如下:
static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,
irq_hw_number_t hw)
{
struct gic_chip_data *gic = d->host_data;
if (hw < 32) {
irq_set_percpu_devid(irq);
irq_domain_set_info(d, irq, hw, &gic->chip, d->host_data,
handle_percpu_devid_irq, NULL, NULL);
irq_set_status_flags(irq, IRQ_NOAUTOEN);
} else {
irq_domain_set_info(d, irq, hw, &gic->chip, d->host_data,
handle_fasteoi_irq, NULL, NULL);
irq_set_probe(irq);
irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(irq)));
}
return 0;
}
由此,这里就找到了desc->handle_irq(desc) 函数被设置为handle_percpu_devid_irq或者handle_fasteoi_irq,以handle_percpu_devid_irq为例:
/**
* handle_percpu_devid_irq - Per CPU local irq handler with per cpu dev ids
* @desc: the interrupt description structure for this irq
*
* Per CPU interrupts on SMP machines without locking requirements. Same as
* handle_percpu_irq() above but with the following extras:
*
* action->percpu_dev_id is a pointer to percpu variables which
* contain the real device id for the cpu on which this handler is
* called
*/
void handle_percpu_devid_irq(struct irq_desc *desc)
{
struct irq_chip *chip = irq_desc_get_chip(desc);
struct irqaction *action = desc->action;
unsigned int irq = irq_desc_get_irq(desc);
irqreturn_t res;
kstat_incr_irqs_this_cpu(desc);
if (chip->irq_ack)
chip->irq_ack(&desc->irq_data);
if (likely(action)) {
trace_irq_handler_entry(irq, action);
res = action->handler(irq, raw_cpu_ptr(action->percpu_dev_id));
trace_irq_handler_exit(irq, action, res);
} else {
unsigned int cpu = smp_processor_id();
bool enabled = cpumask_test_cpu(cpu, desc->percpu_enabled);
if (enabled)
irq_percpu_disable(desc, cpu);
pr_err_once("Spurious%s percpu IRQ%u on CPU%u\n",
enabled ? " and unmasked" : "", irq, cpu);
}
if (chip->irq_eoi)
chip->irq_eoi(&desc->irq_data);
}
最终就调用了我们注册进去的服务程序。