S1: InnoDB AIO原理及相关bug分析

InnoDB AIO 原理及相关bug分析

淘宝希羽

议程
• InnoDB AIO参数设置
• InnoDB模拟AIO原理
• 读写线程如何调用及诊断
• 主线程帮什么忙
• DDL丢表问题分析及解决

InnoDB AIO 参数设置
• innodb_file_io_threads
– 自从5.1 plugin 和5.5版本被舍弃
– built-in版本默认值为4，意味着:
• innodb_read_io_threads=1
• innodb_write_io_threads=1
• 一个insert buffer线程
• 一个log线程

• 在SSD 环境, 拥有更强的IO能力,典型设置：
– innodb_thread_concurrency=64
– innodb_read_io_threads=8
– innodb_write_io_threads=8
– innodb_io_capacity=2000

InnoDB 模拟AIO:
初始化
• srv/srv0start.c:
innobase_start_or_create_for_mysql

InnoDB模拟AIO:
关键的数据结构
• struct os_aio_array_t，4个实例
– mutex
– not_full/is_empty (os_event_t)
– n_slots/n_segments/n_reserved
– slots (os_aio_slot_t)
• struct os_aio_slot_t
n_slot=n_[read|write]_segs*n_per_seg
slot[0] … … slot[n_slot]
pos|reserved|..

InnoDB模拟AIO:
工作线程和唤醒线程句柄
• 工作线程句柄 fil/fil0fil.c:fil_aio_wait
– os_aio_simulated_handle
• 唤醒AIO线程句柄(多处调用)
– os_aio_simulated_wake_handler_thread
– 典型场景：找不到可用slot时强制唤醒
slot[0] slot[1] …slot[k]… slot[n-1]
if found slot that
needs read/write
io, call broadcast in
os_file_read/write wake_handler
else wait for event

InnoDB 模拟AIO:
核心函数os_aio_simulated_handle
- 获取segment slots
- 如果slot被保留并且io_already_done, 那么goto
slot_io_done(释放slot及io_already_done);
- 如果任何slot被保留时间>2s, 那么选择最老的以防止饥饿;
- 如果没有找到上述条件的slot, 选择被保留且offset最小的slot.
- 上述两个条件均未找到slot(被保留的),则 goto wait_for_io;
- 否则必然到到一个slot,再继续找到slot’被保留且与之前找到
slot有连续的IO,再找到与slot’有如此关系的slot’’ …(找到64个)
- memcpy 上述找到的 slot的buf到 combined_buf
- 调用os_file_read/os_file_write来完成读/写.

可能的优化余地:增大slot及批量写的slot数目
Native AIO slot数目为AIO的1/8, 调用os_aio_linux_handle

读写线程如何调用及诊断:
一切以调用fil_io为开始
• fil_io
– fil_mutex_enter_and_prepare_for_io
– fil_node_prepare_for_io
– os_aio
• os_aio_simulated_handle
– os_file_pwrite (lseek && write && [flush])
– fil_node_complete_io
• fil_flush
• 读同步,写异步

读写线程如何调用及诊断：
文件IO的诊断信息
• fil_node_prepare_for_io
– 从fil_system->LRU将node移走(被占用)
– n_pending++
• fil_node_complete_io
– n_pending--
– modification_counter++ (set to flush_counter when freed)
– 将node->space 加到fil_system->unflushed_spaces
– 将node回到fil_system->LRU
• fil_flush
– space->n_pending_flushes++
– n_pending_flushes++
– os_file_flush
– n_pending_flushes—
– space->n_pending_flushes--

主线程帮什么忙
• 调用buf_flush_batch来刷肮页和唤醒AIO
– 以不同的负荷来刷脏页(以当前IO的繁忙程度来
分配IO能力)
– 从flush_list刷页块(相邻页也被刷)
• buf_flush_list->buf_flush_batch(BUF_FLUSH_LIST)
– 最终调用 buf_flush_buffered_writes
• 从缓存中的doublewrite 刷可能的buffer到存储
• 如果使用模拟AIO,则要唤醒AIO线程

flush_list和LRU_list
• 两种刷页方式的特点
– 所有页先从buffer pool刷到缓存中的doublewrite
• 主线程周期性触发(修改页比例 > 脏页比例), 从flush_list中
刷
• 工作线程主动刷(没有可用的空闲blocks),从LRU_list中刷
– doublewrite组成: 128 pages, each 16K, all 2M

– 始于buf_flush_page: 将 flushable 页从buffer pool 写
到某个文件 fil_io (no trx_dwb)

buf_flush_page buf_flush_write_block_low buf_flush_post_to_doublewrite_buf

buf_flush_buffer_writes

DDL表丢失问题背景
• #62100
• DDL失败后表丢失
• 过去一年在线上操作经历5次
• 自2008年就在buglist中存在
• 2011年下提交patch
• 2012年初Mark&Inaam讨论改进patch
• MySQL 5.5.22融入patch

DDL 丢表分析：
基本信息
• fil_rename_tablespace
– 设置stop_ios=TURE
– 等待直到n_pending==0 &&
n_pending_flushes==0
– 设置stop_ios=FALSE
• fil_mutex_enter_and_prepare_for_io
– 等待直到stop_ios = FALSE
• 被阻塞直到超时, mysqld最终abort

DDL丢表分析:
关键的backtrace

DDL丢表分析:
自问自答
• 为什么n_pending > 0?
– 没有活动的写线程
• 没有写线程没有被唤醒?
– 被阻塞在doublewrite->mutex
• 谁拥有doublewrite->mutex
– srv_master 线程拥有但被阻塞
• 为什么DDL异常下不能正确回滚?
– 没有正确设置参数以告知调用者
• 其它风险?
– 对DDL操作的中间临时表的写是否延时

DDL丢表解决方法
• 在等待时间过长时强制唤醒AIO线程
• 正确设置回滚的标记
• 修正max_open_files的判断逻辑位置

S1: InnoDB AIO原理及相关bug分析

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à S1: InnoDB AIO原理及相关bug分析

Similaire à S1: InnoDB AIO原理及相关bug分析 (20)

Plus de Hui Liu

Plus de Hui Liu (7)

S1: InnoDB AIO原理及相关bug分析