<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>redis &#8211; IT Crafter</title>
	<atom:link href="https://blog.itcrafter.net/archives/tag/redis/feed" rel="self" type="application/rss+xml" />
	<link>https://blog.itcrafter.net</link>
	<description></description>
	<lastBuildDate>Sun, 15 Sep 2024 12:05:19 +0000</lastBuildDate>
	<language>zh-Hans</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.1</generator>
	<item>
		<title>Gitlab 500 Error: Redis 故障</title>
		<link>https://blog.itcrafter.net/archives/185</link>
		
		<dc:creator><![CDATA[IT Crafter]]></dc:creator>
		<pubDate>Sun, 15 Sep 2024 12:05:18 +0000</pubDate>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[error 500]]></category>
		<category><![CDATA[gitlab]]></category>
		<category><![CDATA[redis]]></category>
		<guid isPermaLink="false">https://blog.itcrafter.net/?p=185</guid>

					<description><![CDATA[故障&#8230; 前几天网络出梗，导致 Gitlab 服务器上挂载的 LUN 数据盘意外断连，数据损坏。 稍 ... <a title="Gitlab 500 Error: Redis 故障" class="read-more" href="https://blog.itcrafter.net/archives/185" aria-label="Read more about Gitlab 500 Error: Redis 故障">阅读更多</a>]]></description>
										<content:encoded><![CDATA[
<h3 class="wp-block-heading">故障&#8230;</h3>



<p>前几天网络出梗，导致 Gitlab 服务器上挂载的 LUN 数据盘意外断连，数据损坏。</p>



<p>稍费周折把 LUN 恢复并重新挂载之后，Gitlab 依旧无法正常工作，报 500 Error。</p>



<figure class="wp-block-image"><img decoding="async" src="https://pic2.helpfully.top:9522/uploads/2024/09/66e45761eed5c.png" alt="gitlab_500_error"/></figure>



<p>想必是 LUN 断开时正在写入关键数据，尽管数据盘挂载了回来，但可能某些服务因为关键数据损坏而导致整个 Gitlab 跪了&#8230;</p>



<h3 class="wp-block-heading">排查&#8230;</h3>



<p>查下 Gitlab 各个服务的状态：</p>



<pre class="wp-block-preformatted">gitlab-ctl status</pre>



<p>发现 gitlab-kas 和 redis 服务是 down 状态：</p>



<pre class="wp-block-preformatted">un: gitaly: (pid 317) 151836s; run: log: (pid 307) 151836s<br>down: gitlab-kas: (pid 400) <br>run: gitlab-workhorse: (pid 315) 151836s; run: log: (pid 306) 151836s<br>run: logrotate: (pid 37297) 634s; run: log: (pid 313) 151836s<br>run: nginx: (pid 389) 151835s; run: log: (pid 388) 151835s<br>run: postgresql: (pid 319) 151836s; run: log: (pid 310) 151836s<br>run: puma: (pid 318) 151836s; run: log: (pid 309) 151836s<br>down: redis: (pid 314) <br>run: sidekiq: (pid 308) 151836s; run: log: (pid 303) 151836s<br>run: sshd: (pid 36) 151852s; run: log: (pid 35) 151852s<br>​</pre>



<p>gitlab-kas (Kubernetes Agent Server)：是用于集成 Kubernetes 的服务组件，但这对于 Gitlab 来说只是个可选组件；</p>



<p>redis：这倒是个关键的后台服务，用作 Gitlab 的缓存层，并负责处理后台作业队列以及存储用户会话信息。</p>



<p>而且，gitlab-kas 大概率会依赖 redis 的缓存和消息队列能力。所以，先看看 redis 出了什么问题：</p>



<pre class="wp-block-preformatted">tail -f /var/log/gitlab/redis/current   <em># 查看 gitlab redis 运行日志</em></pre>



<p>确实看出一些端倪：</p>



<pre class="wp-block-preformatted">2024-09-13_04:03:46.62940 25242:M 13 Sep 2024 04:03:46.629 * Loading RDB produced by version 7.0.15<br>2024-09-13_04:03:46.62941 25242:M 13 Sep 2024 04:03:46.629 * RDB age 97363 seconds<br>2024-09-13_04:03:46.62944 25242:M 13 Sep 2024 04:03:46.629 * RDB memory usage when created 6.00 Mb<br>2024-09-13_04:03:46.64184 25242:M 13 Sep 2024 04:03:46.641 <em># Internal error in RDB reading offset 0, function at rdb.c:529 -&gt; Unknown RDB string encoding type 25</em><br>2024-09-13_04:03:46.65352 [offset 0] Checking RDB file dump.rdb<br>2024-09-13_04:03:46.65353 [offset 27] AUX FIELD redis-ver = '7.0.15'<br>2024-09-13_04:03:46.65354 [offset 41] AUX FIELD redis-bits = '64'<br>2024-09-13_04:03:46.65355 [offset 53] AUX FIELD ctime = '1726102863'<br>2024-09-13_04:03:46.65356 [offset 68] AUX FIELD used-mem = '6292872'<br>2024-09-13_04:03:46.65357 [offset 80] AUX FIELD aof-base = '0'<br>2024-09-13_04:03:46.65357 [offset 82] Selecting DB ID 0<br>2024-09-13_04:03:46.65363 --- RDB ERROR DETECTED ---<br>2024-09-13_04:03:46.65365 [offset 719559] Internal error in RDB reading offset 0, function at rdb.c:529 -&gt; Unknown RDB string encoding type 25<br>2024-09-13_04:03:46.65365 [additional info] While doing: read-object-value<br>2024-09-13_04:03:46.65366 [additional info] Reading key 'cron_job:namespaces_process_outdated_namespace_descendants_cron_worker:enqueued'<br>2024-09-13_04:03:46.65367 [additional info] Reading type 5 (zset-v2)<br>2024-09-13_04:03:46.65367 [info] 3555 keys read<br>2024-09-13_04:03:46.65368 [info] 3439 expires<br>2024-09-13_04:03:46.65369 [info] 3405 already expired<br>2024-09-13_04:03:46.65369 25242:M 13 Sep 2024 04:03:46.653 <em># Terminating server after rdb file reading failure.</em><br>​</pre>



<p>从日志上看，是 dump.rdb 文件出了问题&#8230;</p>



<p>这个文件的具体路径是： <code>/var/opt/gitlab/redis/dump.rdb</code>，是个挂载到 LUN 上的路径&#8230;</p>



<p>Redis 作为内存数据库，会定期将数据库状态持久化到磁盘上，类似于保存快照，而载体正是这个 <code>dump.rdb</code> 文件。在 Redis 服务启动时，如果检测到存在 <code>dump.rdb</code> 文件，会自动加载文件中的数据。</p>



<p>估计是这次 LUN 的意外断连恰好发生在 rdb 文件的写入过程，导致数据损坏。</p>



<h3 class="wp-block-heading">解法</h3>



<p>要恢复损坏的 <code>dump.rdb</code>，性价比显然不太高。当下最优解法，应该是直接把这个文件删掉了。（已知代价是丢失会话信息，感觉还好）。</p>



<p>删掉 <code>dump.rdb</code>后，重启整个 Gitlab：</p>



<pre class="wp-block-preformatted">gitlab-ctl stop<br>gitlab-ctl start</pre>



<p>好，一切恢复！（至少重要的内容没感觉丢失&#8230;）</p>



<h3 class="wp-block-heading">小结一下</h3>



<p>类似 Gitlab 这种对数据持久化比较敏感的服务，把存储挂到 LUN 上确实多了一层风险，还不如在本地存储上多加一点投资。</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
