| title: | Re jbd fix error handling for checkpoint i |
|
Hello,
On Thu 21-08-08 19:09:27, Hidehiro Kawai wrote:
The patch titled
jbd: fix error handling for checkpoint io
has been added to the -mm tree. Its filename is
jbd-fix-error-handling-for-checkpoint-io.patch
[snip]
Subject: jbd: fix error handling for checkpoint io
From: Hidehiro Kawai <hidehiro.kawai.ez@xxxxxxxxxxx
When a checkpointing IO fails, current JBD code doesnt check the error
and continue journaling. This means latest metadata can be lost from both
the journal and filesystem.
This patch leaves the failed metadata blocks in the journal space and
aborts journaling in the case of log_do_checkpoint(). To achieve this, we
need to do:
1. dont remove the failed buffer from the checkpoint list where in
the case of __try_to_free_cp_buf() because it may be released or
overwritten by a later transaction
2. log_do_checkpoint() is the last chance, remove the failed buffer
from the checkpoint list and abort the journal
3. when checkpointing fails, dont update the journal super block to
prevent the journaled contents from being cleaned. For safety,
dont update j_tail and j_tail_sequence either
4. when checkpointing fails, notify this error to the ext3 layer so
that ext3 dont clear the needs_recovery flag, otherwise the
journaled contents are ignored and cleaned in the recovery phase
5. if the recovery fails, keep the needs_recovery flag
6. prevent cleanup_journal_tail() from being called between
__journal_drop_transaction() and journal_abort() (a race issue
between journal_flush() and __log_wait_for_space()
When I read the source code again, I noticed the race condition described
in 6 doesnt happen. Ive thought journal_flush() can invoke
log_do_checkpoint() while __log_wait_for_space() is invoking
log_do_checkpoint(), but it would be wrong.
First journal_flush() invokes __log_start_commit() and log_wait_commit()
pair. After this, there is no running transaction and no starting handle.
New handles are also not created because j_barrier_count blocks it.
Thus, when journal_flush() invokes log_do_checkpoint(), there is
no other process which invokes __log_wait_for_space() and
log_do_checkpoint() to get free log space. So invocations of
log_do_checkpoint() are always isolated, the race condition doesnt
happen.
Im not quite following you. j_barrier_count is increased only in
journal_lock_updates(). Noone is forced to first call
journal_lock_updates() and only after that journal_flush() (although
usually it is done that way). So I think taking the j_checkpoint_mutex in
journal_flush() is really a good thing to do.
If my understanding is correct, adding mutex_lock() around
log_do_checkpoint() (see bellow) is unneeded.
What do you think about this?
[snip]
@@ -1359,10 +1369,16 @@ int journal_flush(journal_t *journal)
spin_lock(&journal- j_list_lock);
while (!err && journal- j_checkpoint_transactions != NULL) {
spin_unlock(&journal- j_list_lock);
+ mutex_lock(&journal- j_checkpoint_mutex);
err = log_do_checkpoint(journal);
+ mutex_unlock(&journal- j_checkpoint_mutex);
spin_lock(&journal- j_list_lock);
Honza
--
Jan Kara <jack@xxxxxxx
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at rel="nofollow" vger.kernel.org/majordomo-info.html vger.kernel.org/majordomo-info.html
|