SEARCH  

NEWS

2010.10.06:11:36:28
Ustawowe wsparcie rozwoju sieci telekomunikacyjnych
Ustawa o wspieraniu rozwoju usług i sieci telekomunikacyjnych, która weszła w życie 15 lipca br., wprowadza szereg nowatorskich rozwiązań, mających na celu przyspieszenie rozwoju i upowszechnienie usług telekomunikacyjnych w Polsce. Doprowadzi do obniżenia kosztów przedsiębiorców telekomunikacyjnych i odbiorców tych usług (tzw. użytkowników końcowych).

 

messageID:559060007587
author:Rishikesh K Rajak
title:Re jbd fix error handling for checkpoint i
Hello, On Thu 21-08-08 19:09:27, Hidehiro Kawai wrote: The patch titled jbd: fix error handling for checkpoint io has been added to the -mm tree. Its filename is jbd-fix-error-handling-for-checkpoint-io.patch [snip] Subject: jbd: fix error handling for checkpoint io From: Hidehiro Kawai <hidehiro.kawai.ez@xxxxxxxxxxx When a checkpointing IO fails, current JBD code doesnt check the error and continue journaling. This means latest metadata can be lost from both the journal and filesystem. This patch leaves the failed metadata blocks in the journal space and aborts journaling in the case of log_do_checkpoint(). To achieve this, we need to do: 1. dont remove the failed buffer from the checkpoint list where in the case of __try_to_free_cp_buf() because it may be released or overwritten by a later transaction 2. log_do_checkpoint() is the last chance, remove the failed buffer from the checkpoint list and abort the journal 3. when checkpointing fails, dont update the journal super block to prevent the journaled contents from being cleaned. For safety, dont update j_tail and j_tail_sequence either 4. when checkpointing fails, notify this error to the ext3 layer so that ext3 dont clear the needs_recovery flag, otherwise the journaled contents are ignored and cleaned in the recovery phase 5. if the recovery fails, keep the needs_recovery flag 6. prevent cleanup_journal_tail() from being called between __journal_drop_transaction() and journal_abort() (a race issue between journal_flush() and __log_wait_for_space() When I read the source code again, I noticed the race condition described in 6 doesnt happen. Ive thought journal_flush() can invoke log_do_checkpoint() while __log_wait_for_space() is invoking log_do_checkpoint(), but it would be wrong. First journal_flush() invokes __log_start_commit() and log_wait_commit() pair. After this, there is no running transaction and no starting handle. New handles are also not created because j_barrier_count blocks it. Thus, when journal_flush() invokes log_do_checkpoint(), there is no other process which invokes __log_wait_for_space() and log_do_checkpoint() to get free log space. So invocations of log_do_checkpoint() are always isolated, the race condition doesnt happen. Im not quite following you. j_barrier_count is increased only in journal_lock_updates(). Noone is forced to first call journal_lock_updates() and only after that journal_flush() (although usually it is done that way). So I think taking the j_checkpoint_mutex in journal_flush() is really a good thing to do. If my understanding is correct, adding mutex_lock() around log_do_checkpoint() (see bellow) is unneeded. What do you think about this? [snip] @@ -1359,10 +1369,16 @@ int journal_flush(journal_t *journal) spin_lock(&journal- j_list_lock); while (!err && journal- j_checkpoint_transactions != NULL) { spin_unlock(&journal- j_list_lock); + mutex_lock(&journal- j_checkpoint_mutex); err = log_do_checkpoint(journal); + mutex_unlock(&journal- j_checkpoint_mutex); spin_lock(&journal- j_list_lock); Honza -- Jan Kara <jack@xxxxxxx SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at rel="nofollow" vger.kernel.org/majordomo-info.html vger.kernel.org/majordomo-info.html
Index