Memory๐Ÿคฏ

Memory๐Ÿคฏ

Tlb

pyhsical address = (((satp[VPN[2]] >> 10)[VPN[1]] >> 10)[VPN[0]] << 2 )& 0xfffffffffff000) | virtual address[11:0]
VPN[2] = virtual address[38:30]
VPN[1] = virtual address[29:21]
VPN[0] = virtual address[20:12]

Brk

brkๅ’Œsbrkๆ˜ฏๅ†…ๆ ธๆƒณ็”จๆˆทๆไพ›็š„ไธคไธช็”จไบŽ็”ณ่ฏทๅ †็ฉบ้—ด็š„็ณป็ปŸ่ฐƒ็”จ๏ผŒไฝ†ไธ€่ˆฌไฝฟ็”จๅฐ่ฃ…ๅฅฝ็š„mallocๆŽฅๅฃ

#include <unistd.h>
 
int brk(void *addr);

void *sbrk(intptr_t increment)

Mmap

Partition

segmentationๅ’Œpaging้ƒฝๆ˜ฏ่ฟ™ไธ€ๆฆ‚ๅฟต็š„ๅฎž็Žฐ๏ผŒๅœจx86_64ไธญไฝฟ็”จ็š„ๆ˜ฏsegmentation๏ผŒ่€ŒๅœจRisc vไธญไฝฟ็”จ็š„๏ผŒๅนถไธ”ๅซๆœ‰ไธ“ๆœ‰็กฌไปถ็š„๏ผŒๆ˜ฏpaging

MMU๏ผš va -> pa

ๅฏนไบŽMemory Allocation Strategiesๆฅ่ฏด๏ผŒๅˆ†ไธบ๏ผš

  • Fixed partitions

    ๅ…ทๆœ‰internal partition

  • Variable paritions

    ๅ…ทๆœ‰external partition

    • first fit

      ็”ณ่ฏท็ฌฌไธ€ไธช่ถณๅคŸๅคง็š„block

    • best fit

      ็”ณ่ฏทๆœ€ๅฐ็š„ๅฏไปฅๆ‰ฟ่ฝฝ่ฟ›็จ‹็š„block

    • worst fit

      ไปŽๆœ€ๅคง็š„holeไธญ็”ณ่ฏท

Segmentation

ๆฏไธ€ไธชSegmentationไธญๅŒ…ๅซๅพˆๅคšsection๏ผŒ็›ธๅŒๆƒ้™็š„sectionๅฏไปฅๆ”พๅˆฐไธ€ไธชsegmentไธญ

ๅœจๅ…ทไฝ“็š„ๅฎž็Žฐไธญ๏ผŒ้‡‡็”จไปฅไธ‹็š„็ป“ๆž„ๆฅๅฎž็Žฐ(่ฏฅ่กจๅญ˜ๅœจไบŽๅ†…ๅญ˜ไธญ)๏ผš

image-20221205191849396

ๆ˜ฏไธ€็งvariable partition

Paging

ๆœ€้‡่ฆ็š„ๆฆ‚ๅฟต๏ผŒๅฐฑๆ˜ฏๅ…ถไธญ็ปดๆŠคๆ˜ ๅฐ„็š„page table็ป“ๆž„

ๅœจtableไธญ๏ผŒๅญ˜ๅ‚จ็š„ๅ€ผไธบframe number๏ผŒ่€Œpage number้šๅซๅœจindexไธญ๏ผŒ็œไธ‹ไธ€ๅŠ็š„็ฉบ้—ด

ไฝฟ็”จๆŒ‡ๅ‘้กต่กจbase็š„ๅฏ„ๅญ˜ๅ™จ๏ผŒ่ฏฅ่กจๅญ˜ๅ‚จๅœจๅ†…ๅญ˜ไธญ

TLB(translation look-aside buffer):

if (the page number in the TLB)
    hit;
else 
    miss;
    walk page table;

ๆฏๆฌกๅœจๅˆ‡ๆขprocessๆ—ถ๏ผŒ้œ€่ฆๆธ…้™ค่ฏฅcache๏ผŒๅธฆๆฅๆ€ง่ƒฝ็š„ๆŸ่€—

็›ฎๅ‰็š„TLBๅฎž็Žฐ๏ผš

Associative memeory:ๆ”ฏๆŒๅนถ่กŒ็š„ๆฏ”่พƒ

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” 
โ”‚              โ”‚  โ”Œโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”   
โ”‚    CPU       โ”œโ”€>โ”‚P  โ”‚  D   โ”‚  
โ”‚              โ”‚  โ””โ”ฌโ”ฌโ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”˜   
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚โ”‚โ”Œโ”€>โ”Œโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”      
                   โ”‚โ”‚โ”œโ”€>โ”‚    โ”‚     โ”‚
                   โ”‚โ”‚โ”œโ”€>โ”‚    โ”‚     โ”‚
                   โ”‚โ””โ”ผโ”€>โ”‚    โ”‚     โ”œโ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”   
                   โ”‚ โ”œโ”€>โ”‚    โ”‚     โ”‚   โ”‚        โ”‚      โ”‚  
                   โ”‚ โ”œโ”€>โ”‚    โ”‚     โ”‚   v        โ”‚      โ”‚
                   โ”‚ โ””โ”€>โ””โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜ โ”Œโ”€โ”€โ”€โ”ฌโ”€โ”€โ”   โ”‚      โ”‚
                   โ”‚                 โ”‚ F โ”‚D โ”œโ”€โ”€>โ”‚      โ”‚  
                   โ”‚                 โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”˜   โ”‚      โ”‚ 
                   โ”‚           โ”Œโ”€โ”€โ”€โ”€โ” ^         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ 
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€>โ”‚    โ”œโ”€โ”˜  
                               โ”‚    โ”‚
                               โ”‚    โ”‚
                               โ”‚    โ”‚
                               โ”‚    โ”‚
                               โ””โ”€โ”€โ”€โ”€โ”˜   

ๅœจๅคšๆ ธCPUไธญ๏ผŒๆฏไธช่ฟ›็จ‹็š„ๆ‰€ๆœ‰็บฟ็จ‹ๅ…ฑไบซไธ€ไธชpage table

ๅฏนไบŽ็›ธๅŒ่ฟ›็จ‹๏ผŒtextๅœจๅ†…ๅญ˜ไธญๆ˜ฏๅŒไธ€ไธชframe๏ผŒ่€Œๆ•ฐๆฎๆฎต๏ผŒๅˆ™ๅˆ†ๅˆซๅœจไธๅŒ็š„frameไธญ๏ผŒๅช้œ€่ฆๆ˜ ๅฐ„ๅˆฐไธๅŒ็š„่™šๆ‹Ÿๅœฐๅ€ไธญๅณๅฏ

ๅ•ๅฑ‚้กต่กจๆ—ถ๏ผŒๆฏไธช้กต่กจ็š„entryๆ•ฐ้‡ๆ˜ฏ1M๏ผš$2^{20}$

ไธ€่ˆฌๆฏไธชentry็š„้•ฟๅบฆๆ˜ฏ4 byte

ๆญคๆ—ถ๏ผŒๆฏไธช้กต่กจ็š„ๅคงๅฐไธบ4M

  • hierachical page table

    32ไฝๆ—ถ๏ผŒๅฆ‚ๆžœๆœ‰ไธค็บง้กต่กจ๏ผŒๅชไฝฟ็”จไธคไธชentryๆ—ถ๏ผŒๅช้œ€่ฆไธ‰ไธช้กต่กจ๏ผŒ่€Œๆ”นๅŠจๅ‰้œ€่ฆ1Kไธช

  • hashed page table

  • inverted page table

็ฑปfixed partition

ๅœจ่ฟ›่กŒๅ†…ๅญ˜ๅˆ†้…ๆ—ถ๏ผŒ้‡‡็”จ็š„ๆ˜ฏlazy allocation็š„ๆ–นๅผ๏ผŒไฝฟ็”จbrkๆฅๅฏนheap่ฟ›่กŒๅขž้•ฟ๏ผŒๅนถๆฒกๆœ‰ๅฎž้™…่ฟ›่กŒๅˆ†้…๏ผŒๅฏไปฅ็œๅŽปๅˆ†้…ๆ—ถๅ’Œ้‡Šๆ”พๆ—ถ็š„ๆ“ไฝœ

kernelๆฒกๆœ‰lazy allocation๏ผŒๅ› ไธบไธ่ƒฝๅค„็†page fault

mm_struct

struct mm_struct {
	struct {
		struct maple_tree mm_mt;
#ifdef CONFIG_MMU
		unsigned long (*get_unmapped_area) (struct file *filp,
				unsigned long addr, unsigned long len,
				unsigned long pgoff, unsigned long flags);
#endif
		unsigned long mmap_base;	/* base of mmap area */
		unsigned long mmap_legacy_base;	/* base of mmap area in bottom-up allocations */
#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
		/* Base addresses for compatible mmap() */
		unsigned long mmap_compat_base;
		unsigned long mmap_compat_legacy_base;
#endif
		unsigned long task_size;	/* size of task vm space */
		pgd_t * pgd;

#ifdef CONFIG_MEMBARRIER
		/**
		 * @membarrier_state: Flags controlling membarrier behavior.
		 *
		 * This field is close to @pgd to hopefully fit in the same
		 * cache-line, which needs to be touched by switch_mm().
		 */
		atomic_t membarrier_state;
#endif

		/**
		 * @mm_users: The number of users including userspace.
		 *
		 * Use mmget()/mmget_not_zero()/mmput() to modify. When this
		 * drops to 0 (i.e. when the task exits and there are no other
		 * temporary reference holders), we also release a reference on
		 * @mm_count (which may then free the &struct mm_struct if
		 * @mm_count also drops to 0).
		 */
		atomic_t mm_users;

		/**
		 * @mm_count: The number of references to &struct mm_struct
		 * (@mm_users count as 1).
		 *
		 * Use mmgrab()/mmdrop() to modify. When this drops to 0, the
		 * &struct mm_struct is freed.
		 */
		atomic_t mm_count;

#ifdef CONFIG_MMU
		atomic_long_t pgtables_bytes;	/* PTE page table pages */
#endif
		int map_count;			/* number of VMAs */

		spinlock_t page_table_lock; /* Protects page tables and some
					     * counters
					     */
		/*
		 * With some kernel config, the current mmap_lock's offset
		 * inside 'mm_struct' is at 0x120, which is very optimal, as
		 * its two hot fields 'count' and 'owner' sit in 2 different
		 * cachelines,  and when mmap_lock is highly contended, both
		 * of the 2 fields will be accessed frequently, current layout
		 * will help to reduce cache bouncing.
		 *
		 * So please be careful with adding new fields before
		 * mmap_lock, which can easily push the 2 fields into one
		 * cacheline.
		 */
		struct rw_semaphore mmap_lock;

		struct list_head mmlist; /* List of maybe swapped mm's.	These
					  * are globally strung together off
					  * init_mm.mmlist, and are protected
					  * by mmlist_lock
					  */


		unsigned long hiwater_rss; /* High-watermark of RSS usage */
		unsigned long hiwater_vm;  /* High-water virtual memory usage */

		unsigned long total_vm;	   /* Total pages mapped */
		unsigned long locked_vm;   /* Pages that have PG_mlocked set */
		atomic64_t    pinned_vm;   /* Refcount permanently increased */
		unsigned long data_vm;	   /* VM_WRITE & ~VM_SHARED & ~VM_STACK */
		unsigned long exec_vm;	   /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
		unsigned long stack_vm;	   /* VM_STACK */
		unsigned long def_flags;

		/**
		 * @write_protect_seq: Locked when any thread is write
		 * protecting pages mapped by this mm to enforce a later COW,
		 * for instance during page table copying for fork().
		 */
		seqcount_t write_protect_seq;

		spinlock_t arg_lock; /* protect the below fields */

		unsigned long start_code, end_code, start_data, end_data;
		unsigned long start_brk, brk, start_stack;
		unsigned long arg_start, arg_end, env_start, env_end;

		unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */

		/*
		 * Special counters, in some configurations protected by the
		 * page_table_lock, in other configurations by being atomic.
		 */
		struct mm_rss_stat rss_stat;

		struct linux_binfmt *binfmt;

		/* Architecture-specific MM context */
		mm_context_t context;

		unsigned long flags; /* Must use atomic bitops to access */

#ifdef CONFIG_AIO
		spinlock_t			ioctx_lock;
		struct kioctx_table __rcu	*ioctx_table;
#endif
#ifdef CONFIG_MEMCG
		/*
		 * "owner" points to a task that is regarded as the canonical
		 * user/owner of this mm. All of the following must be true in
		 * order for it to be changed:
		 *
		 * current == mm->owner
		 * current->mm != mm
		 * new_owner->mm == mm
		 * new_owner->alloc_lock is held
		 */
		struct task_struct __rcu *owner;
#endif
		struct user_namespace *user_ns;

		/* store ref to file /proc/<pid>/exe symlink points to */
		struct file __rcu *exe_file;
#ifdef CONFIG_MMU_NOTIFIER
		struct mmu_notifier_subscriptions *notifier_subscriptions;
#endif
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
		pgtable_t pmd_huge_pte; /* protected by page_table_lock */
#endif
#ifdef CONFIG_NUMA_BALANCING
		/*
		 * numa_next_scan is the next time that PTEs will be remapped
		 * PROT_NONE to trigger NUMA hinting faults; such faults gather
		 * statistics and migrate pages to new nodes if necessary.
		 */
		unsigned long numa_next_scan;

		/* Restart point for scanning and remapping PTEs. */
		unsigned long numa_scan_offset;

		/* numa_scan_seq prevents two threads remapping PTEs. */
		int numa_scan_seq;
#endif
		/*
		 * An operation with batched TLB flushing is going on. Anything
		 * that can move process memory needs to flush the TLB when
		 * moving a PROT_NONE mapped page.
		 */
		atomic_t tlb_flush_pending;
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
		/* See flush_tlb_batched_pending() */
		atomic_t tlb_flush_batched;
#endif
		struct uprobes_state uprobes_state;
#ifdef CONFIG_PREEMPT_RT
		struct rcu_head delayed_drop;
#endif
#ifdef CONFIG_HUGETLB_PAGE
		atomic_long_t hugetlb_usage;
#endif
		struct work_struct async_put_work;

#ifdef CONFIG_IOMMU_SVA
		u32 pasid;
#endif
#ifdef CONFIG_KSM
		/*
		 * Represent how many pages of this process are involved in KSM
		 * merging.
		 */
		unsigned long ksm_merging_pages;
		/*
		 * Represent how many pages are checked for ksm merging
		 * including merged and not merged.
		 */
		unsigned long ksm_rmap_items;
#endif
#ifdef CONFIG_LRU_GEN
		struct {
			/* this mm_struct is on lru_gen_mm_list */
			struct list_head list;
			/*
			 * Set when switching to this mm_struct, as a hint of
			 * whether it has been used since the last time per-node
			 * page table walkers cleared the corresponding bits.
			 */
			unsigned long bitmap;
#ifdef CONFIG_MEMCG
			/* points to the memcg of "owner" above */
			struct mem_cgroup *memcg;
#endif
		} lru_gen;
#endif /* CONFIG_LRU_GEN */
	} __randomize_layout;

	/*
	 * The mm_cpumask needs to be at the end of mm_struct, because it
	 * is dynamically sized based on nr_cpu_ids.
	 */
	unsigned long cpu_bitmap[];
};

vma

struct vm_area_struct {
	/* The first cache line has the info for VMA tree walking. */

	unsigned long vm_start;		/* Our start address within vm_mm. */
	unsigned long vm_end;		/* The first byte after our end address
					   within vm_mm. */

	struct mm_struct *vm_mm;	/* The address space we belong to. */

	/*
	 * Access permissions of this VMA.
	 * See vmf_insert_mixed_prot() for discussion.
	 */
	pgprot_t vm_page_prot;
	unsigned long vm_flags;		/* Flags, see mm.h. */

	/*
	 * For areas with an address space and backing store,
	 * linkage into the address_space->i_mmap interval tree.
	 *
	 * For private anonymous mappings, a pointer to a null terminated string
	 * containing the name given to the vma, or NULL if unnamed.
	 */

	union {
		struct {
			struct rb_node rb;
			unsigned long rb_subtree_last;
		} shared;
		/*
		 * Serialized by mmap_sem. Never use directly because it is
		 * valid only when vm_file is NULL. Use anon_vma_name instead.
		 */
		struct anon_vma_name *anon_name;
	};

	/*
	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
	 * can only be in the i_mmap tree.  An anonymous MAP_PRIVATE, stack
	 * or brk vma (with NULL file) can only be in an anon_vma list.
	 */
	struct list_head anon_vma_chain; /* Serialized by mmap_lock &
					  * page_table_lock */
	struct anon_vma *anon_vma;	/* Serialized by page_table_lock */

	/* Function pointers to deal with this struct. */
	const struct vm_operations_struct *vm_ops;

	/* Information about our backing store: */
	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
					   units */
	struct file * vm_file;		/* File we map to (can be NULL). */
	void * vm_private_data;		/* was vm_pte (shared mem) */

#ifdef CONFIG_SWAP
	atomic_long_t swap_readahead_info;
#endif
#ifndef CONFIG_MMU
	struct vm_region *vm_region;	/* NOMMU mapping region */
#endif
#ifdef CONFIG_NUMA
	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
#endif
	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
} __randomize_layout;

ๅœจ็”จๆˆทๆ€่ฟ›็จ‹ๆ‰ง่กŒ็š„ๆ—ถๅ€™๏ผŒไธๅฏ้ฟๅ…ๅœฐไผšๅ‘็”Ÿpage fault๏ผŒๅœจ่ฟ™ไบ›ๅผ‚ๅธธไธญ๏ผŒไธๆ–ญๆ˜ ๅฐ„page๏ผŒๆฅไฟ่ฏ็จ‹ๅบ็š„่ฟ›่กŒ

ๅœจ่ฟ่กŒไปฃ็ ๆ—ถ๏ผŒๆ—ข่ฆๅˆ†้…ๆ–ฐ็š„frame๏ผŒไนŸ่ฆๅฏนๅ†…ๅญ˜่ฟ›่กŒๆ˜ ๅฐ„

**Demand Paging **:ๅฝ“ไธ€ไธชpage้œ€่ฆ่ขซไฝฟ็”จๅˆฐ็š„ๆ—ถๅ€™๏ผŒๆ‰ไผšๅธฆๅ…ฅๅˆฐๅญ˜ๅ‚จไธญ

ไธ€่ˆฌๅœจ่ฎฟ้—ฎ่™šๆ‹Ÿๅœฐๅ€(walk page table)ๆ—ถ๏ผŒไผšๅ‡บ็Žฐไธค็ง้”™่ฏฏ:

  • validไฝ†ๆ˜ฏไธๅœจๅ†…ๅญ˜ไธญ๏ผšbring it to memory
  • ้กต่กจ้กน็š„validไฝไธบ0๏ผšabort the operation

ๆ“ไฝœ็ณป็ปŸ็›ธๅฝ“ไบŽไธ€ไธชๅŽๅฐ่ฟ›็จ‹๏ผŒๅœจๅ‡บ็Žฐ็ณป็ปŸ่ฐƒ็”จๆˆ–ๆ˜ฏๅผ‚ๅธธ็š„ๆ—ถๅ€™๏ผŒๆ‰ไผšไป‹ๅ…ฅ

ๅœจๆฒกๆœ‰ๆ–‡ไปถ่ฏปๅ–ๆ—ถ๏ผŒไธ้œ€่ฆ่ฟ›่กŒ็กฌ็›˜็š„่ฏปๅ–๏ผŒๅช้œ€่ฆๅ–ๅ‡บfree frameๅณๅฏ

 Trap to operation system 
 => Save the user regs and state 
 => determine the interrupt as a page fault 
 => check the page legal and determine the location 
 => issue read from disk to free frame
 => waiting allocate the CPU
 => receive an interrupt from disk
 => save regs and state
 => determine the interrupt from disk
 => correct the page table and other tables to show page is now in memory
 => wait for the CPU to be allocated to this process again
 => restore the user registers tate and the new page table, then resume the interrupted instruction

Overview:

   partition directly on physical memory (segmentation and paging)
=> swapping 
=> MMU
   TLB

Virtual memory: ้š”็ฆป้€ป่พ‘ๅœฐๅ€ๅ’Œ็‰ฉ็†ๅœฐๅ€

page faultๆ˜ฏdemanding paging็š„ๅฎž็Žฐๆ–นๅผ

MMU:

if (page_table[v_addr][valid] == v){
    fetch_the_memory();
}
else if (page_table[v_addr][valid] == i){
    raise (page_fault);
}

kernelไธ้‡‡็”จlazy allocation๏ผŒๆ‰€ไปฅไธไผšๅ‘็”Ÿpage fault

ๅœจๅฎž้™…็š„linuxไธญ๏ผŒๅฏนไบŽvma็š„็ฎก็†ไฝฟ็”จ็š„ๆ•ฐๆฎ็ป“ๆž„ๆ˜ฏ็บข้ป‘ๆ ‘๏ผŒ่€ŒๅฏนไบŽๅœจๆ ‘ไธญ็š„ๆŸฅๆ‰พๆ“ไฝœ๏ผŒๆ˜ฏ็”ฑๆ“ไฝœ็ณป็ปŸๆฅๅฎŒๆˆ๏ผŒๆ‰€ไปฅๆฏไธ€ไธชuser space process๏ผŒ้ƒฝๆœ‰ไธ€ไธช็บข้ป‘ๆ ‘

ๅฏนไบŽ๏ผŒๆœ€ๆž็ซฏ็š„ๆƒ…ๅ†ต๏ผŒๅœจๅˆšๅˆšๆ‰ง่กŒไธ€ไธช็บฟ็จ‹ๆ—ถ๏ผŒไป…ไป…ๅœจๅ†…ๅญ˜ไธญๆ‹ท่ดไบ†pcb๏ผŒๅ…ถๅฎƒ็š„ๅ†…ๅฎนๅนถๆฒกๆœ‰่ฝฝๅ…ฅๅˆฐๅ†…ๅญ˜ไธญ๏ผŒไนŸๆฒกๆœ‰็›ธๅบ”็š„ๆ˜ ๅฐ„

่€Œๅœจๅ‘็”Ÿpage faultๆ—ถ๏ผŒ็ปดๆŠคfree frameๆ—ถ๏ผŒ้œ€่ฆไธ€ไธช้“พ่กจ

ๅฝ“ไธ€ไธชpageๆ—ถanonymous็š„๏ผŒๅˆ™้œ€่ฆๅ†™ๅ…ฅswap area๏ผŒๅฆๅˆ™ๅ›žๆฐธไน…ๅคฑๅŽป่ฏฅ้กต

ๅฝ“้œ€่ฆไฝฟ็”จswapๆ—ถ๏ผŒ้œ€่ฆๆƒ่กกไธๅŒ็š„็ฎ—ๆณ•๏ผš

  • FIFO

    Beladyโ€™s Anomaly:ๅœจๅ†…ๅญ˜ๅขžๅŠ ๆ˜ฏ๏ผŒๅ‡บ็Žฐpage fault็š„ๆฌกๆ•ฐๅ่€ŒๅขžๅŠ 

  • optimal

    ๆ›ฟๆขๆŽ‰ๆœ€ไน…ไธ่ขซ็”จ่ฟ‡็š„frame

  • LRU

    ๆ›ฟๆขๆŽ‰ๆœ€ไน…่ขซ็”จ็š„frame

  • LFU

    ๆ›ฟๆขๆŽ‰ๆœ€ๅฐ‘่ขซ็”จๅˆฐ็š„frame

  • MFU

    ๆ›ฟๆขๆŽ‰ๆœ€ๅคš่ขซ็”จๅˆฐ็š„frame

ไพ‹:

#visiting reference:
#assume 3 frames
7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
FIFO
โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œ*โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œ*โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œ*โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œ*โ” โ”Œโ”€โ” โ”Œโ”€โ”
โ”‚7โ”‚ โ”‚7โ”‚ โ”‚7โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚4โ”‚ โ”‚4โ”‚ โ”‚4โ”‚ โ”‚0โ”‚ โ”‚0โ”‚ โ”‚0โ”‚ โ”‚7โ”‚ โ”‚7โ”‚ โ”‚7โ”‚
โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œโ”€โ”ค
โ”‚ โ”‚ โ”‚0โ”‚ โ”‚0โ”‚ โ”‚0โ”‚ โ”‚0โ”‚ โ”‚3โ”‚ โ”‚3โ”‚ โ”‚3โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚1โ”‚ โ”‚1โ”‚ โ”‚1โ”‚ โ”‚0โ”‚ โ”‚0โ”‚
โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚1โ”‚ โ”‚1โ”‚ โ”‚1โ”‚ โ”‚1โ”‚ โ”‚0โ”‚ โ”‚0โ”‚ โ”‚0โ”‚ โ”‚3โ”‚ โ”‚3โ”‚ โ”‚3โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚1โ”‚
โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜
15 page fault

Optimal
Best for theorem
โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” 
โ”‚7โ”‚ โ”‚7โ”‚ โ”‚7โ”‚ โ”‚0โ”‚ โ”‚0โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚0โ”‚ 
โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค 
โ”‚ โ”‚ โ”‚0โ”‚ โ”‚0โ”‚ โ”‚1โ”‚ โ”‚2โ”‚ โ”‚3โ”‚ โ”‚3โ”‚ โ”‚0โ”‚ โ”‚1โ”‚ 
โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œ*โ”ค โ”œ*โ”ค โ”œ*โ”ค โ”œ*โ”ค โ”œ*โ”ค 
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚1โ”‚ โ”‚2โ”‚ โ”‚3โ”‚ โ”‚4โ”‚ โ”‚0โ”‚ โ”‚1โ”‚ โ”‚7โ”‚ 
โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ 
9 page fault

LRU
For the actual implementation:
- counter based
uint8 lru_bits = 0;
while(1){
    if (clk == 100ms){
        lru = lru >> 1;
        if (page used) lru |= 0x80;
    }
}
- stack based

# so 11000100 is newer than 01110111
# When MMU access the page table, it changes the 8bits  the same time.
โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ” โ”Œโ”€โ”
โ”‚7โ”‚ โ”‚7โ”‚ โ”‚7โ”‚ โ”‚0โ”‚ โ”‚1โ”‚ โ”‚0โ”‚ โ”‚2โ”‚ โ”‚3โ”‚ โ”‚0โ”‚ โ”‚4โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚0โ”‚ โ”‚3โ”‚ โ”‚2โ”‚ โ”‚0โ”‚ โ”‚1โ”‚ โ”‚7โ”‚
โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค
โ”‚ โ”‚ โ”‚0โ”‚ โ”‚0โ”‚ โ”‚1โ”‚ โ”‚0โ”‚ โ”‚2โ”‚ โ”‚3โ”‚ โ”‚0โ”‚ โ”‚4โ”‚ โ”‚2โ”‚ โ”‚3โ”‚ โ”‚0โ”‚ โ”‚3โ”‚ โ”‚0โ”‚ โ”‚3โ”‚ โ”‚2โ”‚ โ”‚0โ”‚ โ”‚1โ”‚ โ”‚7โ”‚ โ”‚0โ”‚
โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œ*โ”ค โ”œ*โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œ*โ”ค โ”œ*โ”ค โ”œโ”€โ”ค โ”œโ”€โ”ค
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚1โ”‚ โ”‚2โ”‚ โ”‚2โ”‚ โ”‚3โ”‚ โ”‚0โ”‚ โ”‚4โ”‚ โ”‚2โ”‚ โ”‚3โ”‚ โ”‚0โ”‚ โ”‚3โ”‚ โ”‚0โ”‚ โ”‚3โ”‚ โ”‚2โ”‚ โ”‚0โ”‚ โ”‚1โ”‚ โ”‚7โ”‚ โ”‚0โ”‚ โ”‚1โ”‚
โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜ โ””โ”€โ”˜
12 page fault

clock Page-replacement Algorithm
# give the page second chance:
for page in pages:
    if reference_bit == 0:
        replace
    eles if referece_bit == 1:
        reference_bit = 0
#enhanced version:
#(0,0) not read or write
#(0,1) not read but write
#(1,0) read but not write
#(1,1) read and write

ๅœจๅ‘็”Ÿpage faultๆ—ถ๏ผŒ่ฟ›่กŒ็š„context switch่ฟ‡็จ‹ๅฆ‚ไธ‹๏ผš

process A ---\ 
  ^           \ 
  |            \--- page fault ---\
  |                                \
  |                                 \---> do_fault(A) ---> read_disk() ---\
  |                                       ^   |                 |          \
  |                                       |   |            A in wait queue  \--- switch_to(B) ---\
  |                                       |   |                 |                                 \
  |---------------------------------------|---|                 |                                  \---> process B
                                          |                    done                                          ^  |
                                          |                     |                                           |   |
                                          |                 A in ready queue                   clock interrupt  |
                                          |                     |-------------------------------------------|   |
                                          |---- context switec_to (A) ------------------------------------------|

ๅœจๅ‘็”Ÿpage faultๆ—ถ๏ผŒไผšๅ‘็”Ÿๅทจๅคง็š„ๅผ€้”€๏ผŒ็ฎ€ๅ•ๆต‹่ฏ•ๅฆ‚ไธ‹๏ผš

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
char *buf ;

int main(){
    buf = (char*)malloc(sizeof(char) * 0x400000);
    clock_t st,en;
    st = time(NULL);
    for (int o = 0;o < 1000;o++)
        for (int i = 0;i < 0x400000;i++) buf[i] = 2;
    en = time(NULL);
    printf("time by row is :%lf\n",((double)(en - st)) / CLOCKS_PER_SEC);
    
    st = time(NULL);
    for (int o = 0;o < 1000;o++)
        for (int i = 0;i < 0x1000;i++)
            for (int j = 0;j < 0x400; j++) buf[j * 0x1000 + i] = 1; 
    en = time(NULL);
    printf("time by line is :%lf\n",((double)(en - st)) / CLOCKS_PER_SEC);

}

่ฟ่กŒ็ป“ๆžœ๏ผš

$.\a.exe
time by row is :0.006000
time by line is :0.059000

ไธŠไพ‹ไนŸๆ˜ฏไธ€ไธชThrashing็š„็ฎ€ๅ•ไพ‹ๅญ๏ผŒๅฝ“้ข‘็นๅ‘็”Ÿpage faultๅฏผ่‡ด็š„page replacementๆ—ถ๏ผŒๅฐฑไผšๅ‡บ็Žฐ่ฟ™็งๆƒ…ๅ†ต

WSS: Working Sets

PFF:Page-fault frequency rate

  • ๅฝ“rateๅคชไฝŽๆ—ถ๏ผŒไผšๅ ็”จๅคชๅคš็š„frame๏ผŒไปŽ่€Œ้‡Šๆ”พไธ€ไบ›
  • ๅฝ“rateๅคช้ซ˜ๆ—ถ๏ผŒไผš่Žทๅ–frame

linuxไธญๅค„็†page fault(riscv):

asmlinkage void do_page_fault(struct pt_regs *regs)
{
	struct task_struct *tsk;
	struct vm_area_struct *vma;
	struct mm_struct *mm;
	unsigned long addr, cause;
	unsigned int flags = FAULT_FLAG_DEFAULT;
	int code = SEGV_MAPERR;
	vm_fault_t fault;

	cause = regs->cause;
	addr = regs->badaddr;

	tsk = current;
	mm = tsk->mm;

	/*
	 * Fault-in kernel-space virtual memory on-demand.
	 * The 'reference' page table is init_mm.pgd.
	 *
	 * NOTE! We MUST NOT take any locks for this case. We may
	 * be in an interrupt or a critical region, and should
	 * only copy the information from the master page table,
	 * nothing more.
	 */
	if (unlikely((addr >= VMALLOC_START) && (addr <= VMALLOC_END))) {
		vmalloc_fault(regs, code, addr);
		return;
	}

	/* Enable interrupts if they were enabled in the parent context. */
	if (likely(regs->status & SR_PIE))
		local_irq_enable();

	/*
	 * If we're in an interrupt, have no user context, or are running
	 * in an atomic region, then we must not take the fault.
	 */
	if (unlikely(faulthandler_disabled() || !mm)) {
		no_context(regs, addr);
		return;
	}

	if (user_mode(regs))
		flags |= FAULT_FLAG_USER;

	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);

	if (cause == EXC_STORE_PAGE_FAULT)
		flags |= FAULT_FLAG_WRITE;
	else if (cause == EXC_INST_PAGE_FAULT)
		flags |= FAULT_FLAG_INSTRUCTION;
retry:
	mmap_read_lock(mm);
	vma = find_vma(mm, addr);
	if (unlikely(!vma)) {
		bad_area(regs, mm, code, addr); q 
		return;
	}
	if (likely(vma->vm_start <= addr))
		goto good_area;
	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
		bad_area(regs, mm, code, addr);
		return;
	}
	if (unlikely(expand_stack(vma, addr))) {
		bad_area(regs, mm, code, addr);
		return;
	}

	/*
	 * Ok, we have a good vm_area for this memory access, so
	 * we can handle it.
	 */
good_area:
	code = SEGV_ACCERR;

	if (unlikely(access_error(cause, vma))) {
		bad_area(regs, mm, code, addr);
		return;
	}

	/*
	 * If for any reason at all we could not handle the fault,
	 * make sure we exit gracefully rather than endlessly redo
	 * the fault.
	 */
	fault = handle_mm_fault(vma, addr, flags, regs);

	/*
	 * If we need to retry but a fatal signal is pending, handle the
	 * signal first. We do not need to release the mmap_lock because it
	 * would already be released in __lock_page_or_retry in mm/filemap.c.
	 */
	if (fault_signal_pending(fault, regs))
		return;

	if (unlikely((fault & VM_FAULT_RETRY) && (flags & FAULT_FLAG_ALLOW_RETRY))) {
		flags |= FAULT_FLAG_TRIED;

		/*
		 * No need to mmap_read_unlock(mm) as we would
		 * have already released it in __lock_page_or_retry
		 * in mm/filemap.c.
		 */
		goto retry;
	}

	mmap_read_unlock(mm);

	if (unlikely(fault & VM_FAULT_ERROR)) {
		mm_fault_error(regs, addr, fault);
		return;
	}
	return;
}

ๅฝ“ๅ‘็”Ÿpage faultๆ—ถ๏ผŒ้ฆ–ๅ…ˆ่ฆvmaๆฅfind_vma๏ผŒๆŸฅ็œ‹่ฏฅๅœฐๅ€ๆ˜ฏgood or bad

Buddy System:ไปฅ$2^n$่ฟ›่กŒ็ปดๆŠค

slab Allocator:

  • task struct pool

  • no fragmentation:

    A 12K slab (3 pages) can store 4 3K objects

prepaging:ๅœจpage faultไน‹ๅ‰๏ผŒๅฐฑๅฐ†ๅœฐๅ€ๆ˜ ๅฐ„่ฟ‡ๅŽป

page size:ๅฐฑๅƒไน‹ๅ‰็œ‹ๅˆฐ่ฟ‡็š„๏ผŒlarge page

TLB Reach:TLB reach = (TLB size) * (page size)

Windows XP:working set