mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2024-11-25 16:22:50 +08:00

History

casinca bb31de8999 Some checks failed Code tests (Linux) / test (push) Has been cancelled Details Code tests (macOS) / test (push) Has been cancelled Details Test PyTorch 2.0 and 2.5 / test (2.0.1) (push) Has been cancelled Details Test PyTorch 2.0 and 2.5 / test (2.5.0) (push) Has been cancelled Details Code tests (Windows) / test (push) Has been cancelled Details Check hyperlinks / test (push) Has been cancelled Details Spell Check / spellcheck (push) Has been cancelled Details PEP8 Style checks / flake8 (push) Has been cancelled Details [minor] typo & comments (#441 ) * typo & comment - safe -> save - commenting code: batch_size, seq_len = in_idx.shape * comment - adding # NEW for assert num_heads % num_kv_groups == 0 * update memory wording --------- Co-authored-by: rasbt <mail@sebastianraschka.com>		2024-11-18 19:52:42 +09:00
..
01_main-chapter-code	Add missing device transfer in gpt_generate.py (#436 )	2024-11-14 19:12:53 +09:00
02_alternative_weight_loading	fixed num_workers (#229 )	2024-06-19 17:36:46 -05:00
03_bonus_pretraining_on_gutenberg	Update README.md	2024-08-10 07:54:51 -05:00
04_learning_rate_schedulers	Add and link bonus material (#84 )	2024-03-23 07:27:43 -05:00
05_bonus_hparam_tuning	total training iters may equal to warmup_iters (#301 )	2024-08-06 07:10:05 -05:00
06_user_interface	Add user interface to ch06 and ch07 (#366 )	2024-09-21 20:33:00 -05:00
07_gpt_to_llama	[minor] typo & comments (#441 )	2024-11-18 19:52:42 +09:00
08_memory_efficient_weight_loading	update mmap section	2024-10-14 14:27:19 -05:00
README.md	Memory efficient weight loading (#401 )	2024-10-14 10:30:25 -05:00

README.md

Chapter 5: Pretraining on Unlabeled Data

Main Chapter Code

01_main-chapter-code contains the main chapter code

Bonus Materials

02_alternative_weight_loading contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
03_bonus_pretraining_on_gutenberg contains code to pretrain the LLM longer on the whole corpus of books from Project Gutenberg
04_learning_rate_schedulers contains code implementing a more sophisticated training function including learning rate schedulers and gradient clipping
05_bonus_hparam_tuning contains an optional hyperparameter tuning script
06_user_interface implements an interactive user interface to interact with the pretrained LLM
07_gpt_to_llama contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI
08_memory_efficient_weight_loading contains a bonus notebook showing how to load model weights via PyTorch's load_state_dict method more efficiently