From 5b967db5e30737931443d081824249c2892f4e13 Mon Sep 17 00:00:00 2001
From: MikeySaw <MikeySaw@users.noreply.github.com>
Date: Mon, 12 Aug 2024 13:40:40 +0000
Subject: [PATCH] deploy: 32fd6b1ddd84bd32eb667305509486e74ce71aeb

---
 chapters/08_decoding/08_01_intro/index.html     |  7 +------
 chapters/08_decoding/08_02_determ/index.html    |  5 +++--
 chapters/08_decoding/08_03_sampling/index.html  |  6 +++++-
 .../08_decoding/08_04_hyper_param/index.html    |  7 ++++---
 .../08_decoding/08_05_eval_metrics/index.html   | 10 +++++++---
 chapters/08_decoding/index.html                 | 17 +++++++++++------
 chapters/08_decoding/index.xml                  |  2 +-
 index.html                                      |  2 +-
 index.xml                                       |  2 +-
 9 files changed, 34 insertions(+), 24 deletions(-)
diff --git a/chapters/08_decoding/08_01_intro/index.html b/chapters/08_decoding/08_01_intro/index.html
index e0e1a19..f652da4 100644
--- a/chapters/08_decoding/08_01_intro/index.html
+++ b/chapters/08_decoding/08_01_intro/index.html
@@ -57,6 +57,7 @@
     </nav>
 </div><div id="content" class="container">
 <h1>Chapter 08.01: What is Decoding?</h1>
+<p>Here we introduce the concept of decoding. Given a prompt and a generative language model, how does it generate text? The model produces a probability distribution over all tokens in the vocabulary. The way the model uses that probability distribution to generate the next token is what is called a decoding strategy.</p>
 <h3 id="lecture-slides">Lecture Slides</h3>
 
 
@@ -74,12 +75,6 @@ <h3 id="lecture-slides">Lecture Slides</h3>
   </a>
 
 
-<h3 id="references">References</h3>
-<ul>
-<li>[1] <a href="https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf">Radford et al., 2018</a></li>
-</ul>
-
-
 <ul class="section_skipper list-unstyled">
 
 
diff --git a/chapters/08_decoding/08_02_determ/index.html b/chapters/08_decoding/08_02_determ/index.html
index 42c375d..32ee6a8 100644
--- a/chapters/08_decoding/08_02_determ/index.html
+++ b/chapters/08_decoding/08_02_determ/index.html
@@ -57,6 +57,7 @@
     </nav>
 </div><div id="content" class="container">
 <h1>Chapter 08.02: Greedy &amp; Beam Search</h1>
+<p>Here we introduce two deterministic decoding strategies, greedy &amp; beam search. Both methods are determenistic, which means there is no sampling involved when generating text. While greedy decoding always chooses the token with the highest probability, while beam search keeps track of multiple beams to generate the next token.</p>
 <h3 id="lecture-slides">Lecture Slides</h3>
 
 
@@ -74,9 +75,9 @@ <h3 id="lecture-slides">Lecture Slides</h3>
   </a>
 
 
-<h3 id="references">References</h3>
+<h3 id="additional-resources">Additional Resources</h3>
 <ul>
-<li>[1] <a href="https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf">Radford et al., 2018</a></li>
+<li><a href="https://d2l.ai/chapter_recurrent-modern/beam-search.html">d2l book chapter about greedy and beam search</a></li>
 </ul>
 
 
diff --git a/chapters/08_decoding/08_03_sampling/index.html b/chapters/08_decoding/08_03_sampling/index.html
index 6387532..dc2c084 100644
--- a/chapters/08_decoding/08_03_sampling/index.html
+++ b/chapters/08_decoding/08_03_sampling/index.html
@@ -57,6 +57,7 @@
     </nav>
 </div><div id="content" class="container">
 <h1>Chapter 08.03: Stochastic Decoding &amp; CS/CD</h1>
+<p>In this chapter you will learn about more methods beyond simple deterministic decoding strategies. We introduce sampling with temperature, where you add a temperature parameter into the softmax formula, top-k [1] and top-p [2] sampling, where you sample from a set of top tokens and finally contrastive search [3] and contrastive decoding [4].</p>
 <h3 id="lecture-slides">Lecture Slides</h3>
 
 
@@ -76,7 +77,10 @@ <h3 id="lecture-slides">Lecture Slides</h3>
 
 <h3 id="references">References</h3>
 <ul>
-<li>[1] <a href="https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf">Radford et al., 2018</a></li>
+<li>[1] <a href="https://arxiv.org/abs/1805.04833">Fan et al., 2018</a></li>
+<li>[2] <a href="https://arxiv.org/abs/1904.09751">Holtzman et al., 2019</a></li>
+<li>[3] <a href="https://arxiv.org/abs/2210.14140">Su et al., 2022</a></li>
+<li>[4] <a href="https://arxiv.org/abs/2210.15097">Li et al., 2023</a></li>
 </ul>
 
 
diff --git a/chapters/08_decoding/08_04_hyper_param/index.html b/chapters/08_decoding/08_04_hyper_param/index.html
index 361b073..a3cc7ad 100644
--- a/chapters/08_decoding/08_04_hyper_param/index.html
+++ b/chapters/08_decoding/08_04_hyper_param/index.html
@@ -57,6 +57,7 @@
     </nav>
 </div><div id="content" class="container">
 <h1>Chapter 08.04: Decoding Hyperparameters &amp; Practical considerations</h1>
+<p>In this chapter you will learn how to use the different decoding strategies in practice. When using models from huggingface you can choose the decoding strategy by specifying the hyperparameters of the <code>generate</code> method of those models.</p>
 <h3 id="lecture-slides">Lecture Slides</h3>
 
 
@@ -74,9 +75,9 @@ <h3 id="lecture-slides">Lecture Slides</h3>
   </a>
 
 
-<h3 id="references">References</h3>
+<h3 id="additional-resources">Additional Resources</h3>
 <ul>
-<li>[1] <a href="https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf">Radford et al., 2018</a></li>
+<li><a href="https://github.com/slds-lmu/lecture_dl4nlp/blob/main/code-demos/decoding_examples.ipynb">Jupyter notebook</a></li>
 </ul>
 
 
@@ -85,7 +86,7 @@ <h3 id="references">References</h3>
   <li id="next_in_section"><a class="btn btn-primary" href="https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_03_sampling/">&#xab; Chapter 08.03: Stochastic Decoding &amp; CS/CD</a></li>
 
 
-  <li id="prev_in_section"><a class="btn btn-primary" href="https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_05_eval_metrics/">Chapter 08.05: Decoding Hyperparameters &amp; Practical considerations &#xbb;</a></li>
+  <li id="prev_in_section"><a class="btn btn-primary" href="https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_05_eval_metrics/">Chapter 08.05: Evaluation Metrics &#xbb;</a></li>
 
 </ul>
 
diff --git a/chapters/08_decoding/08_05_eval_metrics/index.html b/chapters/08_decoding/08_05_eval_metrics/index.html
index 1a33e2e..9429a99 100644
--- a/chapters/08_decoding/08_05_eval_metrics/index.html
+++ b/chapters/08_decoding/08_05_eval_metrics/index.html
@@ -7,7 +7,7 @@
 <link rel="stylesheet" type="text/css" href="/dl4nlp/css/style.css">
 
 
-<title>Deep Learning for Natural Language Processing (DL4NLP) | Chapter 08.05: Decoding Hyperparameters &amp; Practical considerations</title>
+<title>Deep Learning for Natural Language Processing (DL4NLP) | Chapter 08.05: Evaluation Metrics</title>
 
 
 <link rel="apple-touch-icon" sizes="180x180" href="/dl4nlp/apple-touch-icon.png">
@@ -56,7 +56,8 @@
         
     </nav>
 </div><div id="content" class="container">
-<h1>Chapter 08.05: Decoding Hyperparameters &amp; Practical considerations</h1>
+<h1>Chapter 08.05: Evaluation Metrics</h1>
+<p>Here we answer the question on how to evaluate the generated outputs in open ended text generation. We first explain <strong>BLEU</strong> [1] and <strong>ROUGE</strong> [2], which are metrics for tasks with a gold reference. Then we introduce <strong>diversity</strong>, <strong>coherence</strong> [3] and <strong>MAUVE</strong> [4], which are metrics for tasks without a gold reference such as open ended text generation. You will also learn about human evaluation.</p>
 <h3 id="lecture-slides">Lecture Slides</h3>
 
 
@@ -76,7 +77,10 @@ <h3 id="lecture-slides">Lecture Slides</h3>
 
 <h3 id="references">References</h3>
 <ul>
-<li>[1] <a href="https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf">Radford et al., 2018</a></li>
+<li>[1] <a href="https://aclanthology.org/P02-1040.pdf">Papineni et al., 2002</a></li>
+<li>[2] <a href="https://aclanthology.org/W04-1013/">Lin, 2004</a></li>
+<li>[3] <a href="https://arxiv.org/abs/2202.06417">Su et al., 2022</a></li>
+<li>[4] <a href="https://arxiv.org/abs/2102.01454">Pillutla et al., 2021</a></li>
 </ul>
 
 
diff --git a/chapters/08_decoding/index.html b/chapters/08_decoding/index.html
index 79dfb7a..49771b9 100644
--- a/chapters/08_decoding/index.html
+++ b/chapters/08_decoding/index.html
@@ -70,7 +70,8 @@ <h1>Chapter 8: Decoding Strategies</h1>
     <a class="title" href="/dl4nlp/chapters/08_decoding/08_01_intro/">Chapter 08.01: What is Decoding?</a>
     
       
-        <p></p>
+        <p>Here we introduce the concept of decoding. Given a prompt and a generative language model, how does it generate text? The model produces a probability distribution over all tokens in the vocabulary. The way the model uses that probability distribution to generate the next token is what is called a decoding strategy.
+</p>
       
       
 </li>
@@ -79,7 +80,8 @@ <h1>Chapter 8: Decoding Strategies</h1>
     <a class="title" href="/dl4nlp/chapters/08_decoding/08_02_determ/">Chapter 08.02: Greedy &amp; Beam Search</a>
     
       
-        <p></p>
+        <p>Here we introduce two deterministic decoding strategies, greedy &amp;amp; beam search. Both methods are determenistic, which means there is no sampling involved when generating text. While greedy decoding always chooses the token with the highest probability, while beam search keeps track of multiple beams to generate the next token.
+</p>
       
       
 </li>
@@ -88,7 +90,8 @@ <h1>Chapter 8: Decoding Strategies</h1>
     <a class="title" href="/dl4nlp/chapters/08_decoding/08_03_sampling/">Chapter 08.03: Stochastic Decoding &amp; CS/CD</a>
     
       
-        <p></p>
+        <p>In this chapter you will learn about more methods beyond simple deterministic decoding strategies. We introduce sampling with temperature, where you add a temperature parameter into the softmax formula, top-k [1] and top-p [2] sampling, where you sample from a set of top tokens and finally contrastive search [3] and contrastive decoding [4].
+</p>
       
       
 </li>
@@ -97,16 +100,18 @@ <h1>Chapter 8: Decoding Strategies</h1>
     <a class="title" href="/dl4nlp/chapters/08_decoding/08_04_hyper_param/">Chapter 08.04: Decoding Hyperparameters &amp; Practical considerations</a>
     
       
-        <p></p>
+        <p>In this chapter you will learn how to use the different decoding strategies in practice. When using models from huggingface you can choose the decoding strategy by specifying the hyperparameters of the generate method of those models.
+</p>
       
       
 </li>
 
 <li>
-    <a class="title" href="/dl4nlp/chapters/08_decoding/08_05_eval_metrics/">Chapter 08.05: Decoding Hyperparameters &amp; Practical considerations</a>
+    <a class="title" href="/dl4nlp/chapters/08_decoding/08_05_eval_metrics/">Chapter 08.05: Evaluation Metrics</a>
     
       
-        <p></p>
+        <p>Here we answer the question on how to evaluate the generated outputs in open ended text generation. We first explain BLEU [1] and ROUGE [2], which are metrics for tasks with a gold reference. Then we introduce diversity, coherence [3] and MAUVE [4], which are metrics for tasks without a gold reference such as open ended text generation. You will also learn about human evaluation.
+</p>
       
       
 </li>
diff --git a/chapters/08_decoding/index.xml b/chapters/08_decoding/index.xml
index 946f31e..70b2c28 100644
--- a/chapters/08_decoding/index.xml
+++ b/chapters/08_decoding/index.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Chapter 8: Decoding Strategies on Deep Learning for Natural Language Processing (DL4NLP)</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/</link><description>Recent content in Chapter 8: Decoding Strategies on Deep Learning for Natural Language Processing (DL4NLP)</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/index.xml" rel="self" type="application/rss+xml"/><item><title>Chapter 08.01: What is Decoding?</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_01_intro/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_01_intro/</guid><description/></item><item><title>Chapter 08.02: Greedy &amp; Beam Search</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_02_determ/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_02_determ/</guid><description/></item><item><title>Chapter 08.03: Stochastic Decoding &amp; CS/CD</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_03_sampling/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_03_sampling/</guid><description/></item><item><title>Chapter 08.04: Decoding Hyperparameters &amp; Practical considerations</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_04_hyper_param/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_04_hyper_param/</guid><description/></item><item><title>Chapter 08.05: Decoding Hyperparameters &amp; Practical considerations</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_05_eval_metrics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_05_eval_metrics/</guid><description/></item></channel></rss>
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Chapter 8: Decoding Strategies on Deep Learning for Natural Language Processing (DL4NLP)</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/</link><description>Recent content in Chapter 8: Decoding Strategies on Deep Learning for Natural Language Processing (DL4NLP)</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/index.xml" rel="self" type="application/rss+xml"/><item><title>Chapter 08.01: What is Decoding?</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_01_intro/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_01_intro/</guid><description>&lt;p>Here we introduce the concept of decoding. Given a prompt and a generative language model, how does it generate text? The model produces a probability distribution over all tokens in the vocabulary. The way the model uses that probability distribution to generate the next token is what is called a decoding strategy.&lt;/p></description></item><item><title>Chapter 08.02: Greedy &amp; Beam Search</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_02_determ/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_02_determ/</guid><description>&lt;p>Here we introduce two deterministic decoding strategies, greedy &amp;amp; beam search. Both methods are determenistic, which means there is no sampling involved when generating text. While greedy decoding always chooses the token with the highest probability, while beam search keeps track of multiple beams to generate the next token.&lt;/p></description></item><item><title>Chapter 08.03: Stochastic Decoding &amp; CS/CD</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_03_sampling/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_03_sampling/</guid><description>&lt;p>In this chapter you will learn about more methods beyond simple deterministic decoding strategies. We introduce sampling with temperature, where you add a temperature parameter into the softmax formula, top-k [1] and top-p [2] sampling, where you sample from a set of top tokens and finally contrastive search [3] and contrastive decoding [4].&lt;/p></description></item><item><title>Chapter 08.04: Decoding Hyperparameters &amp; Practical considerations</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_04_hyper_param/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_04_hyper_param/</guid><description>&lt;p>In this chapter you will learn how to use the different decoding strategies in practice. When using models from huggingface you can choose the decoding strategy by specifying the hyperparameters of the &lt;code>generate&lt;/code> method of those models.&lt;/p></description></item><item><title>Chapter 08.05: Evaluation Metrics</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_05_eval_metrics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_05_eval_metrics/</guid><description>&lt;p>Here we answer the question on how to evaluate the generated outputs in open ended text generation. We first explain &lt;strong>BLEU&lt;/strong> [1] and &lt;strong>ROUGE&lt;/strong> [2], which are metrics for tasks with a gold reference. Then we introduce &lt;strong>diversity&lt;/strong>, &lt;strong>coherence&lt;/strong> [3] and &lt;strong>MAUVE&lt;/strong> [4], which are metrics for tasks without a gold reference such as open ended text generation. You will also learn about human evaluation.&lt;/p></description></item></channel></rss>
\ No newline at end of file
diff --git a/index.html b/index.html
index 4c12a91..247c72e 100644
--- a/index.html
+++ b/index.html
@@ -230,7 +230,7 @@ <h1>Deep Learning for NLP (DL4NLP)</h1>
   
   <li><a class="title" href="/dl4nlp/chapters/08_decoding/08_04_hyper_param/">Chapter 08.04: Decoding Hyperparameters &amp; Practical considerations</a></li>
   
-  <li><a class="title" href="/dl4nlp/chapters/08_decoding/08_05_eval_metrics/">Chapter 08.05: Decoding Hyperparameters &amp; Practical considerations</a></li>
+  <li><a class="title" href="/dl4nlp/chapters/08_decoding/08_05_eval_metrics/">Chapter 08.05: Evaluation Metrics</a></li>
   
 </ul>
 
diff --git a/index.xml b/index.xml
index beca6f0..8aa9ee9 100644
--- a/index.xml
+++ b/index.xml
@@ -33,5 +33,5 @@ During this process, unnecessary parameters and redundant information are discar
 For XLNet, the basic idea is to overcome the limitations of unidirectional and bidirectional language models by introducing a permutation-based pre-training objective, the so called permutation language modeling (PLM), that enables the model to consider all possible permutations of the input tokens, capturing bidirectional context.&lt;/p></description></item><item><title>Chapter 06.02: Tasks as text-to-text problem</title><link>https://slds-lmu.github.io/dl4nlp/chapters/06_post_bert_t5/06_02_text2text/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/06_post_bert_t5/06_02_text2text/</guid><description>&lt;p>Reformulating various NLP tasks as text-to-text tasks aims to simplify model architectures and improve performance by treating all tasks as instances of generating output text from input text.
 This approach addresses shortcomings of BERT&amp;rsquo;s original design, where different tasks required different output layers and training objectives, leading to a complex multitask learning setup. By unifying tasks under a single text-to-text framework, models can be trained more efficiently and generalize better across diverse tasks and domains.&lt;/p></description></item><item><title>Chapter 06.03: Text-to-Text Transfer Transformer</title><link>https://slds-lmu.github.io/dl4nlp/chapters/06_post_bert_t5/06_03_t5/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/06_post_bert_t5/06_03_t5/</guid><description>&lt;p>T5 (Text-To-Text Transfer Transformer) [1] aims to unify various natural language processing tasks by framing them all as text-to-text transformations, simplifying model architectures and enabling flexible training across diverse tasks.
 It achieves this by formulating input-output pairs for different tasks as text sequences, allowing the model to learn to generate target text from source text regardless of the specific task, facilitating multitask learning and transfer learning across tasks with a single, unified architecture.&lt;/p></description></item><item><title>Chapter 07.01: GPT-1 (2018)</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_01_gpt/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_01_gpt/</guid><description>&lt;p>GPT-1 [1] introduces a novel approach to natural language processing by employing a generative transformer architecture pre-trained on a vast corpus of text data, where task-specific input transformations are performed to adapt the model to different tasks.
-By fine-tuning the model on task-specific data with minimal changes to the architecture, GPT-1 demonstrates the effectiveness of transfer learning and showcases the potential of generative transformers in a wide range of natural language understanding and generation tasks.&lt;/p></description></item><item><title>Chapter 07.02: GPT-2 (2019)</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_02_gpt2/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_02_gpt2/</guid><description>&lt;p>GPT-2 [1] builds upon its predecessor with a larger model size, more training data, and improved architecture. Like GPT-1, GPT-2 utilizes a generative transformer architecture but features a significantly increased number of parameters, leading to enhanced performance in language understanding and generation tasks. Additionally, GPT-2 introduces a scaled-up version of the training data and fine-tuning techniques to further refine its language capabilities.&lt;/p></description></item><item><title>Chapter 07.03: GPT-3 (2020) &amp; X-shot learning</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_03_gpt3xshot/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_03_gpt3xshot/</guid><description>&lt;p>In this chapter, we&amp;rsquo;ll explore GPT-3 [1]. GPT-3 builds on the successes of its predecessors, boasting a massive architecture and extensive pre-training on diverse text data. Unlike previous models, GPT-3 introduces a few-shot learning approach, allowing it to perform tasks with minimal task-specific training data. With its remarkable scale and versatility, GPT-3 represents a significant advancement in natural language processing, showcasing the potential of large-scale transformer architectures in various applications.&lt;/p></description></item><item><title>Chapter 07.04: Tasks &amp; Performance</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_04_tasks/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_04_tasks/</guid><description>&lt;p>GPT-3 has X-shot abilities, meaning it is able to perform tasks with minimal or even no task-specific training data. This chapter provides an overview over various different tasks and illustrates the X-shot capabilities of GPT-3. Additionally you will be introduced to relevant benchmarks.&lt;/p></description></item><item><title>Chapter 07.05: Discussion: Ethics and Cost</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_05_discussion/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_05_discussion/</guid><description>&lt;p>In discussing GPT-3&amp;rsquo;s ethical implications, it is crucial to consider its potential societal impact, including issues surrounding bias, misinformation, and data privacy. With its vast language generation capabilities, GPT-3 has the potential to disseminate misinformation at scale, posing risks to public trust and safety. Additionally, the model&amp;rsquo;s reliance on large-scale pretraining data raises concerns about reinforcing existing biases present in the data, perpetuating societal inequalities. Furthermore, the use of GPT-3 in sensitive applications such as content generation, automated customer service, and decision-making systems raises questions about accountability, transparency, and unintended consequences. As such, responsible deployment of GPT-3 requires careful consideration of ethical guidelines, regulatory frameworks, and robust mitigation strategies to address these challenges and ensure the model&amp;rsquo;s ethical use in society.&lt;/p></description></item><item><title>Chapter 08.01: What is Decoding?</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_01_intro/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_01_intro/</guid><description/></item><item><title>Chapter 08.02: Greedy &amp; Beam Search</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_02_determ/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_02_determ/</guid><description/></item><item><title>Chapter 08.03: Stochastic Decoding &amp; CS/CD</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_03_sampling/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_03_sampling/</guid><description/></item><item><title>Chapter 08.04: Decoding Hyperparameters &amp; Practical considerations</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_04_hyper_param/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_04_hyper_param/</guid><description/></item><item><title>Chapter 08.05: Decoding Hyperparameters &amp; Practical considerations</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_05_eval_metrics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_05_eval_metrics/</guid><description/></item><item><title>Chapter 09.01: Instruction Fine-Tuning</title><link>https://slds-lmu.github.io/dl4nlp/chapters/09_llm/09_01_instruction_tuning/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/09_llm/09_01_instruction_tuning/</guid><description>&lt;p>Instruction fine-tuning aims to enhance the adaptability of large language models (LLMs) by providing explicit instructions or task descriptions, enabling more precise control over model behavior and adaptation to diverse contexts.
+By fine-tuning the model on task-specific data with minimal changes to the architecture, GPT-1 demonstrates the effectiveness of transfer learning and showcases the potential of generative transformers in a wide range of natural language understanding and generation tasks.&lt;/p></description></item><item><title>Chapter 07.02: GPT-2 (2019)</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_02_gpt2/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_02_gpt2/</guid><description>&lt;p>GPT-2 [1] builds upon its predecessor with a larger model size, more training data, and improved architecture. Like GPT-1, GPT-2 utilizes a generative transformer architecture but features a significantly increased number of parameters, leading to enhanced performance in language understanding and generation tasks. Additionally, GPT-2 introduces a scaled-up version of the training data and fine-tuning techniques to further refine its language capabilities.&lt;/p></description></item><item><title>Chapter 07.03: GPT-3 (2020) &amp; X-shot learning</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_03_gpt3xshot/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_03_gpt3xshot/</guid><description>&lt;p>In this chapter, we&amp;rsquo;ll explore GPT-3 [1]. GPT-3 builds on the successes of its predecessors, boasting a massive architecture and extensive pre-training on diverse text data. Unlike previous models, GPT-3 introduces a few-shot learning approach, allowing it to perform tasks with minimal task-specific training data. With its remarkable scale and versatility, GPT-3 represents a significant advancement in natural language processing, showcasing the potential of large-scale transformer architectures in various applications.&lt;/p></description></item><item><title>Chapter 07.04: Tasks &amp; Performance</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_04_tasks/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_04_tasks/</guid><description>&lt;p>GPT-3 has X-shot abilities, meaning it is able to perform tasks with minimal or even no task-specific training data. This chapter provides an overview over various different tasks and illustrates the X-shot capabilities of GPT-3. Additionally you will be introduced to relevant benchmarks.&lt;/p></description></item><item><title>Chapter 07.05: Discussion: Ethics and Cost</title><link>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_05_discussion/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/07_gpt/07_05_discussion/</guid><description>&lt;p>In discussing GPT-3&amp;rsquo;s ethical implications, it is crucial to consider its potential societal impact, including issues surrounding bias, misinformation, and data privacy. With its vast language generation capabilities, GPT-3 has the potential to disseminate misinformation at scale, posing risks to public trust and safety. Additionally, the model&amp;rsquo;s reliance on large-scale pretraining data raises concerns about reinforcing existing biases present in the data, perpetuating societal inequalities. Furthermore, the use of GPT-3 in sensitive applications such as content generation, automated customer service, and decision-making systems raises questions about accountability, transparency, and unintended consequences. As such, responsible deployment of GPT-3 requires careful consideration of ethical guidelines, regulatory frameworks, and robust mitigation strategies to address these challenges and ensure the model&amp;rsquo;s ethical use in society.&lt;/p></description></item><item><title>Chapter 08.01: What is Decoding?</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_01_intro/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_01_intro/</guid><description>&lt;p>Here we introduce the concept of decoding. Given a prompt and a generative language model, how does it generate text? The model produces a probability distribution over all tokens in the vocabulary. The way the model uses that probability distribution to generate the next token is what is called a decoding strategy.&lt;/p></description></item><item><title>Chapter 08.02: Greedy &amp; Beam Search</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_02_determ/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_02_determ/</guid><description>&lt;p>Here we introduce two deterministic decoding strategies, greedy &amp;amp; beam search. Both methods are determenistic, which means there is no sampling involved when generating text. While greedy decoding always chooses the token with the highest probability, while beam search keeps track of multiple beams to generate the next token.&lt;/p></description></item><item><title>Chapter 08.03: Stochastic Decoding &amp; CS/CD</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_03_sampling/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_03_sampling/</guid><description>&lt;p>In this chapter you will learn about more methods beyond simple deterministic decoding strategies. We introduce sampling with temperature, where you add a temperature parameter into the softmax formula, top-k [1] and top-p [2] sampling, where you sample from a set of top tokens and finally contrastive search [3] and contrastive decoding [4].&lt;/p></description></item><item><title>Chapter 08.04: Decoding Hyperparameters &amp; Practical considerations</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_04_hyper_param/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_04_hyper_param/</guid><description>&lt;p>In this chapter you will learn how to use the different decoding strategies in practice. When using models from huggingface you can choose the decoding strategy by specifying the hyperparameters of the &lt;code>generate&lt;/code> method of those models.&lt;/p></description></item><item><title>Chapter 08.05: Evaluation Metrics</title><link>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_05_eval_metrics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/08_decoding/08_05_eval_metrics/</guid><description>&lt;p>Here we answer the question on how to evaluate the generated outputs in open ended text generation. We first explain &lt;strong>BLEU&lt;/strong> [1] and &lt;strong>ROUGE&lt;/strong> [2], which are metrics for tasks with a gold reference. Then we introduce &lt;strong>diversity&lt;/strong>, &lt;strong>coherence&lt;/strong> [3] and &lt;strong>MAUVE&lt;/strong> [4], which are metrics for tasks without a gold reference such as open ended text generation. You will also learn about human evaluation.&lt;/p></description></item><item><title>Chapter 09.01: Instruction Fine-Tuning</title><link>https://slds-lmu.github.io/dl4nlp/chapters/09_llm/09_01_instruction_tuning/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/09_llm/09_01_instruction_tuning/</guid><description>&lt;p>Instruction fine-tuning aims to enhance the adaptability of large language models (LLMs) by providing explicit instructions or task descriptions, enabling more precise control over model behavior and adaptation to diverse contexts.
 This approach involves fine-tuning LLMs on task-specific instructions or prompts, guiding the model to generate outputs that align with the given instructions. By conditioning the model on explicit instructions, instruction fine-tuning facilitates more accurate and tailored responses, making LLMs more versatile and effective in various applications such as language translation, text summarization, and question answering.&lt;/p></description></item><item><title>Chapter 09.02: Chain-of-thought Prompting</title><link>https://slds-lmu.github.io/dl4nlp/chapters/09_llm/09_02_cot/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/09_llm/09_02_cot/</guid><description>&lt;p>Chain of thought (CoT) prompting [1] is a prompting method that encourage Large Language Models (LLMs) to explain their reasoning. This method contrasts with standard prompting by not only seeking an answer but also requiring the model to explain its steps to arrive at that answer. By guiding the model through a logical chain of thought, chain of thought prompting encourages the generation of more structured and cohesive text, enabling LLMs to produce more accurate and informative outputs across various tasks and domains.&lt;/p></description></item><item><title>Chapter 09.03: Emergent Abilities</title><link>https://slds-lmu.github.io/dl4nlp/chapters/09_llm/09_03_emerging/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/chapters/09_llm/09_03_emerging/</guid><description>&lt;p>Various researchers have reported that LLMs seem to have emergent abilities. These are sudden appearances of new abilities when Large Language Models (LLMs) are scaled up. In this section we introduce the concept of emergent abilities and discuss a potential counter argument for the concept of emergence.&lt;/p></description></item><item><title/><link>https://slds-lmu.github.io/dl4nlp/exercises/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/exercises/</guid><description>Exercises Exercise Chapter 1 Exercise Chapter 2 Exercise Chapter 3 Exercise Chapter 4 Exercise Chapter 5 Exercise Chapter 6 Exercise Chapter 7 Exercise Chapter 8 Exercise Chapter 9 Exercise Chapter 10</description></item><item><title/><link>https://slds-lmu.github.io/dl4nlp/references/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/references/</guid><description>References Your markdown comes here!</description></item><item><title>Cheat Sheets</title><link>https://slds-lmu.github.io/dl4nlp/appendix/01_cheat_sheets/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/appendix/01_cheat_sheets/</guid><description>possible coming in the future ..</description></item><item><title>Errata</title><link>https://slds-lmu.github.io/dl4nlp/appendix/02_errata/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/appendix/02_errata/</guid><description>Errata in the slides shown in the videos to be added once videos + updated slides thereafter are available 😉</description></item><item><title>Related Courses</title><link>https://slds-lmu.github.io/dl4nlp/appendix/03_related/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://slds-lmu.github.io/dl4nlp/appendix/03_related/</guid><description>Other ML courses Introduction to Machine Learning (I2ML) Introduction to Deep Learning (I2DL)</description></item></channel></rss>
\ No newline at end of file