BillionaireClubCollc
  • News
  • Notifications
  • Shop
  • Cart
  • Media
  • Advertise with Us
  • Profile
  • Groups
  • Games
  • My Story
  • Chat
  • Contact Us
home shop notifications more
Signin
  •  Profile
  •  Sign Out
Skip to content

Billionaire Club Co LLC

Believe It and You Will Achieve It

Primary Menu
  • Home
  • Politics
  • TSR
  • Anime
  • Michael Jordan vs.Lebron James
  • Crypto
  • Soccer
  • Dating
  • Airplanes
  • Forex
  • Tax
  • New Movies Coming Soon
  • Games
  • CRYPTO INSURANCE
  • Sport
  • MEMES
  • K-POP
  • AI
  • The Bahamas
  • Digital NoMad
  • Joke of the Day
  • RapVerse
  • Stocks
  • SPORTS BETTING
  • Glamour
  • Beauty
  • Travel
  • Celebrity Net Worth
  • TMZ
  • Lotto
  • COVD-19
  • Fitness
  • The Bible is REAL
  • OutDoor Activity
  • Lifestyle
  • Culture
  • Boxing
  • Food
  • LGBTQ
  • Poetry
  • Music
  • Misc
  • Open Source
  • NASA
  • Science
  • Natural & Holstict Med
  • Gardening
  • DYI
  • History
  • Art
  • Education
  • Pets
  • Aliens
  • Astrology
  • Farming and LiveStock
  • LAW
  • Fast & Furious
  • Fishing & Hunting
  • Health
  • Credit Repair
  • Grants
  • All things legal
  • Reality TV
  • Africa Today
  • China Today
  • "DUMB SHIT.."
  • CRYPTO INSURANCE

Performance of Best of N Baseline for Various N and Sample Responses and GPT-4 Judgments

:::info
Authors:
(1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier;
(2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier;
(3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier;
(4) Stefano Ermon, CZ Biohub;
(5) Christopher D. Manning, Stanford University;
(6) Chelsea Finn, Stanford University.
:::
Table of Links
Abstract and 1. Introduction
2 Related Work
3 Preliminaries
4 Direct Preference Optimization
5 Theoretical Analysis of DPO
6 Experiments
7 Discussion, Acknowledgements, and References
Author Contributions
\
A Mathematical Derivations
A.1 Deriving the Optimum of the KL-Constrained Reward Maximization Objective
A.2 Deriving the DPO Objective Under the Bradley-Terry Model
A.3 Deriving the DPO Objective Under the Plackett-Luce Model
A.4 Deriving the Gradient of the DPO Objective and A.5 Proof of Lemma 1 and 2
A.6 Proof of Theorem 1
\
B DPO Implementation Details and Hyperparameters
\
C Further Details on the Experimental Set-Up and C.1 IMDb Sentiment Experiment and Baseline Details
C.2 GPT-4 prompts for computing summarization and dialogue win rates
C.3 Unlikelihood baseline
\
D Additional Empirical Results
D.1 Performance of Best of N baseline for Various N and D.2 Sample Responses and GPT-4 Judgments
D.3 Human study details
Additional Empirical Results
D.1 Performance of Best of N baseline for Various N
We find that the Best of N baseline is a strong (although computationally expensive, requiring sampling many times) baseline in our experiments. We include an evaluation of the Best of N baseline for various N for the Anthropic-HH dialogue and TL;DR summarization; the results are shown in Figure 4.
D.2 Sample Responses and GPT-4 Judgments
In this section, we present examples of comparisons between DPO and the baseline (PPO temp 0. for summarization, and the ground truth chosen response for dialogue). See Tables 4-6 for summarization examples, and Tables 7-10 for dialogue examples.
\

\

\

\

\

\

\

\

\
:::info
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.
:::
\

Welcome to Billionaire Club Co LLC, your gateway to a brand-new social media experience! Sign up today and dive into over 10,000 fresh daily articles and videos curated just for your enjoyment. Enjoy the ad free experience, unlimited content interactions, and get that coveted blue check verification—all for just $1 a month!

Source link

Share
What's your thought on the article, write a comment
0 Comments
×

Sign In to perform this Activity

Sign in
×

Account Frozen

Your account is frozen. You can still view content but cannot interact with it.

Please go to your settings to update your account status.

Open Profile Settings

Ads

  • Billionaire128 Liquid Gold Laptop Sleeve

    $ 28.00
  • Billionaire128 Liquid Gold Yoga Leggings

    $ 42.00
  • Billionaire128 Liquid Gold Flag

    $ 25.00
  • News Social

    • Facebook
    • Twitter
    • Facebook
    • Twitter
    Copyright © 2024 Billionaire Club Co LLC. All rights reserved