trlx Reports – Weights & Biases

Skip to main content

Difference due to the change in base_trainer.decode

0

2023-09-05

2 years ago

Difference due to changes in base_trainer.decode

0

2023-09-05

2 years ago

w/ evol-15.08-beluga-13b-threshold

0

2023-08-16

2 years ago

lora sentiments multigpu comparison

0

2023-06-22

2 years ago

Different position_ids calculations

0

2023-06-21

2 years ago

0

2023-05-12

2 years ago

Single Q vs two Qs on randomwalks

0

2023-05-09

2 years ago

update-requirements v. main

0

2023-05-01

2 years ago

Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts #422

0

2023-04-27

2 years ago

Sequence-wise v. token-wise mean KL

0

2023-04-19

2 years ago

[fix] Fix ILQL head sync under ZeRO3 #387

https://github.com/CarperAI/trlx/pull/387

0

2023-03-23

2 years ago

Cuda OOM with PPO on GPT2-medium #372

https://github.com/CarperAI/trlx/issues/372

0

2023-03-21

2 years ago

Timing difference

0

2023-03-16

2 years ago

Untitled Report

0

2023-03-06

2 years ago

0

2023-03-06

2 years ago

0

2023-03-06

2 years ago

0

2023-03-06

2 years ago

Convert the rest of configs from ymls #346

https://github.com/CarperAI/trlx/pull/346

0

2023-03-01

2 years ago

Add batch_size option for the reward model #322

https://github.com/CarperAI/trlx/pull/322

0

2023-02-21

2 years ago

Make gather_for_metrics usage more strict #315

seq2seq example, the difference in reward/mean reflects the difference in len(eval_samples), rest of the metrics are the same

0

2023-02-20

2 years ago

Make gather_for_metrics more strict #315

https://github.com/CarperAI/trlx/pull/315

0

2023-02-19

2 years ago

Make gather_for_metrics more strict #315

https://github.com/CarperAI/trlx/pull/315, no changes in behavior

0

2023-02-17

2 years ago

Untitled Report

0

2023-02-13

3 years ago

Gather experience samples #305

https://github.com/CarperAI/trlx/pull/305

0

2023-02-13

3 years ago

Gather experience samples #305

no changes with 1. determinstic reward_fn and 2. a single process runs with usual sentiment pipeline

0

2023-02-11

3 years ago

Gather experience samples #305

https://github.com/CarperAI/trlx/pull/305

0

2023-02-10

3 years ago

Add Accelerate SFT Trainer #280

0

2023-02-08

3 years ago

Add Accelerate SFT Trainer #280

https://github.com/CarperAI/trlx/pull/280 CUDA_VISIBLE_DEVICES=0 python examples/ppo_sentiments.py & CUDA_VISIBLE_DEVICES=1 python examples/sft_sentiments.py & CUDA_VISIBLE_DEVICES=2 python examples/ilql_sentiments.py &

0

2023-02-07

3 years ago

Set deepspeed's fp16 auto_cast to false #279

0

2023-02-06

3 years ago

[fix] Set deepspeed's fp16 auto_cast to false #279

https://github.com/CarperAI/trlx/pull/279

0

2023-02-06

3 years ago

Improve PPO readability #210

https://github.com/CarperAI/trlx/pull/210

0

2023-02-05

3 years ago

Fix distributed dataloaders & deduplicate eval #276

https://github.com/CarperAI/trlx/pull/276

0

2023-02-04

3 years ago

Toy example for PPO does learn as much as expected

0

2023-02-03

3 years ago

Fix empty elements

0

2023-02-03

3 years ago

Fix heads dtype

zero3 now works for ILQL, no changes for PPO

0

2023-01-23

3 years ago

random_walks_document v. main

0

2023-01-22

3 years ago

gptj-rm-static v. gptj-rm-hh ?

0

2023-01-19

3 years ago

Difference truncation_side left/right

0

2023-01-13

3 years ago

Update generation utilities #172

ppo_sentiments

0

2023-01-12

3 years ago

Update generation utilities #172

ilql_sentiments

0

2023-01-12

3 years ago

Untitled Report

0

2023-01-04

3 years ago

Fix ppo calculation with unequal generation lenghts

0

2022-12-14

3 years ago