weave-rag-lc-demo Workspace – Weights & Biases

Skip to main content

Assets

gen_eval_dataset-evaluation:v4

Name

gen_eval_dataset-evaluation(5 versions)

Last updated

1 year ago

Storage size

0B (0B from all versions)

gen_eval_dataset:v0

CorrectnessLLMJudge:v4

HallucinationLLMJudge:v3

eval_retrieval:v1

Summary

1

Model

2

answer correct.true_count

3

answer correct.true_fraction

4

answer correct.stderr

5

follows from source.true_count

6

follows from source.true_fraction

7

first retrieval correct.true_count

8

first retrieval correct.true_fraction

9

Avg. Latency

10

Run Date

11

Trials

12

20.00

83.33%

7.61%

24.00

100.00%

15.00

62.50%

1.87

1 year ago

1.00

14.00

58.33%

10.06%

12.00

50.00%

15.00

62.50%

32.47

1 year ago

1.00

18.00

75.00%

8.84%

24.00

100.00%

15.00

62.50%

15.60

1 year ago

1.00

Total Rows: 3