- Log and debug all LLM inputs, outputs, and traces so you can easily examine and debug failures and understand how data flows through your application
- Create a system of record for your LLM apps
- Build rigorous evaluations of LLM use cases to score any aspect of your app
- Organize information generated across the LLM workflow, from experimentation to evaluations to production
- Evaluate and analyze LLM performance and visually compare model results across different performance dimensions
- Use human feedback such as emojis and text notes to compile evaluation datasets, monitor performance, and collect examples for fine-tuning