Skip to main content

Structured Outputs Journey

My experiments with trying to get structured outputs from LLMs
Created on February 2|Last edited on February 2
I created several tasks using chatGPT to practise extraction of structured data from a given document/prose. I have documented them all under the following github repository


Although I didn't use weave, I tried to apply concepts which I have learned from the course on using instructor for structured output extraction. One important aspect that I found was even if some information is not provided in the prose, the LLM imagines that information, although it does try to do so in a beautiful way as depicted in the output of the third task in the above repo. But we need to add checks and balances wherever possible to address this.
I don't think a simple context check will work so well because attributions can be mistaken where the context has all the information but there is a line item whose amount is attributed to another line item and so on. Also hard checks are very trivial, we need intelligence to incorporate soft checks to make validation more effective for eg. Jason Liu and Jasoon Liu should be inferred as different but we may ask a clarification question or something while validating or write some partial validations etc.
I liked the course and it's learnings are gonna stay with me. I will keep posting my experiments to the above github link. Thanks a lot Jason and wandb for this course https://wandb.ai/site/courses/structured-outputs/