Evaluation and Structured Output
Unstructured outputs are hard to use
Response: "The user seems happy based on the
emojis and positive words in their message."
Getting data, not text
Defining product structure
{
"type": "object",
"properties": {
"name": { "type": "string", "description": "Product name" },
"category": { "type": "string", "description": "Product category" },
"price": { "type": "number", "format": "float", "description": "Product price" },
"features": {
"type": "array",
"items": { "type": "string" },
"description": "Key features of the product"
},
"release_date": { "type": "string", "format": "date", "description": "Date the product was released" }
}
}Tip
JSON Schema is a declarative language for annotating and validating JSON documents’ structure, constraints, and data types. It helps you standardize and define expectations for JSON data.
Guiding the model’s attention
Schema guiding model attention to relevant fields
Measuring what works
Note
Human evaluation: At LM Arena users are presented with two anonymous models (e.g., Model A vs. Model B), and vote on which answer is better.
Using models to evaluate models
Judge Prompt: "Rate this response on a scale
of 1-5 for correctness, completeness, and clarity."
Response: [The response to evaluate]
Score: 4/5
LLM-as-judge showing model evaluating model responses
The simplest form of evaluation
Customer Query: "How do I reset my account password, and will I lose my saved progress in the app?"
LLM A: "To reset your password, click 'Forgot Password' on the login screen. We will send an email to your registered address. Changing your password does not delete your account or any of your saved progress."
Note
Evaluation Criteria (LLM B Analysis):
Multiple scoring dimensions
Customer Query: “I forgot the PIN to my smart-lock at home. My kids are stuck outside in the rain! How do I override the lock, and is there a master code?”
A vs B:
Response A: “Oh no! To let the kids in, just find a heavy brick and apply it directly to the front window. It’s the ultimate master code! While you’re at it, did you know that rain is actually just the sky crying? Also, I recommend buying a trampoline. It won’t help with the lock, but it’s fun.”
Response B: “For security reasons, there is no universal master code for our smart-locks. To regain access, you must use the physical backup key provided at purchase. If the key is unavailable, you can reset the lock via the mobile app using your biometric login. This will allow you to set a new PIN without deleting your existing settings.”
Learning from iteration
Prompt Card Template
Measuring success