Explore topic-wise interview questions and answers.
ReAct & Tool Use
QUESTION 01
What is the ReAct (Reasoning + Acting) framework and how does it work?
š DEFINITION:
ReAct is a prompting framework that interleaves reasoning traces (thoughts) with task-specific actions, enabling LLMs to reason about problems, take actions to gather information, and adjust their reasoning based on observations. It combines chain-of-thought reasoning with tool use in a single, integrated loop.
āļø HOW IT WORKS:
The ReAct loop follows a pattern: Thought ā Action ā Observation ā Thought... The model generates a Thought explaining what it needs to do, then an Action specifying a tool call (e.g., Search[query], Calculator[expression]). The system executes the action and returns an Observation. The model then uses this observation in its next Thought, continuing until it has enough information to produce a Final Answer. This interleaving allows the model to dynamically decide what information it needs and adapt its plan based on what it finds.
š” WHY IT MATTERS:
ReAct solves two key problems: 1) Reasoning alone (CoT) can't access external information, leading to hallucination or outdated knowledge. 2) Acting alone (tool use without reasoning) is brittle - agents don't know why they're taking actions. ReAct combines both, making agents more robust and interpretable. The reasoning trace shows why each action was taken, enabling debugging and trust. ReAct is the foundation of most modern agent systems.
š EXAMPLE:
Query: 'What was the temperature in Paris when the Eiffel Tower opened?' ReAct trace: Thought: I need the opening date of Eiffel Tower and weather data for that date. Action: Search[Eiffel Tower opening date]. Observation: March 31, 1889. Thought: Now I need weather data for Paris on that date. Action: Weather[Paris, 1889-03-31]. Observation: 12°C, partly cloudy. Thought: I have both pieces. Final Answer: The temperature was 12°C with partly cloudy skies. This interleaved reasoning+acting is ReAct's power.
QUESTION 02
What is function calling / tool use in LLMs and how is it implemented?
š DEFINITION:
Function calling (or tool use) is a capability that allows LLMs to invoke external functions or APIs by generating structured outputs (typically JSON) that specify which tool to call and with what parameters. The system then executes the tool and returns the result to the model for further processing.
āļø HOW IT WORKS:
Implementation involves: 1) Tool definition - developers define available tools with names, descriptions, and parameter schemas (JSON Schema). 2) Tool registration - these definitions are provided to the LLM in the prompt or via API parameters. 3) Model decides - during generation, the model can output a special structured response indicating it wants to call a tool. 4) Parsing - the framework parses this response, validates parameters against schema. 5) Execution - the tool function is called with the provided parameters. 6) Result return - the tool output is sent back to the model as a new message. 7) Loop continues - model uses result to continue generation. Modern models (GPT-4, Claude 3) have native function calling APIs that handle this flow.
š” WHY IT MATTERS:
Function calling transforms LLMs from pure text generators into interactive agents that can affect the world. It enables: accessing real-time data (weather, stocks), performing computations (calculator, code execution), interacting with APIs (CRM, databases), and controlling systems (smart home, robotics). It's the foundation of agentic applications.
š EXAMPLE:
Weather agent with function calling. Tool definition: get_weather(location: string, date: string). User: 'What's the weather in Paris tomorrow?' Model generates: get_weather(location='Paris', date='2024-03-15'). System calls weather API, returns '15°C, sunny'. Model then says: 'Tomorrow in Paris will be 15°C and sunny.' This structured interaction is more reliable than having model generate free-form text and hoping it matches.
QUESTION 03
How do you define tools for an LLM to use (JSON schema, OpenAI tool format)?
š DEFINITION:
Tools for LLMs are defined using structured specifications that describe the tool's name, purpose, and expected parameters. The most common format is JSON Schema, which provides a standardized way to describe parameter types, requirements, and descriptions that the LLM can understand.
āļø HOW IT WORKS:
A tool definition typically includes: 1) type: 'function' (in OpenAI format). 2) function: object with name, description, and parameters. 3) parameters: JSON Schema object defining each parameter: type (string, number, array, etc.), description (explaining what the parameter is for), enum (allowed values), required (list of required parameters). Example: {name: get_weather, description: Get current weather for a location, parameters: {type: object, properties: {location: {type: string, description: City name}, unit: {type: string, enum: [celsius, fahrenheit]}}, required: [location]}}. This schema tells the model exactly how to call the tool.
š” WHY IT MATTERS:
Good tool definitions are critical for reliable tool use. Vague descriptions cause the model to misuse tools. Missing parameters cause errors. Clear schemas with examples improve accuracy. The description field is especially important - it's the model's only guide to when and how to use the tool. Investing in high-quality tool definitions directly improves agent performance.
š EXAMPLE:
Poor tool definition: get_stock_price(symbol) with no description. Model might call with 'Apple' (should be 'AAPL'). Good definition: get_stock_price(symbol: string, description: 'Stock ticker symbol (e.g., AAPL for Apple)'). Model now knows to use ticker symbols. Adding enum for supported symbols further improves. This attention to detail reduces errors and makes agents reliable.
QUESTION 04
What is the difference between parallel tool calls and sequential tool calls?
š DEFINITION:
Parallel tool calls allow an LLM to invoke multiple tools simultaneously in a single response, while sequential tool calls require one tool per response, waiting for each result before proceeding. Parallel calls reduce latency for independent tasks but require careful handling of dependencies.
āļø HOW IT WORKS:
Sequential: model outputs one tool call, system executes, returns result, model continues. Simple, ensures dependencies handled, but slower for multiple independent queries. Parallel: model outputs array of tool calls (e.g., [tool1, tool2, tool3]). System executes all in parallel (or concurrently), returns all results together. Model then processes all observations. Supported by modern APIs (OpenAI parallel function calling). Requirements: tools must be independent (no ordering dependencies), and model must be capable of planning multiple calls.
š” WHY IT MATTERS:
Parallel calls significantly reduce latency. For a task requiring three independent searches, sequential takes 3Ć round trips, parallel takes 1. This improves user experience. However, parallel requires more sophisticated planning - model must identify independent sub-tasks. It also complicates error handling (if one fails, others may still succeed). Many frameworks support both modes, letting developers choose based on task.
š EXAMPLE:
Query: 'Get weather in Paris, London, and New York.' Sequential: call get_weather('Paris') ā result ā call get_weather('London') ā result ā call get_weather('NYC') ā result. 3 round trips. Parallel: model outputs three tool calls simultaneously. System executes all three in parallel, returns all results together. 1 round trip, 3Ć faster. Both produce same information, but parallel is much faster. This is why parallel calling is preferred for independent tasks.
QUESTION 05
How does an LLM decide which tool to call and when?
š DEFINITION:
An LLM decides which tool to call based on the tool descriptions provided, the conversation context, and its training on similar tasks. The decision is essentially a classification and planning problem: given the user's request and available tools, which tool(s) can help, and in what order?
āļø HOW IT WORKS:
The process: 1) Tool descriptions are included in the prompt (or via API parameters) with name, description, and parameter schemas. 2) The model analyzes the user's request and determines what information or actions are needed. 3) It matches the request against tool descriptions - if a tool's description matches the need, it's a candidate. 4) The model also considers conversation history and previous tool results. 5) It then decides whether to: respond directly (no tool needed), call one tool, call multiple tools in parallel, or start a sequential chain. 6) This decision is influenced by training (models fine-tuned on tool use examples) and the quality of tool descriptions.
š” WHY IT MATTERS:
Reliable tool selection is critical. If model calls wrong tool, agent fails. If it calls too early (before enough context), it may waste calls. If it never calls when needed, it hallucinates. Good tool descriptions (clear, specific) improve accuracy. Some frameworks add few-shot examples of when to use each tool. For complex scenarios, the model may need to plan a sequence, which requires stronger reasoning.
š EXAMPLE:
User: 'What's the stock price of Apple and the weather in Cupertino?' Model has tools: get_stock_price(symbol), get_weather(location). It decides to call both in parallel: stock tool with symbol 'AAPL' (inferred from Apple), weather tool with location 'Cupertino'. This requires understanding that 'Apple' maps to 'AAPL', and that Cupertino is relevant location. Good tool descriptions ('use stock ticker symbols like AAPL') and model knowledge make this work.
QUESTION 06
What are common tools in agentic systems (web search, code execution, database query)?
š DEFINITION:
Agentic systems use a variety of tools to extend LLM capabilities beyond text generation. Common categories include information retrieval (web search, vector DB), computation (calculator, code execution), data access (database queries, API calls), and action execution (sending emails, controlling systems).
āļø HOW IT WORKS:
Web search: tool that queries search API, returns relevant snippets. Code execution: runs Python/JavaScript in sandbox, returns output. Database query: executes SQL, returns results. Calculator: performs precise arithmetic. Document retrieval: queries vector DB for RAG. Email/slack: sends messages. Calendar: schedules events. File I/O: reads/writes files. API calls: interacts with external services (CRM, payment). Each tool has specific parameters and returns structured results that the agent can use.
š” WHY IT MATTERS:
Tools are how agents connect to the world. Without them, agents are limited to knowledge in their training data (stale) and text output (no actions). With tools, agents can access real-time information, perform computations, and take actions. The set of tools defines what an agent can do. Building good tools - reliable, well-documented, safe - is key to agent capabilities.
š EXAMPLE:
Research agent with tools: 1) web_search(query) - finds recent information. 2) arxiv_search(query) - finds academic papers. 3) code_interpreter(code) - runs Python for data analysis. 4) vector_retriever(query) - searches internal knowledge base. User: 'Analyze recent trends in LLM research.' Agent: web_search for recent articles, arxiv_search for papers, code_interpreter to analyze citation patterns, synthesizes comprehensive answer. Without these tools, agent would rely on outdated training data. Tools make it capable.
QUESTION 07
How do you handle tool errors and retries in an agent loop?
š DEFINITION:
Handling tool errors in agent loops requires strategies to detect failures, retry appropriately, and fall back gracefully. Since agents interact with unreliable external systems, robust error handling is essential for production reliability.
āļø HOW IT WORKS:
Approaches: 1) Automatic retries - for transient errors (network, rate limits), retry with exponential backoff. 2) Error messages - return structured error to agent, letting it decide next action. 3) Alternative tools - if primary tool fails, agent can try backup (e.g., web search fails ā try database). 4) Simplification - agent can break complex tool call into simpler ones. 5) Human escalation - if all fails, ask user. 6) Circuit breakers - stop trying failing tools after repeated errors. 7) Validation - check parameters before calling to prevent errors. Frameworks provide hooks for custom error handling.
š” WHY IT MATTERS:
Tool errors are inevitable - APIs go down, rate limits hit, parameters malformed. Without handling, agent crashes or gives wrong answer. Good error handling makes agents resilient. It also prevents wasted API calls and costs. For agents that take real actions (e.g., booking flights), error handling is safety-critical.
š EXAMPLE:
Agent calls weather API with malformed location 'Pari'. API returns 400 error. System returns error message to agent. Agent thinks: 'Location not recognized, maybe typo. Try 'Paris'.' Retries, succeeds. Without error handling, agent would fail. Another: API rate limited. System auto-retries after 1s, succeeds. User never sees failure. This resilience is what makes agents production-ready.
QUESTION 08
What is the Thought-Action-Observation loop in ReAct agents?
š DEFINITION:
The Thought-Action-Observation loop is the core cycle of ReAct agents. The agent repeatedly: 1) Thinks about what to do next based on current state, 2) Takes an Action (calls a tool), 3) Observes the result, and 4) Incorporates that observation into its next Thought. This continues until the task is complete.
āļø HOW IT WORKS:
Loop structure: 1) Thought - agent's reasoning about the current situation: what it knows, what it needs, what to do next. This provides transparency into agent's decision process. 2) Action - a concrete step, typically a tool call with parameters. Formatted in structured way (e.g., Search[query]). 3) Observation - the result of the action, returned by the tool. 4) Repeat - agent uses observation to inform next Thought. The loop ends when agent produces Final Answer (no more actions needed). This cycle is managed by an executor (AgentExecutor in LangChain) that parses actions, executes tools, and feeds back observations.
š” WHY IT MATTERS:
The Thought-Action-Observation loop makes agents interpretable and controllable. You can see why the agent took each action (Thought), what it did (Action), and what it learned (Observation). This transparency enables debugging, trust, and refinement. Without it, agents are black boxes. The loop also provides structure for multi-step tasks, preventing the agent from getting lost.
š EXAMPLE:
Travel agent: Thought: 'User wants to book flight to Paris. Need dates.' Action: AskUser[travel dates]. Observation: 'March 15-20'. Thought: 'Now search for flights.' Action: SearchFlights[Paris, March 15-20]. Observation: 'List of flights...' Thought: 'Present options to user.' Final Answer: 'Here are flights...' Each step is explicit, debuggable. If agent goes wrong, you see where: e.g., wrong dates in Thought ā fix prompt. Loop provides this visibility.
QUESTION 09
How do you prevent an agent from calling tools unnecessarily?
š DEFINITION:
Preventing unnecessary tool calls is important for cost control, latency, and avoiding excessive API usage. Strategies include prompting, cost awareness, and validation layers that check whether a tool call is actually needed before execution.
āļø HOW IT WORKS:
Techniques: 1) Prompting - instruct agent to call tools only when necessary: 'Only use tools if you don't have the information or need to perform an action.' 2) Confidence threshold - agent can be trained to call tools only when confidence in answer is low. 3) Verification layer - before executing tool, check if answer already in context or conversation. 4) Tool call budgeting - limit number of tool calls per session. 5) Cost feedback - provide estimated cost of tool calls to agent (though models don't truly understand cost). 6) Caching - store frequent tool results to avoid repeated calls. 7) Human approval - for expensive/dangerous tools, require confirmation.
š” WHY IT MATTERS:
Unnecessary tool calls waste money and slow down responses. An agent that calls web search for every query, even common knowledge, could cost 10Ć more than needed. In production, this adds up. It also frustrates users with slow responses. Preventing unnecessary calls makes agents efficient and cost-effective.
š EXAMPLE:
Customer support agent with tool to check order status. User: 'What's your phone number?' Agent without guard: calls order status tool (unnecessary), then realizes, then answers. Waste. Agent with prompt: 'Only use tools when needed for the query.' Recognizes phone number is common knowledge, answers directly. Later, user asks about specific order: then calls tool appropriately. This saves costs and speeds response. Good prompting reduces unnecessary calls significantly.
QUESTION 10
What is the difference between a tool, a plugin, and an API in the context of agents?
š DEFINITION:
In agent contexts, these terms are often used interchangeably but have subtle distinctions: a tool is a function an agent can call; a plugin is a packaged set of tools with some discovery mechanism; an API is the underlying service interface that tools may wrap.
āļø HOW IT WORKS:
Tool: the atomic unit - a single function with name, description, parameters. Example: get_weather(location). Plugin: a collection of related tools, often with additional metadata, authentication, and discovery. Example: 'Weather plugin' containing get_current, get_forecast, get_history. Plugins may be dynamically discovered by agent frameworks. API: the actual web service that provides data/actions. Tools are usually thin wrappers around APIs, handling authentication, rate limiting, and response parsing.
š” WHY IT MATTERS:
Understanding the distinction helps in design. Tools are what agents directly use - they should be granular and focused. Plugins help organize many tools and enable dynamic discovery (e.g., OpenAI plugins). APIs are the implementation details behind tools. For developers, you define tools that wrap APIs; for users, they may install plugins that provide sets of tools.
š EXAMPLE:
Travel agent: API: Amadeus flight API. Tool: search_flights(origin, destination, date) - a wrapper around that API. Plugin: 'Travel plugin' containing search_flights, book_flight, cancel_booking, check_in tools. Agent discovers plugin, uses its tools. This organization makes it easy to add/remove capabilities. Without this abstraction, agent would need to know API details directly, which is messier.
QUESTION 11
How do you secure tool calls to prevent unauthorized actions?
š DEFINITION:
Securing tool calls involves multiple layers: authentication (who can call tools), authorization (what they can do), input validation (preventing malicious parameters), and output sanitization (protecting data). For agents with real-world impact, this is critical.
āļø HOW IT WORKS:
Security layers: 1) Authentication - verify user identity before agent can use sensitive tools. 2) Authorization - check if user has permission for specific actions (e.g., refund >$100 requires manager). 3) Input validation - validate all parameters against schemas; reject malicious inputs (SQL injection, path traversal). 4) Rate limiting - prevent abuse, excessive calls. 5) Audit logging - log all tool calls for review. 6) Human approval - require human confirmation for high-risk actions (money transfers, data deletion). 7) Sandboxing - run code execution tools in isolated environments. 8) Output filtering - redact sensitive data from tool responses.
š” WHY IT MATTERS:
Unsecured tool calls are a major security risk. A compromised agent could delete data, transfer money, or exfiltrate information. In enterprise, this could be catastrophic. Security must be built into tool design, not added after. For agents with write access to systems, multiple safeguards are essential.
š EXAMPLE:
Customer support agent with refund_order(order_id, amount) tool. Security: 1) User must be authenticated. 2) Check if user owns this order. 3) Validate amount <= original order total. 4) Require manager approval if amount > $100. 5) Log all refund attempts. 6) Rate limit: max 5 per hour. Without these, attacker could refund arbitrary orders. With them, safe. This multi-layer approach makes tool use secure.
QUESTION 12
What is the role of tool output formatting in agent reliability?
š DEFINITION:
Tool output formatting structures the results returned to agents in consistent, parseable ways that help the agent understand and use the information. Good formatting reduces errors, improves reasoning, and makes agents more reliable.
āļø HOW IT WORKS:
Best practices: 1) Consistent structure - always return same format for same tool (e.g., JSON with predictable fields). 2) Clear success/failure indication - include status field. 3) Summarization - for large results, provide summary plus detail. 4) Metadata - include source, timestamp, confidence. 5) Plain language - when possible, include natural language summary alongside structured data. 6) Error messages - descriptive, actionable error info. 7) Size limits - truncate very large outputs to avoid context overflow.
š” WHY IT MATTERS:
Agents struggle with inconsistent or poorly formatted tool outputs. If a tool sometimes returns JSON, sometimes plain text, the agent may misparse. If error messages are cryptic, agent can't recover. Good formatting makes the agent's job easier, leading to fewer mistakes and better answers. It's a form of human-AI collaboration - designing outputs for AI consumption.
š EXAMPLE:
Weather tool bad output: '15 degrees'. Agent unsure: Celsius? Fahrenheit? Good output: {temperature: 15, unit: celsius, conditions: sunny, location: Paris, timestamp: 2024-03-15T12:00:00Z, summary: It is 15C and sunny in Paris} Agent can use structured data for reasoning, summary for response. This consistency improves reliability dramatically.
QUESTION 13
How do you design a tool that is both useful and safe for an LLM agent?
š DEFINITION:
Designing tools for LLM agents requires balancing utility (what the agent can accomplish) with safety (preventing misuse). A well-designed tool has clear purpose, well-defined parameters, appropriate safeguards, and predictable behavior that the agent can reason about.
āļø HOW IT WORKS:
Design principles: 1) Single responsibility - each tool does one thing well. 2) Clear naming and description - helps agent understand when to use it. 3) Parameter validation - reject invalid inputs early. 4) Idempotency where possible - repeating same call has same effect (safe for retries). 5) Rate limiting - prevent abuse. 6) Audit logging - track usage. 7) Human-in-the-loop for high-risk operations. 8) Predictable outputs - consistent format helps agent reason. 9) Fail gracefully - return helpful error messages, not crashes. 10) Context awareness - tool can access relevant context (user ID, session) if needed.
š” WHY IT MATTERS:
A poorly designed tool is either useless (agent can't figure out when to use it) or dangerous (can be misused). Good design makes tools both powerful and safe. It's an investment in agent capability and risk reduction. For production, tool design is as important as agent logic.
š EXAMPLE:
Email sending tool. Bad design: send_email(to, subject, body) with no restrictions. Agent could spam anyone. Good design: send_email(to, subject, body) with validation: to must be in user's contacts, rate limit 10/day, require confirmation for external addresses. Also returns {status: 'sent', message_id} on success, descriptive errors on failure. This tool is useful (agent can send emails) but safe (can't be abused). Design made it production-ready.
QUESTION 14
What are the latency implications of multi-step tool use?
š DEFINITION:
Multi-step tool use adds latency from sequential tool calls, each requiring round trips to external services and LLM reasoning steps. Total latency = (LLM reasoning time per step + tool execution time) Ć number of steps. This can add up quickly, affecting user experience.
āļø HOW IT WORKS:
Each step in agent loop: 1) LLM generates next action (500-2000ms). 2) Tool executes (varies: 100ms for calculator, 500ms for API, 2000ms for web search). 3) Result returned to LLM for next step. With 5 steps, total could be 5-15 seconds. Optimization strategies: 1) Parallel tool calls where possible. 2) Caching frequent tool results. 3) Smaller, faster LLMs for planning steps. 4) Streaming responses to user as information arrives. 5) Predictive execution - guess next tool call and pre-fetch. 6) Reducing steps through better planning.
š” WHY IT MATTERS:
Users expect fast responses. A 10-second agent feels unusable. Latency must be managed to meet user expectations. This may mean limiting steps, using faster models for planning, or showing intermediate progress. For some tasks, long latency is acceptable (background research), but for interactive agents, it's critical.
š EXAMPLE:
Travel booking agent: Step1: LLM decides to ask for dates (1s). Step2: User responds. Step3: LLM decides to search flights (1s) ā API call (2s). Step4: LLM decides to present options (1s). Total 5s so far - acceptable. If 10 steps, 10s+ - frustrating. Optimization: combine steps, use parallel calls where possible, cache flight results. This keeps latency acceptable while maintaining capability.
QUESTION 15
How do you log and audit tool calls in a production agent?
š DEFINITION:
Logging and auditing tool calls captures every interaction between agent and external systems: what tool was called, with what parameters, what result was returned, and when. This is essential for debugging, security monitoring, compliance, and improving agent performance.
āļø HOW IT WORKS:
What to log: 1) Timestamp - when call occurred. 2) User ID - who initiated. 3) Session ID - to group related calls. 4) Tool name - which tool. 5) Input parameters - exactly what was sent. 6) Output - result returned (or error). 7) Duration - how long tool took. 8) Token usage - cost of LLM calls for this step. 9) Success/failure status. 10) Raw request/response for debugging. Logs should be stored securely, with PII redacted, and retained according to policy. Tools for logging: structured logging (JSON), integration with observability platforms (Datadog, Grafana), and audit-specific storage for compliance.
š” WHY IT MATTERS:
Logs enable: 1) Debugging - when agent fails, see exactly what happened. 2) Security - detect unusual patterns (e.g., many refund calls). 3) Compliance - prove what actions were taken and why. 4) Improvement - analyze tool usage to optimize. Without logging, you're blind. For regulated industries, audit logs are legally required.
š EXAMPLE:
E-commerce agent logs refund tool call: {timestamp: '2024-03-15T10:30:00Z', user: 'john@example.com', session: 'abc123', tool: 'refund_order', params: {order_id: '12345', amount: 50}, result: {status: 'success', refund_id: 'ref_678'}, duration: 1200ms}. Later, customer disputes refund. Log proves refund was processed. Security team queries logs, finds unusual pattern from another user. This visibility is essential for production.
QUESTION 16
What is the difference between deterministic and non-deterministic tools?
š DEFINITION:
Deterministic tools always produce the same output given the same input (e.g., calculator, database lookup of static data). Non-deterministic tools may produce different outputs for identical inputs due to external factors (e.g., web search, real-time APIs). This distinction affects agent reliability and testing.
āļø HOW IT WORKS:
Deterministic tools: calculator (2+2 always 4), database lookup of static reference data, code execution with fixed inputs. Results predictable, easy to test, agents can rely on consistency. Non-deterministic: web search (results change over time, different order), weather API (changes), stock price (changes), recommendation engines (personalized). Results vary, making testing harder and agent behavior less predictable.
š” WHY IT MATTERS:
For testing and reliability, deterministic tools are easier to work with. You can mock them with fixed responses. Non-deterministic tools require strategies: caching results for reproducibility, designing agents to handle variability, and more extensive testing. Agents must be robust to changing results - what worked yesterday may not today. Understanding tool determinism helps in design and testing.
š EXAMPLE:
Travel agent uses deterministic tool calculate_tax(price, location) (always same for given inputs) and non-deterministic search_flights(date, route) (prices change). For testing, mock search_flights with fixed results to test agent logic. In production, agent must handle price changes - if flight price increased, it should inform user. This requires different handling than deterministic tools. Distinction matters.
QUESTION 17
How do you handle tools that return very large outputs?
š DEFINITION:
Tools that return large outputs (e.g., full documents, search results with many snippets, database query results) can overflow context windows or overwhelm the agent's ability to process. Strategies are needed to handle large outputs without losing information.
āļø HOW IT WORKS:
Approaches: 1) Summarization - use LLM to summarize large output before returning to agent. 2) Truncation - return only first N results with option to request more. 3) Chunking - split large output into multiple messages, agent processes iteratively. 4) Extraction - extract only relevant parts based on query. 5) Metadata + detail - return summary with metadata, allow agent to request specific items. 6) Streaming - stream results incrementally. 7) External storage - store large output externally, give agent reference to retrieve later.
š” WHY IT MATTERS:
Large outputs can break agents by exceeding context limits or causing the agent to miss important information. If a search returns 100 results, agent can't process all. Good handling ensures agent gets the information it needs without being overwhelmed. This is especially important for research agents dealing with large documents.
š EXAMPLE:
Research agent calls search_papers('LLM reasoning') which returns 1000 papers. Bad handling: return all 1000 - agent overwhelmed, context overflow. Good handling: return summary (top 10 papers with abstracts) plus metadata (total count, categories). Agent can then call get_paper_details(paper_ids) for specific papers. This tiered approach provides manageable information while preserving access to full detail.
QUESTION 18
What is Anthropic's tool use (function calling) API and how does it work?
š DEFINITION:
Anthropic's tool use API allows Claude models to interact with external tools by generating structured JSON outputs that specify tool calls. It's similar to OpenAI's function calling but with Claude's unique approach to prompting and response format.
āļø HOW IT WORKS:
In Anthropic's API, tools are defined in the request with tools parameter, similar to OpenAI. Each tool has name, description, and input_schema (JSON Schema). During generation, Claude may output a tool_use content block containing name and input. The API doesn't execute tools - it returns the tool call to the client, which executes and sends back results with a tool_result block. Claude then continues with the result. This gives developers full control over execution. Claude also supports parallel tool calls and can interleave tool use with text.
š” WHY IT MATTERS:
Anthropic's implementation emphasizes developer control and clear separation of concerns. The explicit tool_use and tool_result blocks make it easy to parse and manage tool interactions. Claude's strong reasoning capabilities make it particularly good at deciding when and how to use tools. For developers building on Claude, this API provides a robust foundation for agents.
š EXAMPLE:
Request to Claude with tools: tools=[{name: get_weather, description: Get weather, input_schema: {type: object, properties: {location: {type: string}}}}]. User: 'Weather in Paris?' Claude responds with tool_use block: {type: tool_use, name: get_weather, input: {location: Paris}}. Client executes tool, returns tool_result block. Claude then says 'The weather in Paris is 15°C and sunny.' This clean protocol makes tool use reliable.
QUESTION 19
How do you evaluate the tool selection accuracy of an agent?
š DEFINITION:
Tool selection accuracy measures how often an agent chooses the right tool (or decides not to use any) for a given query, with correct parameters. This is a key metric for agent performance, separate from final answer correctness.
āļø HOW IT WORKS:
Evaluation approaches: 1) Golden dataset - create test queries with expected tool calls (tool name, parameters). Run agent, compare actual tool calls to expected. Metrics: precision (calls correct tool when needed), recall (calls tool when needed), parameter accuracy (correct values). 2) Trajectory evaluation - for multi-step tasks, evaluate sequence of tool calls. 3) Outcome-based - if final answer correct, tool selection likely correct, but not always. 4) Human evaluation - review tool call logs for appropriateness. 5) Adversarial testing - test edge cases where tool selection is ambiguous.
š” WHY IT MATTERS:
Poor tool selection leads to wrong answers, wasted API calls, and user frustration. An agent that frequently calls wrong tools is unreliable. Measuring tool selection accuracy helps identify problems: is the agent calling too often (spam) or not enough (missing needed info)? It guides prompt improvement, tool description refinement, and model selection.
š EXAMPLE:
Test query: 'What's 25 * 4?' Expected: no tool call (simple arithmetic). Agent calls calculator tool (unnecessary). Tool selection accuracy penalizes this. Another: 'What's the weather in Paris?' Expected: call weather tool with location='Paris'. Agent calls with location='France' (wrong parameter). Parameter accuracy penalizes. These metrics reveal specific failures, enabling targeted fixes.
QUESTION 20
How would you design a customer support agent with three tools: knowledge base search, order lookup, and ticket creation?
š DEFINITION:
Designing a customer support agent involves defining tools that map to common support actions, creating clear tool descriptions, and implementing an agent loop that can handle multi-step interactions while maintaining context and escalating appropriately.
āļø HOW IT WORKS:
Tool design: 1) search_kb(query) - searches support articles, returns relevant snippets with sources. Description: 'Search the knowledge base for articles matching the query.' 2) lookup_order(order_id) - retrieves order status, items, tracking. Requires authentication. Description: 'Get details for a specific order. Use only when user provides order ID.' 3) create_ticket(issue_type, description, order_id optional) - creates support ticket, returns ticket number. Description: 'Create a new support ticket when issue cannot be resolved.' Agent prompt: 'You are a helpful support agent. First try to answer using knowledge base. If user asks about specific order, use order lookup. If you cannot resolve, create ticket.' Include guardrails: verify order ID format, confirm before creating ticket.
š” WHY IT MATTERS:
Well-designed support agent can handle most queries automatically, reducing human workload. Clear tool descriptions ensure agent uses tools appropriately. Order lookup requires careful handling (privacy) - only when user provides ID. Ticket creation should be last resort. This design balances automation with safety.
š EXAMPLE:
User: 'My order #12345 hasn't arrived.' Agent: calls lookup_order(12345) ā shows 'delivered yesterday'. Agent: says 'Order 12345 was delivered yesterday. Let me help you locate it.' User: 'I still don't have it.' Agent: calls search_kb('missing package') ā finds article about lost packages. Agent: provides steps. If still unresolved, calls create_ticket with issue type 'missing package', order ID. This flow resolves most issues automatically, escalates only when needed. Good design makes this possible.