Python / Python Modern Generative AI and Agents Interview Questions
How do you handle documents or conversations that exceed an LLM's context window?
Every LLM has a maximum context window (measured in tokens) — GPT-4o supports 128K tokens, Claude 3.5 Sonnet 200K, Llama 3.1 128K. Inputs exceeding this limit are either truncated (silently losing content) or raise an error. Several strategies handle long documents:
| Strategy | How it works | Best for |
|---|---|---|
| RAG / chunk-and-retrieve | Embed chunks, retrieve relevant ones, send only retrieved chunks | Question answering over large corpora |
| Summarise then answer | Recursively summarise document sections, then answer over summary | Summarisation tasks |
| Map-reduce | Run LLM on each chunk independently, combine results | Extraction, classification per chunk |
| Refine | Process first chunk; iteratively update answer with each next chunk | Sequential analysis |
| Rolling window | Slide a context window over the document with overlap | Sequential tasks like translation |
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)
# Load a very long document
docs = PyPDFLoader('long_report.pdf').load()
chunks = RecursiveCharacterTextSplitter(
chunk_size=4000, chunk_overlap=200
).split_documents(docs)
# ── Map-reduce summarisation
map_reduce_chain = load_summarize_chain(
llm,
chain_type='map_reduce', # 'stuff' | 'map_reduce' | 'refine'
verbose=True,
)
summary = map_reduce_chain.invoke({'input_documents': chunks})
print(summary['output_text'])
# ── Token counting before API calls (avoid surprises)
import tiktoken
enc = tiktoken.encoding_for_model('gpt-4o')
def count_tokens(text: str, model: str = 'gpt-4o') -> int:
enc = tiktoken.encoding_for_model(model)
return len(enc.encode(text))
with open('big_doc.txt') as f:
content = f.read()
n_tokens = count_tokens(content)
max_ctx = 128_000 # gpt-4o context window
print(f'{n_tokens} tokens — {"fits" if n_tokens < max_ctx else "exceeds context"}')
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
