What happens when an assistant "reads" a file

AI coding assistants work by performing actions on your codebase: reading files, running searches, executing commands. Each of these operations, called tool calls, pulls content into a temporary buffer called the context window. This is not a summary or an index. It is raw content, accumulated across every action in the session.

If you have used ChatGPT, Claude, or Gemini, you already know the basic concept. The context window is the conversation. With a coding assistant, that same conversation also contains the agent's own structured content: file contents it has read, search results, command outputs. This is what gives the model the ability to act on the developer's machine, a build server, or a container environment.

Here is the critical detail: the entire context window is sent to the AI provider's API on every subsequent turn. Not just the new content. Everything. The buffer grows with each action, and the full payload travels with every request.

A single session can involve dozens of tool calls. Many happen autonomously, decided by the assistant, not by your developer. Sub-agents can run 20 or 30 tool calls in the background while the developer sees only a one-line summary.

A typical agentic coding session

Consider a fictional but entirely plausible scenario. A developer asks the assistant to investigate slow checkout performance. The assistant decides its own path: list the repo, grep for keywords, read files, run tests. Here is what a typical 13-turn session looks like.

DeveloperCheckout is slow for EU customers. Can you investigate?
Agent · tool callls -R /home/dev/platform
Tool result412 files across 34 foldersIncludes /secrets, /customer-exports, .env files
Agent · tool callgrep "slow|latency|timeout" across entire repo
Tool result · PII exposed178 matches returned, including SQL dumps, Slack exports, nginx logs312 customer email addresses, 48 real IP addresses, Stripe customer IDs now in context
Agent · tool callRead .env.productionThe file is .gitignored, but the agent reads from disk, not from git
Tool result · secrets exposedStripe live key, DB password, AWS credentials, Sentry DSN, JWT signing secretAll production secrets now in context, re-sent on every subsequent turn
Agent · 5 more tool callsRead checkout code, payment logic, order queries. Write migration. Run tests.Normal development work. The developer sees none of these calls in detail.
Agent · responseDone. Missing index on orders.user_id. Migration written, tests pass.
DeveloperGreat, open a PR.

The fix was 2 lines of SQL.

The final API call carried the entire accumulated context: every file listing, every grep match, every secret from .env.production, every customer email address from the database dump. All re-sent in a single request to the provider's API.

The developer saw a handful of responses. The agent decided what to read on its own.

Same session, five different destinations

Now consider that five developers on your team ran the same kind of session. Same repo, same sensitive data swept into the context. Each developer used a different AI provider, each with a personal API key.

🖥
Local model (Ollama)Context never leaves the machine. No third-party terms, no data transfer.
🇺🇸
Claude (personal API key)Data goes to AWS US-East by default. EU region and training opt-out available, but only if configured. Did compliance sign off on a personal account?
🇺🇸
ChatGPT PlusData goes to Azure US. EU data residency and a DPA require an Enterprise agreement.
🇨🇳
DeepSeekData lands in China under PRC jurisdiction. No GDPR-compatible data processing agreement. No credible training opt-out.
🇫🇷
MistralServers in France, GDPR-native, DPA available. Still a third-party API with its own log retention policies.

If you do not know which of these describes your team right now, that is the answer.

Why this matters for your organisation

This is not a hypothetical risk. Coding assistants are already in daily use across development teams. The question is not whether your developers use them. The question is whether you have visibility into what data enters the context and where it goes.

When an assistant reads a file containing customer records, API keys, or internal communications, that content becomes part of every subsequent API call in the session. Depending on the provider and the agreement tier, it may be logged, retained for abuse review, or processed under terms your developer never read.

For organisations subject to GDPR, this matters directly. Personal data sent to a non-EU jurisdiction without appropriate safeguards is a compliance gap, whether the transfer was intentional or not.

Questions worth asking today

  • Which LLM APIs are in use across your development team?
  • Are developers using personal accounts and API keys on company code?
  • Do you have a signed data processing agreement with each provider?
  • Where does the data physically land: EU, US, or somewhere else?
  • Can PII or production secrets reach the context window during routine work?
  • Does your GDPR policy account for agentic tool calls?
  • Do you have practices in place (guardrails) that prevent assistants from reading sensitive files?

The solution?

The solution is not to ban coding assistants. They make software development faster. Coding agents are here to stay. The solution is to take ownership of the tooling: manage API keys centrally, choose providers that match your compliance and sustainability requirements, and design processes and practices that prevent sensitive data from entering the context in the first place.

For many use cases, running LLMs on the company's own network, at a service provider's data centre, or even locally on the developer's own machine is already a realistic option today. When you use agentic development tools backed by an LLM under your own control, the context window and its data do not leak. When using cloud-based LLM APIs, EU data residency, clear terms on whether data may be used for training new models or even for logging, are the minimum requirement.

At Trail Openers, we help organisations build responsible AI practices from the ground up. That includes understanding where data flows, choosing the right architecture, and making sure governance of LLM usage keeps pace with adoption. Want to find out where your team's tokens are going? Get in touch and we will map it out together.