Copilot not Autopilot.

I’m blessed, (or maybe cursed) with the genetics that makes me an over-thinker. I’m grateful that I can share that with someone (anyone), so thank you for reading.

For years, people have been studying AI—a giant bucket where we toss everything related to computers seeming to act like humans. NPCs from old video games? AI. Chess engines? AI. Image classification? Also AI. Machine Learning (ML), which predicts outcomes or classifies data based on features, falls under this umbrella too.

But in 2022, everything changed. The mainstream introduction of generative text LLMs—like ChatGPT—marked a turning point. It reshaped how we work, how we think, and how we interact with information¹. Whether you see that as good or bad is beside the point—it’s simply a fact. We’re living through the Generative Revolution.

To truly leverage these tools, we need to understand what they are—and what they’re not. I’m not a biologist, mathematician, statistician, or ML/AI/LLM expert. But to grasp how these systems work, we need at least a high-level understanding of those domains. That ability to abstract across disciplines is what makes human intelligence so powerful—and why we’re capable of building tools like these in the first place.

We’ll use a reductionist lens to look at generative LLMs.
At their core, they’re statistical probability machines. Imagine you take the entirety of written knowledge and reduce it to the probability of one word following another. Now, feed the word “The” into this machine. Based on learned probabilities, out comes “quick.” You append that word and feed “The quick” back in. The process repeats until a coherent statement emerges.

But how does the machine know when to stop?
There are a few mechanisms:

End-of-sequence markers: During training, models learn to recognize when a sentence or thought typically ends.
Emergent behavior: Through sheer scale and pattern learning, LLMs tend to stay on topic and produce contextually relevant outputs—even without explicit topic enforcement.
External classifiers: In more specialized systems, a classification engine can evaluate whether the output is well-formed or on-topic. For example, it might assess the probability that a sentence is grammatically correct or semantically aligned with the input.

Now, imagine this process happening millions of times per second. You begin to see something that resembles reasoning.

You might ask: If it’s all based on priors, how can Copilot generate something new?
LLMs have a parameter called temperature, which controls randomness. A low temperature yields predictable, safe outputs. A higher temperature introduces variation—like generating “The quick brown coyote jumps over the lazy turtle.” It’s inspired by the original, but creatively divergent (and it also completely misses the intent of the original statement).

When you combine a generative engine with a classification engine, you get something that starts to resemble intelligence—not because it understands, but because it can produce coherent, relevant, and sometimes novel outputs.

Let’s define “intelligence” as the ability to make decisions and act on them. In humans, two systems drive this process: the Limbic System (often called the “Lizard Brain”) and the Prefrontal Cortex. These systems can work together—like when a gut feeling aligns with a logical conclusion—or they can conflict, like when you open the fridge for a snack but close it because you’re trying to stick to a diet.

The Limbic System handles emotions, subconscious reactions, and automatic behaviors. It’s evolutionarily older and often drives more decisions than we realize. The Prefrontal Cortex, on the other hand, is responsible for rational thought, planning, and reasoning. It’s what allows us to override instinct with logic, weigh consequences, and make deliberate choices.

If you want to witness human learning in its rawest form, look no further than a toddler. One day, my son was playing near the edge of our bed. I warned him, “Don’t play on the side—you’ll fall off.” Moments later, he took a backward tumble and rug-burned the tip of his nose. Tears followed, but so did learning. From that moment on, he approached edges with caution.

This is a textbook example of the limbic system in action. His brain encoded the experience—edges equal danger—and generalized it. I didn’t need to teach him that the edges of chairs, couches, or playground structures were also risky. He abstracted the concept from a single painful event. This ability to generalize from limited input is a hallmark of human intelligence. We use our senses to perceive depth, our experiences to understand gravity, and our cognition to form broad, transferable rules.

Alongside the limbic system, we have the prefrontal cortex, responsible for higher-order reasoning. While the limbic system drives instinctual behavior—primarily focused on survival and reproduction—the prefrontal cortex allows us to override those instincts with logic and analysis.

Take the fear of heights, for example. It’s a common limbic response designed to keep us safe from falling. But this fear doesn’t always align with actual risk. Consider two scenarios: climbing a 3,000-foot rock wall without a rope (high danger), and standing behind a glass window in a tall building (virtually no danger). In both cases, the limbic system may trigger fear. But the prefrontal cortex can step in and reason: “There’s a solid wall and glass between me and the drop. I’m safe.” Once this logic is internalized, the fear response may diminish in similar future situations.

This dynamic—instinct overridden by reason—mirrors the relationship between generative LLMs and classification engines. LLMs generate text based on statistical patterns, while classifiers evaluate whether the output is coherent, relevant, or complete. But here’s the key difference: humans abstract concepts, while LLMs rely on prior examples and labeled data. My son didn’t need to see thousands of windows to understand what a window is. After seeing one, he could identify others—regardless of shape or size—because he grasped the concept: transparent but solid.

So while LLMs can mimic reasoning through pattern recognition and probabilistic modeling, they don’t possess the kind of conceptual generalization that defines human intelligence.

If you’ve made it this far—thank you. It’s important that we understand what these tools are and how they work before jumping to conclusions. LLMs and AI aren’t replacements for human intelligence; they’re companions—tools that can elevate our thinking and act as catalysts for hyper-productivity. But if we begin to treat them as substitutes for thought, we risk becoming less productive than before. Instead of solving problems, we’ll find ourselves buried under a mountain of generated content—noisy, unfocused, and difficult to make sense of.

Since I have the epic opportunity to be at the forefront of this Generative Revolution. I’ve learned a few things that help to ensure your Copilot(s) are doing the best they can for you.

Reduce the signal to noise ratio
Trust but Verify
Beware of Confirmation Bias
Garbage in – Garbage out

1. Reduce the Signal to Noise Ratio

LLMs are vastly better than humans at processing large volumes of textual information. You’ll never be able to read as much or as fast as they can. However, if the information you provide is sparse or disconnected—lacking structure or meaningful relationships—the model can start to produce confused or incoherent output.

Why?
Because LLMs don’t “understand” in the human sense. They don’t build mental models or track long-term goals. Instead, they simulate understanding by identifying and matching patterns in the data. When the input lacks clear connections, the model struggles to find those patterns, which can lead to hallucinations or irrelevant responses.

To get the best results, you should aim to provide clean, structured, and context-rich input. Ideally, your data should be grouped or aggregated in a way that reflects how you would logically approach the problem yourself. If you take a single instance of what you’re trying to achieve and organize the information so that it leads you to a clear conclusion, then your Copilot will likely be able to reach that conclusion faster—and often more efficiently—than you could on your own.

2. Trust but Verify

Imagine an LLM as an extremely junior, extremely ambitious developer who just drank 13 Red Bulls and types at 10,000 words per minute. It generates output based on patterns it has seen in code—but without any real understanding of what the code does. It will leap across logical gaps on the faintest hint of an assumption, and by the end, you might be left with something that looks like a working solution.

This is where “Trust but Verify” becomes essential.

Work incrementally. Check the output at each step to ensure it’s heading in the right direction and doing what you expect. Never blindly trust the output. Proofread emails. Compile code often. Validate assumptions with unit tests. The model can be a powerful accelerator, but only if you stay in the driver’s seat.

Also, be mindful of context drift—LLMs can lose track of your intent if prompts get too long or ambiguous. Keep your instructions clear and focused. And when in doubt, ask for explanations or rationale behind the output. If the model can’t justify a decision, it’s a sign you need to dig deeper.

Treat the LLM like a fast, eager junior—not a senior developer. It can help you move faster, but you are still responsible for the quality, correctness, and consequences of the work.

3. Beware of Confirmation Bias

When using LLMs to understand or generate code, it’s easy to fall into the trap of confirmation bias. If your prompt suggests a specific solution or assumption—like “Why is this function the best way to handle errors?”—the model will likely reinforce that idea, even if it’s flawed. LLMs don’t evaluate correctness or challenge assumptions; they generate output based on patterns in the data they’ve seen. This means they can confidently produce code that looks right but is built on shaky logic or misunderstood requirements.

This becomes especially risky when you’re debugging, refactoring, or trying to understand unfamiliar code. If you feed the model a biased interpretation of what the code is doing, it may generate explanations or modifications that align with your misunderstanding, rather than helping you uncover the truth.

To avoid this, here are a few practical strategies:

Ask open-ended questions: Instead of “Is this the best way to do X?”, try “What are the trade-offs of this approach?” or “What alternatives exist?”
Request multiple interpretations: Ask the model to explain the code in different ways or from different perspectives. This can help surface edge cases or overlooked logic.
Use the model to challenge your assumptions: Prompt it with “What might be wrong with this implementation?” or “What are potential bugs or limitations?”
Cross-check with documentation and runtime behavior: Don’t rely solely on the model’s explanation—run the code, inspect outputs, and consult official sources.

Remember, LLMs are powerful tools for accelerating understanding, but they’re not infallible. Treat them like a fast, eager junior developer: helpful, but in need of supervision and critical review.

4. Garbage in – Garbage out

When working with code, the quality of your input directly determines the quality of the output. LLMs don’t understand code the way a human developer does—they generate based on patterns they’ve seen. If you feed them poorly structured, outdated, or ambiguous code, they’ll likely replicate those flaws, or worse, build on them in unpredictable ways.

For example, if you paste in a function with unclear variable names, missing context, or inconsistent formatting, the model may make incorrect assumptions or generate code that looks plausible but doesn’t actually solve the problem. It’s not because the model is broken—it’s because it’s working with garbage.

To get useful, accurate output, you need to do the legwork upfront:

Clean up your code before prompting: Refactor messy logic, remove dead code, and clarify naming.
Provide context: Include relevant surrounding code, expected inputs/outputs, and any constraints.
Be specific: Ask for exactly what you need—whether it’s a bug fix, optimization, or explanation.
Avoid vague or overly broad prompts: Instead of “Make this better,” try “Optimize this loop for readability and performance.”

Think of the LLM as a high-speed collaborator—it can help you move faster, but only if you give it something solid to build on. Garbage in will always lead to garbage out, no matter how powerful the model is.

In conclusion, like the title says: it’s a Copilot—not an Autopilot.
These tools, when used thoughtfully, can exponentially multiply your ability to get things done. They can help you reason faster, explore more ideas, and build solutions with incredible speed. But when used carelessly, they can just as easily amplify false assumptions, reinforce misunderstandings, and generate vast amounts of noise.

To work effectively with LLMs, treat them like a collaborative partner. Simplify your inputs, aggregate relevant context, and structure your data in a way that helps you reason better—because when you reason better, the machine will too. Better yet, use the machine to help you build better tools that improve your own workflows. That’s some inception-level productivity.

As always – thanks for reading, and Happy Copiloting!

“A Copilot can take you farther, faster—but only if you chart the course. Without direction, it’s just speed without purpose.” — Microsoft Copilot

The ideas in this blog post are my own. But I know what you’re thinking… and the answer is yes… I absolutely used Copilot to help me write this post.

Information is not data. Data is just raw values—numbers, text, logs—without meaning. It only becomes information when you add context and structure. LLMs don’t work with raw data; they work with language that already carries meaning. If your input lacks clarity or context, the model will struggle to produce useful output and may just generate noise. ↩︎

Unparalleled Adventure