Risks to beware of if you use LLMs for research

Recent Substack article about using AI tools for research, whether by college students or professors and PIs. Specifically, limitations like selection bias and questionable sources.

Is “deep research” mode going to save us?

Excerpt:

Human brains are amazing but imperfect machines, capable of sifting through massive amounts of input and synthesizing from that (and all our past experience) a flawed but good-enough simulation of the world around us. Our experience is a controlled hallucination, so to speak, but one that tracks the world well enough to keep us alive. 

Really, your brain is a storyteller, and a damn convincing one. We hear, for example, that consumption of nuts is consistently associated with lower mortality and it’s nearly impossible to avoid jumping to the conclusion that nuts are good for us, that the one variable is causing the other (even if we know better at the conscious level and can easily see how the mortality disparity might be a byproduct of something else entirely about the type of people who tend to include nuts in their diet). 

We see causation everywhere — even where it’s illusory — because that provides a coherent story to make sense of the data we’re faced with1. Our brain is amazing at bringing together disparate pieces of information and finding — or making up — a pattern that connects them. This is partly why we see patterns in clouds and faces in inanimate objects (i.e. pareidolia) or other forms of apophenia where our brain imposes patterns on things even when those patterns aren’t real (Brugger, 2001).

Unfortunately, as Daniel Kahneman points out in Thinking, Fast and Slow (2010), this means our brain is happy to accept neat and tidy stories even when they aren’t true. It’s much harder — literally, more effortful — to engage the deeper, slower reasoning processes that consider counterfactuals and alternative explanations, that remind us correlation doesn’t necessarily imply causation. It’s also harder to carefully weigh evidence and to consider the quality of information instead of just the quantity or how easily it comes to mind2.

WYSIATI

And there’s a more pernicious issue. Our brain is so good at telling a story — finding a pattern amongst all the information and raw data we’re inundated with — that it does so automatically and without taking into account that the information at hand is always incomplete and partial. As Kahneman puts it, the brain typically operates on the principle of “what you see is all there is” (WYSIATI). We don’t consider what we don’t know, what information isn’t present or accessible but which would be incredibly relevant3.

We make conclusions from impartial information, as one must do of course, but we do so without taking into account the missing pieces of the puzzle. One common form of this is survivorship bias: we make judgments based on the success stories that are visible and neglect all the unseen cases that didn’t succeed. The lottery winners are visible and salient, but we don’t get nearly as much exposure to each and every person who bought a lottery ticket and didn’t win. We see the businesses that succeed, but spend much less time contemplating all the businesses that fail, so we end up with a skewed intuition of our odds if we started a new business. We dissect the origin stories of billionaire college dropouts like Gates, Jobs, and Zuckerberg, but don’t focus on the countless people who made similar choices but didn’t succeed. Thanks to survivorship bias, we end up with a very skewed perspective of how the world works. We end up with compelling but not necessarily accurate stories of how the world works.

[Continue the full article here, free]


Leave a Reply

Your email address will not be published. Required fields are marked *