This is Science Fictions with Stuart Ritchie, a subscriber-only newsletter from i. If you’d like to get this direct to your inbox, every single week, you can sign up here.

Okay, stand back, I’m going to write about sex differences. Even as I type that, I feel a chill of controversy: everyone has to tread very carefully when talking about the science of such a touchy subject.

One of the big battlefields in the debate over sex differences is the toyshop. People disagree about the effects toys might have on children as they develop, by setting particular expectations about how they should behave or what they should be interested in. For example, there’s a movement called “Let Toys Be Toys”, which pushes for fewer gender stereotypes—and certainly less blue-for-boys and pink-for-girls colour choices—in toy design and marketing.

There’s also a related debate about the innateness of toy preferences. Do boys and girls arrive in the world with inbuilt preferences for specific toys and activities, or do society, parenting, and gender stereotypes push them down a particular track?

Scientists have done a lot of research on the question of toy preferences. A meta-analysis in 2020 was able to find seventy-five studies that examined toy preferences in children aged up to 11 years. It contains some of the largest differences I’ve ever seen in a psychology study. There’s a lot of variation in the effect sizes, but on average, boys prefer vehicle-based toys at a dramatically higher rate than do girls; and girls prefer dolls at a rate even higher than that.

These differences are bigger than the height difference between men and women, which is a huge effect; they’re bigger than any other group difference I’ve seen in any area of psychology (and perhaps many parents will now be saying “well duh, of course we know that girls prefer dolls and boys prefer vehicles!” – but it’s important to have collected actual data on this, rather than just going by anecdote).

Pretty impressive results, then: but you’ll have noticed that they don’t actually address our question of innateness at all: they only provide evidence that the phenomenon of toy preferences exists – not the reason behind it.

Here’s where monkeys come in. In a famous study from 2002, scientists had the idea of giving human toys to monkeys (specifically vervet monkeys), to see which ones the males and females would prefer. The logic was this: if you find that male monkeys had a big preference for toy trucks and female monkeys preferred dolls, it would be very difficult to explain with a socialisation theory. After all, nobody is socialising monkeys to prefer the “appropriate” toy for their gender – monkeys don’t watch adverts tailored for their gender on YouTube videos.

Sure enough, the monkeys showed sex differences in their preferences: the males spent more time playing with a police car toy; the females preferred a doll.

In turn, this might tell us something about human children. The researchers concluded that their results showed that “sexually differentiated object preferences arose early in human evolution” – in other words, the preference for particular types of toy isn’t something that our modern society has instilled in children. Because it’s shared with monkeys, with whom we share a common ancestor, it must’ve been there, programmed into our brains, for thousands of years.

Another monkey study from 2008 appeared to support the evolutionary idea, even if the results were a little more complex. Using rhesus macaque monkeys this time, the researchers found that the males had a strong preference for wheeled, vehicle-related toys compared to fluffy plush toys, whereas the female monkeys showed no particular preferences. They agreed with the earlier 2002 paper that this can’t have been socialised, but took a somewhat more moderate line, invoking both nature and nurture to explain toy preferences:

“Toy preferences reflect hormonally influenced behavioral and cognitive biases which are sculpted by social processes into the sex differences seen in monkeys and humans.”

Now, though, new research has thrown a spanner in the works. A new study, published last month, has pointed out a big flaw in both the previous monkey experiments: they tested the monkeys in groups. Monkeys have strong dominance hierarchies, and their ranking in the hierarchy affects their behaviour when they’re around other monkeys. If only some of the more dominant monkeys could interact with the toys with impunity while the others were cowed by their lack of social standing, it wouldn’t necessarily give you an accurate view of their true preferences.

So the authors of the new study did something very simple: they tested monkeys’ preferences when they were interacting with the toys on their own. And here, the results were very different from the previous studies. There were no differences in preferences for the male-stereotyped, wheeled toys. There was a difference in preference for the doll – but it was the male monkeys who played with it more.

Okay, so it seems the story of monkeys sharing an evolved preference with human children is a bit less straightforward than

But I’ve left out one hugely important detail about all three studies: their sample size. The 2002 study included only sixty-three monkeys. The 2008 one had thirty-four. And the new one from 2023? Only fourteen monkeys. These are tiny studies: highly subject to random fluctuations in the data – in other, not-too-different parallel universes, the results could’ve been totally different.

You might say: aren’t you looking for enormous effects here? Surely a small study is actually fine, since the effects in this domain would stick out like a sore thumb? But remember, those are the effects in humans. It’s entirely unclear what sort of effect size we should expect for monkeys – small studies could easily miss out on a subtle, but real, preference for a particular kind of toy.

To me, this whole story illustrates something I find depressingly often when looking at the science behind controversial questions: we simply can’t draw any firm conclusions on the back of any of the data we have available. I don’t mean: “we should always remain at least a little uncertain because that’s how science should be approached in general”. I mean: “we are all fighting over the smallest of evidentiary scraps, and each study gives us effectively no reason to update our views on the matter”.

Without solid data, there’s always a way to keep the debate going, endlessly, with no resolution. Everyone can remain angry at the other side; nobody has to put in any effort, and nobody has to change their mind.

(That’s not to say there isn’t other evidence on this question: for example, some studies have looked at girls with Congenital Adrenal Hyperplasia, a condition that means they get extra testosterone and develop more male-typical features. Girls with this condition do, in at least some studies, show preferences for more male-stereotyped toys than do girls without it. But collecting data from a population who all have a particular medical condition might not give you results that are representative of the population at large, and the studies here—by dint of collecting data on a rare medical condition—are usually pretty small themselves).

It doesn’t have to be this way: some scientific fields have got their act together after years of publishing low-quality research, and started working together to produce bigger, better studies. To give just one example, in the closely-related field of developmental psychology, the ManyBabies project helps scientists at many different universities and labs across the world collaborate on important research questions by putting their samples together. It makes sure any given researcher doesn’t have to rely on only the babies they can recruit, and helps them do their research on much larger groups.

Other fields—including that of innate sex differences—should follow suit. People really care about these questions. Tiny studies, inconclusive regardless of what their results show, are not the way to go about answering them.

Other stuff I’ve written recently

At the weekend I tweeted about a very bad-looking study on the effects of “cold plunges” that had been shared by Andrew Huberman, an online health influencer with a massive following. Even the study’s abstract was full of red flags, and I bemoaned the fact that Huberman (who is also a professor at Stanford University) would spread something that looked so flawed. Upon further inspection the study really was as bad as it first appeared, and I wrote about it here, as part of a discussion about cold plunges in general.

And apologies if this is all very tangled and self-referential, but I’ve also put together a summary of the best of my recent writing for the i at this link.

Science link of the week

Stian Westlake has written a great piece for the Guardian’s “Big Idea” series on why governments shouldn’t be afraid to test policies using experiments – even if experiments sometimes make people uncomfortable.

Thanks for reading the Science Fictions newsletter. See you next week – and in the meantime feel free to get in touch: [email protected].

This is Science Fictions with Stuart Ritchie, a subscriber-only newsletter from i. If you’d like to get this direct to your inbox, every single week, you can sign up here.

By admin