Can machine learning predict your gender according to your dating priorities?

How a Random Forest Can Predict Heterosexual Gender Differences in Dating

7 min readOct 24, 2019

Dating services were once the only other option besides meeting someone by chance. You had to actually run into a person at the grocery store or God forbid, the DMV waiting room. Then, in the late-nineties, a magical, expeditious, and ever-so-faintly-Russian roulette-like method of mate selection was invented: speed-dating, enabling dozens of people at a time to assembly line their love life into the future by meeting ten or even twenty people in just a few hours.

Despite the advent of online dating apps like Tinder where you don’t need to fill out any kind of self-assessment and you can have love (or what have you) at your house as fast as you can say “INTJ seeks reasonably good time,” we will set our sights to the much more wholesome-seeming seas of the speed-dating survey. The question is: can insights be gleaned from an assessment taken by college-educated twenty-something’s about to attend a speed-dating event?

The Experiment

The study was on whether researchers could predict whether two heterosexual people, after meeting each other during the event, would like to meet sometime for a date (indicated by a decision of ‘yes’ or ‘no’). The data includes many features —from income, race, and gender to self-image and perceptions of the opposite sex.

The Data

When looking to generate a predictive model from a dataset, it’s important to recognize whether your data is balanced or not. 58% of decisions at the end of the speed-dating event were a ‘no.’ While that isn’t difficult to improve upon, I began to wonder why I cared about their answers at all. Can we really extract information from these college-aged Columbia University students to the general population? Not the kind I wanted, I realized. Sally might want to go on a date with John; but where in the data do we see the reasons for this? I pored over the data at length, hoping any feature might pinpoint the reason. I became paranoid, delirious: “She rated him a ‘10’ in intelligence during the event but maybe that was just because she thought he was attractive! What did this John even look like? Oh, he’s from Wisconsin?!”

The features were interrelated, confounding even, and the utter romantic in me sighed in dismay as none seemed to capture the spark, the koi no yokan, or whatever it was, that drew two strangers to check that ‘yes’ in the decision box.

Well, if I couldn’t trust the participants’ ratings of the individual traits of their speed-dating partners to have come from independent conclusions (they didn’t) nor to be carefully, consciously considered with the foresight that one day an amateur data scientist would sweat buckets in the wee hours deliberating over their across-the-board ratings of a 7 (she was going to say “8” on ‘ambitiousness’, you know), could I instead trust that the collective population of the experiment might betray some underlying differences according to their gender? After all, it was a heterosexual speed-dating experiment with basically equal observations in both ‘classes’ of gender. A perfect target! So perfect that for the first week of looking at the data I was certain my model was overfit (imagine a scatterplot with a line that tries to go through every single dot). It turns out, stark differences exist when it comes to the prioritization of ‘dateable’ traits according to men and women.

The Model

A Random Forest Classifier seemed the perfect algorithm upon which to predict our unholy grail of a target: male or female.

‘You can’t trust the individual but the crowd seems alright.’

This about describes both the algorithm and the dataset, funnily enough. Random Forest Classifier takes small samples from the dataset at large. Each of these becomes a tree, or ‘estimator’, which makes decisions, works through its sample and arrives at a result. While some results are wrong the majority will lead to a pretty good prediction. In the same way, I dearly hoped that the observations from the dataset population as a whole would be insightful.

Data Processing

Using Python, I loaded and began to sift through the mere and dear 197 columns, or ‘features’, of the dataset, encountering a high number of missing values (called ‘nulls’). There was leakage everywhere. Leakage are features that are recorded during or after the event took place and would not be available in future. And truly, all I wanted to know was “beforehand, what were these people thinking?” I didn’t want to know how they were feeling during the event. Or 3 weeks after. Thus, only 12 features were truly important to me in that moment, having to do with the following questions:

After many tears and much bloodshed the truly relevant features could finally shine through.

With my clean data in hand, I was ready for predictive modeling à la Random Forest Classifier.

If only all decision-making came down to having 100 less-informed versions of you making one collectively good decision. Ah, simplicity.

We can see that decisions are made based on a threshold, with samples being sorted into smaller and smaller samples until all observations belong to a class.

The accuracy of this model on the data with no tuning of its hyper-parameters (specifications) is around 95%. With tuning, 97–99%. Taking a look at its permutation feature importances allowed me to glean insights about the effect that the individual features have on the data, via their absence, essentially.

Case closed. Attractiveness is shockingly the most important trait of the six. Without knowing how important attractiveness is to a person, for example, our model would perform much, much worse in predicting their gender (if you’re shifting uncomfortably in your seat you know what’s coming).

Insights in Gender Differences

Finally, what we’ve all been waiting for!

That’s right; the ol’ box-and-whisker plot. The median (most central value of the ratings) of this plot is designated by the line that runs horizontally through the red or blue box. The box itself represents the middle 50% of all the data for that feature, according to gender. Surprisingly, this perhaps old-fashioned, un-flashy little ditty sang a rather interesting tune.

The interest lies in the difference between women’s preferences for dateable traits, versus women’s expectations of what men considered the most dateable traits. And vice versa.

The men and women of the dataset were pretty good in their assessment of the opposite sex — regarding the importance of ambition to each other, shared interests and how ‘fun’ they were.
Both genders gave a relatively high priority to intelligence. Yet both men and women thought that the opposite sex cared less about intelligence than they did, on average.
Men thought that women cared as much about attractiveness as they did. Apparently, they did not. Women, on the other hand, seemed to have a rough idea of the dog-and-pony show to be had at this dubious ‘speed-dating event’, yet still even more greatly overestimated how important attractiveness was to the men in the study (crazy, I know).
Finally, when it came to sincerity, women slightly underestimated its importance to men. Perhaps they were hedging all their bets in their expectations of men regarding that oh-so-seductive attractiveness trait preference. We’ll never know.

Lessons

Data cleaning, while thrilling, will in fact take 80% of your energy and time as a sacrifice before it allows you to come crawling back on your hands and knees offering various learning models for it to try on that it will probably hate.
A dataset’s relevance to you for exploration may not be at all what either you nor the creators of the dataset intended. And that’s okay. In fact, it’s more likely. Time-wise, plan accordingly (note to self).

Future Project

Plotly Dash, I’m looking at you. Together we’ll create the best heterosexual-male/female-opposite-sex-dating-priority predicting application out there on the market today.