Do speech behaviours related to confidence and uncertainty vary between men and women?
Among all species on Earth, humans have a unique capability of communication using a symbolic communication system, i.e., verbal and written language1. The highly sophisticated language enables humans to communicate in a very precise and complex manner. Still, communicative speech acts seem to differ between genders. One of the major differences in women and men’s speech is that men have been found to dominate conversations through the use of interruptions and overlaps2. Additionally, men use strong expletives, while women use politer versions.
In this project, we investigate the variety of speech that is related to a specific gender, social norms and variations in the use of language among those genders. We suppose men and women have different speech behaviours, women talk with more uncertainties (doubts). For example, we expect a woman to say “I expect this to do that” while a man would rather say “I know this does that”. Our idea is therefore to analyse whether there is a real difference between genders and, if so, to what extent it is the case.
We are interested in using this dataset to answer the following question:
To answer this question, we'll go through the following points:
In the following, we analyse the data from Quotebank, an open corpus that gathers 178 million quotations (attributed to speakers) from 2008 to 2020. Still, in this project, we will only focus on the most recent quotations, being from 2015 to 2020. We combine this dataset with speakers’ information from Wikidata, a collaboratively edited open source knowledge base.
To have a general overview of the speakers’ occupations, we focus on four main professional fields: arts, science, economy and politics. Our speakers are then regrouped under professions from each professional field. Additionally, to determine the roles of nationality, religion and education in determining a possible cultural gender difference in communicative acts, we selected a general data frame with no condition on the profession.
To analyse speech uncertainty, we adapted an already existing uncertainty detection classifier3, using 6 features. Uncertainty is defined by speculative verbs (like suggest or presume), adjectives and adverbs (like probably or possibly), auxiliary verbs (like must or should) or the use of some tense or modes of conjugation (subjunctive, conditional). This classifier is an automatic machine learning method to detect uncertainty in natural language.
Before starting to investigate our research questions, let's have a look at what our dataset looks like.