This guide outlines a framework for how to ask questions about gender, and consider wider issues of sex and gender, in survey planning and analysis. It’s an attempt to address the complexity of an issue which has, until recently, been considered straightforward and uncontroversial.
“Nothing about us, without us”
The first questionnaire to ask respondents about their gender was the US Census in 1790, which included categories of “Free White males of 16 years and upward,” “Free White males under 16,” and “Free White females.” For much of the time since then, gender information has been collected in a perfunctory way. Often it’s a simple component of demographic information, but sometimes it’s a precursor to the collection of sex- or gender-based information, for example on pregnancies. Separating the population into two immutable categories of men and women (or, interchangeably, males and females), assigned at birth, has been uncontroversial for most of history. Although transgender, intersex, and other people who don’t fit the boxes so neatly have existed throughout history, they’ve been marginalized. It’s only in recent years that this has changed, bringing with it a need to consider what the existence of gender-variant people implies for how we collect data on individuals. It raises questions about how to collect, analyze and present data in a way that reflects the diversity of the population.
Personal note: I am a transgender woman, assigned male at birth, so I have a personal stake in this. I want to be included in surveys, and I want to be included accurately. I’ve seen a lot of very good and very bad attempts to do this in the last few years. There will be a few personal observations, which will be in italics and should be interpreted as “Source: me.” I don’t speak for the trans community, or for anyone else other than myself.
Background: the US trans community
The best national estimate of the transgender (or trans) population is from survey data compiled by the Williams Institute. An estimated 1.6 million people aged 13+ in the US describe themselves as transgender. It’s a small minority, but not a tiny one — about 0.6% of the US population. The trans population skews young, at least partly because of stronger social disapproval in earlier years. Older trans people may have made their peace with living as a perceived cisgender person and might not report being transgender. Much like the rest of the LGBTQ community, the trans population is not geographically uniform. Many trans people move to, or are more able to come out as trans in, larger cities, and in states that are legally and culturally more friendly, although this is far from universal. The community is divided roughly equally among transgender women, transgender men, and nonbinary and other gender identities.
The appropriate terminology to describe trans and gender-diverse people is an evolving issue, both within and beyond the LGBTQ community. The vocabulary has changed dramatically in the last couple of decades, and it continues to evolve. These are current terms (as of 2024) and may not align fully with sources written in the past, even as recently as a few years ago. It’s important to remember that this text is written from the perspective of a middle-aged white member of the trans community, in the United States, in 2024. Our community exists globally, and across history, and the terminology below is not universal – it may not represent the experiences of others.
- Sex and gender: The English language has evolved in the last couple of decades, and these terms, once considered almost identical in everyday use, have diverged. Gender is now used to describe a person’s position in society (which is NOT the same as sex/gender stereotypes: many people do not conform to these). “Sex” has come to mean “biological sex” – although as described below, this is somewhat misleading.
- Assigned gender at birth: the gender marked on one’s birth certificate, as issued at the time of birth. Trans folks will sometimes use the acronyms AFAB (assigned female at birth) or AMAB (assigned male at birth) for this. If you need to know the respondent’s birth gender, don’t ask for “sex/gender on your birth certificate;” 16 states allow people to change their birth certificate to “non-binary” or “intersex.” “Original birth certificate” would work better. Note that this may be a sensitive topic, as it has the potential to “out” someone as trans. It’s important to promise confidentiality.
- Cisgender or cis: a person whose gender as experienced now matches their assigned gender at birth. This is the majority of the population. Some cisgender people are gender-nonconforming (see below). Personal note: there have been attempts by a few influential people to rebrand “cis” as a slur. This hasn’t got much traction outside the “don’t call me straight, I’m just normal” fringe, but there may be very occasional push back from respondents.
- Biological man/woman: Often used to indicate a cis person, especially in the media. However, this term is problematic, and I recommend not using this language. “Biological sex” is multifaceted: external phenotype, internal organs, hormonal systems and chromosomes are all aspects of it, and each aspect would divide the population in a somewhat different way. Even at the level of chromosomes, not all cis men have XY sex chromosomes, and not all cis women are XX. “Cis,” or “gender assigned at birth,” would be a better way of identifying a person’s original gender. Personal note: I’d argue that all men and women are biological, and will continue to argue this until I meet a steam-powered one.
- Transgender or trans: someone whose gender as experienced now does not match their assigned gender at birth. A trans woman is a woman who was assigned male at birth, and a trans man is a man who was assigned female at birth. Note that trans is an adjective: “trans man” is correct usage; “transman” is not.
- Nonbinary: someone who describes their gender as neither man nor woman. Many nonbinary people would also describe themselves as trans, but not all do. Don’t make assumptions.
- Gender nonconforming: someone who does not feel comfortable with the societal attributes of their gender, and/or someone whose visual presentation is not typical of their gender. Sometimes trans, sometimes not. Sometimes cis, sometimes not. Sometimes nonbinary, sometimes not. “Transgender” and “gender nonconforming” are independent traits. At the simplest level, don’t assume a woman with a shaved head is a trans man. There are other ways that people who aren’t cis might describe themselves, such as genderqueer, genderfluid, xenogender, and others. This is evolving as the community evolves; it’s helpful to give survey respondents a “write-in” option to allow for this, even if they’ll be classed as “other” in reporting and analysis.
- Transsexual: a somewhat outmoded term, dating from the days when the medical community divided trans people (primarily trans women) into transsexuals and transvestites (which often also meant drag queens and cross-dressers), based on whether they’d had gender-affirming surgery. Please don’t use it, and please don’t ask about surgeries unless you REALLY need this information (for example, in a health questionnaire asking for medical history). Note also that trans and non-binary status is not defined by surgical history, or by whether the person is taking hormones or other medication.
- Deadname: a transgender person’s name in their “old” gender, prior to transition. A person’s legal name may or may not be their deadname. It’s considered the height of rudeness to ask about this, so please don’t unless it’s absolutely necessary.
- Identify as: this is somewhat loaded language and is best avoided unless it’s used consistently for all attributes, not just those regarding gender. For example, it’s very common in the media for trans people, and only trans people, to be described as “identifying as a woman/man”; it’s much rarer to see “identifying as tall”, for example. Personal note: I hate this phrase. I identify as transgender in exactly the same way that I identify as living in Chicago.
- Intersex or differences in sexual development (DSDs): a person born with a congenital variation of sex characteristics, which may include chromosomes, gonads, genitals, or a combination. Intersex people may describe themselves as cis, trans, nonbinary, or simply as intersex. Many people, especially with chromosomal variations, may not realize that they are intersex. In addition, some people with DSDs reject the term “intersex.”
- Sexual orientation: this won’t be explicitly discussed in this article, but it’s important to remember that trans people can be of any sexual orientation. A person being trans is independent of them being lesbian, gay or bisexual, and it is also independent of whether they were lesbian, gay or bisexual before transition.
Note that in the rest of this piece, I use “trans people” and “the trans community” as shorthand for “people who are transgender, nonbinary, and/or other gender variants.”
Principles and best practice for data collection
Simply asking people whether they are male or female is fraught with difficulties. So how can we collect and present data about gender in a way that captures the complexity of reality? First, and this is absolutely fundamental; understand the scope and limitations of your data, and have a clear plan for how to use them. The approaches for a population-based study and a community-based study will be different. Reasons for asking about gender include:
- to compare outcomes by gender;
- to record gender for use as a risk factor or confounder in modeling;
- tracking survey response, or determining study eligibility, by gender;
- determining which questions should be asked of the respondent;
- ensuring respondents get the appropriate medical or other care.
The study aims will impact the specific questions included. Questions on a medical survey may have to be written very differently to those on a more general social or economic questionnaire. For example, asking about a person’s medical history would probably be inappropriate on the latter.
If you’re working with a sample of the general population, the trans community will be a small proportion of the respondents. Unless your sample size is very large, the ability to conduct statistical analysis specifically of trans people beyond simple descriptive measures will be limited. This may not be the case for a more focused community study, such as visitors to an LGBTQ community center — or, obviously, a study specifically aimed at trans people.
But even if the number of trans respondents is too small for statistical analysis, omitting trans respondents — either explicitly or by asking exclusionary or insulting questions about gender — may bias the results. Clumsily-framed questions on gender can adversely affect the response rate for trans people. It’s also important that, while trans people might be familiar with most of the terms I’ve used above, many of them might be completely unknown to cis respondents — and we don’t want them to either get confused or refuse to answer. For example, “gender on original birth certificate” may be more understandable than “assigned gender at birth.” Data collection must ensure the whole population is represented.
The National Academy of Sciences (Measuring Sex, Gender Identity, and Sexual Orientation | The National Academies Press) developed five guiding principles for human-subjects data collection, which should be considered when preparing survey questions for trans people and for members of other marginalized communities. Descriptions below are from the NAS publication, lightly edited.
- People deserve to count and be counted (inclusiveness). Everyone should be able to see themselves, and their identities, represented in surveys and other data collection instruments.
- Use precise terminology that reflects the constructs of interest (precision). Sex and gender are complex and multidimensional, and identifying the components of these constructs that are of interest and measuring them using appropriate terminology is critical for collecting reliable data. Questions should clearly specify which component(s) of sex and gender are being measured, and one construct should not be used as a proxy for another.
- Respect identity and autonomy (autonomy). Questions about dimensions of identity, by definition, are asking about a person’s sense of self. Data collection must allow respondents to self-identify whenever possible, and any proxy reporting should reflect what is known about how a person self-identifies. All data collection activities require well-informed consent from potential respondents, with no penalty for those who opt out of sharing personal information about themselves or other household members.
- Collect only necessary data (parsimony). Data should only be gathered in pursuit of a specific and well-defined goal, and data that are not essential to achieve that goal should not be collected.
- Use data in a manner that benefits respondents and respects their privacy and confidentiality (privacy). After collection, aggregate data should be analyzed at the most granular level possible, and research findings should be shared with respondents and their communities to ensure that they benefit from the data they have shared. Throughout all the steps of analysis and dissemination, data on sex and gender, which may be sensitive and vulnerable to misuse, has to be analyzed, maintained, and shared only under rigorous privacy and confidentiality standards.
Additionally, and I can’t stress this enough, talk to trans people! Preferably several trans people, but one person as an absolute bare minimum. You will reduce the risk of low-quality data down the line.
Examples of survey questions
Below is discussion of a selection of survey questions used to collect data on gender.
The traditional approach
A single sex/gender question was, until recently, all but universal:
What is your sex?
- Male
- Female
This is outmoded and fails to collect data that reflects reality. There have been several attempts to formulate better questions. Here are a few examples.
A stand-alone question: CDC 2021
The 2021 Behavioral Risk Factor Surveillance System (BRFSS) Questionnaire (CDC – BRFSS). Earlier versions of this survey are the source of the population estimates quoted in the introduction.
Do you consider yourself to be transgender?
- Yes, Transgender, male to female
- Yes, Transgender, female to male
- Yes, Transgender, gender nonconforming
- No
- Don’t know/not sure
This is a reasonably good way to collect the information in a single, select-one-option only, question. Including a “nonbinary” answer could improve the options for some people who might not see themselves as trans, but who aren’t cisgender either. An “other: please specify” option and text box is also always a good idea, as it is difficult to provide an exhaustive (and time-invariant) list of options. A better question incorporating these options might be:
Do you consider yourself to be transgender or non-binary?
- Transgender male to female
- Transgender female to male
- Transgender, gender nonconforming
- Nonbinary
- Different identity (please state): _______
- No
- Don’t know/not sure
While this version covers additional cases, there will still be people who feel they need to give multiple answers here. Allowing “check all that apply” provides more flexibility.
Two questions: Williams Institute 2014
The influential and generally excellent, if slightly dated, 2014 report from the Williams Institute (Best Practices for Asking Questions to Identify Transgender and Other Gender Minority Respondents on Population-Based Surveys (GenIUSS) – Williams Institute (ucla.edu)) suggests two different two-question approaches. Both begin with:
What sex were you assigned at birth, on your original birth certificate?
- Male
- Female
Followed by either:
Option 1: How do you describe yourself? (check one)
- Male
- Female
- Transgender
- Do not identify as female, male, or transgender
Or Option 2: What is your current gender identity? (Check all that apply)
- Male
- Female
- Trans male/trans man
- Trans female/trans woman
- Genderqueer/gender non-conforming
- Different identity (please state): _______
Option 1 is often used, but the “check one” structure may not allow people the flexibility to express their identity fully. Most trans people would NOT regard themselves as neither male nor female, and many nonbinary or gender-nonconforming people would ALSO describe themselves as transgender (and sometimes also male or female). I do not recommend using option 1, despite its common use. Personal note: If I can’t answer “Female” and “Transgender,” I’m not going to answer this. I’m not a third gender.
Option 2 is better, although “how do you describe yourself?” may be a more understandable opening line than “what is your current gender identity?” “Check all that apply” allows respondents to describe their gender status fully. Note also the write-in option at the bottom. Option 2 could be further improved by adding “nonbinary” to the fifth option to better reflect current language. “Check all that apply” questions are a little more difficult to analyze because they require setting up a series of binary variables instead of a single categorical one. However, they result in better data collection and are more in accord with the first two National Academy of Sciences principles, as described above.
Option 2 illustrates how much has changed in the trans and gender-diverse community. The report recommending these questions is from 2014; from a 2024 perspective some of the language has moved on, especially in its use of “gender nonconforming” as a virtual synonym for “nonbinary.” It shows the importance of seeking input from the LGBTQ community when preparing to collect data. The change in terminology over time has implications for panel and longitudinal studies, where we usually want to keep questions consistent. However, as the vocabulary becomes less current, the quality of data will decrease. It’s better to make sure the language in questions reflects current usage. It’s generally possible to map at least the major categories from one change to the next.
These questions raise an issue of what to do with the responses of people who give a different gender in the second question (male at birth, female now, or the reverse), without stating they are transgender. It would be impossible to distinguish a trans person who missed checking the “transgender” option, from a cis person who had erroneously checked the wrong gender on one of the questions.
Two questions: Gendered Innovations, 2022
A variant of the above questions is proposed by Gendered Innovations (Surveys| Gendered Innovations (stanford.edu), 2022):
What sex were you assigned at birth?
- Female
- Male
- Intersex
- A sex not listed here (please specify): _______
What is your current gender identity? (Please select all that apply)
- Woman
- Man
- Non-binary
- Genderqueer
- A gender identity not listed here (please specify): _______
- Prefer not to state
This makes a useful distinction in the language used between assigned sex at birth (female/male) and current gender identity (woman/man…). However, the terms “assigned sex at birth” and “current gender identity” may not be well-known in the general population, which may lead to confusion for survey respondents. Interestingly, the term “transgender” is not used at all in these questions — trans people would be identified as those entering different gender at birth and current gender. Again, this would make it hard to distinguish trans people from people making data entry mistakes.
Recommended questions
Given the issues discussed above, I recommend the follow two-question series for gathering sex/gender information in most situations:
What sex was entered on your original birth certificate?
- Male
- Female
How do you describe your gender now? (Check all that apply)
- Man
- Woman
- Transgender
- Nonbinary/genderqueer/genderfluid
- Something else (please state): _______
These questions cover the major gender-identity categories, while keeping the language straightforward and current. They are also unlikely to cause confusion among cisgender people, maximizing the likelihood of accurate responses across the whole population. Note that these questions haven’t been piloted. If anyone would like to test these for response rate and clarity in a sample population, I’d be interested in working with you.
In addition to questions about gender, also consider how sex/gender affects the rest of the questionnaire. For example, many questionnaires have skip patterns, in which some questions are only asked of subsets of respondents. Questions asked about, say, pregnancy, which are only asked of cis women (“Males: skip the next two questions and go to Question 7”), may bias the results. As an example of what could go wrong here, it’s important to remember that some trans people will, at various points in their lives, be screened for both prostate cancer and breast cancer.
Reporting on trans people in results
How should trans people be represented in the survey analysis or results? In a sample drawn from the general population, trans people will be a small proportion of the participants. This means the statistical power to make inferences about the trans population is likely to be extremely low; in other words, any differences between the trans and cis populations would have to be large in order to be statistically important.
There is an additional statistical issue to consider. It’s especially important to reduce the rate of false-positive responses when investigating small groups within a population. If 1% of the cohort is trans, but the questions on trans identity have a 2% false-positive rate, there will be more “trans” respondents who are actually cis, than there are trans people in the survey. If 2% of cis people are incorrectly answering the questions, that will be roughly the same number of responses as the number of trans people in the survey cohort. To minimize the likelihood of this, it’s extremely important to keep questions clear and easily understandable for cis people who may have little or no knowledge of the trans community.
Given that the trans population is relatively small, there’s sometimes an argument for not reporting separate statistics for trans people due to data privacy or statistical robustness concerns. In that situation, the most appropriate way to group respondents is dependent on the aims and research questions. In most circumstances, combining all women (cis and trans) and all men (cis and trans) together is more appropriate than combining cis men and trans women. Nonbinary people, and others who are neither men nor women, could then be combined in an “other” category if their numbers are small. In health-related studies dealing with specific medical issues such as cancer screening or access to gynecological services, grouping cis men and trans women may be more appropriate. As with the questionnaire structure, the research question should guide the choice.
Final thoughts
Asking inappropriate, outdated, or clumsy questions about gender can be extremely alienating. Personal note: I’ve seen quite a few questionnaires that I would refuse to answer. Survey response options should cover the full demographic range of the survey cohort. Including all gender-variant people may seem intimidating, but with care, it’s not difficult.
Personal note: I’m a transgender woman, and so do not have direct knowledge of the experiences of trans men, or people in the nonbinary or intersex/DSD communities. I apologize for any inaccuracies, or any areas where I might have glossed over issues or been misleading. Please contact me if you would like to discuss and/or suggest any changes to this guide.
Need more help?
Request a free consultation with our statistics and data science consultant teams. We’re happy to assist you with any current or future projects.