In 1948, a group of researchers knocked on doors in Framingham, Massachusetts, and recruited 5,209 residents for a heart study.[1] They measured blood pressure, cholesterol, weight, and inquired about smoking habits. They monitored the health status of this cohort for years, then their children, then their grandchildren, three generations encompassing nearly 15,000 participants across seven decades. The Framingham Heart Study became the foundation for nearly everything we know about cardiovascular risk. It's why your doctor checks your cholesterol. It's why you know that high blood pressure is bad. It's the reason the phrase "risk factor" exists in medicine at all.

In a separate foundational study that defined "normal" reproductive hormone levels, how many women do you think comprised that cohort?

Just thirty-seven.

In 1975, Sherman and Korenman published what became the canonical study of the menstrual hormone cycle, the paper that textbook diagrams of estrogen and progesterone curves are based on, the reference that shaped decades of clinical understanding.[2] Their sample size: 37 women. All from one hospital, and all presumed to have "normal" menstrual cycles.

Cardiovascular medicine got a seventy-year, multi-generational research program. Reproductive endocrinology got a single study that would, in most other fields, be considered pilot data, and then the field largely moved on.

This essay is about the distance between those two levels of investment.

The Studies That Built the Textbooks

The clinical reference ranges your doctor uses when interpreting reproductive hormone labs (the ranges that determine whether your estrogen is "normal" or your progesterone is "low") were not established through large-scale population-level studies. They were established through studies that would, in most other areas of medicine, be considered preliminary or pilot data, likely underpowered to make any statistically-significant inferences, much less inform the standard of health for women's reproductive systems.

Sherman and Korenman collected daily blood samples from 37 women across complete menstrual cycles at a single medical center.[2] Daily blood draws for an entire cycle is burdensome (needles every morning for a month), and the fact that they collected this data at all was truly a noteworthy feature of their work. But the result was a single composite curve, averaged across 37 women, presented as what the "normal" menstrual cycle looks like. That curve became the diagram in every endocrinology textbook. It became the standard.

In 2006, Stricker and colleagues conducted a similar study to establish reference ranges for reproductive hormones using the Abbott ARCHITECT immunoassay.[3] Their sample size: 20 women. Twenty volunteers from a single center in Geneva, ages 20 to 36. These values are now used by laboratories worldwide to determine whether or not your hormones are within the "normal" range.

McLachlan and colleagues, in 1990, built mathematical models of follicle-stimulating hormone (FSH) and luteinizing hormone (LH) dynamics, the two hormones that orchestrate ovulation, using data from approximately 40 women with regular 28-day menstrual cycles.[4]

When the FDA establishes pharmacokinetic reference values for a new drug, they typically require hundreds or thousands of participants and a diverse study population. When clinical governing bodies set the reference ranges for cholesterol or blood glucose, they draw on datasets with tens of thousands of people spanning different ethnicities, ages, and conditions. For the hormones that drive the menstrual cycle, the clinical standard was built on relatively small groups of 20 to 40 women, from single centers, in Western countries, with primarily monoethnic backgrounds.

These were necessary studies for their time. The problem is that we never followed up with larger, better-powered ones. We took pilot-level data and enshrined them in official guidelines and recommendations.

The Variation We're Ignoring

Small samples typically fail to capture variation. When your reference range is built from data on 20 people, the range reflects the variation inherent to those particular 20 people: their genetics, their diets, their socio-cultural background. Anyone who falls outside of that window then looks "abnormal", even if they're perfectly healthy for their body.

We can use ethnicity as an example. A study of over 1,600 women in the Study of Women's Health Across the Nation (SWAN) cohort found that African-American women had estradiol levels approximately 18% higher than Caucasian women during the early follicular phase.[5] Eighteen percent is not a rounding error. If a doctor compares an African-American woman's estradiol to a reference range derived from a predominantly Caucasian Swiss cohort of 20 people, the comparison is misleading before it begins.

Geography matters, too. A 2022 study establishing reference intervals for reproductive hormones in Peruvian women found that estradiol values differed enough from manufacturer-derived ranges such that population-specific validation was clinically necessary.[6]

Then there's intra-individual variation. The SWAN Daily Hormone Study tracked 848 women through daily urine collections and documented enormous variability in hormone surge timing and peak levels from one cycle to the next within the same person.[7] Your progesterone surge this month might look nothing like your progesterone surge last month. Both are normal. But a single-point measurement compared to a static reference range built from 20 people can't account for any of this.

How Did This Happen?

In 1977, the FDA issued a policy that effectively excluded women of "childbearing potential" from early-phase clinical trials.[8] The stated concern was protecting potential fetuses from experimental drugs, a legitimate concern addressed in what amounted to an overreaching exclusion of women from the opportunity to be studied and understood. Rather than developing protocols to mitigate reproductive risk while including women in clinical research, the field simply excluded them. For nearly sixteen years.

The NIH Revitalization Act of 1993 reversed this policy, mandating the inclusion of women and minorities in federally funded clinical research.[9] But mandating the inclusion of women and actually building the infrastructure to study cyclical biology are two different things. The menstrual cycle is inconvenient to study. Hormones fluctuate daily, sometimes hourly. Cycles vary in length. Proper characterization requires dense longitudinal sampling, which is expensive and exhausting for participants. Studying men, whose hormonal profiles fluctuate comparatively less day to day, was simpler. So even after the legal barrier dissolved, the practical ones remained.

In 2015, the American College of Obstetricians and Gynecologists (ACOG) published a committee opinion recommending that the menstrual cycle be treated as a vital sign,[10] as fundamental to a health assessment as heart rate or blood pressure. The fact that this recommendation was noteworthy in 2015 tells you how far behind the field was.

Apps Track Cycles. Nobody Tracks Hormones.

The obvious rebuttal: what about cycle tracking apps? Natural Cycles has collected data from millions of users. Clue has one of the largest menstrual health datasets in the world. Apple's cycle tracking feature reaches hundreds of millions of devices.

These apps track cycle length, period duration, and symptoms. Some track basal body temperature (BBT). What they don't track is the actual hormone concentrations. They track the shadow of the cycle, but not the cycle itself.

Knowing that your period started on day 28 doesn't tell you what your estrogen did on day 12. Knowing that your BBT rose by 0.3 degrees confirms that ovulation likely occurred, but it says nothing about whether your progesterone response was adequate or whether your LH surge was shaped normally. It's the difference between tracking weather by recording whether it rained and tracking weather by measuring atmospheric pressure, humidity, temperature, and wind speed. One gives you a diary. The other gives you a forecast.

We have diary data at scale. We have almost no forecast data at all.

The Consequences

Polycystic ovary syndrome (PCOS) affects roughly 8 to 13 percent of reproductive-age women globally.[11] It's one of the most common endocrine disorders. An estimated 50 to 75 percent of cases go undiagnosed.[11] PCOS is a hormonal condition diagnosed, in part, by hormone levels. If the reference ranges used to interpret those levels are based on 20 people from a single center, the diagnostic threshold itself is suspect.

Endometriosis affects approximately 190 million women worldwide. And the average time to diagnosis: seven to ten years.[12] Seven to ten years of pain, of being told things are normal, of appointments that result in no symptomatic relief because the tools for seeing the problem aren't there.

Luteal phase deficiency, or inadequate progesterone production after ovulation, can cause infertility and early pregnancy loss. Diagnosing it requires understanding what normal progesterone patterns look like. Our definition of "normal" comes from 20 to 40 women studied decades ago.

The data gap is not an academic curiosity. It shapes diagnostic thresholds, treatment timelines, and clinical decisions for conditions that affect hundreds of millions of people.

What's Starting to Change

The SWAN Daily Hormone Study remains one of the largest dense-sampling hormone datasets ever assembled, following 848 women through daily urine collections.[7] The limitation: participants were all between 42 to 52 years of age, likely somewhere within the menopausal transition. That study informed a great deal about what we know about perimenopause, but almost nothing about a 25-year-old.

Oova, a quantitative hormone testing company, has reported collecting data from over 30,000 menstrual cycles, likely the largest real-world hormone dataset assembled outside traditional clinical infrastructure. This is genuine progress.

At Clair, we're approaching the problem from a different angle entirely. Rather than asking women to collect daily samples, we're developing noninvasive continuous hormone inference from wearable sensor data, using the physiological signals the body produces to estimate what hormones are doing in real time. If wearable-based hormone estimation works at scale, it removes the sampling burden that has bottlenecked the field for fifty years. No blood draws, no urine collections, no daily clinic visits.

But the broader point isn't about any single company or dataset. The infrastructure for understanding reproductive hormones at a population level still barely exists. Cardiovascular medicine has Framingham, the Multi-Ethnic Study of Atherosclerosis (MESA), the Atherosclerosis Risk in Communities Study (ARIC), and the UK Biobank. Reproductive endocrinology has a handful of studies with sample sizes in the double digits and one cohort limited to women over 42.

It's 2026

We have population-scale data on cholesterol, blood pressure, glucose metabolism, liver function, kidney function, thyroid function, vitamin D, iron, and dozens of other biomarkers. These reference ranges are built from studies with thousands or tens of thousands of participants, validated across diverse populations, and refined across decades.

For reproductive hormones, the system that governs fertility, influences mood, shapes metabolism, modulates immune function, and affects the daily lived experience of roughly half the human population, we have reference ranges extracted from 20 to 40 women.

It's 2026. We sent a rover to Mars. We sequenced the human genome in a day. We have artificial intelligence that can diagnose retinal disease from a photograph. And we still don't know what "normal" estrogen looks like across a representative population of women.

This isn't a technology problem. The technology to collect these data has existed for decades. This is a priority problem. And priorities can change. They changed for cardiovascular health. They changed for cancer. They changed for diabetes. They can change for this as well.

It requires deciding that the hormonal health of half the population deserves the same scientific investment we've given to everything else.

We can do better.


Clair is building the first noninvasive continuous hormone tracker. Follow our progress at wearclair.com.

References

  1. Dawber TR, Meadors GF, Moore FE. Epidemiological approaches to heart disease: the Framingham Study. Am J Public Health. 1951;41(3):279-286.
  2. Sherman BM, Korenman SG. Hormonal characteristics of the human menstrual cycle throughout reproductive life. J Clin Invest. 1975;55(4):699-706.
  3. Stricker R, Eberhart R, Chevailler MC, Quinn FA, Bischof P, Stricker R. Establishment of detailed reference values for luteinizing hormone, follicle stimulating hormone, estradiol, and progesterone during different phases of the menstrual cycle on the Abbott ARCHITECT analyzer. Clin Chem Lab Med. 2006;44(7):883-887.
  4. McLachlan RI, Cohen NL, Dahl KD, Bremner WJ, Soules MR. Serum inhibin levels during the periovulatory interval in normal women: relationships with sex steroid and gonadotrophin levels. Clin Endocrinol (Oxf). 1990;32(1):39-48.
  5. Randolph JF Jr, Sowers M, Gold EB, et al. Reproductive hormones in the early menopausal transition: relationship to ethnicity, body size, and menopausal status. J Clin Endocrinol Metab. 2003;88(4):1516-1522.
  6. Moya-Salazar J, Cerda SP, Cañari B, Moya-Salazar MM, Contreras-Pulache H. Reference intervals of the sex hormonal profile in healthy women: A retrospective single-center study in Peru. Heliyon. 2022;8(9):e10592.
  7. Santoro N, Crawford SL, El Khoudary SR, et al. Menstrual Cycle Hormone Changes in Women Traversing Menopause: Study of Women's Health Across the Nation. J Clin Endocrinol Metab. 2017;102(7):2218-2229.
  8. FDA General Considerations for the Clinical Evaluation of Drugs, 1977. HEW(FDA) 77-3040.
  9. NIH Revitalization Act of 1993, Public Law 103-43.
  10. American College of Obstetricians and Gynecologists. Menstruation in girls and adolescents: using the menstrual cycle as a vital sign. Committee Opinion No. 651. Obstet Gynecol. 2015;126(6):e143-e146.
  11. Bozdag G, Mumusoglu S, Zengin D, Karabulut E, Yildiz BO. The prevalence and phenotypic features of polycystic ovary syndrome: a systematic review and meta-analysis. Hum Reprod. 2016;31(12):2841-2855.
  12. Nnoaham KE, Hummelshoj L, Webster P, et al. Impact of endometriosis on quality of life and work productivity: a multicenter study across ten countries. Fertil Steril. 2011;96(2):366-373.