Correlation between Mean age at first marriage and Contraceptive prevalence

by Devon Ankar for class DSE 6000 @ Wayne State University

Mean age at first marriage of women from gapminder.org

Contraceptive prevalence (% of women ages 15-49) from gapminder.org via World Bank - percentage of women who are practicing, or whose sexual partners are practicing, any form of contraception

Question: Does mean age at first marriage correlate with contraceptive prevalence? Is it a positive or negative correlation?

Initial Hypothesis: Mean age at first marriage correlates positively with contraceptive prevalence. That is, countries with higher mean age at first marriage will also have higher contraceptive prevalence.

In [117]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

age_at_marriage = pd.read_csv('age_at_marriage.csv')
contraceptive_prevalence = pd.read_csv('contraceptive_prevalence.csv')
In [118]:
age_at_marriage.head(5)
Out[118]:
Year Afghanistan Albania Algeria Angola Argentina Armenia Australia Austria Azerbaijan ... Vanuatu Venezuela West Bank and Gaza Western Sahara VietNam Virgin Islands (U.S.) Yemen, Rep. Zambia Zimbabwe MEAN
0 1960 NaN NaN NaN NaN NaN NaN 21.6 24.01 NaN ... NaN NaN NaN NaN NaN 20.828149 NaN NaN NaN 21.347796
1 1961 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 19.020287
2 1962 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 21.267773
3 1963 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 22.100000
4 1964 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 17.540476

5 rows × 187 columns

In [119]:
contraceptive_prevalence.head(5)
Out[119]:
Year Afghanistan Albania Algeria American Samoa Andorra Angola Antigua and Barbuda Argentina Armenia ... Uzbekistan Vanuatu Venezuela Vietnam Virgin Islands (U.S.) West Bank and Gaza Yemen, Rep. Zambia Zimbabwe MEAN
0 1960 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1961 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1962 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1963 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 1964 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 215 columns

In [120]:
age_at_marriage.shape
Out[120]:
(46, 187)
In [121]:
contraceptive_prevalence.shape
Out[121]:
(46, 215)
In [122]:
age_at_marriage.info()
contraceptive_prevalence.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 0 to 45
Columns: 187 entries, Year to MEAN
dtypes: float64(186), int64(1)
memory usage: 67.3 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 0 to 45
Columns: 215 entries, Year to MEAN
dtypes: float64(214), int64(1)
memory usage: 77.3 KB

There are a lot of years and countries for which data is missing. Epidemiological data can be difficult to obtain so it is understandable that there is missing data. Since we are only looking for overall trends, it is ok that there is some data missing. We will see if this becomes a problem later.

In [124]:
#age_at_marriage = age_at_marriage[age_at_marriage['MEAN'] > 0]
#contraceptive_prevalence = contraceptive_prevalence[contraceptive_prevalence['MEAN'] > 0]

Create a plot of year vs. mean age at first marriage across all countries. We want to get a quick visual if there is a correlation.

In [125]:
sns.lmplot(x='Year', y='MEAN', data=age_at_marriage,
           fit_reg=True,
           )
Out[125]:
<seaborn.axisgrid.FacetGrid at 0x1f5058fe0b8>

As expected, the mean age at first marriage increases with the year. That is, as the years progress, women tend to marry at later ages. This is true as an average across all countries, though it may not be true for individual countries. Let's check the US, since that is where we live. I expect this trend will also be present in the US.

In [126]:
sns.lmplot(x='Year', y='United States', data=age_at_marriage,
           fit_reg=True,
           )
Out[126]:
<seaborn.axisgrid.FacetGrid at 0x1f5069bce10>

Indeed, this trend is markedly present in the US.

Create a plot of year vs. contraceptive prevalence across all countries. Again, this is to get a quick visual.

In [127]:
sns.lmplot(x='Year', y='MEAN', data=contraceptive_prevalence,
           fit_reg=True,
           )
Out[127]:
<seaborn.axisgrid.FacetGrid at 0x1f5069fde10>

We do see a weak trend in increased contraceptive prevalence as the years progress. It is not as strong as the previous trend on age at first marriage.

Since we live in the US, I am curious how this plot would look for the US alone. Create a plot of year vs. contraceptive prevalence for the US alone.

In [128]:
sns.lmplot(x='Year', y='United States', data=contraceptive_prevalence,
           fit_reg=True,
           )
Out[128]:
<seaborn.axisgrid.FacetGrid at 0x1f5069bc780>

This is telling us that contraceptive prevalence has slightly increased across time as a global average, so it's a positive correlation, but the correlation is weak.

For the US alone, the correlation is more apparent - as the years have gone by, contraceptive prevalence has increased in the US.

The next thing we want to do is check if there is a correlation between the mean age at first marriage vs. contraceptive prevalence. This is to confirm or disconfirm the initial hypothesis.

In [129]:
x = age_at_marriage['MEAN']
y = contraceptive_prevalence['MEAN']
In [130]:
plt.scatter(x, y, c='b', alpha=0.5)

plt.xlabel('Mean age at first marriage')
plt.ylabel('Contraceptive prevalence (% of women ages 15-49)')

plt.show()

Conclusions

It looks like contraceptive prevalence stays in a similar band of about 35-60% on average across all years and all countries, almost irrespective of the average age at first marriage. The correlation appears to be very weak, if any. My hypothesis was that as mean age at first marriage increased, so would contraceptive prevalence, but this appears to be false. I cannot conclude that there is a positive correlation between mean age at first marriage and contraceptive prevalence.

I would have more confidence in this analysis if there was less missing data. Both of the csv files had a lot of missing data.

Also, the most recent data in the set is from 2005, which is now 13 years ago; it is possible that age at first marriage and/or contraceptive prevalence has changed since then.

Lastly, the mean age at first marriage data was compiled by Gapminder using several sources, including their own estimates. According to Gapminder, "The data are based on multple sources and definitions might vary." This could indicate the data itself is unreliable or inaccurate. Gapminder themselves warn "We discourage the use of this dataset for statistical analysis." It may be enough to get a rough estimate here, but I would have more confidence in this analysis if we had better data to begin with.