I have an Data Science project where I attempt to predict the likelihood that a name is gender neutral. I have a logistic regression in my project but I need another algorithm and I cannot decide if I can use Random Forest or Naive Bayes or any other one. I need to demonstrate another algorithm, two more if possible. Code is in R.

install.packages(‘devtools’)

library(devtools)

library(babynames)

library(dplyr)

library(tidyr)

data(babynames)

neutral_names <- babynames %>%

select(-prop) %>%

#filter only names between years 1930 and 2012

filter(year >= 1930, year <= 2012) %>%

#get the number of female and male for each name per year

spread(key = sex, value = n, fill = 0) %>%

#Calculate the measure of gender-neutrality

mutate(prop_F = 100 * F / (F+M), se = (50 – prop_F)^2) %>%

group_by(name) %>%

#per name, find the total number of babies and measure of gender-neutrality

summarise(n = n(), female = sum(F), male=sum(M), total = sum(F + M),

mse = mean(se)) %>%

#take only names that occurs every year and occurs greater than 9000 times

filter(n == 83, total > 9000) %>%

#sort by gender neutrality

arrange(mse) %>%

#get only the top 10