# Exercise 1 with solution - New York City /r/ with Rbrul - GoldVarb-esque logistic regression with categorical predictor
#
# Start R.
# Load Rbrul.
source("http://www.danielezrajohnson.com/Rbrul.R")
# Start Rbrul.
rbrul()
#
# Open an Internet browser.
# Go to "http://www.danielezrajohnson.com/ds.csv".
# Save page on Desktop or in other directory as "ds.csv".
# From the Rbrul main menu, choose "1" to "load/save data".
1
# If prompted to save current data, press "Enter" for "No".
# Navigate to Desktop or other directory.
[navigate]
# Enter the number to the left of the "ds.csv" file.
[number]
# "What separates the columns?" Enter "c" for commas.
c
# The MAIN MENU should appear again, with "current data structure" as 
follows:
# Current data structure:
# r (integer with 2 values): 1 0 
# store (factor with 3 values): Saks Macy's Klein's 
# emphasis (factor with 2 values): normal emphatic 
# word (factor with 2 values): fouRth flooR 
#
# This is the Labov "fourth floor" department store data collected in 
1962.
#
# Question 1: which of the factor groups has a statistically-significant 
effect on the use of /r/, and which of their factors favor and disfavor 
/r/, and to what extent?
#
# Enter "5-modeling" and the MODELING MENU will open.
5
# Enter "1" to "choose variables".
1
# Enter "1" to choose "r" as the response, or dependent variable.
1
# Press "Enter" as "r" is a binary response.

# Enter "2" to choose "1" (presence of /r/) as the application value.
1
# Enter "2", "3", "4" to choose "store", "emphasis", and "word" as the 
potential predictors.
2
3
4

# "Are any of these predictors continuous?" The answer is no, they are all 
categorical/factors, so press "Enter".

# "Any grouping factors (random effects)?" The answer is no, these are all 
ordinary fixed effects, so press "Enter".

# "Consider an/another pairwise interaction between predictors?" Why not? Let's 
consider the possible interaction of any pair of the predictors. So enter 
"2", "3", "2", "4", "3", "4".
2
3
2
4
3
4

#
# The MODELING MENU should now show, under "Current variables are:"
# response.binary: r (1 vs. 0)
# fixed.factor: store emphasis word
# fixed.interaction: store:emphasis store:word emphasis:word
#
# Enter "5" for "step-up/step-down".
5
# A long output results. The most relevant part of the output, near the 
end, is:
#
# BEST STEP-UP MODEL WAS WITH store (1.08e-18) + word (8.18e-09) [A]
#
# STEP-UP AND STEP-DOWN MATCH!
#
# STEPPING DOWN:
#
# $store
#  factor logodds tokens 1/1+0 centered weight
#    Saks   0.900    177 0.475           0.711
#  Macy's   0.436    336 0.372           0.607
# Klein's  -1.337    216 0.097           0.208
#
# $word
# factor logodds tokens 1/1+0 centered weight
#  flooR   0.493    347 0.412           0.621
# fouRth  -0.493    382 0.228           0.379
#
# Both in step-up and step-down, the groups selected as significant are 
STORE and WORD. The significance levels are shown in parentheses (For 
example, 1.08e-18 for STORE means p = 1.08 x 10 raised to the power of 
-18).
#
# EMPHASIS is not selected, since its significance is only 0.07 (this 
figure can be seen at step-up Run 6 or step-down Run 9).
#
# The only interaction considered in the step-up model is STORE:WORD, 
after STORE and WORD are individually added. This interaction is not 
significant (p = 0.34).
# In the step-down model, all the interaction effects are dropped first, 
as well as the EMPHASIS main effect.
#
# We see the same output as from GoldVarb in Exercise 1, in the "centered 
weight" column. For STORE, Saks strongly favors /r/ (.711), Macy's mildly 
favors /r/ (.607), and Klein's strongly disfavors /r/ (0.208). Within 
WORD, the unstressed "fourth" disfavors /r/ (0.379), while stressed 
"floor" favors /r/ (0.621).
#
# The Rbrul output also presents results in log-odds, and shows token 
numbers and response proportions in the same table as the factor weights.
#
# Enter "9" to return to the main menu.
9