Logo

February 2022 PDHP workshop - Shared screen with speaker view
Brooke Helppie-McFall
01:15:41
ok hthanks. the one that wasn't loading for me was "broomm" -- so do I need both?
Andrew
01:15:51
No idea!
Emanuela Struffolino
01:16:22
only broom
Emanuela Struffolino
01:17:27
for those who want to try, here other possible solutions:devtools::install_github("tidymodels/broom")
Emanuela Struffolino
01:17:46
Sorry, this_
Emanuela Struffolino
01:17:47
install.packages("devtools")devtools::install_github("tidymodels/broom")
Emanuela Struffolino
01:18:06
in case devtools is not already installed on your machine
Brooke Helppie-McFall
01:20:10
it's saying there is a dependency I don't have called dplyr
Brooke Helppie-McFall
01:20:21
i tried installing with pacman. any other advice?
Brooke Helppie-McFall
01:20:27
(that didn't work)
Emanuela Struffolino
01:20:42
you can install dplyr that and try again
Emanuela Struffolino
01:21:16
sometimes the only solution is to install them individually
Elizabeth Eve Bruch
01:21:57
Looks good!
Emanuela Struffolino
01:22:05
Welcome (officially) to everyone! I will be monitoring the chat while Anette presents
Zhaohui Fan
01:23:11
Will the PPT or documents be available after the training today?
Emanuela Struffolino
01:23:30
I think they are already available on the webinar webpage
Emanuela Struffolino
01:23:58
https://pdhp.isr.umich.edu/workshops/sequence-analysis-for-social-science/
Emanuela Struffolino
01:24:15
but Paul might follow up on this
Paul Chapin Schulz
01:24:36
Yes, slides are already online and video will be posted shortly after the workshop. https://pdhp.isr.umich.edu/workshops/sequence-analysis-for-social-science/
Elizabeth Eve Bruch
01:27:45
Question: what is the relationship between sequence analysis and trajectory analysis?
Emanuela Struffolino
01:28:52
Can you specify what do you mean by trajectory analysis?
Emanuela Struffolino
01:29:05
Do you have a paper/book in mind?
Laura Taylor
01:30:31
I had the same question, but it relation to Latent Transition Analysis (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2846549/)
Elizabeth Eve Bruch
01:30:49
@Emanuela: group based trajectory models, e.g., work by Nagin and coauthors.
Emanuela Struffolino
01:31:32
I see, here Anette is discussing this shortly, but the main difference is that here we are considering sequences of categorical states
Elizabeth Eve Bruch
01:32:25
^^ Thank you for this clarification.
Emanuela Struffolino
01:32:41
Here you are!
Laura Taylor
01:34:41
And for the LTA question, also found this paper that directly compares the two: https://www.jstor.org/stable/23361038
Emanuela Struffolino
01:35:46
Yes! that is a very important paper indeed to find out more about pros and cons of the two techniques - I would say that a lot depends on your research question
Elizabeth Eve Bruch
01:36:43
@Emanuela: one more question please. How many categorical states can the sequence analysis models handle. For example, if you had 100 states, would the models still be estimable (leaving interpretability aside for the moment)? I have only seen them applied for a small number of states.
Youn Park
01:36:46
Process sociology sounds like non-sequential in terms of temporality, focusing on changes and instants. How does this ontology still suit optimal matching methods, sequence analysis?
Elizabeth Eve Bruch
01:38:33
I’m also curious about the relationship between sequence analyses and Markov models. My understanding is that sequence analyses capture the sequence but does not attempt to model the probability of moving from one state to another. Is that right?
Emanuela Struffolino
01:40:02
@YounPark : I am not sure I follow your argument here: at least in Abbott's understanding of processual sociology relates exactly to time as inherent dimension of social occurrences. Maybe you want to give me more details and I can try to address your question better?
Paula Fomby
01:41:46
Can you please provide the reference for the Abbott paper Prof. Fasang mentioned that critiques traditional regression models from a processual framework?
Elizabeth Eve Bruch
01:42:11
@Anette: thank you!
Emanuela Struffolino
01:43:08
@PaulaFomby: here a couple of references you can find also in the outline of the course:
Emanuela Struffolino
01:43:09
• Abbott, A. (1992). From causes to events: Notes on narrative positivism. Sociological Methods & Research, 20(4), 428–455. doi: 10.1177/0049124192020004002• Abbott, A. (1995). Sequence analysis: New methods for old ideas. Annual Review of Sociology, 21(1), 93–113. doi: 10.1146/annurev.so.21.080195.000521
Paula Fomby
01:43:24
^^Thanks!
Youn Park
01:44:16
In Abbott’s early work, he sees social processes as a sequence of events. But in his later work, he uses the term, the lineage of successive events and seems take a non-sequential view in temporality. Still confused, but that is my understanding so far.
Emanuela Struffolino
01:44:23
Here some reference about combination of Markov models and SA:
Emanuela Struffolino
01:44:26
https://arxiv.org/abs/1704.00543
Emanuela Struffolino
01:44:39
https://library.oapen.org/bitstream/handle/20.500.12657/23137/1007017.pdf?sequence=1#page=191
Chengming Han
01:45:22
What is the relations between sequence analysis and time series analysis? Is time series analysis a variable based or process based model?
Elizabeth Eve Bruch
01:46:33
@Emanuela: thank you! This is very useful.
Emanuela Struffolino
01:47:08
@YounPark: I see your point. We will see later when talking about optimal matching that indeed the issue of "keeping" temporal order is a key issue that actually goes beyond the theoretical understanding of temporal processes as such. I suggest that I keep note on this and try to go back to this in the Q&A.
Youn Park
01:48:00
@Emanuela, thanks!
Emanuela Struffolino
01:48:19
@ ChengmingHan: time series is variable based and actually does not aim at identify the "picture" of the unfolding of the temporal process
Chengming Han
01:48:59
I see. Thank you!
Laura Taylor
02:01:40
Are there (rough) guidelines related to sample size/number of time points necessary to even think about SA?
Emanuela Struffolino
02:05:17
@LauraTaylor: as this is descriptive technique, in theory no. However, as we will see later today, every step implies assumptions and in some cases the advantage represented by the technique does not justify the whole process. for example: if I have 2 states in my alphabet and 4 time points, probably I would reach a more efficient results just using summary indicators in a model. This is also driven by the fact that you are not very likely to find large variation in the clusters you extract from the initial sample.
Emanuela Struffolino
02:06:16
I have myself used sequences of 8 time points and 4 states and got nice results that were over and above what I could get with a standard regression, but this very much depends on your data structure/the process
Laura Taylor
02:06:43
Thanks - it's just helpful to have a sense of what you all have worked with!
Brandy R Sinco
02:10:45
Is there an advantage to using R, other than the SAS Genetics module, other than the cost?
Emanuela Struffolino
02:11:42
@BrandyRSinco: I do not know the SAS genetics module. I only know the STATA packages, and what I can say is that computation time in STATA is much much larger
NINA CASTRO MENDEZ
02:11:51
Do you recommend to weight the sequence analysis ?gle
Brooke Helppie-McFall
02:16:48
is there anyone here in the ISR building who is a pretty good Rstudio user? I'm still having trouble getting broom to install; I think I've gotten everything else done....
Prof. Dr. Anette Éva Fasang
02:17:04
@PaulaFomby: here is another early paper by Abbott on Transcending general linear reality that appeared in Sociological Theory in 1988
Prof. Dr. Anette Éva Fasang
02:17:06
https://www.jstor.org/stable/202114?seq=1#metadata_info_tab_contents
Brooke Helppie-McFall
02:17:23
wondering if i could pop by someone's office and show them what Rstudio my laptop is telling me and try to troubleshoot
Simon Brauer
02:18:00
i'm up on the 5th floor
Simon Brauer
02:18:18
room 5080
Frank (Zoom support 734.277.3061)
02:18:21
@Brooke-- I don't use it, but I know that RStudio requires an admin login to install properly. Do you have an admin login to install the packages or programs you need? If not, I can connect remotely and work with you to see if we can get it to install correctly
Huchen Liu
02:19:00
I'd be glad to help you as well. I'm in ISR Thompson 2104.
Emanuela Struffolino
02:19:09
broom is not so important for the hands on, you will be able to do 99% of the things without it
Chi-Lin Yu
02:19:37
Can you explain a bit more why SA is a process-based method instead of a variable-based method? Sounds like we still focus on, e.g., one variable with different states (isn’t it variable-based also)?
Damian Santiago
02:21:03
Hi everyone! Great workshop! It is someone in the ssa community working on developing a user-friendly software for ssa that dont required to know R or Stata? greetings from Cuba
Emanuela Struffolino
02:21:06
@NinaCastroMendez: the weights are a delicate issue in this context, I will say some words later
Emanuela Struffolino
02:22:30
@DamianSantiago: you will see that with R it will be fairly simple to conduct the basic steps. I will tell more about this later. I do not know of any other softwares, I am sorry
Emanuela Struffolino
02:23:10
@Chi-Lin Yu: with visualization this will be clearer, I hope - I will follow up on this
Prof. Dr. Anette Éva Fasang
02:23:14
@Damian: the R package TraMineR does not necessarily require a lot of prior knowledge of R and is very good. It is certainly the easiest and most powerful tool out there and was developed by a team of statisticians and computer scientists in Geneva/Lausanne. The Stata module is convenient for Stata users but has far fewer possibilities for analysis and the code is often longer and more complicated, also computation time in Stata tends to be longer.
Damian Santiago
02:24:16
thanks Prof. Struffolino and Prof. Fasang!
Andrew
02:27:19
So by weights do you mean giving some cases more importance than others?
Anette Fasang
02:28:31
@Andrew: yes exactly. We would try to give individuals who are more likely to drop out of a panel survey higher weights, for example.
Mădălina Manoilă
02:30:35
Could you please share the link for the reduced version of the dataset that we'll use today?
Damian Santiago
02:34:19
how to analize many trajectory types? for example, family formation and some health trajectory
Anette Fasang
02:35:37
@Madalina: you should have received all the materials for today by email before I think? Emanuela will walk you through it when you open the data.
Anette Fasang
02:36:20
@Damian: This can be done with a method called multichannel sequence analysis that allows to analyse several parallel „channels“ or dimensions. We will briefly talk about these and other extensions at the end of the course.
Brooke Helppie-McFall
02:36:43
Hey all, I've just had to update Rstudio, and now TraMineR isn't installing using pacman or install.packages. The latter says "object TraMineR not found"
Andrew
02:36:53
Are the data included in the R and Studio install? I didn’t see a separate link to a data source.
Anette Fasang
02:38:15
Emanuela will show how to open the data and will walk through code in a little bit.
Andrew
02:38:24
Ok thanks 🙂
Brooke Helppie-McFall
02:38:27
got it. i actually figured it out.
Damian Santiago
02:42:07
how change of granularity functions? do you select the modal state when change month to year?
Damian Santiago
02:43:26
are there inferential tests over sequences?
Anette Fasang
02:43:44
@Damian: the modal state simply means the state that is most common at a given time point in a given sample. So it has nothing to do with the time granularity, except that the modal state can be different at each time point.
Anette Fasang
02:44:28
@Damian: https://journals.sagepub.com/doi/full/10.1177/0081175020959401
Anette Fasang
02:44:48
Liao, T. F., & Fasang, A. E. (2021). Comparing groups of life-course sequences using the bayesian information criterion and the likelihood-ratio test. Sociological Methodology, 51(1), 44-85
Anette Fasang
02:45:32
Inferential statistic is done in sequence analysis using bootstrap methods and then there is an adaptation of the BIC and LRT tests to compare groups of sequences that I posted above.
Damian Santiago
02:45:38
thanks!
Rees Alice
02:46:29
Another link (colors) : https://www.colourlovers.com/palette/312352/
Damian Santiago
02:47:55
it is possible to use ssa for non temporal data, like repeated quali data? any ref?
Anette Fasang
02:49:36
@Damian: yes that is totally possible as long as the quality states have an order. this order would not have to be a temporal order. repeated qualitative data does have a temporal order even if it is not monthly or yearly if you have a 1st 2nd, 3rd time point.
Brooke Helppie-McFall
02:56:16
where is this folder to download?
Anette Fasang
02:56:29
https://pdhp.isr.umich.edu/workshops/sequence-analysis-for-social-science/
Brooke Helppie-McFall
02:56:44
i don't see it at that link
Elena Povedano
02:56:44
thanks Anette!
Carol F Scott
02:57:08
No lab materials
Youn Park
03:00:38
In your example, transition rates & numbers are different between monthly & yearly data. I was wondering if identified patterns could be different as well, depending on time intervals?
Paul Chapin Schulz
03:00:39
I will need a few minutes to post online.
NINA CASTRO MENDEZ
03:01:54
Can you talk us about costs? What do you recommend us?
Youn Park
03:02:42
Thank you!
NINA CASTRO MENDEZ
03:03:51
thank you!
Andrew
03:04:51
Have you come across the work of David Clarke at the University of Nottingham? He works on sequence analysis but I think it’s more micro-level, for example modeling the sequence of elements of a traffic accident.
Brooke Helppie-McFall
03:07:01
my help page looks very different
Brooke Helppie-McFall
03:07:52
nevermind
Jacqui Smith
03:09:22
We also have open access HRS life history data that could be used for sequence analysis (similar to SHARE, ELSA). For example, employment states, marital states, all within couples. https://hrs.isr.umich.edu/news/data-announcements/cross-wave-2015-2017-life-history-mail-survey-lhms-harmonized-and-aggregated
Emanuela Struffolino
03:11:01
https://www.dropbox.com/sh/ttz1ikpz0fjb62q/AAD28ZKT5HokRZkWmLVUU3fXa?dl=0
Brooke Helppie-McFall
03:12:46
still working
Paula Tufiș
03:12:47
All good - thanks!
Brooke Helppie-McFall
03:12:54
working on 2 computers so still need 2-3mins
Paul Chapin Schulz
03:13:14
Thanks for the link. It is working for me as well.
Andrew
03:13:23
Done
Brooke Helppie-McFall
03:15:03
how do i get my rstudio to show the right file?
Anette Fasang
03:15:07
issue with loading the packages can depend on your computer and necessary admin right to install things.
Brooke Helppie-McFall
03:15:16
i didn't see what she did to get the right folder to show up in lower right
Anette Fasang
03:16:13
@Brooke: You can open the R script on the upper left?
Brooke Helppie-McFall
03:16:25
ok iu;'ll try that
Anette Fasang
03:17:08
the lower right is an output window that shows things when you run the code. R does not show you the data set unless you request it to open the data. the data is kind of in the background and you run the commands in the upper left and get the output in the lower left.
Brooke Helppie-McFall
03:17:20
it's working now. thanbks!
Anette Fasang
03:17:24
you can ignore the lower right for now.
Brandy R Sinco
03:20:21
family<--read_dta(C:/Courses/UM_Genetics_SocialScience/Data_01/PartnerBirthbio.dta). R is giving me the error message, Error: unexpected '/' in "family<--read_dta(C:/" Does any see a problem with my R code?
Anette Fasang
03:21:04
@Brandy: try read.dta with a . instead of _
Anette Fasang
03:21:27
Sometimes it is also about having back or forward slashes
Anette Fasang
03:22:57
I think your error message seems to be about the slashes or your path not the _
Anette Fasang
03:23:23
Is the code running for most people?
Ignacio Bórquez
03:23:31
Yes here!
Eva Vasiljevic
03:23:36
Yes
Elena Povedano
03:23:37
here too!
Dolly Loomans
03:23:38
Yes it works !
Anette Fasang
03:23:42
good thanks!
Mădălina Manoilă
03:23:43
yes, thanks!
Brandy R Sinco
03:24:44
family<--read.dta("C:/Courses/UM_Genetics_SocialScience/Data_01/PartnerBirthbio.dta")Error in read.dta("C:/Courses/UM_Genetics_SocialScience/Data_01/PartnerBirthbio.dta") :could not find function "read.dta" Any suggestions?
Simon Brauer
03:25:27
@Brandy it looks like your issue was you were missing quotes, not the _
Paul Chapin Schulz
03:25:40
FYI links to the materials are now added to the PDHP website as well.
Paul Chapin Schulz
03:25:42
https://pdhp.isr.umich.edu/workshops/sequence-analysis-for-social-science/
Anette Fasang
03:25:44
try first setting your working director with setwd("C:/Courses/UM_Genetics_SocialScience/Data_01/")
Paul Chapin Schulz
03:26:09
There is a link to the Dropbox and also to directly download as a zip file.
Anette Fasang
03:27:40
then <- read_dta(here("01_data", "PartnerBirthbio.dta"))
Anette Fasang
03:28:28
@Brandy: setwd("C:/Courses/UM_Genetics_SocialScience/")
Anette Fasang
03:28:49
family <- read_dta(here("01_data", "PartnerBirthbio.dta"))
Brandy R Sinco
03:31:05
R is telling that the function, read_dta, doesn't exist. What is the name of the package to install? I already tried install.packages("read_dta")?
Anette Fasang
03:31:27
"haven", ### read data stored in various formats
Anette Fasang
03:31:57
try install.packages("haven")
Anette Fasang
03:32:08
rio is another good import package
Anette Fasang
03:32:50
with the rio package the command
Brooke Helppie-McFall
03:33:33
all set! the code all worked for me once I got the right file open. thank you!
Brandy R Sinco
03:33:34
Thanks AF. I was able to install haven. R will still not accept read_dta.
Chengming Han
03:33:47
What about trying the menu: file->import dataset->from stata
Anette Fasang
03:35:13
try installing the rio package and then the command is simply <- import(„PartnerBirthbio.dta“)
Elena Povedano
03:35:30
Hi Brandy! try installing package “readstata13”
Paula Fomby
03:42:35
@Brandy, I had same problem as you and took Chengming's advice to import the file. The import window then showed this to be the code to bring in the data:
Paula Fomby
03:42:38
library(haven)PartnerBirthbio <- read_dta("01_data/PartnerBirthbio.dta")
Emanuela Struffolino
03:43:51
this is a good solution, but then you will have to rename the data object with "family" as the whole code run with that - or change the example code.
Paula Fomby
03:44:08
Got it, thanks!
Emanuela Struffolino
03:44:11
family<-PartnerBirthBio
Brandy R Sinco
03:47:50
I was able to install haven, but not readstata13. The error message was that RTools was required.
Emanuela Struffolino
03:48:14
could you then use read_dta?
Brandy R Sinco
03:51:25
Thanks. read_dta is now working.
Emanuela Struffolino
03:51:43
excellent!
NINA CASTRO MENDEZ
04:07:21
How much means high and low costs?
Mădălina Manoilă
04:08:36
What is you take on using transition rates?
Mădălina Manoilă
04:08:53
your* :)
Mădălina Manoilă
04:09:51
Thank you!
NINA CASTRO MENDEZ
04:10:10
Tahnks!
NINA CASTRO MENDEZ
04:10:13
thanks!
Chengming Han
04:10:29
Is it possible that if I use number instead of alphabet letters, the results of OM would be different?
Anette Fasang
04:10:51
No, this should not happen.
Anette Fasang
04:11:29
you will have to specify in the seqdef command which categorical states the numbers refer to and then specify a cost setting.
Chengming Han
04:11:43
👍Thank you!
Anette Fasang
04:11:59
so whether you use letters or numbers as codes will not matter for the results.
Kyle Abraham Campbell
04:20:43
Does weights = Indep mean that the observations MUST be independent? That is, correlated observations shouldn't be used in those algorithms?
Brandy R Sinco
04:21:13
Clustering procedures are available in SAS in the STAT module, proc Cluster and proc FASTCLUS. The SAS/STAT module is part of the UM license, but the SAS GENETICS module is not. Personally, I would use R for the previous analyses because the procedures are not part of the UM SAS license, but will use the SAS procedures for cluster analysis.
Dolly Loomans
04:22:27
Is there a way to find out if specific sequences influence the ASW heavily? (i.e. outliers)
Anette Fasang
04:23:04
Generally the individual sequences should be independent, but etc distances in the distance matrix are always dependent. so for example if you want to put bootstrap confidence intervals around mean distances, you need to sample from the initial pool of independent sequences and recalculate distances each time and cannot draw from one distance matrix in which the values are dependent.
Dolly Loomans
04:23:41
thank you!
Wenshan Yu
04:24:08
If a group that is of research interest has a relatively small sample size, can I go against the suggestion from cluster techniques and choose a larger number of clusters so that the group can be a cluster on its own?
Anette Fasang
04:24:09
@Brandy: yes, you can easily store the distance matrix from the optimal matching in R and input it into another program to do the clustering there. Matthias Studer coupled the weighted Custer package for R that has many easy functions and visualisation tools that work really well for cluster analysis after sequence analysis.
Anette Fasang
04:24:40
https://cran.r-project.org/web/packages/WeightedCluster/vignettes/WeightedCluster.pdf
Mădălina Manoilă
04:24:42
when you visualize the quality criteria values, is there a way to make the legend smaller?
Anette Fasang
04:25:50
@Wenshan: yes you can do that based on substantive justifications - there are statistical criteria to chose the number of groups and then there is the idea of „construct validity“ that the groups resonate with a theoretically expected or meaningful typology. Emanuela will talk a bit about that
Wenshan Yu
04:27:01
Thanks!
Anette Fasang
04:27:15
@Madalina: You can just plot the legend separately with the command <- seqlegend(„seq.data“) and delete the legend from the original graph with the command with.legend=„FALSE“
Youn Park
04:27:38
What can we do with the bad cluster (a significant amount of negative ASW) like cluster3?
Mădălina Manoilă
04:27:46
perfect! thanks!
Anette Fasang
04:27:55
I often do this, because then I can specify in the seqlegend command exactly what I want my legend to look like and my graph is less cluttered by the standard legend.
NINA CASTRO MENDEZ
04:28:57
Do you have any suggestion to associate the sequence analysis results to the historical, social and biographical time?
Anette Fasang
04:29:01
@Youn: good question. For descriptive purposes you can simply state that this is a very heterogeneous group, which can be substantively interesting.
Youn Park
04:31:38
Thank you!
Youn Park
04:37:53
@Emanuella, can you give me any articles/examples of the reallocation of clusters you just mentioned? That sounds interesting.
Emanuela Struffolino
04:39:03
@YounPark, I only know about some that are still under review/ongoing. we'll ask later to Anette if she has a reference to this
Youn Park
04:39:29
@Emanuela, thanks!
Paul Chapin Schulz
04:41:01
As we wind down, when you get a chance, please take a minute to fill out our feedback form about the workshop https://pdhp.isr.umich.edu/workshops/workshop-feedback/
Anette Fasang
04:41:20
Jalovaara, Marika, and Anette Eva Fasang. "Family life courses, gender, and mid-life earnings." European sociological review 36, no. 2 (2020): 159-178.
Anette Fasang
04:42:40
This paper has an extended appendix on excluding low silhouettes when using the typology in a regression analysis
Paul Chapin Schulz
04:54:18
FYI full video of the workshop and the full chat will be posted on our website in the next few days. I'll e-mail out a notice when those go online.
NINA CASTRO MENDEZ
05:09:20
What do you suggest… multinomial logistic regression or logistic regression for each category?
Youn Park
05:10:43
In terms of the role of theory in clustering, would it be acceptable if we include a minor cluster - theoretically important but technically not captured in among 4-5 clusters?
Youn Park
05:11:39
Thank you!
Brooke Helppie-McFall
05:11:51
thank you so much!
Dolly Loomans
05:12:14
Thank you! This was super informative and helpful. (I was happy that this was online, now I could join from the Netherlands :)!)
Chengming Han
05:12:16
Thank you!
Yongxin Shang
05:12:17
Thank you so much! This workshop is super helpful!
Yajuan Si
05:12:20
Thanks!
Connie Hsiung
05:12:24
thank you!
Elena Povedano
05:12:27
thank you!!
NINA CASTRO MENDEZ
05:12:28
Thank you very much!
Youn Park
05:12:30
Thank you so much! Learned a lot!
Frank (Zoom support 734.277.3061)
05:12:34
Thank you!
Mădălina Manoilă
05:12:36
Many thanks!
Andrew
05:12:38
Really interesting, thanks!
Julio
05:12:40
Thank you!!!!
Jacqui Smith
05:12:44
Thank you - hope to see you in Berlin too!