
01:15:41
ok hthanks. the one that wasn't loading for me was "broomm" -- so do I need both?

01:15:51
No idea!

01:16:22
only broom

01:17:27
for those who want to try, here other possible solutions:devtools::install_github("tidymodels/broom")

01:17:46
Sorry, this_

01:17:47
install.packages("devtools")devtools::install_github("tidymodels/broom")

01:18:06
in case devtools is not already installed on your machine

01:20:10
it's saying there is a dependency I don't have called dplyr

01:20:21
i tried installing with pacman. any other advice?

01:20:27
(that didn't work)

01:20:42
you can install dplyr that and try again

01:21:16
sometimes the only solution is to install them individually

01:21:57
Looks good!

01:22:05
Welcome (officially) to everyone! I will be monitoring the chat while Anette presents

01:23:11
Will the PPT or documents be available after the training today?

01:23:30
I think they are already available on the webinar webpage

01:23:58
https://pdhp.isr.umich.edu/workshops/sequence-analysis-for-social-science/

01:24:15
but Paul might follow up on this

01:24:36
Yes, slides are already online and video will be posted shortly after the workshop. https://pdhp.isr.umich.edu/workshops/sequence-analysis-for-social-science/

01:27:45
Question: what is the relationship between sequence analysis and trajectory analysis?

01:28:52
Can you specify what do you mean by trajectory analysis?

01:29:05
Do you have a paper/book in mind?

01:30:31
I had the same question, but it relation to Latent Transition Analysis (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2846549/)

01:30:49
@Emanuela: group based trajectory models, e.g., work by Nagin and coauthors.

01:31:32
I see, here Anette is discussing this shortly, but the main difference is that here we are considering sequences of categorical states

01:32:25
^^ Thank you for this clarification.

01:32:41
Here you are!

01:34:41
And for the LTA question, also found this paper that directly compares the two: https://www.jstor.org/stable/23361038

01:35:46
Yes! that is a very important paper indeed to find out more about pros and cons of the two techniques - I would say that a lot depends on your research question

01:36:43
@Emanuela: one more question please. How many categorical states can the sequence analysis models handle. For example, if you had 100 states, would the models still be estimable (leaving interpretability aside for the moment)? I have only seen them applied for a small number of states.

01:36:46
Process sociology sounds like non-sequential in terms of temporality, focusing on changes and instants. How does this ontology still suit optimal matching methods, sequence analysis?

01:38:33
I’m also curious about the relationship between sequence analyses and Markov models. My understanding is that sequence analyses capture the sequence but does not attempt to model the probability of moving from one state to another. Is that right?

01:40:02
@YounPark : I am not sure I follow your argument here: at least in Abbott's understanding of processual sociology relates exactly to time as inherent dimension of social occurrences. Maybe you want to give me more details and I can try to address your question better?

01:41:46
Can you please provide the reference for the Abbott paper Prof. Fasang mentioned that critiques traditional regression models from a processual framework?

01:42:11
@Anette: thank you!

01:43:08
@PaulaFomby: here a couple of references you can find also in the outline of the course:

01:43:09
• Abbott, A. (1992). From causes to events: Notes on narrative positivism. Sociological Methods & Research, 20(4), 428–455. doi: 10.1177/0049124192020004002• Abbott, A. (1995). Sequence analysis: New methods for old ideas. Annual Review of Sociology, 21(1), 93–113. doi: 10.1146/annurev.so.21.080195.000521

01:43:24
^^Thanks!

01:44:16
In Abbott’s early work, he sees social processes as a sequence of events. But in his later work, he uses the term, the lineage of successive events and seems take a non-sequential view in temporality. Still confused, but that is my understanding so far.

01:44:23
Here some reference about combination of Markov models and SA:

01:44:26
https://arxiv.org/abs/1704.00543

01:44:39
https://library.oapen.org/bitstream/handle/20.500.12657/23137/1007017.pdf?sequence=1#page=191

01:45:22
What is the relations between sequence analysis and time series analysis? Is time series analysis a variable based or process based model?

01:46:33
@Emanuela: thank you! This is very useful.

01:47:08
@YounPark: I see your point. We will see later when talking about optimal matching that indeed the issue of "keeping" temporal order is a key issue that actually goes beyond the theoretical understanding of temporal processes as such. I suggest that I keep note on this and try to go back to this in the Q&A.

01:48:00
@Emanuela, thanks!

01:48:19
@ ChengmingHan: time series is variable based and actually does not aim at identify the "picture" of the unfolding of the temporal process

01:48:59
I see. Thank you!

02:01:40
Are there (rough) guidelines related to sample size/number of time points necessary to even think about SA?

02:05:17
@LauraTaylor: as this is descriptive technique, in theory no. However, as we will see later today, every step implies assumptions and in some cases the advantage represented by the technique does not justify the whole process. for example: if I have 2 states in my alphabet and 4 time points, probably I would reach a more efficient results just using summary indicators in a model. This is also driven by the fact that you are not very likely to find large variation in the clusters you extract from the initial sample.

02:06:16
I have myself used sequences of 8 time points and 4 states and got nice results that were over and above what I could get with a standard regression, but this very much depends on your data structure/the process

02:06:43
Thanks - it's just helpful to have a sense of what you all have worked with!

02:10:45
Is there an advantage to using R, other than the SAS Genetics module, other than the cost?

02:11:42
@BrandyRSinco: I do not know the SAS genetics module. I only know the STATA packages, and what I can say is that computation time in STATA is much much larger

02:11:51
Do you recommend to weight the sequence analysis ?gle

02:16:48
is there anyone here in the ISR building who is a pretty good Rstudio user? I'm still having trouble getting broom to install; I think I've gotten everything else done....

02:17:04
@PaulaFomby: here is another early paper by Abbott on Transcending general linear reality that appeared in Sociological Theory in 1988

02:17:06
https://www.jstor.org/stable/202114?seq=1#metadata_info_tab_contents

02:17:23
wondering if i could pop by someone's office and show them what Rstudio my laptop is telling me and try to troubleshoot

02:18:00
i'm up on the 5th floor

02:18:18
room 5080

02:18:21
@Brooke-- I don't use it, but I know that RStudio requires an admin login to install properly. Do you have an admin login to install the packages or programs you need? If not, I can connect remotely and work with you to see if we can get it to install correctly

02:19:00
I'd be glad to help you as well. I'm in ISR Thompson 2104.

02:19:09
broom is not so important for the hands on, you will be able to do 99% of the things without it

02:19:37
Can you explain a bit more why SA is a process-based method instead of a variable-based method? Sounds like we still focus on, e.g., one variable with different states (isn’t it variable-based also)?

02:21:03
Hi everyone! Great workshop! It is someone in the ssa community working on developing a user-friendly software for ssa that dont required to know R or Stata? greetings from Cuba

02:21:06
@NinaCastroMendez: the weights are a delicate issue in this context, I will say some words later

02:22:30
@DamianSantiago: you will see that with R it will be fairly simple to conduct the basic steps. I will tell more about this later. I do not know of any other softwares, I am sorry

02:23:10
@Chi-Lin Yu: with visualization this will be clearer, I hope - I will follow up on this

02:23:14
@Damian: the R package TraMineR does not necessarily require a lot of prior knowledge of R and is very good. It is certainly the easiest and most powerful tool out there and was developed by a team of statisticians and computer scientists in Geneva/Lausanne. The Stata module is convenient for Stata users but has far fewer possibilities for analysis and the code is often longer and more complicated, also computation time in Stata tends to be longer.

02:24:16
thanks Prof. Struffolino and Prof. Fasang!

02:27:19
So by weights do you mean giving some cases more importance than others?

02:28:31
@Andrew: yes exactly. We would try to give individuals who are more likely to drop out of a panel survey higher weights, for example.

02:30:35
Could you please share the link for the reduced version of the dataset that we'll use today?

02:34:19
how to analize many trajectory types? for example, family formation and some health trajectory

02:35:37
@Madalina: you should have received all the materials for today by email before I think? Emanuela will walk you through it when you open the data.

02:36:20
@Damian: This can be done with a method called multichannel sequence analysis that allows to analyse several parallel „channels“ or dimensions. We will briefly talk about these and other extensions at the end of the course.

02:36:43
Hey all, I've just had to update Rstudio, and now TraMineR isn't installing using pacman or install.packages. The latter says "object TraMineR not found"

02:36:53
Are the data included in the R and Studio install? I didn’t see a separate link to a data source.

02:38:15
Emanuela will show how to open the data and will walk through code in a little bit.

02:38:24
Ok thanks 🙂

02:38:27
got it. i actually figured it out.

02:42:07
how change of granularity functions? do you select the modal state when change month to year?

02:43:26
are there inferential tests over sequences?

02:43:44
@Damian: the modal state simply means the state that is most common at a given time point in a given sample. So it has nothing to do with the time granularity, except that the modal state can be different at each time point.

02:44:28
@Damian: https://journals.sagepub.com/doi/full/10.1177/0081175020959401

02:44:48
Liao, T. F., & Fasang, A. E. (2021). Comparing groups of life-course sequences using the bayesian information criterion and the likelihood-ratio test. Sociological Methodology, 51(1), 44-85

02:45:32
Inferential statistic is done in sequence analysis using bootstrap methods and then there is an adaptation of the BIC and LRT tests to compare groups of sequences that I posted above.

02:45:38
thanks!

02:46:29
Another link (colors) : https://www.colourlovers.com/palette/312352/

02:47:55
it is possible to use ssa for non temporal data, like repeated quali data? any ref?

02:49:36
@Damian: yes that is totally possible as long as the quality states have an order. this order would not have to be a temporal order. repeated qualitative data does have a temporal order even if it is not monthly or yearly if you have a 1st 2nd, 3rd time point.

02:56:16
where is this folder to download?

02:56:29
https://pdhp.isr.umich.edu/workshops/sequence-analysis-for-social-science/

02:56:44
i don't see it at that link

02:56:44
thanks Anette!

02:57:08
No lab materials

03:00:38
In your example, transition rates & numbers are different between monthly & yearly data. I was wondering if identified patterns could be different as well, depending on time intervals?

03:00:39
I will need a few minutes to post online.

03:01:54
Can you talk us about costs? What do you recommend us?

03:02:42
Thank you!

03:03:51
thank you!

03:04:51
Have you come across the work of David Clarke at the University of Nottingham? He works on sequence analysis but I think it’s more micro-level, for example modeling the sequence of elements of a traffic accident.

03:07:01
my help page looks very different

03:07:52
nevermind

03:09:22
We also have open access HRS life history data that could be used for sequence analysis (similar to SHARE, ELSA). For example, employment states, marital states, all within couples. https://hrs.isr.umich.edu/news/data-announcements/cross-wave-2015-2017-life-history-mail-survey-lhms-harmonized-and-aggregated

03:11:01
https://www.dropbox.com/sh/ttz1ikpz0fjb62q/AAD28ZKT5HokRZkWmLVUU3fXa?dl=0

03:12:46
still working

03:12:47
All good - thanks!

03:12:54
working on 2 computers so still need 2-3mins

03:13:14
Thanks for the link. It is working for me as well.

03:13:23
Done

03:15:03
how do i get my rstudio to show the right file?

03:15:07
issue with loading the packages can depend on your computer and necessary admin right to install things.

03:15:16
i didn't see what she did to get the right folder to show up in lower right

03:16:13
@Brooke: You can open the R script on the upper left?

03:16:25
ok iu;'ll try that

03:17:08
the lower right is an output window that shows things when you run the code. R does not show you the data set unless you request it to open the data. the data is kind of in the background and you run the commands in the upper left and get the output in the lower left.

03:17:20
it's working now. thanbks!

03:17:24
you can ignore the lower right for now.

03:20:21
family<--read_dta(C:/Courses/UM_Genetics_SocialScience/Data_01/PartnerBirthbio.dta). R is giving me the error message, Error: unexpected '/' in "family<--read_dta(C:/" Does any see a problem with my R code?

03:21:04
@Brandy: try read.dta with a . instead of _

03:21:27
Sometimes it is also about having back or forward slashes

03:22:57
I think your error message seems to be about the slashes or your path not the _

03:23:23
Is the code running for most people?

03:23:31
Yes here!

03:23:36
Yes

03:23:37
here too!

03:23:38
Yes it works !

03:23:42
good thanks!

03:23:43
yes, thanks!

03:24:44
family<--read.dta("C:/Courses/UM_Genetics_SocialScience/Data_01/PartnerBirthbio.dta")Error in read.dta("C:/Courses/UM_Genetics_SocialScience/Data_01/PartnerBirthbio.dta") :could not find function "read.dta" Any suggestions?

03:25:27
@Brandy it looks like your issue was you were missing quotes, not the _

03:25:40
FYI links to the materials are now added to the PDHP website as well.

03:25:42
https://pdhp.isr.umich.edu/workshops/sequence-analysis-for-social-science/

03:25:44
try first setting your working director with setwd("C:/Courses/UM_Genetics_SocialScience/Data_01/")

03:26:09
There is a link to the Dropbox and also to directly download as a zip file.

03:27:40
then <- read_dta(here("01_data", "PartnerBirthbio.dta"))

03:28:28
@Brandy: setwd("C:/Courses/UM_Genetics_SocialScience/")

03:28:49
family <- read_dta(here("01_data", "PartnerBirthbio.dta"))

03:31:05
R is telling that the function, read_dta, doesn't exist. What is the name of the package to install? I already tried install.packages("read_dta")?

03:31:27
"haven", ### read data stored in various formats

03:31:57
try install.packages("haven")

03:32:08
rio is another good import package

03:32:50
with the rio package the command

03:33:33
all set! the code all worked for me once I got the right file open. thank you!

03:33:34
Thanks AF. I was able to install haven. R will still not accept read_dta.

03:33:47
What about trying the menu: file->import dataset->from stata

03:35:13
try installing the rio package and then the command is simply <- import(„PartnerBirthbio.dta“)

03:35:30
Hi Brandy! try installing package “readstata13”

03:42:35
@Brandy, I had same problem as you and took Chengming's advice to import the file. The import window then showed this to be the code to bring in the data:

03:42:38
library(haven)PartnerBirthbio <- read_dta("01_data/PartnerBirthbio.dta")

03:43:51
this is a good solution, but then you will have to rename the data object with "family" as the whole code run with that - or change the example code.

03:44:08
Got it, thanks!

03:44:11
family<-PartnerBirthBio

03:47:50
I was able to install haven, but not readstata13. The error message was that RTools was required.

03:48:14
could you then use read_dta?

03:51:25
Thanks. read_dta is now working.

03:51:43
excellent!

04:07:21
How much means high and low costs?

04:08:36
What is you take on using transition rates?

04:08:53
your* :)

04:09:51
Thank you!

04:10:10
Tahnks!

04:10:13
thanks!

04:10:29
Is it possible that if I use number instead of alphabet letters, the results of OM would be different?

04:10:51
No, this should not happen.

04:11:29
you will have to specify in the seqdef command which categorical states the numbers refer to and then specify a cost setting.

04:11:43
👍Thank you!

04:11:59
so whether you use letters or numbers as codes will not matter for the results.

04:20:43
Does weights = Indep mean that the observations MUST be independent? That is, correlated observations shouldn't be used in those algorithms?

04:21:13
Clustering procedures are available in SAS in the STAT module, proc Cluster and proc FASTCLUS. The SAS/STAT module is part of the UM license, but the SAS GENETICS module is not. Personally, I would use R for the previous analyses because the procedures are not part of the UM SAS license, but will use the SAS procedures for cluster analysis.

04:22:27
Is there a way to find out if specific sequences influence the ASW heavily? (i.e. outliers)

04:23:04
Generally the individual sequences should be independent, but etc distances in the distance matrix are always dependent. so for example if you want to put bootstrap confidence intervals around mean distances, you need to sample from the initial pool of independent sequences and recalculate distances each time and cannot draw from one distance matrix in which the values are dependent.

04:23:41
thank you!

04:24:08
If a group that is of research interest has a relatively small sample size, can I go against the suggestion from cluster techniques and choose a larger number of clusters so that the group can be a cluster on its own?

04:24:09
@Brandy: yes, you can easily store the distance matrix from the optimal matching in R and input it into another program to do the clustering there. Matthias Studer coupled the weighted Custer package for R that has many easy functions and visualisation tools that work really well for cluster analysis after sequence analysis.

04:24:40
https://cran.r-project.org/web/packages/WeightedCluster/vignettes/WeightedCluster.pdf

04:24:42
when you visualize the quality criteria values, is there a way to make the legend smaller?

04:25:50
@Wenshan: yes you can do that based on substantive justifications - there are statistical criteria to chose the number of groups and then there is the idea of „construct validity“ that the groups resonate with a theoretically expected or meaningful typology. Emanuela will talk a bit about that

04:27:01
Thanks!

04:27:15
@Madalina: You can just plot the legend separately with the command <- seqlegend(„seq.data“) and delete the legend from the original graph with the command with.legend=„FALSE“

04:27:38
What can we do with the bad cluster (a significant amount of negative ASW) like cluster3?

04:27:46
perfect! thanks!

04:27:55
I often do this, because then I can specify in the seqlegend command exactly what I want my legend to look like and my graph is less cluttered by the standard legend.

04:28:57
Do you have any suggestion to associate the sequence analysis results to the historical, social and biographical time?

04:29:01
@Youn: good question. For descriptive purposes you can simply state that this is a very heterogeneous group, which can be substantively interesting.

04:31:38
Thank you!

04:37:53
@Emanuella, can you give me any articles/examples of the reallocation of clusters you just mentioned? That sounds interesting.

04:39:03
@YounPark, I only know about some that are still under review/ongoing. we'll ask later to Anette if she has a reference to this

04:39:29
@Emanuela, thanks!

04:41:01
As we wind down, when you get a chance, please take a minute to fill out our feedback form about the workshop https://pdhp.isr.umich.edu/workshops/workshop-feedback/

04:41:20
Jalovaara, Marika, and Anette Eva Fasang. "Family life courses, gender, and mid-life earnings." European sociological review 36, no. 2 (2020): 159-178.

04:42:40
This paper has an extended appendix on excluding low silhouettes when using the typology in a regression analysis

04:54:18
FYI full video of the workshop and the full chat will be posted on our website in the next few days. I'll e-mail out a notice when those go online.

05:09:20
What do you suggest… multinomial logistic regression or logistic regression for each category?

05:10:43
In terms of the role of theory in clustering, would it be acceptable if we include a minor cluster - theoretically important but technically not captured in among 4-5 clusters?

05:11:39
Thank you!

05:11:51
thank you so much!

05:12:14
Thank you! This was super informative and helpful. (I was happy that this was online, now I could join from the Netherlands :)!)

05:12:16
Thank you!

05:12:17
Thank you so much! This workshop is super helpful!

05:12:20
Thanks!

05:12:24
thank you!

05:12:27
thank you!!

05:12:28
Thank you very much!

05:12:30
Thank you so much! Learned a lot!

05:12:34
Thank you!

05:12:36
Many thanks!

05:12:38
Really interesting, thanks!

05:12:40
Thank you!!!!

05:12:44
Thank you - hope to see you in Berlin too!