Crystal Lewis Profile Banner
Crystal Lewis Profile
Crystal Lewis

@Cghlewis

3,475
Followers
1,936
Following
623
Media
4,788
Statuses

Research Data Management Consultant Co-organizer @rladiesstl & @poWOMENer Data Mgmt Hub Author of Data Management in Large-Scale Education Research

St Louis, MO
Joined April 2018
Don't wanna be here? Send us removal request.
Pinned Tweet
@Cghlewis
Crystal Lewis
2 months
Today is the day! Data Management in Large-Scale Education Research is officially out in stores and online! @CRCPress The open access version will remain freely available as well here!
Tweet media one
Tweet media two
11
57
197
@Cghlewis
Crystal Lewis
26 days
When someone tells me they think a dataset will be quick to clean.
Tweet media one
13
242
2K
@Cghlewis
Crystal Lewis
4 months
When I receive a new dataset to clean and I’m looking to see what I’ve gotten myself into.
Tweet media one
7
209
2K
@Cghlewis
Crystal Lewis
3 months
Tweet media one
14
188
2K
@Cghlewis
Crystal Lewis
3 months
Just a reminder if you have some messy #rstats code, the {styler} package addins can quickly clean it up according to the tidyverse style guide rules.
Tweet media one
20
143
1K
@Cghlewis
Crystal Lewis
11 days
Me over here still using RStudio instead of Positron, RMarkdown instead of Quarto, and %>% instead of |>. 😅
35
45
622
@Cghlewis
Crystal Lewis
5 months
For anyone who is interested, the slides from this training session can be found here:
@Cghlewis
Crystal Lewis
5 months
That was a lot of fun! #NCME2024
Tweet media one
4
8
144
10
115
613
@Cghlewis
Crystal Lewis
1 year
University professors - please start teaching research data management in your research methods courses. Thank you for coming to my talk.
19
53
529
@Cghlewis
Crystal Lewis
2 months
dplyr::anti_join(), always there to help me quickly figure out who is missing from a dataset.🌟 #rstats
Tweet media one
5
46
498
@Cghlewis
Crystal Lewis
2 years
For 2 years I have worked on this #rstats resource. It's a collection of functions used to wrangle data, especially in the field of ed research. Functions are organized by task (such as naming variables), and examples of how to use functions are provided.
9
98
456
@Cghlewis
Crystal Lewis
3 years
Over the years, I think the reason I have slowly pivoted my focus from data analysis to data managment is b/c I've seen so much mismanagement of data, in every sector, that I am concerned how many findings are based off bad data. Not bad analysis, but incorrect data. (1/2)
16
45
405
@Cghlewis
Crystal Lewis
1 month
Looking for a standardized list of steps for cleaning data for the purposes of data sharing? I've got a checklist for you! Each step explained in detail here:
Tweet media one
5
62
410
@Cghlewis
Crystal Lewis
2 years
✍️New blog post about what is arguably one of the most important pieces of documentation: A data dictionary.
Tweet media one
8
72
386
@Cghlewis
Crystal Lewis
3 years
After losing my mind for a while, I finally figured out how to reference variables in a function using {{ }}, and how to create dynamic variable names using := 😳 #rstats
Tweet media one
11
29
383
@Cghlewis
Crystal Lewis
2 years
I have never used the lubridate::parse_date_time() function until now. How awesome is this? #rstats
Tweet media one
11
53
364
@Cghlewis
Crystal Lewis
1 month
Tweet media one
5
32
364
@Cghlewis
Crystal Lewis
1 year
I swear, rainbow parentheses is one of the best gifts RStudio has given us. #rstats
Tweet media one
10
29
336
@Cghlewis
Crystal Lewis
2 months
I received physical copies of my book and it's difficult to contain my excitement! I spent 2.5 years researching, writing, and revising this content and I'm so happy with how it turned out. Currently available for pre-order! Ships after July 9!
Tweet media one
14
30
314
@Cghlewis
Crystal Lewis
2 years
I had no idea the #rstats "todor" Addin existed for RStudio. Thank you to @charliejhadley for pointing this out. I can finally easily find my future notes to myself!
Tweet media one
9
52
310
@Cghlewis
Crystal Lewis
2 years
@jacasiegel I would recommend @rfortherest courses. They are super beginner friendly.
4
19
311
@Cghlewis
Crystal Lewis
29 days
Codebooks are excellent summary tools for both reviewing and documenting a dataset. There are several #rstats packages that can quickly create codebooks from existing datasets. Learn about a few of them here.
2
59
292
@Cghlewis
Crystal Lewis
2 months
Because I'm so excited about the official release of my book tomorrow, I wanted to share other open access books that everyone should check out! 1. 2. 3. 4. 5.
5
58
275
@Cghlewis
Crystal Lewis
3 months
If at all possible, clean your data using code. Cleaning data manually is ✔️error prone ✔️difficult to check/reproduce ✔️time consuming
Tweet media one
7
44
272
@Cghlewis
Crystal Lewis
2 months
So purrr::map_dfr() is superseded. Is this the newer (tidyverse approved) way to import and bind data frames now? I can't keep up. 😅 #rstats
Tweet media one
15
24
265
@Cghlewis
Crystal Lewis
3 years
If you use #rstats #tidyverse but still don't fully grasp concepts such as what is ~, why do we sometimes add .x in a function, or what is NSE, I "attempted" to demystify some of these concepts as part of my ed data wrangling resource.
3
45
250
@Cghlewis
Crystal Lewis
2 months
When running a research project you will have tons of files to store (data, reports, forms, etc.). Before a project begins, take time to organize your folder structure so that you consistently name and store those files in way that makes them easy to find and use.
Tweet media one
3
35
237
@Cghlewis
Crystal Lewis
1 year
(1/4) When publicly sharing your data it's important to share documentation along with your data. Readmes, data dictionaries, and project documentation can ensure that future users understand both the contents and context of your data. Templates in comments below. #edresearch
Tweet media one
4
63
228
@Cghlewis
Crystal Lewis
1 month
It's been a while since I've shared this resource. If you are looking for help wrangling education data in #rstats , this wiki provides code examples organized by common tasks.
2
58
225
@Cghlewis
Crystal Lewis
3 months
If you ever need to move files around in your directory, #rstats can help do this quickly, especially when you are working with hundreds of files! Here is just one way you might quickly sort and copy files into a new location.
Tweet media one
7
33
219
@Cghlewis
Crystal Lewis
7 months
As a reminder, you can export data frames to individual spreadsheet tabs using the {openxlsx} package! #rstats
Tweet media one
Tweet media two
5
44
212
@Cghlewis
Crystal Lewis
4 days
My periodic reminder that if you are looking for examples of how to structure a repository for public data sharing, as well as what documentation to share along with your data, I created an example project here on @OSFramework !
1
65
281
@Cghlewis
Crystal Lewis
6 months
Interested in learning how to better organize your data on this beautiful Sunday? ☀️ I’ve got you covered:
0
44
197
@Cghlewis
Crystal Lewis
16 days
Managing data without a plan
Tweet media one
3
26
187
@Cghlewis
Crystal Lewis
1 month
As I'm running a script to check for entry errors today, I want to remind everyone to never assume your data entry process is error-free. No matter how careful your team is, errors are always possible. Double entry helps catch errors early on!
Tweet media one
Tweet media two
1
21
183
@Cghlewis
Crystal Lewis
2 months
It's 10pm and I'm looking up the arguments for #rstats pivot_wider() for the 100th time. 😅
8
9
183
@Cghlewis
Crystal Lewis
3 months
Check out Cleaning Medical Data with R from the 2023 @r_medicine conference! Both the slides and recording available here. #rstats
1
43
184
@Cghlewis
Crystal Lewis
1 year
Earlier this summer @ibddoctor , @PipingHotData , and I gave a workshop on cleaning data in R for the R/Medicine Conference. Not sure when the recording will be available but the slides are available here for anyone who is interested! #rstats
Tweet media one
0
31
181
@Cghlewis
Crystal Lewis
4 months
Thanks to @DataSciNews for sharing my Data Cleaning for Data Sharing Using R materials in their most recent newsletter. ☺️
Tweet media one
0
36
180
@Cghlewis
Crystal Lewis
2 months
Tips for data cleaning using code! 👇 More info about data cleaning using code: More info about creating a coding style guide:
Tweet media one
7
30
177
@Cghlewis
Crystal Lewis
2 years
I 🧡 {pointblank} for validation. It notifies you if your tests fail (ex: values out of range), and provides a csv to review cases that fail each test! It works great in conjunction with a data dictionary where you have predefined expectations for your vars! #rstats #edresearch
Tweet media one
Tweet media two
8
29
174
@Cghlewis
Crystal Lewis
7 months
Restructuring your longitudinal data from long to wide OR wide to long format is a fairly painless process in #rstats , especially when you consistently name your variables using controlled vocabulary. 🌟 Full script here:
Tweet media one
7
27
172
@Cghlewis
Crystal Lewis
2 years
I've been working with #rstats dplyr::rows_update a lot lately and just a word of warning to the other users, this function will update *all* values (note the grade level change to NA below). If you only want to update NA values, use dplyr::rows_patch. 🌟
Tweet media one
4
31
164
@Cghlewis
Crystal Lewis
2 years
Do you have a dataset with some missing values, and another form with the missing values completed? Yesterday I was reminded about the rows_update() #rstats function!
Tweet media one
2
29
165
@Cghlewis
Crystal Lewis
8 days
Nobody will remember: - your salary - how “busy you were” - how many hours you worked people will remember: - that data you shared without good documentation
3
18
161
@Cghlewis
Crystal Lewis
7 months
Recently a researcher gave me a dataset, and I asked: What do the values (1,2,3,4) of this variable mean? Their response: The categories are in order in Qualtrics, so the values should be in order. Me: Are you sure about that? 😉 #alwayscheck
Tweet media one
9
34
160
@Cghlewis
Crystal Lewis
1 year
I decided to do another week of #rstats code for common #edresearch data cleaning problems! 🤘 Today I'm sharing a simple solution for renaming many variables at once using an existing data dictionary. I use this method all the time in my work and it saves me so much time!
Tweet media one
2
23
157
@Cghlewis
Crystal Lewis
6 months
If there is only one thing that researchers can do to improve the quality of their data, they should create data dictionaries before collecting any data. Plan for what you want to see in your final datasets. This planning leads to more usable data.
3
26
161
@Cghlewis
Crystal Lewis
25 days
Trying to recall a project decision from 2 years ago because I never wrote it down
Tweet media one
2
17
157
@Cghlewis
Crystal Lewis
3 months
Current you looking at your past self wondering why you didn't better document your project procedures? 😭
Tweet media one
5
10
157
@Cghlewis
Crystal Lewis
2 years
Why I like data cleaning plans. When you want feedback on the transformations you completed on your data, it can be overwhelming to share your code with team members. A data cleaning plan simplifies that conversation.
Tweet media one
3
25
151
@Cghlewis
Crystal Lewis
7 months
When appending (stacking) datasets, variable names and types must be consistent across datasets. It's helpful to check these assumptions before appending your files. The #rstats janitor::compared_df_cols() is great for this.
Tweet media one
6
29
143
@Cghlewis
Crystal Lewis
5 months
That was a lot of fun! #NCME2024
Tweet media one
4
8
144
@Cghlewis
Crystal Lewis
3 months
It's so important to use controlled vocabularies in your variable names (consistent phrases). Not only do they make your variable names easier to interpret, they also allow you to easily retrieve and manipulate variables programmatically. See these examples in R. 👇 #rstats
Tweet media one
Tweet media two
1
16
144
@Cghlewis
Crystal Lewis
27 days
Doing one last check for errors in your data.
Tweet media one
0
17
143
@Cghlewis
Crystal Lewis
2 years
I've been spending a lot of time thinking through how we move from just knowing data management best practices, to creating an actual workflow that a team can implement. As always, feedback is welcome.
Tweet media one
1
19
137
@Cghlewis
Crystal Lewis
2 years
I had such a great time talking about 10 common data management mistakes with the @ssspecf today! Thank you to everyone who joined. Slides are available here!
Tweet media one
4
31
138
@Cghlewis
Crystal Lewis
1 month
If you plan to combine data sources, make sure to collect your variables consistently. 🙌
Tweet media one
0
19
141
@Cghlewis
Crystal Lewis
3 years
The times I feel happiest in my work are those when I am helping people organize their data, teaching people (and learning new) data management best practices, and writing data cleaning scripts for really gnarly data. Is that weird? Sharing knowledge and data cleaning. Bam.
13
1
130
@Cghlewis
Crystal Lewis
4 months
Cool resource alert! 🚨 @Key2STATS has a repository of open datasets available to use for example purposes. You can either download them or interact with them in their R coding interface.
Tweet media one
3
33
133
@Cghlewis
Crystal Lewis
2 months
Once upon a time some of my data management templates were featured in Nature. 😍 Unfortunately they all lived in Google Drive at the time. They now live on OSF. :)
1
38
127
@Cghlewis
Crystal Lewis
2 years
I created a table to compare #rstats packages that are useful for creating codebooks based on 15 criteria. My eyes are getting blurry from staring at this for so long so if I have misrepresented a category for any package, let me know!
Tweet media one
@Cghlewis
Crystal Lewis
2 years
ISO people's favorite #rstats packages to make codebooks. Something similar to an SPSS codebook that has each var info self-contained: - var name - var label - var type - values and value labels - NA values and labels - Total N - N and % percent per category including NA
Tweet media one
13
15
77
6
24
126
@Cghlewis
Crystal Lewis
4 months
Working on another data dictionary today, I am reminded again how important these documents are in the research life cycle. Creating data dictionaries early on creates a roadmap for your entire project, leading to more high quality and usable data.
3
25
124
@Cghlewis
Crystal Lewis
2 years
I’m going to be that person wearing the band t-shirt to the concert. #rstudioconf2022 bound!
Tweet media one
2
2
121
@Cghlewis
Crystal Lewis
3 months
FYI, base #rstats rounds to the even number at .5, rather than always rounding up. 1.5 -> 2 2.5 -> 2 If you want to to always round up at .5, you can use janitor::round_half_up().
Tweet media one
9
25
118
@Cghlewis
Crystal Lewis
4 months
Received the typset proof of my book today. Time to get busy on last final edits and indexing.✍️
Tweet media one
3
8
119
@Cghlewis
Crystal Lewis
4 months
Don't let bad data quality stop your research project from making an impact. #researchdatamanagement
Tweet media one
3
15
117
@Cghlewis
Crystal Lewis
24 days
Data Management in Large-Scale Education Research () includes a collection of checklists, templates, and examples to help you organize your data management workflows. All supplemental materials for the book can be found on OSF here:
1
25
117
@Cghlewis
Crystal Lewis
7 months
It's difficult for others to use your data without first reading documentation. Sharing a data dictionary or codebook alongside your data ensures that users correctly interpret variables. A few examples and templates can be found here:
Tweet media one
Tweet media two
1
23
117
@Cghlewis
Crystal Lewis
3 months
This project provides an example of the data products a researcher may choose to publicly share at the end of a research project, and how they can facilitate reuse by structuring those products in a clear and organized way.
0
27
115
@Cghlewis
Crystal Lewis
3 years
It’s been around for a while but the {gtrendsR} package is great for easily accessing Google trends. #rstats
Tweet media one
Tweet media two
1
17
112
@Cghlewis
Crystal Lewis
2 years
I started with base R back in 2011 and it never clicked for me. I tried for years to use R and never liked it. Then I found the tidyverse and started using it exclusively and it finally clicked. And now I actually understand base R better. Use what works for you.
@prisonrodeo
Christopher Zorn
2 years
Good morning. If the only things you’ve ever done with R rely on the “tidyverse,” you don’t know R, and can’t claim to. Be sure your students know this.
49
12
105
6
2
111
@Cghlewis
Crystal Lewis
7 months
Today for Love Data Week I am sharing "The Basics of Data Management", an excellent data management primer, especially for those collecting human subjects data in the field. #lovedata24 💜
1
29
111
@Cghlewis
Crystal Lewis
1 year
I'm so excited to let everyone know that I have signed a contract with @CRCPress to publish my book! Grateful for the help of @crcgrubbsd in this process! Once published, the online version will remain freely available as well. 🌟 #edresearch #datamgmt
14
17
108
@Cghlewis
Crystal Lewis
5 months
Yesterday I reviewed ~30 open datasets and there was a spectrum of quality. While I don't believe all data need to be identically formatted, I do think data should meet these standards and include thorough documentation (e.g., a data dictionary and project summary).
Tweet media one
5
24
108
@Cghlewis
Crystal Lewis
2 years
At the risk of sounding really uncool, does anyone else ever feel like they just can't keep up? #rstats
@wang_minjie
perlatex
2 years
😂I'm confused. `%>%` or `|>` , which one should I choose? #rstats #tidyverse #dplyr #datascience #ggplot2 #coding
Tweet media one
Tweet media two
Tweet media three
12
9
91
25
7
109
@Cghlewis
Crystal Lewis
14 days
Learn more about file naming and organization in this chapter!
Tweet media one
2
16
107
@Cghlewis
Crystal Lewis
3 months
Is training your team on good data management practices a bucket list item for you this year? Do you have a new grant that you would like to get started on the right foot, or existing projects that you want to better organize? I can help! Learn more:
Tweet media one
2
26
107
@Cghlewis
Crystal Lewis
2 months
Today is the day! Data Management in Large-Scale Education Research is available for pre-order! 🥳 Also available on Amazon and Barnes and Noble!
Tweet media one
5
28
106
@Cghlewis
Crystal Lewis
7 months
Project style guides are excellent tools for ensuring that team members consistently name and structure folders and files as well as dataset variable names and values. Style guides improve interpretation, usability, and reproducibility of files. Learn more
Tweet media one
Tweet media two
3
22
101
@Cghlewis
Crystal Lewis
2 months
Even with the best data management practices in place, it's good practice to assume some amount of error is inevitable. Always review your datasets for mistakes before finalizing them. See #16 in this checklist for examples of how to conduct checks.
Tweet media one
1
24
104
@Cghlewis
Crystal Lewis
5 days
Ever have multiple datasets, maybe collected longitudinally, that you need to combine into a larger dataset? What is the best way to combine them and how can we combine them using #rstats ? Check out this blog for ideas!
2
25
103
@Cghlewis
Crystal Lewis
2 years
A question I’ve always wanted to ask: Is there any reason I should not use R Notebooks for data cleaning? I have no output that needs to be rendered but I like organizing my steps in code chunks. It makes my code more readable than adding comments to syntax. #rstats
20
7
102
@Cghlewis
Crystal Lewis
1 year
The best way to code missing data is a divisive topic. There is very little agreement on which method to use. Here's an interesting table from this 2013 publication:
Tweet media one
10
26
102
@Cghlewis
Crystal Lewis
4 months
Data management can feel overwhelming. Where to begin? Which practices should I use and when should I implement them? This Appendix summarizes common activities that occur in each phase of research into a digestible list for your project. #edresearch
0
25
101
@Cghlewis
Crystal Lewis
2 years
Also, I finally decided to share my new website! It took me a bit. Believe it or not, I actually really dislike promoting myself, not a great trait for an entrepreneur! :) Check the site for current and future #datamgmt and #rstats content!
Tweet media one
5
9
99
@Cghlewis
Crystal Lewis
4 months
Quick introduction for new followers!👋 I’m a freelance research data management consultant. I share resources on how to create better project workflows. I also recently finished this book which will be available in print soon but is also open access here:
1
20
100
@Cghlewis
Crystal Lewis
9 months
Is your longitudinal data currently in long format, but you need it in wide format, or vice versa? No worries! It is easy to restructure your data in R using the {tidyr} package. #rstats You can find more examples here:
Tweet media one
1
17
96
@Cghlewis
Crystal Lewis
4 months
Creating style guides for your projects (rules for formatting information) improves interoperability, interpretation, and reproducibility. It is helpful to have style guides for: ✔️Structuring directories ✔️Naming files ✔️Naming variables Learn more here:
Tweet media one
1
19
97
@Cghlewis
Crystal Lewis
1 year
A comparison of character limits for common statistical programs.
Tweet media one
8
11
97
@Cghlewis
Crystal Lewis
3 years
I have finally finished the next module in my education data mgmt training series: Writing a data cleaning plan! I hope this is helpful for all education researchers looking for data management guidance! Feedback welcome! #edresearch #datamgmt
Tweet media one
5
20
92
@Cghlewis
Crystal Lewis
2 years
I need to remember to always double my estimate of how long I think it will take me to clean data. I'm always so optimistic! And then I start digging in and seeing what is really going on.
7
5
95
@Cghlewis
Crystal Lewis
1 year
It's Friday! And I'm so happy that I've written another chapter for the book (and revised several others). Only 3 more chapters to write in 5 weeks. I can do this right? 😂🙏
Tweet media one
3
5
94
@Cghlewis
Crystal Lewis
7 months
I am so excited that @rfortherest is starting this email series: What's New in R! I was just talking about how difficult it is to keep up with the latest developments, especially with the dwindling #rstats community. If you are interested, sign up here:
2
26
94
@Cghlewis
Crystal Lewis
18 days
In our workshop today I shared this resource with many #rstats users who are looking for help getting started with common data wrangling functions. This wiki might be helpful for you as well.
1
20
94
@Cghlewis
Crystal Lewis
10 months
When you’re managing longitudinal surveys
Tweet media one
6
6
94
@Cghlewis
Crystal Lewis
2 years
If you've been looking for a list of research data management resources you can share with your team or students, look no further! I've compiled my go-to resources in my first ever blog post! ✍️ I hope it’s helpful! #datamgmt #edresearch
4
27
92
@Cghlewis
Crystal Lewis
3 years
In @SamanthaCsik 's talk last night, we googled alternate ways to work with case sensitivity in #rstats {stringr}. Normally I first mutate the column to lower case, but more simply you can add regex (with the argument "ignore_case") or add (?i)! 🤯
Tweet media one
Tweet media two
6
20
91
@Cghlewis
Crystal Lewis
1 year
Sometimes I worry that the data mgmt info I share is too basic. But I continue to see poorly managed data and I'm reminded that many of us aren't taught how to manage data and that while organizing data may seem "basic" in theory, it is actually fairly complicated in practice.
11
6
91